Page MenuHomePhabricator

Break down monster class: Database
Closed, ResolvedPublic

Description

This class has more than 6000 lines and more than 100 public functions, it does everything related to databases and needs to be broken down. Some ideas that were brought up by @daniel are:

  • Migrate transaction management to its own class and interface (T299698)
  • Move replication-based stuff to its own class, e.g. getLag()
  • Remove legacy cruft from introduction of ResultWrapper: T286694
  • Move schema update and related functions to IMaintainableDatabase or DatabaseUpdater (T190396)
  • Move SQL building code from Database class to SQLPlatform (T307616)

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The "Move replication-based stuff to its own class" bullet could use some clarity. I'd imagine a lot of that could easily stay in Database/subclasses (due to rdbms specific logic). I guess you could have separate per-rdbms class hierarchies and inject the objects. Was this referring to more of the LoadBalancer code?

Yes, you basically answered yourself. This needs to be per-rdbms replication manager and we also need to move LB and LBFactory logic to it (as much as possible) as well, that would make the class more conceptually cohesive.

Change 745215 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdmbs: Start of SQLPlatform to split out of Database

https://gerrit.wikimedia.org/r/745215

Change 745215 merged by jenkins-bot:

[mediawiki/core@master] rdmbs: Start of SQLPlatform to split out of Database

https://gerrit.wikimedia.org/r/745215

Change 785807 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Start using SQLPlatform and move more methods there

https://gerrit.wikimedia.org/r/785807

Change 785807 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Start using SQLPlatform and move more methods there

https://gerrit.wikimedia.org/r/785807

Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF updated the task description. (Show Details)
Jdforrester-WMF updated the task description. (Show Details)

Change 821201 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Move several methods from IDatabase to IMaintainableDatabase

https://gerrit.wikimedia.org/r/821201

Change 821201 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Move several methods from IDatabase to IMaintainableDatabase

https://gerrit.wikimedia.org/r/821201

Change 824717 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] [WIP] [VERY WIP] Moving replication-related code to its own component

https://gerrit.wikimedia.org/r/824717

Change 824722 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] rdbms: Introduce DatabaseFlagsHolder and move some internal logic there

https://gerrit.wikimedia.org/r/824722

Change 825930 had a related patch set uploaded (by Aaron Schulz; author: Aaron Schulz):

[mediawiki/core@master] rdbms: remove unused getTopologyRootPrimary() method from IDatabase

https://gerrit.wikimedia.org/r/825930

Change 825930 merged by jenkins-bot:

[mediawiki/core@master] rdbms: remove getTopologyRootPrimary() and clean up related fields/parameters

https://gerrit.wikimedia.org/r/825930

Change 615684 had a related patch set uploaded (by Tim Starling; author: Legoktm):

[mediawiki/core@master] rdbms: Move Database::factory() to DatabaseFactory service

https://gerrit.wikimedia.org/r/615684

Change 822696 had a related patch set uploaded (by Krinkle; author: Aaron Schulz):

[mediawiki/core@master] rdbms: avoid DB_PRIMARY queries in getLagFromPtHeartbeat()

https://gerrit.wikimedia.org/r/822696

Change 822696 merged by jenkins-bot:

[mediawiki/core@master] rdbms: avoid DB_PRIMARY queries in getLagFromPtHeartbeat()

https://gerrit.wikimedia.org/r/822696

Change 615684 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Move Database::factory() to DatabaseFactory service

https://gerrit.wikimedia.org/r/615684

Change 824722 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Introduce DatabaseFlagsHolder and move some internal logic there

https://gerrit.wikimedia.org/r/824722

Change 824717 merged by jenkins-bot:

[mediawiki/core@master] rdbms: Moving replication-related code to its own component

https://gerrit.wikimedia.org/r/824717

Ladsgroup claimed this task.

One major challenge for engineers while doing refactoring is knowing when to stop and consider it good enough (ref. "Kill it with fire" book). It's hard, there is always room for improvements and doing more. In this case of database class, we can probably move more code to SQLPlatform, probably set up a dedicated class for "Session" state and much more but:

  • It has already changed drastically, went from > 6000 lines to 3900 which 500 lines are just delegating to smaller classes.
    • This is more pronounced in subclasses. DatabaseMysqlBase went from 1600 lines to 800.
  • It's not longer the most complex class of mediawiki according to this metric: https://doc.wikimedia.org/mediawiki-core/master/phpmetrics/complexity.html
  • Its complexity score has been halved.
  • It's much more testable now.
  • I think its size is no longer rdbms' biggest problem. I can think of several other issues in rdbms. Notably undue complexity of LB/LBF, LoadMonitor's mess, removable complexity in Database class itself, reliance of running regex on SQL (e.g. Database::getTempTableWrites) and many more issues that are more important that Database class's size now.

So I'm closing this, we probably would open a new one in one or two years to revisit but in the mean time, let's fix more important issues.