Page MenuHomePhabricator

Reduce concurrency of RecordLintJobs or shard it per section
Open, Needs TriagePublic

Description

This has contributed to the outages we have had in the past couple of weeks (see the parent ticket). The concurrency should go down to avoid overwhelming the primary database with too many writes.

Event Timeline

MSantos renamed this task from Reduce concurrancy of RecordLintJobs or shard it per section to Reduce concurrency of RecordLintJobs or shard it per section.Jul 25 2024, 2:11 PM
MSantos added subscribers: Jgiannelos, daniel.

Should this be done in the job queue? Or is there something we can do inside RecordLintJob? Is there an example of other jobs that are sharded by section?

Yeah, grep for partitioned_jobs_config in helm deployment charts.

That being said, if we move linter tables to x1, it won't be needed anymore.

Yeah, grep for partitioned_jobs_config in helm deployment charts.

OK, I found https://gerrit.wikimedia.org/g/mediawiki/services/change-propagation/jobqueue-deploy/+/05420ad000caa34a9351de4774d0196a860ca869/scap/vars.yaml#88 and I think this is probably a bit past what I feel comfortable doing so I'll leave it for someone else.

I'll note that T330036#9791309 will also address it by moving the updates into refreshLinks rather than having a separate job.

That being said, if we move linter tables to x1, it won't be needed anymore.

This seems harder to do given the joins we're doing in queries already, I don't want to make it more difficult for editors to get access to the linter data :/