Reduce concurrency of RecordLintJobs or shard it per section
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	• Ladsgroup
	Jul 22 2024, 10:10 AM

Description

This has contributed to the outages we have had in the past couple of weeks (see the parent ticket). The concurrency should go down to avoid overwhelming the primary database with too many writes.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved	Security	• Ladsgroup	T370304 Bursts of occasional severe contention on s4 (commonswiki) primary mariadb causing recurrent user-facing outages on all wikis
		Open		None	T370624 Reduce concurrency of RecordLintJobs or shard it per section

Event Timeline

• Ladsgroup created this task.Jul 22 2024, 10:10 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 22 2024, 10:10 AM

MSantos renamed this task from Reduce concurrancy of RecordLintJobs or shard it per section to Reduce concurrency of RecordLintJobs or shard it per section.Jul 25 2024, 2:11 PM

MSantos edited projects, added Content-Transform-Team-WIP, RESTBase Sunsetting; removed Content-Transform-Team.

MSantos added subscribers: Jgiannelos, daniel.

Should this be done in the job queue? Or is there something we can do inside RecordLintJob? Is there an example of other jobs that are sharded by section?

Yeah, grep for partitioned_jobs_config in helm deployment charts.

That being said, if we move linter tables to x1, it won't be needed anymore.

In T370624#10027452, @Ladsgroup wrote:

Yeah, grep for partitioned_jobs_config in helm deployment charts.

OK, I found https://gerrit.wikimedia.org/g/mediawiki/services/change-propagation/jobqueue-deploy/+/05420ad000caa34a9351de4774d0196a860ca869/scap/vars.yaml#88 and I think this is probably a bit past what I feel comfortable doing so I'll leave it for someone else.

I'll note that T330036#9791309 will also address it by moving the updates into refreshLinks rather than having a separate job.

In T370624#10027453, @Ladsgroup wrote:

That being said, if we move linter tables to x1, it won't be needed anymore.

This seems harder to do given the joins we're doing in queries already, I don't want to make it more difficult for editors to get access to the linter data :/

mdaniels5757 subscribed.Aug 21 2024, 12:36 AM

Reduce concurrency of RecordLintJobs or shard it per sectionOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Reduce concurrency of RecordLintJobs or shard it per section
Open, Needs TriagePublic
Actions

Related Objects
Search...