Page MenuHomePhabricator

Figure out plan for restbase-async w/r database switchover
Closed, ResolvedPublic

Description

On https://wikitech.wikimedia.org/wiki/Switch_Datacenter#Services there are special instructions for restbase-async:

Restbase is a bit of a special case, and needs an additional step, if we're just switching active traffic over and not simulating a complete failover:

# pool restbase-async everywhere, then depool restbase-async in the newly active dc, so that async traffic is separated from real-users traffic as much as possible.

This special case is problematic, and during today's switchover we decided to leave restbase-async like everything else, only pooled in codfw.

We should either:

  • stop treating restbase-async specially and just have it be like any other service, pooled in the active DC
  • implement said special handling in the cookbook so it's not an extra step

Event Timeline

If we do, we need to keep in mind that we're going to keep restbase-async pooled only in codfw for as long as possible/1 week during T327920: March 2023 Datacenter Switchover, so we need to be able to skip the "pool everywhere, depool in active DC" step.

Clement_Goubert claimed this task.

The plan we landed on is:

  • Use the sre.discovery.datacenter to depool eqiad completely, thus switching restbase-async to codfw with the rest of the services
  • Wait the time we want to test capacity in codfw (around a week, less if issues arise)
  • Use sre.discovery.service-route to switch restbase-async back to eqiad following the procedure in https://wikitech.wikimedia.org/wiki/Switch_Datacenter#restbase-async