Page MenuHomePhabricator

elukey (Luca Toscano)
Site Reliability Engineer - Machine Learning

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (454 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Yesterday

elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

Last step remaining is to decommission the old VMs!

Thu, Sep 19, 3:46 PM · serviceops
elukey added a comment to T375179: sre.network.tls cookbook - CFSSL error: bad request.

This time we have an issue with sign, since a certificate is already there. I verified with manual commands and gencert works fine.

Thu, Sep 19, 3:08 PM · netops, CFSSL-PKI, Infrastructure-Foundations
elukey added a comment to T374443: Move puppet-merge (bash script) to puppetserver1001.

The move was done and everything seems to work as expected!

Thu, Sep 19, 2:40 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

puppetserver1001 is also working with the new settings, it was rebooted today after trashing.

Thu, Sep 19, 7:26 AM · User-Elukey, Infrastructure-Foundations, SRE

Wed, Sep 18

elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

All poolcounter IPs for MediaWIki/Thumbor are now on Bookworm!

Wed, Sep 18, 1:57 PM · serviceops
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

puppetserver1002 is now running with 35 JRuby workers instead of 48, let's see how it goes at steady state. If everything looks good, we can rollout the change to the rest of the cluster.

Wed, Sep 18, 1:25 PM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

I tried to generate a heap dump with jmap but it is very large and I'd need to copy it to my local laptop to inspect it via VisualVM. There is also an option in jmap to generate a live breakdown in plaintext, but it is full of jruby objects (as expected).

Wed, Sep 18, 9:26 AM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a comment to T374711: keyholder-proxy doesn't restart on config change.

I had a chat with Filippo, the keyholder-proxy is not the daemon that needs re-arming when restarted, so it can be done anytime without extra manual steps.

Wed, Sep 18, 7:43 AM · User-Elukey, Puppet, Keyholder, Infrastructure-Foundations, SRE
elukey closed T331969: Migrate chartmuseum to Bookworm as Resolved.
Wed, Sep 18, 7:42 AM · User-Elukey, serviceops, SRE
elukey closed T331969: Migrate chartmuseum to Bookworm, a subtask of T291916: Tracking task for Bullseye migrations in production, as Resolved.
Wed, Sep 18, 7:41 AM · User-Elukey, Epic, Infrastructure-Foundations, SRE

Tue, Sep 17

elukey added a comment to T332016: Migrate docker registry hosts to bookworm.

registry2005 is now running Bookworm, up and running:

Tue, Sep 17, 5:40 PM · serviceops
elukey closed T374928: codfw: 1 new VM for docker-registry as Resolved.

VM is up and running :)

Tue, Sep 17, 5:39 PM · Patch-For-Review, Infrastructure-Foundations, serviceops
elukey closed T374928: codfw: 1 new VM for docker-registry, a subtask of T332016: Migrate docker registry hosts to bookworm, as Resolved.
Tue, Sep 17, 5:36 PM · serviceops
elukey added a comment to T374443: Move puppet-merge (bash script) to puppetserver1001.

On the infrastructure side we now have:

Tue, Sep 17, 1:05 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T374928: codfw: 1 new VM for docker-registry.
sudo cookbook sre.ganeti.makevm --os bookworm --network private -p 7 --cluster codfw --group B --memory 6 --vcpus 2 --disk 20 registry2005
Tue, Sep 17, 11:14 AM · Patch-For-Review, Infrastructure-Foundations, serviceops
elukey created T374928: codfw: 1 new VM for docker-registry.
Tue, Sep 17, 11:10 AM · Patch-For-Review, Infrastructure-Foundations, serviceops
elukey added a comment to T331969: Migrate chartmuseum to Bookworm.

Last step https://gerrit.wikimedia.org/r/c/integration/config/+/1073426

Tue, Sep 17, 11:06 AM · User-Elukey, serviceops, SRE

Mon, Sep 16

elukey closed T257297: Improve sre.hosts.decommission (additionally find host yaml files) as Declined.

Probably not needed anymore :)

Mon, Sep 16, 2:33 PM · Infrastructure-Foundations, SRE-tools, SRE
elukey lowered the priority of T372485: Spicerack's tox config times out all the time after T342019 from High to Medium.
Mon, Sep 16, 2:18 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey added a project to T374711: keyholder-proxy doesn't restart on config change: User-Elukey.
Mon, Sep 16, 2:12 PM · User-Elukey, Puppet, Keyholder, Infrastructure-Foundations, SRE
elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Cross-posting from T365167#10148384, where I am testing a reimage for sretest2001.

Mon, Sep 16, 1:30 PM · DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

I checked for RSC in the dump that I made from Redfish, and I see the following:

Mon, Sep 16, 10:47 AM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey added a comment to T332016: Migrate docker registry hosts to bookworm.

@JMeybohm could you check the httpbb tests are still relevant and returning the expected results?

Almost. I've uploaded a patch to correct the one thing that seems to have changed.

The plan you laid unfortunately does not work. The httpbb tests, as they are now, do not work against a read-only registry (e.g. eqiad at the moment). But I would argue that we should be able to just add registry2005 (depooled) with bookworm, test against that and then decom one of the old ones (or create two new VMs and decom both of the old ones).

Mon, Sep 16, 10:44 AM · serviceops
elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

Thumbor has been migrated to the new poolcounter VMs, and the MW network policies support the new VM's IPs.

Mon, Sep 16, 10:26 AM · serviceops
elukey claimed T331969: Migrate chartmuseum to Bookworm.
Mon, Sep 16, 10:20 AM · User-Elukey, serviceops, SRE
elukey added a comment to T331969: Migrate chartmuseum to Bookworm.

The reimage of 2001 went fine, I just repooled it. Let's wait for a day before moving to 1001 so if anything weird comes up, we'll have a quick way to fix (depool 2001).

Mon, Sep 16, 10:19 AM · User-Elukey, serviceops, SRE
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

@Jhancock.wm you are totally right, thanks a lot! I was able to force PXE on a 10G port setting the the first RSC-W-66G4 option to Legacy. I hope to find an option in Redfish for enable it from the provision cookbook..

Mon, Sep 16, 9:08 AM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey added a comment to T331969: Migrate chartmuseum to Bookworm.

Due to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1035854, the VM's RAM was bumped to 2G.

Mon, Sep 16, 8:39 AM · User-Elukey, serviceops, SRE

Fri, Sep 13

elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

All new VMs created!

Fri, Sep 13, 10:03 AM · serviceops
elukey closed T374629: eqiad: 2 VM request for poolcounter as Resolved.
Fri, Sep 13, 9:54 AM · SRE, Infrastructure-Foundations, vm-requests
elukey closed T374629: eqiad: 2 VM request for poolcounter, a subtask of T332015: Migrate poolcounter hosts to bookworm, as Resolved.
Fri, Sep 13, 9:53 AM · serviceops
elukey added a comment to T374629: eqiad: 2 VM request for poolcounter.
+-------+-------+-----------+----------+-----------+---------+-----------+
| Group | Nodes | Instances |  MFree   | MFree avg |  DFree  | DFree avg |
+-------+-------+-----------+----------+-----------+---------+-----------+
|   A   |   8   |     30    | 331.2GiB |  41.4GiB  | 16.9TiB |   2.1TiB  |
|   B   |   7   |     33    | 242.3GiB |  34.6GiB  | 12.1TiB |   1.7TiB  |
|   C   |   8   |     30    | 327.5GiB |  40.9GiB  | 15.9TiB |   2.0TiB  |
|   D   |   6   |     32    | 207.3GiB |  34.6GiB  | 10.9TiB |   1.8TiB  |
+-------+-------+-----------+----------+-----------+---------+-----------+
Fri, Sep 13, 7:43 AM · SRE, Infrastructure-Foundations, vm-requests
elukey added a parent task for T374629: eqiad: 2 VM request for poolcounter: T332015: Migrate poolcounter hosts to bookworm.
Fri, Sep 13, 7:42 AM · SRE, Infrastructure-Foundations, vm-requests
elukey added a subtask for T332015: Migrate poolcounter hosts to bookworm: T374629: eqiad: 2 VM request for poolcounter.
Fri, Sep 13, 7:42 AM · serviceops

Thu, Sep 12

elukey updated the task description for T374629: eqiad: 2 VM request for poolcounter.
Thu, Sep 12, 3:05 PM · SRE, Infrastructure-Foundations, vm-requests
elukey created T374629: eqiad: 2 VM request for poolcounter.
Thu, Sep 12, 3:04 PM · SRE, Infrastructure-Foundations, vm-requests
elukey closed T374520: codfw: 2 VM request for poolcounter, a subtask of T332015: Migrate poolcounter hosts to bookworm, as Resolved.
Thu, Sep 12, 3:03 PM · serviceops
elukey closed T374520: codfw: 2 VM request for poolcounter as Resolved.
Thu, Sep 12, 3:03 PM · vm-requests, Infrastructure-Foundations, SRE
elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Nasty issue found for sretest2001: T365167#10140713

In the provision cookbook we loop through the NICs and check the one with a link status up, setting it as default PXE NIC to use. In this case Redfish for Supermicro doesn't return to use any good value, and our logic cannot be used. It is unclear where the problem lies, we'll have to check more hosts to confirm.

Thu, Sep 12, 2:47 PM · DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Nasty issue found for sretest2001: T365167#10140713

Thu, Sep 12, 1:22 PM · DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

Something not really great: on sretest2001 one of the 10G interfaces has a link up, that I can confirm via BIOS, but not via Redfish.

Thu, Sep 12, 1:18 PM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey renamed T374520: codfw: 2 VM request for poolcounter from codfw: 1 VM request for poolcounter to codfw: 2 VM request for poolcounter.
Thu, Sep 12, 12:46 PM · vm-requests, Infrastructure-Foundations, SRE
elukey reopened T374520: codfw: 2 VM request for poolcounter, a subtask of T332015: Migrate poolcounter hosts to bookworm, as Open.
Thu, Sep 12, 12:46 PM · serviceops
elukey reopened T374520: codfw: 2 VM request for poolcounter as "Open".

Using this task to create another VM, poolcounter2006.

Thu, Sep 12, 12:45 PM · vm-requests, Infrastructure-Foundations, SRE
elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

Moved thumbor codfw to poolcounter2005, everything worked nicely.

Thu, Sep 12, 12:45 PM · serviceops
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

Updated https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1071553 and tested, it seems working. I kicked off a reimage of sretest2001, and I ended up with:

Thu, Sep 12, 9:36 AM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey added a comment to T373527: puppetserver1002 thrashing and requiring a power cycle as a result.

It happened again, this time to puppetserver1001. Amir was in the middle of a puppet-merge and it got stuck. OOM killer acting on the puppetserver's JVM :(

Thu, Sep 12, 9:16 AM · User-Elukey, Infrastructure-Foundations, SRE
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

Thanks! I created a diff from the settings dumped before your fix(es) and after, from the Redfish point of view.

Thu, Sep 12, 8:51 AM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops

Wed, Sep 11

elukey reopened T369493: Migrate ml-staging/ml-serve clusters off of Pod Security Policies as "Open".

@klausman I was reviewing with Janis the status of the migration, I think that some steps are missing, please check https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters/PSP_replacement

Wed, Sep 11, 2:38 PM · Machine-Learning-Team, Kubernetes
elukey moved T369493: Migrate ml-staging/ml-serve clusters off of Pod Security Policies from 2024-2025 Q1 Done to Unsorted on the Machine-Learning-Team board.
Wed, Sep 11, 2:38 PM · Machine-Learning-Team, Kubernetes
elukey reopened T369493: Migrate ml-staging/ml-serve clusters off of Pod Security Policies, a subtask of T273507: PodSecurityPolicies will be deprecated with Kubernetes 1.21, as Open.
Wed, Sep 11, 2:36 PM · Patch-For-Review, serviceops, Prod-Kubernetes
elukey added a comment to T369491: Migrate aux cluster off of Pod Security Policies.

Filed a PR for the upstream jaeger chart: https://github.com/jaegertracing/helm-charts/pull/600

Wed, Sep 11, 2:07 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, Kubernetes
elukey closed T369491: Migrate aux cluster off of Pod Security Policies as Resolved.

AUX migrated to PSS!

Wed, Sep 11, 1:46 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, Kubernetes
elukey closed T369491: Migrate aux cluster off of Pod Security Policies, a subtask of T273507: PodSecurityPolicies will be deprecated with Kubernetes 1.21, as Resolved.
Wed, Sep 11, 1:45 PM · Patch-For-Review, serviceops, Prod-Kubernetes
elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

The poolcounter2005 host is up with Bookworm, as far as I can see it seems working fine.

Wed, Sep 11, 1:33 PM · serviceops
elukey closed T374520: codfw: 2 VM request for poolcounter as Resolved.
Wed, Sep 11, 1:32 PM · vm-requests, Infrastructure-Foundations, SRE
elukey closed T374520: codfw: 2 VM request for poolcounter, a subtask of T332015: Migrate poolcounter hosts to bookworm, as Resolved.
Wed, Sep 11, 1:32 PM · serviceops
elukey added a subtask for T332015: Migrate poolcounter hosts to bookworm: T374520: codfw: 2 VM request for poolcounter.
Wed, Sep 11, 10:00 AM · serviceops
elukey added a parent task for T374520: codfw: 2 VM request for poolcounter: T332015: Migrate poolcounter hosts to bookworm.
Wed, Sep 11, 10:00 AM · vm-requests, Infrastructure-Foundations, SRE
elukey added a comment to T374520: codfw: 2 VM request for poolcounter.

@MoritzMuehlenhoff I'd proceed with the creation of poolcounter2005 in row A if you are ok, using sre.ganeti.makevm.

Wed, Sep 11, 10:00 AM · vm-requests, Infrastructure-Foundations, SRE
elukey renamed T374520: codfw: 2 VM request for poolcounter from codfw: 1 VM %request for poolcounter to codfw: 1 VM request for poolcounter.
Wed, Sep 11, 9:56 AM · vm-requests, Infrastructure-Foundations, SRE
elukey added a comment to T374520: codfw: 2 VM request for poolcounter.
+-------+-------+-----------+----------+-----------+---------+-----------+
| Group | Nodes | Instances |  MFree   | MFree avg |  DFree  | DFree avg |
+-------+-------+-----------+----------+-----------+---------+-----------+
|   A   |   6   |     21    | 265.9GiB |  44.3GiB  | 13.4TiB |   2.2TiB  |
|   B   |   6   |     22    | 250.8GiB |  41.8GiB  | 13.3TiB |   2.2TiB  |
|   C   |   6   |     23    | 247.8GiB |  41.3GiB  | 10.7TiB |   1.8TiB  |
|   D   |   6   |     24    | 256.7GiB |  42.8GiB  | 11.7TiB |   2.0TiB  |
+-------+-------+-----------+----------+-----------+---------+-----------+
Wed, Sep 11, 9:56 AM · vm-requests, Infrastructure-Foundations, SRE
elukey updated subscribers of T374520: codfw: 2 VM request for poolcounter.
Wed, Sep 11, 9:55 AM · vm-requests, Infrastructure-Foundations, SRE
elukey created T374520: codfw: 2 VM request for poolcounter.
Wed, Sep 11, 9:54 AM · vm-requests, Infrastructure-Foundations, SRE
elukey added a comment to T369491: Migrate aux cluster off of Pod Security Policies.

Found a violation:

Wed, Sep 11, 9:16 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, Kubernetes
elukey closed T332011: Migrate dragonfly-supernodes to Bookworm, a subtask of T291916: Tracking task for Bullseye migrations in production, as Resolved.
Wed, Sep 11, 8:55 AM · User-Elukey, Epic, Infrastructure-Foundations, SRE
elukey closed T332011: Migrate dragonfly-supernodes to Bookworm as Resolved.

Both nodes on Bookworm!

Wed, Sep 11, 8:55 AM · User-Elukey, serviceops, SRE

Tue, Sep 10

elukey moved T374443: Move puppet-merge (bash script) to puppetserver1001 from Backlog to In Progress on the User-Elukey board.
Tue, Sep 10, 3:00 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T374443: Move puppet-merge (bash script) to puppetserver1001.
elukey@config-master1001:~$ curl https://puppetserver1001.eqiad.wmnet/puppet-sha1.txt
68278f7164f8b827af56282c0ac8664010886b8d
Tue, Sep 10, 2:50 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey awarded T327396: Migrate Kartotherian to node-mapnik v4.2.1 and unfork a Party Time token.
Tue, Sep 10, 1:10 PM · Essential-Work, Content-Transform-Team-WIP, Patch-Needs-Improvement, WMDE-GeoInfo-FocusArea, Maps (Kartotherian)
elukey updated the task description for T374443: Move puppet-merge (bash script) to puppetserver1001.
Tue, Sep 10, 12:48 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey claimed T374443: Move puppet-merge (bash script) to puppetserver1001.
Tue, Sep 10, 12:44 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey triaged T374443: Move puppet-merge (bash script) to puppetserver1001 as High priority.
Tue, Sep 10, 12:44 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey created T374443: Move puppet-merge (bash script) to puppetserver1001.
Tue, Sep 10, 12:39 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

@Jhancock.wm @Papaul Hi! If you have time I have another strange thing to figure out.

Tue, Sep 10, 9:16 AM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops

Mon, Sep 9

elukey removed projects from T374233: Migrate the ownership of ML-Owned Docker images in production-images repo to mailing lists: serviceops, Infrastructure-Foundations.
Mon, Sep 9, 2:27 PM · Machine-Learning-Team
elukey added a comment to T374233: Migrate the ownership of ML-Owned Docker images in production-images repo to mailing lists.

I'd also add the knative images:

Mon, Sep 9, 2:26 PM · Machine-Learning-Team
elukey triaged T374073: Unified pattern for RemoteHosts accessors in Spicerack as Medium priority.
Mon, Sep 9, 2:22 PM · User-Elukey, Spicerack, SRE-tools, Infrastructure-Foundations
elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

Better procedure after chatting with Moritz:

Mon, Sep 9, 10:14 AM · serviceops
elukey added a comment to T332015: Migrate poolcounter hosts to bookworm.

Hey folks, as far as I can get both poolcounter (debian upstream) and poolcounter-prometheus-exporter (bookworm-wikimedia) are already good to go, so we could attempt a reimage of one of the nodes (namely I can try)?

Mon, Sep 9, 9:09 AM · serviceops
elukey added a comment to T332011: Migrate dragonfly-supernodes to Bookworm.

First node reimaged! Everything looks good afaics.

Mon, Sep 9, 8:56 AM · User-Elukey, serviceops, SRE
elukey added a comment to T331969: Migrate chartmuseum to Bookworm.

Tried to file a patch but I realized that we don't have the helm package for Bookworm/Bullseye, so the build fails. I am wondering if the current version of chartmuseum requires Helm 2 or if we could use Helm 3, but maybe we need to upgrade.

Mon, Sep 9, 8:34 AM · User-Elukey, serviceops, SRE

Fri, Sep 6

elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

Great news, the first version of the Supermicro support in provision is live on cumin nodes (namely the cookbook now supports it).

Fri, Sep 6, 4:04 PM · DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

My bad, it was because my factory reset for some reason didn't restore the ADMIN password to its original state. Thanks for the follow up!

Fri, Sep 6, 4:03 PM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

Next and last step - wait for the new conftool release, and then close!

Fri, Sep 6, 2:48 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T368023: Move the private Puppet repository to puppetserver1001.

Tried to update Wikitech and https://wikitech.wikimedia.org/wiki/Puppet#Private_puppet, the documentation should be relatively good now.

Fri, Sep 6, 2:45 PM · Patch-For-Review, User-Elukey, Puppet-Infrastructure, SRE, Infrastructure-Foundations
elukey added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

I've released spicerack 8.13.0 that collects the latest changes for the redfish module, and installed on cumin2002. The cookbook seems ready to go (https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/10378060) but I'd like to test it on sretest2001 first. I have factory-reset it, but now I think it is missing the Redfish license, so I need to wait DCops to redeploy it.

Fri, Sep 6, 1:49 PM · DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
elukey moved T372485: Spicerack's tox config times out all the time after T342019 from In Progress to Stalled on the User-Elukey board.
Fri, Sep 6, 1:42 PM · Patch-For-Review, User-Elukey, Release-Engineering-Team, Infrastructure-Foundations
elukey added a comment to T369491: Migrate aux cluster off of Pod Security Policies.

Next steps:

Fri, Sep 6, 1:26 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, Kubernetes
elukey added a comment to T365167: Q4:rack/setup/install sretest2001.

@Jhancock.wm Hi! I tried to factory reset the sretest2001's BMC, and now I am getting some errors when using the Redfish API (unauthorized etc..). I am wondering if the factory reset deleted the license to use redfish too.. if so could you please re-add it? Thanks in advance!

Fri, Sep 6, 10:47 AM · SRE, Infrastructure-Foundations, ops-codfw, DC-Ops
elukey closed T367981: Update Proton to include Chromium 128.0.6613.119-1 as Resolved.

Deployed :)

Fri, Sep 6, 8:57 AM · Content-Transform-Team-WIP, Essential-Work, Proton
elukey added a project to T332011: Migrate dragonfly-supernodes to Bookworm: User-Elukey.
Fri, Sep 6, 8:53 AM · User-Elukey, serviceops, SRE
elukey added a comment to T332011: Migrate dragonfly-supernodes to Bookworm.

Next steps:

  • reimage codfw outside the deployment window
  • let it bake for some days
  • do the same for eqiad
Fri, Sep 6, 8:53 AM · User-Elukey, serviceops, SRE
elukey renamed T332011: Migrate dragonfly-supernodes to Bookworm from Migrate dragonfly-supernodes to bullseye to Migrate dragonfly-supernodes to Bookworm.
Fri, Sep 6, 8:52 AM · User-Elukey, serviceops, SRE
elukey renamed T367981: Update Proton to include Chromium 128.0.6613.119-1 from Update Proton to include Chromium 126.0.6478.126 to Update Proton to include Chromium 128.0.6613.119-1.
Fri, Sep 6, 8:31 AM · Content-Transform-Team-WIP, Essential-Work, Proton
elukey added a comment to T367981: Update Proton to include Chromium 128.0.6613.119-1.

As FYI I have been taking care of deployments of new versions of Proton, a new announce went out yesterday and I filed https://gerrit.wikimedia.org/r/c/mediawiki/services/chromium-render/+/1071133.

Fri, Sep 6, 8:24 AM · Content-Transform-Team-WIP, Essential-Work, Proton
elukey added a comment to T369491: Migrate aux cluster off of Pod Security Policies.
root@deploy1003:~# kube-env admin aux-k8s-eqiad
Fri, Sep 6, 8:08 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, Kubernetes

Thu, Sep 5

elukey added a project to T374073: Unified pattern for RemoteHosts accessors in Spicerack: User-Elukey.
Thu, Sep 5, 2:51 PM · User-Elukey, Spicerack, Infrastructure-Foundations, SRE-tools
elukey added a comment to T371899: Review how the debmonitor server processes hosts/images when starting fresh.

Important bit after a discussion with Riccardo - the debmonitor DB is already replicated (eqiad -> codfw at the moment) since it is hosted on M2-Master, and the replication/backup is handled by Data Persistence. The important bit is that they also handle what DC is "active", and it is transparent to us since we resolve the m2-master DNS record (that points to what M2 master is currently active).

Thu, Sep 5, 7:10 AM · User-Elukey, Infrastructure-Foundations

Wed, Sep 4

elukey closed T276443: Formalize and share the spicerack/cumin release process as Resolved.

We have now https://gitlab.wikimedia.org/repos/sre/python-release that basically documents how to release Spicerack and other similar projects. Please always ping the SRE Infra Foundations team before doing anything :)

Wed, Sep 4, 3:46 PM · SRE-tools, Cumin, Infrastructure-Foundations, Spicerack