Wikimedia's User-Agent policy specifically forbids using generic values for the User-Agent request header.
Apply stricter rate limiting to requests violating the policy.
Wikimedia's User-Agent policy specifically forbids using generic values for the User-Agent request header.
Apply stricter rate limiting to requests violating the policy.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T224891 Rate limit requests in violation of User-Agent policy more aggressively | |||
Open | CDanis | T313634 Survey the third-party library market for UA policy compliance |
Change 514017 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache_upload: return HTTP 403 to requests violating UA policy
Change 514017 merged by Ema:
[operations/puppet@production] cache_upload: return HTTP 403 to requests violating UA policy
For Tech News: Bots and other scripts that do not set an identifiable User-Agent may find their requests blocked until they identify themselves properly.
Not sure if it applies here, but please remember that we allow Api-User-Agent as an alternative to User-Agent for Javascript solutions.
We (Traffic) have decided to continue allowing requests violating the UA policy. Instead of blocking them, we will apply stricter rate limiting to those.
Change 513596 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] varnish: cache_upload rate limit
Change 513596 merged by Ema:
[operations/puppet@production] varnish: cache_upload miss/pass rate limit
TechNews: I've added it to the upcoming edition with this edit, that will be frozen for translation in about 18 hours. Please amend it before then if needed. (And thank you @Legoktm for writing the initial version!). Cheers!
Even with the current rate limiting, some crawling are regularly causing issues, wasting precious SRE time.
I'd like to revisit this task to be more strict on user agents, maybe progressively increasing the way we enforce our policy. For example:
A variant could be to only apply the above on the upload cluster, but the less exceptions the better
Agreed to all that, though I would not exempt WMCS because WMCS can generate significant amounts of traffic much faster by virtue of already being in the cluster and people using WMCS are generally Wikimedians who should be more familiar with our policies than someone who just wants to scrape wiki pages.
I would also add that after a DoS ~2 months ago I spent a while working on advertising the UA policy and our general API usage guidelines: [1], [2].
We responded to another set of pages today and most of the offending requests were coming from a public Cloud with no User-agent, so we've banned those requests from the upload cluster: https://gerrit.wikimedia.org/r/702003
I'm not really sure who or which team needs to approve this or whether no one opposes it and someone just needs to do it.
Change 702896 had a related patch set uploaded (by Ema; author: Ema):
[operations/puppet@production] varnish: use 403 instead of 429 where appropriate
Change 702896 merged by Ema:
[operations/puppet@production] varnish: use 403 instead of 429 where appropriate
The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!
Change 740818 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] R:varnish:instance: Add genral public cloud rate limiting
Change 740828 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] R:varnish:instance: Add hiere key to control cloud ratelimits
Change 740828 had a related patch set uploaded (by Jbond; author: jbond):
[operations/puppet@production] R:varnish:instance: Add hiere key to control cloud ratelimits
Change 740818 merged by Jbond:
[operations/puppet@production] R:varnish:instance: Add general public cloud rate limiting
Change 740828 abandoned by Jbond:
[operations/puppet@production] R:varnish:instance: Add hiera key to control cloud ratelimits
Reason:
replaced by requestctl
@Pppery AFAIK other then blocking empty agent headers on upload (T224891#7182766) no further progress has been made to addresses the comments in T224891#6983370