Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry when requested range not satisfied #2222

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tsutsu
Copy link

@tsutsu tsutsu commented Jun 13, 2024

I've run into the kind of bad-Range-response-header aborts reported in #1344 a couple of times now — mostly when downloading 100GB+ files from lesser-known CDNs using high connection parallelism.

In the cases I've observed where partial-content responses give incorrect Range response headers, the responses that do this seem completely non-deterministic. Retrying the same large split download several times, will result in a different random one or two splits (out of the thousands that make up the download) range-aborting.

My hypothesis for what's going on in these cases (and perhaps many of the ones people are noticing in #1344), is that aria2c is talking to a load-balancer and/or doing DNS round-robin, such that each partial request is being directed to a different server backend; and the pool of backend nodes has a heterogenous mix of webserver versions or configurations — such that a minority of the nodes are bugged somehow. Each "roll of the dice" when requesting a split, can therefore result in talking to a bad server that gives you a bad Range response header.

But why it happens doesn't really matter; all that matters is that these cases are non-deterministic, and therefore will almost always be fixed by just retrying the request.

This PR simply makes aria2 retry, rather than abort, when !(HttpRequest::isRangeSatisfied(...)).

I've used aria2 with this fix to download from one of these flaky hosts, and retrying does "solve" the problem. I suspect it will solve most if not all of the problems seen by people in #1344.

(And in the unlikely case where a bad Range response is a deterministic problem with a webserver, aria2 will still eventually hit its retry limit and abort anyway.)

@tsutsu
Copy link
Author

tsutsu commented Jun 13, 2024

I would potentially suggest also tweaking the retry logic, to use a minimum retry-wait of e.g. 2s for retries triggered by this error-code specifically, even when the user has not specified a --retry-wait.

When talking to a load-balancer that uses least-conn upstream routing, retrying immediately due to a bad Range response header might (depending on the load-balancer impl, and how busy it is) guarantee that you re-acquire exactly the same flaky backend you just released.

Adding an enforced delay for this case, would give someone else the opportunity to take the connection to the bugged backend you just released, so that you'll get a different one. 😄

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant