Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Storage] Retries exceed MaxRetries if server sends partial response

See original GitHub issue

When using Azure.Core@1.1.0 and Azure.Storage.Blobs@12.4.0-dev.20200306.1, BlobClient.DownloadToAsync() appears to retry indefinitely if the server sends a partial body, even though MaxRetries = 1.

Repro Steps

Start a server which can inject HTTP faults, like https://github.com/mikeharder/HttpFaultInjector.
Create a BlobClient with the following options:

var httpClient = new HttpClient(httpClientHandler)
{
    Timeout = TimeSpan.FromSeconds(5)
};

var blobClientOptions = new BlobClientOptions()
{
    Transport = new ChangeUriTransport(new HttpClientTransport(httpClient), "localhost", 7778),
    Retry = {
        MaxRetries = 1,
        NetworkTimeout = TimeSpan.FromSeconds(10)
    },
};

var blobClient = new BlobClient(connectionString, "testcontainer", "testblob", blobClientOptions);
var response = await blobClient.DownloadToAsync(new MemoryStream());

private class ChangeUriTransport : HttpPipelineTransport
{
    private readonly HttpPipelineTransport _transport;
    private readonly string _host;
    private readonly int? _port;

    public ChangeUriTransport(HttpPipelineTransport transport, string host, int? port)
    {
        _transport = transport;
        _host = host;
        _port = port;
    }

    public override Request CreateRequest()
    {
        return _transport.CreateRequest();
    }

    public override void Process(HttpMessage message)
    {
        ChangeUri(message);
        _transport.Process(message);
    }

    public override ValueTask ProcessAsync(HttpMessage message)
    {
        ChangeUri(message);
        return _transport.ProcessAsync(message);
    }

    private void ChangeUri(HttpMessage message)
    {
        // Ensure Host header is only set once, since the same HttpMessage will be reused on retries
        if (!message.Request.Headers.Contains("Host"))
        {
            message.Request.Headers.Add("Host", message.Request.Uri.Host);
        }

        message.Request.Uri.Host = _host;
        if (_port.HasValue)
        {
            message.Request.Uri.Port = _port.Value;
        }
    }
}

Run the client app, and instruct the server to send a partial response (say 50% of the body).
The client will read the partial response, block for 10 seconds (NetworkTimeout), then retry but only requesting the remaining 50% of bytes (with a Content-Range header).
Again instruct the server to send a partial response (50% of the remaining body == 25% of the original body).
The client will again read the partial response, block for 10 seconds (NetworkTimeout), then retry but only requesting the remaining 25% of original bytes.
This can be repeated indefinitely, despite MaxRetries=1. I’m not sure if this is a bug or by design.

Issue Analytics

State:
Created 4 years ago
Comments:7 (7 by maintainers)

Top GitHub Comments

2reactions

tg-msftcommented, Mar 17, 2020

So my understanding of this issue is that we’ll do the following:

Start downloading with the customer’s MaxRetries via the pipeline
Wrap in a RetriableStream
If downloading the stream fails after we’ve left the pipeline a. Try to start downloading from where we left off with the customer’s MaxRetries via the pipeline b. If that succeeds, go back to step 2 c. If that fails, loop back to step 3, up to 3 times
Return the downloaded blob

We don’t adjust MaxRetries in Step 3a to account for the number of attempts in Steps 1 or previous iterations of Step 3a. We also don’t keep a running number of attempts in Step 3c so every stream failure kicks off a whole new cycle. This could take a very long time under adverse circumstances (absolute worst case being 3 * MaxRetries * BlobSize requests if Mike sets up a test server that fails after sending back a single byte).

Mike’s point in raising the issue is number of actual retries that could happen in a flaky blob download doesn’t correspond to how the customer envisions MaxRetries working. Pavel and I are okay with this until we hear from customers at which point we could add a few more RetryPolicy knobs to allow configuring what happens in Step 3.

0reactions

pakrymcommented, Jan 5, 2022

Customers never reported problems with this behavior.

Top Results From Across the Web

ConnectionError, Max retries exceeded with url, (Caused ...

I have a python client and a simple clojure server running on the same machine. The server runs on compojure+http-kit with 4 threads....

Retry behavior - AWS SDKs and Tools

Retryable : This step determines whether a response can be retried based on the following: The HTTP status code. The error code returned...

Configuration file - Cortex metrics

The alertmanager_storage_config configures the Cortex alertmanager storage backend. # Backend storage to use. Supported backends are: s3, gcs, azure, swift, ...

Documentation - Apache Kafka

The fetch request size must be at least as large as the maximum message size the server allows or else it is possible...

Grafana Mimir configuration parameters

Describes parameters used to configure Grafana Mimir.