question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Storage] Retries exceed MaxRetries if server sends partial response

See original GitHub issue

When using Azure.Core@1.1.0 and Azure.Storage.Blobs@12.4.0-dev.20200306.1, BlobClient.DownloadToAsync() appears to retry indefinitely if the server sends a partial body, even though MaxRetries = 1.

Repro Steps

  1. Start a server which can inject HTTP faults, like https://github.com/mikeharder/HttpFaultInjector.

  2. Create a BlobClient with the following options:

var httpClient = new HttpClient(httpClientHandler)
{
    Timeout = TimeSpan.FromSeconds(5)
};

var blobClientOptions = new BlobClientOptions()
{
    Transport = new ChangeUriTransport(new HttpClientTransport(httpClient), "localhost", 7778),
    Retry = {
        MaxRetries = 1,
        NetworkTimeout = TimeSpan.FromSeconds(10)
    },
};

var blobClient = new BlobClient(connectionString, "testcontainer", "testblob", blobClientOptions);
var response = await blobClient.DownloadToAsync(new MemoryStream());

private class ChangeUriTransport : HttpPipelineTransport
{
    private readonly HttpPipelineTransport _transport;
    private readonly string _host;
    private readonly int? _port;

    public ChangeUriTransport(HttpPipelineTransport transport, string host, int? port)
    {
        _transport = transport;
        _host = host;
        _port = port;
    }

    public override Request CreateRequest()
    {
        return _transport.CreateRequest();
    }

    public override void Process(HttpMessage message)
    {
        ChangeUri(message);
        _transport.Process(message);
    }

    public override ValueTask ProcessAsync(HttpMessage message)
    {
        ChangeUri(message);
        return _transport.ProcessAsync(message);
    }

    private void ChangeUri(HttpMessage message)
    {
        // Ensure Host header is only set once, since the same HttpMessage will be reused on retries
        if (!message.Request.Headers.Contains("Host"))
        {
            message.Request.Headers.Add("Host", message.Request.Uri.Host);
        }

        message.Request.Uri.Host = _host;
        if (_port.HasValue)
        {
            message.Request.Uri.Port = _port.Value;
        }
    }
}
  1. Run the client app, and instruct the server to send a partial response (say 50% of the body).

  2. The client will read the partial response, block for 10 seconds (NetworkTimeout), then retry but only requesting the remaining 50% of bytes (with a Content-Range header).

  3. Again instruct the server to send a partial response (50% of the remaining body == 25% of the original body).

  4. The client will again read the partial response, block for 10 seconds (NetworkTimeout), then retry but only requesting the remaining 25% of original bytes.

  5. This can be repeated indefinitely, despite MaxRetries=1. I’m not sure if this is a bug or by design.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

2reactions
tg-msftcommented, Mar 17, 2020

So my understanding of this issue is that we’ll do the following:

  1. Start downloading with the customer’s MaxRetries via the pipeline
  2. Wrap in a RetriableStream
  3. If downloading the stream fails after we’ve left the pipeline a. Try to start downloading from where we left off with the customer’s MaxRetries via the pipeline b. If that succeeds, go back to step 2 c. If that fails, loop back to step 3, up to 3 times
  4. Return the downloaded blob

We don’t adjust MaxRetries in Step 3a to account for the number of attempts in Steps 1 or previous iterations of Step 3a. We also don’t keep a running number of attempts in Step 3c so every stream failure kicks off a whole new cycle. This could take a very long time under adverse circumstances (absolute worst case being 3 * MaxRetries * BlobSize requests if Mike sets up a test server that fails after sending back a single byte).

Mike’s point in raising the issue is number of actual retries that could happen in a flaky blob download doesn’t correspond to how the customer envisions MaxRetries working. Pavel and I are okay with this until we hear from customers at which point we could add a few more RetryPolicy knobs to allow configuring what happens in Step 3.

0reactions
pakrymcommented, Jan 5, 2022

Customers never reported problems with this behavior.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ConnectionError, Max retries exceeded with url, (Caused ...
I have a python client and a simple clojure server running on the same machine. The server runs on compojure+http-kit with 4 threads....
Read more >
Retry behavior - AWS SDKs and Tools
Retryable : This step determines whether a response can be retried based on the following: The HTTP status code. The error code returned...
Read more >
Configuration file - Cortex metrics
The alertmanager_storage_config configures the Cortex alertmanager storage backend. # Backend storage to use. Supported backends are: s3, gcs, azure, swift, ...
Read more >
Documentation - Apache Kafka
The fetch request size must be at least as large as the maximum message size the server allows or else it is possible...
Read more >
Grafana Mimir configuration parameters
Describes parameters used to configure Grafana Mimir.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found