question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

S3 rate limit encountered during DataFrame computation

See original GitHub issue

I was computing the length of a Dask DataFrame which contains many (i.e. several thousand) partitions and is stored in Parquet format on S3. Something that looks like:

import dask.dataframe as dd

df = dd.read_parquet("s3://...", ...)   # df has thousands of partitions
len(df)

Shortly after this computation was kicked off, I got the following error (the full traceback is further down):

ClientError: An error occurred (SlowDown) when calling the ListObjectsV2 operation (reached max retries: 4): Please reduce your request rate.

My guess is that since our len(df) implementation triggers lots of small, fast length computations, we’re able to crank through many tasks quickly and hit some S3 rate limit.

This error message tells me to Please reduce your request rate. and I’m wondering how I can go about doing this. Perhaps there is some exponential backoff behavior I can trigger in s3fs to make api requests at an increasingly slower rate? Maybe there’s other throttling mechanism I can use to prevent hitting these types of S3 limits?

cc @martindurant

Full traceback:
---------------------------------------------------------------------------
ClientError                               Traceback (most recent call last)
/opt/conda/envs/coiled/lib/python3.9/site-packages/s3fs/core.py in _call_s3()

/opt/conda/envs/coiled/lib/python3.9/site-packages/aiobotocore/client.py in _make_api_call()

ClientError: An error occurred (SlowDown) when calling the ListObjectsV2 operation (reached max retries: 4): Please reduce your request rate.

The above exception was the direct cause of the following exception:

OSError                                   Traceback (most recent call last)
<timed eval> in <module>

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/dask/dataframe/core.py in __len__(self)
   3942             return super().__len__()
   3943         else:
-> 3944             return len(s)
   3945 
   3946     def __contains__(self, key):

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/dask/dataframe/core.py in __len__(self)
    579 
    580     def __len__(self):
--> 581         return self.reduction(
    582             len, np.sum, token="len", meta=int, split_every=False
    583         ).compute()

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/dask/base.py in compute(self, **kwargs)
    284         dask.base.compute
    285         """
--> 286         (result,) = compute(self, traverse=False, **kwargs)
    287         return result
    288 

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    566         postcomputes.append(x.__dask_postcompute__())
    567 
--> 568     results = schedule(dsk, keys, **kwargs)
    569     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    570 

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/distributed/client.py in get(self, dsk, keys, workers, allow_other_workers, resources, sync, asynchronous, direct, retries, priority, fifo_timeout, actors, **kwargs)
   2746                     should_rejoin = False
   2747             try:
-> 2748                 results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   2749             finally:
   2750                 for f in futures.values():

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/distributed/client.py in gather(self, futures, errors, direct, asynchronous)
   2023             else:
   2024                 local_worker = None
-> 2025             return self.sync(
   2026                 self._gather,
   2027                 futures,

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
    864             return future
    865         else:
--> 866             return sync(
    867                 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    868             )

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
    324     if error[0]:
    325         typ, exc, tb = error[0]
--> 326         raise exc.with_traceback(tb)
    327     else:
    328         return result[0]

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/distributed/utils.py in f()
    307             if callback_timeout is not None:
    308                 future = asyncio.wait_for(future, callback_timeout)
--> 309             result[0] = yield future
    310         except Exception:
    311             error[0] = sys.exc_info()

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/tornado/gen.py in run(self)
    760 
    761                     try:
--> 762                         value = future.result()
    763                     except Exception:
    764                         exc_info = sys.exc_info()

~/mambaforge/envs/coiled-jrbourbeau-parquet-demo/lib/python3.9/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
   1888                             exc = CancelledError(key)
   1889                         else:
-> 1890                             raise exception.with_traceback(traceback)
   1891                         raise exc
   1892                     if errors == "skip":

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/optimization.py in __call__()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/core.py in get()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/core.py in _execute_task()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/core.py in <genexpr>()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/core.py in _execute_task()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py in __call__()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py in read_parquet_part()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/dataframe/io/parquet/core.py in <listcomp>()

/opt/conda/envs/coiled/lib/python3.9/site-packages/dask/dataframe/io/parquet/fastparquet.py in read_partition()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fastparquet/api.py in to_pandas()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fastparquet/api.py in read_row_group_file()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fsspec/spec.py in open()

/opt/conda/envs/coiled/lib/python3.9/site-packages/s3fs/core.py in _open()

/opt/conda/envs/coiled/lib/python3.9/site-packages/s3fs/core.py in __init__()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fsspec/spec.py in __init__()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fsspec/asyn.py in wrapper()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fsspec/asyn.py in sync()

/opt/conda/envs/coiled/lib/python3.9/site-packages/fsspec/asyn.py in _runner()

/opt/conda/envs/coiled/lib/python3.9/site-packages/s3fs/core.py in _info()

/opt/conda/envs/coiled/lib/python3.9/site-packages/s3fs/core.py in _simple_info()

/opt/conda/envs/coiled/lib/python3.9/site-packages/s3fs/core.py in _call_s3()

OSError: [Errno 16] Please reduce your request rate.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:9 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
quasibencommented, Sep 2, 2021

Pinterest engineering is also currently engaged with s3/parquet improved reading:

https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0

1reaction
martindurantcommented, Aug 19, 2021

@jrbourbeau , can you see if setting S3FileSystem.retries = 10 (in the client and workers too) avoids this problem?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Resolve HTTP 503 "Slow Down" AmazonS3Exception in ...
This error occurs when the Amazon Simple Storage Service (Amazon S3) request rate for your application exceeds the typically sustained rates ...
Read more >
What is the best practice writing massive amount of files to s3 ...
S3 comes with 2 kinds of consistency a.read after write b.eventual consistency and which some cases results in file not found expectation.
Read more >
Amazon S3 transfers | BigQuery - Google Cloud
The BigQuery Data Transfer Service for Amazon S3 allows you to automatically schedule and manage recurring load jobs from Amazon S3 into BigQuery....
Read more >
How to Use Amazon S3 Select to Query CSV
Below are some example data and a function to query a CSV in Amazon S3 and return a CSV format that can be...
Read more >
Python and Parquet Performance - Data Syndrome
In Pandas, PyArrow, fastparquet, AWS Data Wrangler, PySpark and Dask ... with numpy.arrays and AWS Data Wrangler with Pandas and Amazon S3.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found