[Bug] Dask on Ray 1TB sort failed by S3 read failure
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Core
What happened + What you expected to happen
Failed within 12 seconds with this error;
(dask:generate_s3_file-b0ed8484-e106-492b-95ff-7889df04cc94 pid=1208) S3 partition 6 exists
(dask:generate_s3_file-d0d38f68-747e-4536-b6c9-28fac980a525 pid=1209) S3 partition 7 exists
(dask:generate_s3_file-ccc17522-e547-4a1a-b5a2-f15a20a691f1 pid=189, ip=10.0.3.102) S3 partition 9 exists
(dask:generate_s3_file-dd77b428-0729-499c-9f57-4016e25a3480 pid=194, ip=10.0.3.93) S3 partition 8 exists
(dask:generate_s3_file-3d9fe851-2279-4d04-ba28-7a1ab4b7a390 pid=1210) S3 partition 4 exists
(dask:generate_s3_file-69546713-ce11-4a3f-8a17-4f4e8ee38ad6 pid=192, ip=10.0.3.102) S3 partition 2 exists
(dask:generate_s3_file-75019ef3-37c7-4675-b5d4-32a03e1eea40 pid=379, ip=10.0.3.102) S3 partition 3 exists
(dask:generate_s3_file-0f4164ae-65cb-46a8-a100-d901e069efd9 pid=371, ip=10.0.3.93) S3 partition 0 exists
(dask:generate_s3_file-4a30bc6b-677c-4df6-b86c-df85ec55858b pid=376, ip=10.0.3.102) S3 partition 1 exists
Traceback (most recent call last):
File "dask_on_ray/dask_on_ray_sort.py", line 200, in <module>
file_path=args.file_path,
File "dask_on_ray/dask_on_ray_sort.py", line 112, in trial
df = load_dataset(client, data_dir, s3_bucket, nbytes, n_partitions)
File "dask_on_ray/dask_on_ray_sort.py", line 56, in load_dataset
df = dd.read_parquet(filenames)
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/parquet/core.py", line 342, in read_parquet
**kwargs,
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/parquet/arrow.py", line 383, in read_metadata
kwargs,
File "/home/ray/anaconda3/lib/python3.7/site-packages/dask/dataframe/io/parquet/arrow.py", line 917, in _collect_dataset_info
**_dataset_kwargs,
File "/home/ray/anaconda3/lib/python3.7/site-packages/pyarrow/dataset.py", line 670, in dataset
return _filesystem_dataset(source, **kwargs)
File "/home/ray/anaconda3/lib/python3.7/site-packages/pyarrow/dataset.py", line 422, in _filesystem_dataset
return factory.finish(schema)
File "pyarrow/_dataset.pyx", line 1680, in pyarrow._dataset.DatasetFactory.finish
File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1179, in pyarrow._fs._cb_open_input_file
File "/home/ray/anaconda3/lib/python3.7/site-packages/pyarrow/fs.py", line 394, in open_input_file
raise FileNotFoundError(path)
FileNotFoundError: core-nightly-test/df-100-0.parquet.gzip
Versions / Dependencies
master
Reproduction script
N/A
Anything else
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
[Bug] Dask on ray 1tb sort failing due to input file not found error
The files still seem to be in the bucket. At a high level, it seems that somehow Dataset is now interpreting the URI...
Read more >Troubleshoot Amazon S3 Batch Operations issues
Here are some common reasons that Amazon S3 Batch Operations fails or returns an error: Manifest file format (CSV or JSON); Manifest file ......
Read more >dask distributed memory error - Stack Overflow
The most common cause of this error is trying to collect too much data, such as occurs in the following example using dask.dataframe:...
Read more >Analyzing memory management and performance in Dask-on ...
The goal of this blog is to compare the memory management and performance of "Dask-on-Ray'' versus Dask with its built-in scheduler.
Read more >Connect to remote data - Dask documentation
Connect to remote data¶. Dask can read data from a variety of data stores including local file systems, network file systems, cloud object...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Looks like it fails 5 times in a row, so there must be something here.
cc @mwtian would you have time to take a look at this issue?
If you are busy, I can just take it over. I will take a look at it today!