Descriptive error message and usage documentation for read_csv_glob
See original GitHub issueSystem information
- Linux Ubuntu 16.04
- ray 1.8.0
- modin 0.12.0
- python 3.8
I’m seeing the below issue with read_csv_glob
pointing to an S3 prefix that would results in multiple files being returned. I’m using the url s3://nyc-tlc/trip data/yellow_tripdata_2020-
which returns several files. Note that this work when I point to an S3 url to a single .csv
file.
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_13805/755823039.py in <module>
2
3 file_path = 's3://nyc-tlc/trip data/yellow_tripdata_2020-'
----> 4 modin_df = pd.read_csv_glob(file_path, parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"], quoting=3)
5
6
~/venv/lib/python3.8/site-packages/modin/experimental/pandas/io.py in parser_func(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
181
182 kwargs = {k: v for k, v in f_locals.items() if k in _pd_read_csv_signature}
--> 183 return _read(**kwargs)
184
185 parser_func.__doc__ = _read.__doc__
~/venv/lib/python3.8/site-packages/modin/experimental/pandas/io.py in _read(**kwargs)
206
207 try:
--> 208 pd_obj = FactoryDispatcher.read_csv_glob(**kwargs)
209 except AttributeError:
210 raise AttributeError("read_csv_glob() is only implemented for pandas on Ray.")
~/venv/lib/python3.8/site-packages/modin/core/execution/dispatching/factories/dispatcher.py in read_csv_glob(cls, **kwargs)
183 @_inherit_docstrings(factories.ExperimentalPandasOnRayFactory._read_csv_glob)
184 def read_csv_glob(cls, **kwargs):
--> 185 return cls.__factory._read_csv_glob(**kwargs)
186
187 @classmethod
~/venv/lib/python3.8/site-packages/modin/core/execution/dispatching/factories/factories.py in _read_csv_glob(cls, **kwargs)
511 )
512 def _read_csv_glob(cls, **kwargs):
--> 513 return cls.io_cls.read_csv_glob(**kwargs)
514
515 @classmethod
~/venv/lib/python3.8/site-packages/modin/core/io/text/csv_glob_dispatcher.py in _read(cls, filepath_or_buffer, **kwargs)
60 if isinstance(filepath_or_buffer, str):
61 if not cls.file_exists(filepath_or_buffer):
---> 62 return cls.single_worker_read(filepath_or_buffer, **kwargs)
63 filepath_or_buffer = cls.get_path(filepath_or_buffer)
64 elif not cls.pathlib_or_pypath(filepath_or_buffer):
~/venv/lib/python3.8/site-packages/modin/core/storage_formats/pandas/parsers.py in single_worker_read(cls, fname, **kwargs)
267 ErrorMessage.default_to_pandas("Parameters provided")
268 # Use default args for everything
--> 269 pandas_frame = cls.parse(fname, **kwargs)
270 if isinstance(pandas_frame, pandas.io.parsers.TextFileReader):
271 pd_read = pandas_frame.read
~/venv/lib/python3.8/site-packages/modin/core/storage_formats/pandas/parsers.py in parse(chunks, **kwargs)
310
311 pandas_dfs = []
--> 312 for fname, start, end in chunks:
313 if start is not None and end is not None:
314 # pop "compression" from kwargs because bio is uncompressed
ValueError: not enough values to unpack (expected 3, got 1)
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (9 by maintainers)
Top Results From Across the Web
API Documentation - Error handling - Rev
Where a standard HTTP error is suffiently descriptive, e.g. 401 (Not Authorized) or 404 (Not Found), the response body will be empty. For...
Read more >Error messages | Document AI - Google Cloud
Learn how to resolve some errors raised by Document AI. This topic discusses errors whose resolutions require more steps than can be easily...
Read more >(Document or Process) Error detail view
In the execution detail view, clicking an error message in the Documents with Errors list or above the connections list opens a detail...
Read more >error Reference - Max 8 Documentation
Listens for and reports Max errors as message output. This will allow for error management in cases where it is not appropriate to...
Read more >You asked for it, you got it: Error code troubleshooting tips
Recommendations include troubleshooting steps and links to additional documentation that can help you resolve the error. An example is below.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@YarShev this can be repurposed to add better error message and usage documentation.
Hi @c3-cjazra!
I don’t see
*
symbol in your URL, without which functionread_csv_glob
will not work.Can you try
read_csv_glob("s3://nyc-tlc/trip data/yellow_tripdata_2020-*")
?