question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Descriptive error message and usage documentation for read_csv_glob

See original GitHub issue

System information

  • Linux Ubuntu 16.04
  • ray 1.8.0
  • modin 0.12.0
  • python 3.8

I’m seeing the below issue with read_csv_glob pointing to an S3 prefix that would results in multiple files being returned. I’m using the url s3://nyc-tlc/trip data/yellow_tripdata_2020- which returns several files. Note that this work when I point to an S3 url to a single .csv file.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_13805/755823039.py in <module>
      2 
      3 file_path = 's3://nyc-tlc/trip data/yellow_tripdata_2020-'
----> 4 modin_df = pd.read_csv_glob(file_path, parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"], quoting=3)
      5 
      6 

~/venv/lib/python3.8/site-packages/modin/experimental/pandas/io.py in parser_func(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    181 
    182         kwargs = {k: v for k, v in f_locals.items() if k in _pd_read_csv_signature}
--> 183         return _read(**kwargs)
    184 
    185     parser_func.__doc__ = _read.__doc__

~/venv/lib/python3.8/site-packages/modin/experimental/pandas/io.py in _read(**kwargs)
    206 
    207     try:
--> 208         pd_obj = FactoryDispatcher.read_csv_glob(**kwargs)
    209     except AttributeError:
    210         raise AttributeError("read_csv_glob() is only implemented for pandas on Ray.")

~/venv/lib/python3.8/site-packages/modin/core/execution/dispatching/factories/dispatcher.py in read_csv_glob(cls, **kwargs)
    183     @_inherit_docstrings(factories.ExperimentalPandasOnRayFactory._read_csv_glob)
    184     def read_csv_glob(cls, **kwargs):
--> 185         return cls.__factory._read_csv_glob(**kwargs)
    186 
    187     @classmethod

~/venv/lib/python3.8/site-packages/modin/core/execution/dispatching/factories/factories.py in _read_csv_glob(cls, **kwargs)
    511     )
    512     def _read_csv_glob(cls, **kwargs):
--> 513         return cls.io_cls.read_csv_glob(**kwargs)
    514 
    515     @classmethod

~/venv/lib/python3.8/site-packages/modin/core/io/text/csv_glob_dispatcher.py in _read(cls, filepath_or_buffer, **kwargs)
     60         if isinstance(filepath_or_buffer, str):
     61             if not cls.file_exists(filepath_or_buffer):
---> 62                 return cls.single_worker_read(filepath_or_buffer, **kwargs)
     63             filepath_or_buffer = cls.get_path(filepath_or_buffer)
     64         elif not cls.pathlib_or_pypath(filepath_or_buffer):

~/venv/lib/python3.8/site-packages/modin/core/storage_formats/pandas/parsers.py in single_worker_read(cls, fname, **kwargs)
    267         ErrorMessage.default_to_pandas("Parameters provided")
    268         # Use default args for everything
--> 269         pandas_frame = cls.parse(fname, **kwargs)
    270         if isinstance(pandas_frame, pandas.io.parsers.TextFileReader):
    271             pd_read = pandas_frame.read

~/venv/lib/python3.8/site-packages/modin/core/storage_formats/pandas/parsers.py in parse(chunks, **kwargs)
    310 
    311         pandas_dfs = []
--> 312         for fname, start, end in chunks:
    313             if start is not None and end is not None:
    314                 # pop "compression" from kwargs because bio is uncompressed

ValueError: not enough values to unpack (expected 3, got 1)

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:11 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
devin-petersohncommented, Dec 6, 2021

@YarShev this can be repurposed to add better error message and usage documentation.

1reaction
anmyachevcommented, Dec 2, 2021

Hi @c3-cjazra!

I don’t see * symbol in your URL, without which function read_csv_glob will not work.

Can you try read_csv_glob("s3://nyc-tlc/trip data/yellow_tripdata_2020-*")?

Read more comments on GitHub >

github_iconTop Results From Across the Web

API Documentation - Error handling - Rev
Where a standard HTTP error is suffiently descriptive, e.g. 401 (Not Authorized) or 404 (Not Found), the response body will be empty. For...
Read more >
Error messages | Document AI - Google Cloud
Learn how to resolve some errors raised by Document AI. This topic discusses errors whose resolutions require more steps than can be easily...
Read more >
(Document or Process) Error detail view
In the execution detail view, clicking an error message in the Documents with Errors list or above the connections list opens a detail...
Read more >
error Reference - Max 8 Documentation
Listens for and reports Max errors as message output. This will allow for error management in cases where it is not appropriate to...
Read more >
You asked for it, you got it: Error code troubleshooting tips
Recommendations include troubleshooting steps and links to additional documentation that can help you resolve the error. An example is below.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found