Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Descriptive error message and usage documentation for read_csv_glob

See original GitHub issue

System information

Linux Ubuntu 16.04
ray 1.8.0
modin 0.12.0
python 3.8

I’m seeing the below issue with read_csv_glob pointing to an S3 prefix that would results in multiple files being returned. I’m using the url s3://nyc-tlc/trip data/yellow_tripdata_2020- which returns several files. Note that this work when I point to an S3 url to a single .csv file.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_13805/755823039.py in <module>
      2 
      3 file_path = 's3://nyc-tlc/trip data/yellow_tripdata_2020-'
----> 4 modin_df = pd.read_csv_glob(file_path, parse_dates=["tpep_pickup_datetime", "tpep_dropoff_datetime"], quoting=3)
      5 
      6 

~/venv/lib/python3.8/site-packages/modin/experimental/pandas/io.py in parser_func(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    181 
    182         kwargs = {k: v for k, v in f_locals.items() if k in _pd_read_csv_signature}
--> 183         return _read(**kwargs)
    184 
    185     parser_func.__doc__ = _read.__doc__

~/venv/lib/python3.8/site-packages/modin/experimental/pandas/io.py in _read(**kwargs)
    206 
    207     try:
--> 208         pd_obj = FactoryDispatcher.read_csv_glob(**kwargs)
    209     except AttributeError:
    210         raise AttributeError("read_csv_glob() is only implemented for pandas on Ray.")

~/venv/lib/python3.8/site-packages/modin/core/execution/dispatching/factories/dispatcher.py in read_csv_glob(cls, **kwargs)
    183     @_inherit_docstrings(factories.ExperimentalPandasOnRayFactory._read_csv_glob)
    184     def read_csv_glob(cls, **kwargs):
--> 185         return cls.__factory._read_csv_glob(**kwargs)
    186 
    187     @classmethod

~/venv/lib/python3.8/site-packages/modin/core/execution/dispatching/factories/factories.py in _read_csv_glob(cls, **kwargs)
    511     )
    512     def _read_csv_glob(cls, **kwargs):
--> 513         return cls.io_cls.read_csv_glob(**kwargs)
    514 
    515     @classmethod

~/venv/lib/python3.8/site-packages/modin/core/io/text/csv_glob_dispatcher.py in _read(cls, filepath_or_buffer, **kwargs)
     60         if isinstance(filepath_or_buffer, str):
     61             if not cls.file_exists(filepath_or_buffer):
---> 62                 return cls.single_worker_read(filepath_or_buffer, **kwargs)
     63             filepath_or_buffer = cls.get_path(filepath_or_buffer)
     64         elif not cls.pathlib_or_pypath(filepath_or_buffer):

~/venv/lib/python3.8/site-packages/modin/core/storage_formats/pandas/parsers.py in single_worker_read(cls, fname, **kwargs)
    267         ErrorMessage.default_to_pandas("Parameters provided")
    268         # Use default args for everything
--> 269         pandas_frame = cls.parse(fname, **kwargs)
    270         if isinstance(pandas_frame, pandas.io.parsers.TextFileReader):
    271             pd_read = pandas_frame.read

~/venv/lib/python3.8/site-packages/modin/core/storage_formats/pandas/parsers.py in parse(chunks, **kwargs)
    310 
    311         pandas_dfs = []
--> 312         for fname, start, end in chunks:
    313             if start is not None and end is not None:
    314                 # pop "compression" from kwargs because bio is uncompressed

ValueError: not enough values to unpack (expected 3, got 1)