Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

numpy error using read_csv with parse_dates=[...] and index_col=[...]

See original GitHub issue

Consider a file of the following format:

week,sow,prn,rxstatus,az,elv,l1_cno,s4,s4_cor,secsigma1,secsigma3,secsigma10,secsigma30,secsigma60,code_carrier,c_cstdev,tec45,tecrate45,tec30,tecrate30,tec15,tecrate15,tec00,tecrate00,l1_loctime,chanstatus,l2_locktime,l2_cno
1765,68460.00,126,00E80000,0.00,0.00,39.38,0.118447,0.107595,0.252663,0.532384,0.600540,0.603073,0.603309,-13.255543,0.114,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1692.182,8C023D84,0.000,0.00
1765,68460.00,23,00E80000,0.00,0.00,53.48,0.034255,0.021177,0.035187,0.042985,0.061142,0.061738,0.061801,-22.760003,0.015,24.955111,0.112239,25.115330,-0.119774,25.146603,-0.065852,24.747576,-0.243804,10426.426,08109CC4,10409.660,44.52
1765,68460.00,13,00E80000,0.00,0.00,54.28,0.046218,0.019314,0.037818,0.056421,0.060602,0.060698,0.060735,-20.679035,0.090,25.670250,-0.070761,25.752224,-0.055089,26.045048,-0.180056,25.360369,-0.062119,7553.020,18109CA4,7202.660,47.27

I try to read that with the following code

data = pd.read_csv(FILE, date_parser=GPStime2datetime,
                   parse_dates={'datetime': ['week', 'sow']},
                   index_col=['datetime', 'prn'])

Here I’m parsing week and sow into a datetime column using a custom function (this works properly) and using datetime and the prn column as a MultiIndex. The file is read successfully when index_col='datetime', but not when trying to create the MultiIndex using index_col=['datetime', 'prn'] (or when using column numbers instead of names). I get the following traceback:

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 474, in parser_f
    return _read(filepath_or_buffer, kwds)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 260, in _read
    return parser.read()

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 721, in read
    ret = self._engine.read(nrows)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1223, in read
    index, names = self._make_index(data, alldata, names)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 898, in _make_index
    index = self._agg_index(index, try_parse_dates=False)

  File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 984, in _agg_index
    index = MultiIndex.from_arrays(arrays, names=self.index_names)

  File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 4410, in from_arrays
    cats = [Categorical.from_array(arr, ordered=True) for arr in arrays]

  File "C:\Anaconda\lib\site-packages\pandas\core\categorical.py", line 355, in from_array
    return Categorical(data, **kwargs)

  File "C:\Anaconda\lib\site-packages\pandas\core\categorical.py", line 271, in __init__
    codes, categories = factorize(values, sort=False)

  File "C:\Anaconda\lib\site-packages\pandas\core\algorithms.py", line 131, in factorize
    (hash_klass, vec_klass), vals = _get_data_algo(vals, _hashtables)

  File "C:\Anaconda\lib\site-packages\pandas\core\algorithms.py", line 412, in _get_data_algo
    mask = com.isnull(values)

  File "C:\Anaconda\lib\site-packages\pandas\core\common.py", line 230, in isnull
    return _isnull(obj)

  File "C:\Anaconda\lib\site-packages\pandas\core\common.py", line 240, in _isnull_new
    return _isnull_ndarraylike(obj)

  File "C:\Anaconda\lib\site-packages\pandas\core\common.py", line 330, in _isnull_ndarraylike
    result = np.isnan(values)

TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

I am using Python 2.7, Pandas 0.16.1 and numpy 1.9.2.

Issue Analytics

State:
Created 8 years ago
Comments:31 (31 by maintainers)

Top GitHub Comments

1reaction

jrebackcommented, Jun 1, 2015

@cmeeren ok thanks.

The basic issue is that some the inference in read_csv is not as general as to_datetime which correctly handles all of these cases. So the output of the date_parser needs to be coerced to fix this.

pull-requests are welcome!

0reactions

jrebackcommented, Jun 2, 2015

best to do a pull-request. you need to add your example as a test.