numpy error using read_csv with parse_dates=[...] and index_col=[...]
See original GitHub issueConsider a file of the following format:
week,sow,prn,rxstatus,az,elv,l1_cno,s4,s4_cor,secsigma1,secsigma3,secsigma10,secsigma30,secsigma60,code_carrier,c_cstdev,tec45,tecrate45,tec30,tecrate30,tec15,tecrate15,tec00,tecrate00,l1_loctime,chanstatus,l2_locktime,l2_cno
1765,68460.00,126,00E80000,0.00,0.00,39.38,0.118447,0.107595,0.252663,0.532384,0.600540,0.603073,0.603309,-13.255543,0.114,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,1692.182,8C023D84,0.000,0.00
1765,68460.00,23,00E80000,0.00,0.00,53.48,0.034255,0.021177,0.035187,0.042985,0.061142,0.061738,0.061801,-22.760003,0.015,24.955111,0.112239,25.115330,-0.119774,25.146603,-0.065852,24.747576,-0.243804,10426.426,08109CC4,10409.660,44.52
1765,68460.00,13,00E80000,0.00,0.00,54.28,0.046218,0.019314,0.037818,0.056421,0.060602,0.060698,0.060735,-20.679035,0.090,25.670250,-0.070761,25.752224,-0.055089,26.045048,-0.180056,25.360369,-0.062119,7553.020,18109CA4,7202.660,47.27
I try to read that with the following code
data = pd.read_csv(FILE, date_parser=GPStime2datetime,
parse_dates={'datetime': ['week', 'sow']},
index_col=['datetime', 'prn'])
Here I’m parsing week
and sow
into a datetime
column using a custom function (this works properly) and using datetime
and the prn
column as a MultiIndex
. The file is read successfully when index_col='datetime'
, but not when trying to create the MultiIndex
using index_col=['datetime', 'prn']
(or when using column numbers instead of names). I get the following traceback:
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 260, in _read
return parser.read()
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 721, in read
ret = self._engine.read(nrows)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1223, in read
index, names = self._make_index(data, alldata, names)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 898, in _make_index
index = self._agg_index(index, try_parse_dates=False)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 984, in _agg_index
index = MultiIndex.from_arrays(arrays, names=self.index_names)
File "C:\Anaconda\lib\site-packages\pandas\core\index.py", line 4410, in from_arrays
cats = [Categorical.from_array(arr, ordered=True) for arr in arrays]
File "C:\Anaconda\lib\site-packages\pandas\core\categorical.py", line 355, in from_array
return Categorical(data, **kwargs)
File "C:\Anaconda\lib\site-packages\pandas\core\categorical.py", line 271, in __init__
codes, categories = factorize(values, sort=False)
File "C:\Anaconda\lib\site-packages\pandas\core\algorithms.py", line 131, in factorize
(hash_klass, vec_klass), vals = _get_data_algo(vals, _hashtables)
File "C:\Anaconda\lib\site-packages\pandas\core\algorithms.py", line 412, in _get_data_algo
mask = com.isnull(values)
File "C:\Anaconda\lib\site-packages\pandas\core\common.py", line 230, in isnull
return _isnull(obj)
File "C:\Anaconda\lib\site-packages\pandas\core\common.py", line 240, in _isnull_new
return _isnull_ndarraylike(obj)
File "C:\Anaconda\lib\site-packages\pandas\core\common.py", line 330, in _isnull_ndarraylike
result = np.isnan(values)
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I am using Python 2.7, Pandas 0.16.1 and numpy 1.9.2.
Issue Analytics
- State:
- Created 8 years ago
- Comments:31 (31 by maintainers)
Top Results From Across the Web
Error when parsing timestamp with pandas read_csv
For me works change format to %Y-%m-%d %H:%M : def dateparse (timestamp): return pd.datetime.strptime(timestamp, '%Y-%m-%d %H:%M'). Sample:
Read more >pandas.read_csv — pandas 1.5.2 documentation
Column(s) to use as the row labels of the DataFrame , either given as string name or column index. If a sequence of...
Read more >How to “read_csv” with Pandas - Towards Data Science
The data type of StartDate column is object but we know this column includes dates so we can read the values as date...
Read more >Pandas read_csv() - How to read a csv file in Python
You can convert them to a pandas DataFrame using the read_csv function. ... parse_dates=True df = pd.read_csv("data.csv", index_col='Date', ...
Read more >Pandas read_csv() Tutorial: Importing Data - DataCamp
You're now ready to import the CSV file into Python using read_csv() from pandas ... file paths and convert your flat file as...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@cmeeren ok thanks.
The basic issue is that some the inference in
read_csv
is not as general asto_datetime
which correctly handles all of these cases. So the output of the date_parser needs to be coerced to fix this.pull-requests are welcome!
best to do a pull-request. you need to add your example as a test.