Weird behavior when concatting dataframes
See original GitHub issueI’m having an issue with a timeindex after I run a query then concat new columns based off the return of an apply on a column. If I don’t do the query or I don’t concat it works ok. Both together seems to cause an issue. I’ve attached a csv with sample data to use and code that should reproduce the error below.
import pandas as pd
import dask.dataframe
schema_keys = {'index': int, 'timestamp': int, 'origin_port': int}
df = dask.dataframe.read_csv(
['test_dask.csv'],
dtype=schema_keys,
converters={'timestamp': lambda x: pd.to_datetime(int(x), unit='ns')},
blocksize=None
)
df = df.set_index('timestamp')
df = df.query("origin_port == 0")
def generate_new_columns(port):
return pd.Series({'col1': '', 'col2': None})
newcols = df['origin_port'].apply(generate_new_columns, meta={'col1': str, 'col2': object})
df = dask.dataframe.concat([df, newcols], axis=1)
print(df.compute())
The full traceback is
Traceback (most recent call last):
File "test_dask.py", line 22, in <module>
print(df.compute())
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/base.py", line 96, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/base.py", line 203, in compute
results = get(dsk, keys, **kwargs)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/threaded.py", line 76, in get
**kwargs)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/async.py", line 525, in get_async
raise(remote_exception(res, tb))
dask.async.KeyError: 1483228800084000000
Traceback
---------
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/async.py", line 291, in execute_task
result = _execute_task(task, data)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/async.py", line 272, in _execute_task
return func(*args2)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/dataframe/methods.py", line 58, in boundary_slice
result = getattr(df, kind)[start:stop]
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/core/indexing.py", line 1312, in __getitem__
return self._getitem_axis(key, axis=0)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/core/indexing.py", line 1453, in _getitem_axis
return self._get_slice_axis(key, axis=axis)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/core/indexing.py", line 1334, in _get_slice_axis
slice_obj.step, kind=self.name)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/tseries/index.py", line 1498, in slice_indexer
return Index.slice_indexer(self, start, end, step, kind=kind)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2997, in slice_indexer
kind=kind)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 3176, in slice_locs
start_slice = self.get_slice_bound(start, 'left', kind)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 3125, in get_slice_bound
raise err
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 3119, in get_slice_bound
slc = self.get_loc(label)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/tseries/index.py", line 1402, in get_loc
return Index.get_loc(self, key, method, tolerance)
File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2136, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/index.pyx", line 553, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11829)
File "pandas/index.pyx", line 578, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11425)
File "pandas/index.pyx", line 175, in pandas.index.IndexEngine._get_loc_duplicates (pandas/index.c:4663)
File "pandas/index.pyx", line 421, in pandas.index.Int64Engine._maybe_get_bool_indexer (pandas/index.c:8636)
Issue Analytics
- State:
- Created 6 years ago
- Comments:11 (11 by maintainers)
Top Results From Across the Web
python - Strange behaviour in pandas concat - Stack Overflow
I believe your issue is that you are not resetting the index after concatenation, but before selecting the data. Try:
Read more >Merge, join, concatenate and compare - Pandas
When concatenating DataFrames with named axes, pandas will attempt to preserve these index/column names ... First, the default join='outer' behavior:.
Read more >[Code]-Strange behaviour in pandas concat-pandas
Where these 13 dataframes are produced through that list comprehension, I get a very strange result. It's as if I have set axis=1...
Read more >rbind Concatenate data frames by row, keeping any zero-row...
Behaviour for scalars is IMO weird; see Examples. The idea seems to be to turn each scalar into a single-row data frame, coercing...
Read more >Spark DataFrame Union and Union All - Spark by {Examples}
But, in spark both behave the same and use DataFrame duplicate function to remove duplicate ... Combine two or more DataFrames using union....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
My fault, I had modified your script to concat to the original, not the version filtered with
.query
.Running your original script on my branch in #2214 does work.
It breaks other things.
On Thu, Apr 13, 2017 at 1:09 PM, Tom Augspurger notifications@github.com wrote: