Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Weird behavior when concatting dataframes

See original GitHub issue

I’m having an issue with a timeindex after I run a query then concat new columns based off the return of an apply on a column. If I don’t do the query or I don’t concat it works ok. Both together seems to cause an issue. I’ve attached a csv with sample data to use and code that should reproduce the error below.

import pandas as pd
import dask.dataframe

schema_keys = {'index': int, 'timestamp': int, 'origin_port': int}

df = dask.dataframe.read_csv(
    ['test_dask.csv'],
    dtype=schema_keys,
    converters={'timestamp': lambda x: pd.to_datetime(int(x), unit='ns')},
    blocksize=None
)
df = df.set_index('timestamp')
df = df.query("origin_port == 0")


def generate_new_columns(port):
    return pd.Series({'col1': '', 'col2': None})


newcols = df['origin_port'].apply(generate_new_columns, meta={'col1': str, 'col2': object})
df = dask.dataframe.concat([df, newcols], axis=1)
print(df.compute())

test_dask.csv.zip

The full traceback is

Traceback (most recent call last):
  File "test_dask.py", line 22, in <module>
    print(df.compute())
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/base.py", line 96, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/base.py", line 203, in compute
    results = get(dsk, keys, **kwargs)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/threaded.py", line 76, in get
    **kwargs)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/async.py", line 525, in get_async
    raise(remote_exception(res, tb))
dask.async.KeyError: 1483228800084000000

Traceback
---------
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/async.py", line 291, in execute_task
    result = _execute_task(task, data)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/async.py", line 272, in _execute_task
    return func(*args2)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/dask/dataframe/methods.py", line 58, in boundary_slice
    result = getattr(df, kind)[start:stop]
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/core/indexing.py", line 1312, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/core/indexing.py", line 1453, in _getitem_axis
    return self._get_slice_axis(key, axis=axis)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/core/indexing.py", line 1334, in _get_slice_axis
    slice_obj.step, kind=self.name)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/tseries/index.py", line 1498, in slice_indexer
    return Index.slice_indexer(self, start, end, step, kind=kind)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2997, in slice_indexer
    kind=kind)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 3176, in slice_locs
    start_slice = self.get_slice_bound(start, 'left', kind)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 3125, in get_slice_bound
    raise err
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 3119, in get_slice_bound
    slc = self.get_loc(label)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/tseries/index.py", line 1402, in get_loc
    return Index.get_loc(self, key, method, tolerance)
  File "/Users/shughes/miniconda3/lib/python3.5/site-packages/pandas/indexes/base.py", line 2136, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/index.pyx", line 553, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11829)
  File "pandas/index.pyx", line 578, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:11425)
  File "pandas/index.pyx", line 175, in pandas.index.IndexEngine._get_loc_duplicates (pandas/index.c:4663)
  File "pandas/index.pyx", line 421, in pandas.index.Int64Engine._maybe_get_bool_indexer (pandas/index.c:8636)

Issue Analytics

State:
Created 6 years ago
Comments:11 (11 by maintainers)

Top GitHub Comments

2reactions

TomAugspurgercommented, Apr 13, 2017

Which if I understand you correctly means there should not be an error?

My fault, I had modified your script to concat to the original, not the version filtered with .query.

Running your original script on my branch in #2214 does work.

0reactions

mrocklincommented, Apr 13, 2017

It breaks other things.

On Thu, Apr 13, 2017 at 1:09 PM, Tom Augspurger notifications@github.com wrote:

Ha, ok. In this case, Matt’s suggestion of changing boundary slice to

start = max(start, df.index.min()) stop = min(stop, df.index.max())

does fix it. I’ll put together a PR quick, once I verify that that change doesn’t break anything else.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/2211#issuecomment-293963128, or mute the thread https://github.com/notifications/unsubscribe-auth/AASszKjHJ9LcsejFWF3ANElVfrOGwO6Pks5rvla5gaJpZM4M8G3C .

Top Results From Across the Web

python - Strange behaviour in pandas concat - Stack Overflow

I believe your issue is that you are not resetting the index after concatenation, but before selecting the data. Try:

Merge, join, concatenate and compare - Pandas

When concatenating DataFrames with named axes, pandas will attempt to preserve these index/column names ... First, the default join='outer' behavior:.

[Code]-Strange behaviour in pandas concat-pandas

Where these 13 dataframes are produced through that list comprehension, I get a very strange result. It's as if I have set axis=1...

rbind Concatenate data frames by row, keeping any zero-row...

Behaviour for scalars is IMO weird; see Examples. The idea seems to be to turn each scalar into a single-row data frame, coercing...

Spark DataFrame Union and Union All - Spark by {Examples}

But, in spark both behave the same and use DataFrame duplicate function to remove duplicate ... Combine two or more DataFrames using union....