question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Keyerror when slicing dating

See original GitHub issue

This looks similar to #2211 but I’m not sure. I’ve attached a zip with sample data and code that reproduces it. If you uncomment line 25 it works for some reason.

import dask.bag
import pandas as pd
import re
from datetime import datetime
schema_dict = {
    'timestamp': 'datetime64[ns]',
}


time_regex = r'\[(?P<time>[^]]+)\]'
time_regex = re.compile(time_regex)


def get_log_dict(line):
    match = time_regex.match(line)
    dt = pd.datetime.strptime(match.groupdict()['time'], '%d/%b/%Y:%H:%M:%S +0000')
    return {'timestamp': dt}


files = ['2012-09-25.log', '2012-09-26.log', '2012-09-27.log']
b = dask.bag.read_text(files, blocksize=5000000).map(get_log_dict).to_dataframe(schema_dict)
b = b[~b.timestamp.isnull()]
b = b.set_index('timestamp')
b = b[sorted(b.columns)]
# b = b.repartition(freq='15m')
start = datetime(2012, 9, 26)
end = datetime(2012, 9, 27)
b = b.loc[start:end]
b.compute()

Archive.zip

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
TomAugspurgercommented, May 2, 2017

Yeah, this looks very similar.

I think dask’s .loc will have to protect against the index not being monotonic / sorted, and fall back to boundary_slice if it isn’t. I can take a closer look tonight or tomorrow morning.

0reactions
shughes-ukcommented, May 3, 2017

If you’re going to raise an error the docs should probably be changed to reflect the ‘mostly sorted’ status and perhaps include the workaround for it. It doesn’t sound like you’re going to go that route though.

Read more comments on GitHub >

github_iconTop Results From Across the Web

KeyError When Slicing Dataframe - python - Stack Overflow
When I run it, I get a KeyError that reads ['DRY DENSITY' 'R.C.'] not index. I varified that the names are correct for...
Read more >
Indexing and selecting data - Pandas
Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the...
Read more >
1.4 Label-based slicing conventions — Pandas Doc
0 is not in the index In [9]: df.loc[0:4, :] KeyError: 0 # 3 is not a unique ... use case is to...
Read more >
Indexing time series data in pandas - wrighters.io
This KeyError is raised because in a DataFrame , using a single argument to the ... Unlike slicing which includes all values that...
Read more >
[Solved]-KeyError: 'Date'-Pandas,Python - appsloveworld.com
Related Query · Why does pandas generate a KeyError when looking up date in date-indexed table? · KeyError after combining date and hour...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found