Keyerror when slicing dating
See original GitHub issueThis looks similar to #2211 but I’m not sure. I’ve attached a zip with sample data and code that reproduces it. If you uncomment line 25 it works for some reason.
import dask.bag
import pandas as pd
import re
from datetime import datetime
schema_dict = {
'timestamp': 'datetime64[ns]',
}
time_regex = r'\[(?P<time>[^]]+)\]'
time_regex = re.compile(time_regex)
def get_log_dict(line):
match = time_regex.match(line)
dt = pd.datetime.strptime(match.groupdict()['time'], '%d/%b/%Y:%H:%M:%S +0000')
return {'timestamp': dt}
files = ['2012-09-25.log', '2012-09-26.log', '2012-09-27.log']
b = dask.bag.read_text(files, blocksize=5000000).map(get_log_dict).to_dataframe(schema_dict)
b = b[~b.timestamp.isnull()]
b = b.set_index('timestamp')
b = b[sorted(b.columns)]
# b = b.repartition(freq='15m')
start = datetime(2012, 9, 26)
end = datetime(2012, 9, 27)
b = b.loc[start:end]
b.compute()
Issue Analytics
- State:
- Created 6 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
KeyError When Slicing Dataframe - python - Stack Overflow
When I run it, I get a KeyError that reads ['DRY DENSITY' 'R.C.'] not index. I varified that the names are correct for...
Read more >Indexing and selecting data - Pandas
Every label asked for must be in the index, or a KeyError will be raised. When slicing, both the start bound AND the...
Read more >1.4 Label-based slicing conventions — Pandas Doc
0 is not in the index In [9]: df.loc[0:4, :] KeyError: 0 # 3 is not a unique ... use case is to...
Read more >Indexing time series data in pandas - wrighters.io
This KeyError is raised because in a DataFrame , using a single argument to the ... Unlike slicing which includes all values that...
Read more >[Solved]-KeyError: 'Date'-Pandas,Python - appsloveworld.com
Related Query · Why does pandas generate a KeyError when looking up date in date-indexed table? · KeyError after combining date and hour...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yeah, this looks very similar.
I think dask’s
.loc
will have to protect against the index not being monotonic / sorted, and fall back toboundary_slice
if it isn’t. I can take a closer look tonight or tomorrow morning.If you’re going to raise an error the docs should probably be changed to reflect the ‘mostly sorted’ status and perhaps include the workaround for it. It doesn’t sound like you’re going to go that route though.