question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CLN remove duplicated entries in valid_resos in pandas/core/indexes/datetimes.py

See original GitHub issue
  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Before running, it is easy! just check the lines 560-570 of the pandas/code/indexes/datetime.py and you will understand the human error some developer did 😃

# Your code here



import pandas as pd
import datetime as dt

Inside Github.csv, we have:

'''
,RxID,msgtype,message,confirmed,SNR,strength,format,icao24,correctedBits,LowConf,freqOffst
2020-08-25 11:00:08.187503455,1,10,4000,,0.0,-80.5,,,,,
2020-08-25 11:00:08.189013753,1,10,4000,,0.0,-78.5,,,,,
2020-08-25 11:00:08.189746310,1,8,,,0.0,-79.0,,,,,
2020-08-25 11:00:08.189986916,1,10,2700,,0.0,-78.0,,,,,
2020-08-25 11:00:08.190476779,1,6,20A93040502C94,1,0.0,-79.5,4.0,A1F014,,,
2020-08-25 11:00:08.190482661,1,9,0000,,0.0,-79.0,,,,,
2020-08-25 11:00:08.190963454,1,6,58EB0000171D17,1,0.0,-76.5,11.0,,,,
2020-08-25 11:00:08.191085134,1,9,7F00,,0.0,-78.5,,,,,
2020-08-25 11:00:08.191087092,1,9,0000,,0.0,-79.5,,,,,
2020-08-25 11:00:08.191123383,1,9,1000,,0.0,-72.0,,,,,
2020-08-25 11:00:08.191139020,1,7,,,0.0,-77.0,,,,,
2020-08-25 11:00:08.191150695,1,7,,,0.0,-76.5,,,,,
2020-08-25 11:00:08.191590978,1,10,0000,,0.0,-70.5,,,,,
2020-08-25 11:00:08.193015479,1,10,0000,,0.0,-83.0,,,,,
2020-08-25 11:00:08.193509041,1,9,2800,,0.0,-78.0,,,,,
2020-08-25 11:00:08.193664650,1,8,,,0.0,-80.5,,,,,
2020-08-25 11:00:08.193992571,1,10,0000,,0.0,-81.5,,,,,
2020-08-25 11:00:08.194459459,1,10,7D00,,0.0,-76.5,,,,,
2020-08-25 11:00:08.194461492,1,10,0001,,0.0,-76.0,,,,,
2020-08-25 11:00:08.195045194,1,10,0000,,0.0,-80.0,,,,,
2020-08-25 11:00:08.195061385,1,8,,,0.0,-80.0,,,,,
2020-08-25 11:00:08.195102628,1,10,0000,,0.0,-75.0,,,,,

'''
dfPKT = pd.read_csv('Github.csv',dtype={'RxId': int, 'msgtype': int, 'message': str, 'confirmed': object,'SNR': float, 'strength': float, 'format': float, 'icao24': str, 'correctedBits': float,'LowConf': float, 'freqOffst': float},index_col=0)
i=pd.DatetimeIndex(dfPKT.index,dayfirst=True)
dfPKT.set_index(i,inplace=True)

#occupancy constant
occupancyMode3Ainterrogation = 0.00015675

#How many messages are inside valid Mode A interrogation.
validModeAint=[]

#We start with the first timestamp as a reference.
Next_timestamp = dfPKT.index[0]
#Then, we get all the minutes of our study/analysis.
allminutes = dfPKT.index.floor('60S').unique()
#and for every minute, we do a data analysis.
for minute in allminutes:
    #Here, since the dataset is very big, I get small datasets of 1 minute each. 
    dfAnalysis = dfPKT.loc[(dfPKT.index > minute) & (dfPKT.index < minute + dt.timedelta(seconds=60))]
    for message, index, format, code, msgtype in zip(dfAnalysis["message"], dfAnalysis.index, dfAnalysis["format"],dfAnalysis["icao24"], dfAnalysis["msgtype"]):
    #Depending on the type of the message, we process it in one way or another
        if msgtype == 10:
            bits = bin(int(message, 16))[2:].zfill(16) #this is the 16 bits inside the messagetype 10. 
            if bits[:2] == '00': #if first two bits are zero
                '''
                when these two specific bits on the message are 0, we have to check on the same dataset if there is a message linked to it, from 8 to 13 microseconds after the actual message. HERE is were we get the error. It is trying to slice the datetime strings, and internally, it does not have a "milisecond" resolution. Altough my resolution in here is microseconds. But internally, it tries to do something with miliseconds and It breaks the code.
                '''
                if (2 in dfAnalysis[str(index + dt.timedelta(seconds=0.000008)):str(index + dt.timedelta(seconds=0.000013))]["msgtype"].to_numpy()):
                    validModeAint.append(index)
                    Next_timestamp = index + dt.timedelta(seconds=occupancyMode3Ainterrogation)

Problem description

Traceback (most recent call last):
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 718, in slice_indexer
    return Index.slice_indexer(self, start, end, step, kind=kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 4966, in slice_indexer
    start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 5169, in slice_locs
    start_slice = self.get_slice_bound(start, "left", kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 5079, in get_slice_bound
    label = self._maybe_cast_slice_bound(label, side, kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 665, in _maybe_cast_slice_bound
    lower, upper = self._parsed_string_to_bounds(reso, parsed)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 536, in _parsed_string_to_bounds
    raise KeyError
KeyError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\OccupancyScripts\OccupancyToKML\Github.py", line 31, in <module>
    if (2 in dfAnalysis[str(index + dt.timedelta(seconds=0.000008)):str(index + dt.timedelta(seconds=0.000013))]["msgtype"].to_numpy()):
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\frame.py", line 2881, in __getitem__
    indexer = convert_to_index_sliceable(self, key)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexing.py", line 2132, in convert_to_index_sliceable
    return idx._convert_slice_indexer(key, kind="getitem")
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\base.py", line 3190, in _convert_slice_indexer
    indexer = self.slice_indexer(start, stop, step, kind=kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 728, in slice_indexer
    start_casted = self._maybe_cast_slice_bound(start, "left", kind)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 665, in _maybe_cast_slice_bound
    lower, upper = self._parsed_string_to_bounds(reso, parsed)
  File "C:\Users\bkaguila\dev\anaconda\envs\Occupancies\lib\site-packages\pandas\core\indexes\datetimes.py", line 536, in _parsed_string_to_bounds
    raise KeyError
KeyError

Process finished with exit code 1

The current behavior is that the indexes are partitioned, as they are strings, and the reso or Resolution class, which has nanosecond and minisecond resolution, then is passed to the datetime.py file on the pandas/core/indexes, which has the valid_resos (on line 520) wrong.

Expected Output

Process finished with exit code 0

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 67a3d4241ab84419856b84fc3ebc9abcbe66c6b3 python : 3.9.0.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252 pandas : 1.1.4 numpy : 1.19.4 pytz : 2020.1 dateutil : 2.8.1 pip : 20.2.4 setuptools : 50.3.1.post20201107 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : None IPython : None pandas_datareader: None bs4 : None bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.3.3 numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pytables : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None numba : None

[paste the output of pd.show_versions() here leaving a blank line after the details tag]

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:19 (10 by maintainers)

github_iconTop GitHub Comments

1reaction
MarcoGorellicommented, Jan 31, 2021

They’ve opened #39503, which is clearer than this issue, so closing in favour of that one

1reaction
boringowcommented, Jan 31, 2021

Okay! sorry about that I didn’t know it.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python: 5 Ways to Remove Duplicates from List - FavTutor
Remove duplicates from a list in python with these unique ways. Learn how to remove duplicates in a python list with source code...
Read more >
removing duplicate entries from multi-d array in python
One workaround would be to serialize xx elements, and then do a list(set()) on new array and then unserialize all the elements back...
Read more >
Removing Duplicated Data in Pandas: A Step-by-Step Guide
Learn how to remove duplicated data in pandas. We'll cover how to use the .duplicated and .drop_duplicates methods with real-life examples ...
Read more >
Python - Ways to remove duplicates from list - GeeksforGeeks
This article focuses on one of the operations of getting a unique list from a list that contains a possible duplicate. Removing duplicates...
Read more >
Pandas Drop Duplicate Rows - drop_duplicates() function
Pandas Drop Duplicate Rows - drop_duplicates() function ... Pandas drop_duplicates() function removes duplicate rows from the DataFrame.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found