Datetime objects are converted to Timestamp without my consent.
See original GitHub issuePython datetime.datetime object get converted inside pandas dataframes. This is not in all cases desired.
Code Sample
import numpy as np
import pandas as pd
rd = lambda : datetime(2017,1,np.random.randint(1,32))
datelist = [rd() for i in range(10)]
print(type(datelist[0])) # <type 'datetime.datetime'>
df = pd.DataFrame({"date" : datelist,
"y" : np.random.rand(10)})
print(type(df["date"][0])) # <class 'pandas._libs.tslib.Timestamp'>
dates = [d.to_pydatetime() for d in df["date"]]
print(type(dates[0])) # <type 'datetime.datetime'>
Creating a dataframe with a list of datetime
objects converts them to pandas._libs.tslib.Timestamp
. This is undesired. In this simple case it is at least possible to get back a datetime object by list comprehension, dates = [d.to_pydatetime() for d in df["date"]]
, which I would consider as an unnecessary step and makes the use of a DataFrame somehow obsolete.
Background:
The real problem with this behaviour is shown in the following. Any instance of a class that gets derived from datetime
is also converted, hence loosing all its properties.
import numpy as np
import pandas as pd
class someobject(datetime):
prop = "property"
def __init__(self,*args,**kwargs):
datetime.__init__(*args,**kwargs)
rd = lambda : someobject(2017,1,np.random.randint(1,32))
datelist = [rd() for i in range(10)]
print(type(datelist[0])) # <class '__main__.someobject'>
print(datelist[0].prop) # property
df = pd.DataFrame({"date" : datelist,
"y" : np.random.rand(10)})
print(type(df["date"][0])) # <class 'pandas._libs.tslib.Timestamp'>
print(df["date"][0].prop) # Error AttributeError: 'Timestamp' object has no attribute 'prop'
How to prevent the DataFrame to convert an object that I put into it to something else?
As a workaround, how to change someobject
in the above such that it does not get converted automatically to something else?
[Here, python 2.7, pandas 0.20.1 are used]
Issue Analytics
- State:
- Created 6 years ago
- Reactions:2
- Comments:6 (3 by maintainers)
Top GitHub Comments
df['date'].dt.to_pydatetime()
does not work when assigning a column to dataframe, it will revert back topandas._libs.tslibs.timestamps.Timestamp
.This also happens when groupby is used on datetime column.
Sure. Apologies for the lack of example. Here’s an excerpt from some bit of code I was profiling where this came up when using line profiler, with python 3.7:
When using
pd.Timedelta
andpd.Timestamp
When using
datetime.datetime
anddatetime.timedelta
Sorry if the formatting broke, but the time went from 19.3 us to 0.7us per hit.
The attached SO post was the incorrect link, here’s the corrected one, and my numbers seem to agree with theirs: https://stackoverflow.com/a/29192601/805763