[BUG] TimeGapSplit mutates the data
See original GitHub issueTimeGapSplit
mutates the pandas.DataFrame
it operates on if the date column is not a datetime
type. For example:
from pandas.api.types import (
is_object_dtype,
is_datetime64_any_dtype
)
date_range = pd.date_range(start='1/1/2018', end='1/30/2018')
dates = [date.strftime('%m-%d-%Y') for date in date_range]
df = (
pd.DataFrame(
data=np.random.randint(0, 30, size=(30, 4)),
columns=list('ABCy')
)
.assign(
date=dates
)
)
assert is_object_dtype(df['date'])
cv = TimeGapSplit(
df=df,
date_col='date',
train_duration=timedelta(days=3),
valid_duration=timedelta(days=1),
)
assert is_datetime64_any_dtype(df['date'])
Is this desirable behavior?
Possible remedies are:
- Only accept the
datetime
type - Accept
str
type, but make a copy and leave thepandas.DataFrame
as is.
Happy to hear your thoughts @kayhoogland @stephanecollot @koaning
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (8 by maintainers)
Top Results From Across the Web
A more efficient way to split timeseries data (pd.Series) at gaps?
I am trying to split a pd.Series with sorted dates that have sometimes gaps between them that are bigger than the normal ones....
Read more >Let's clear up the confusion around the slice( ), splice( ), & split ...
This usage is valid in JavaScript. An array with different data types: string, numbers, and a boolean. Slice ( ). The slice( )...
Read more >Subscriptions - Apollo GraphQL Docs
Subscriptions are useful for notifying your client in real time about changes to back-end data, such as the creation of a new object...
Read more >The Molecular Clock and Estimating Species Divergence
The molecular clock hypothesis states that DNA and protein sequences evolve at a rate that is relatively constant over time and among different...
Read more >How to use the timeSplitter
An alternative is to split the data into a few intervals, select one interval at the time and perform separate models on each....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
copy feels like a safer option for now. it allows for flexibility in the future.
@koaning can you close?