crazy timestamp bug in feather?
See original GitHub issueHello the dream team,
Thanks for this wonderful package. I was playing with feather and some timestamps and I noticed some dangerous behavior. Maybe it is a bug.
Consider this
import pandas as pd
import feather
import numpy as np
df = pd.DataFrame({'string_time_utc' : [pd.to_datetime('2018-02-01 14:00:00.531'),
pd.to_datetime('2018-02-01 14:01:00.456'),
pd.to_datetime('2018-03-05 14:01:02.200')]})
df['timestamp_est'] = pd.to_datetime(df.string_time_utc).dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.tz_localize(None)
df
Out[17]:
string_time_utc timestamp_est
0 2018-02-01 14:00:00.531 2018-02-01 09:00:00.531
1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
Here I create the corresponding EST
timestamp of my original timestamps (in UTC
time).
Now saving the dataframe to csv
or to feather
will generate two completely different results.
df.to_csv('P://testing.csv')
df.to_feather('P://testing.feather')
Switching to R.
Using the good old csv
gives me something a bit annoying, but expected. R thinks my timezone is UTC
by default, and wrongly attached this timezone to timestamp_est
. No big deal, I can always use with_tz
or even better: import as character and process as timestamp while in R.
> dataframe <- read_csv('P://testing.csv')
Parsed with column specification:
cols(
X1 = col_integer(),
string_time_utc = col_datetime(format = ""),
timestamp_est = col_datetime(format = "")
)
Warning message:
Missing column names filled in: 'X1' [1]
>
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 4
X1 string_time_utc timestamp_est
<int> <dttm> <dttm>
1 0 2018-02-01 14:00:00.530 2018-02-01 09:00:00.530
2 1 2018-02-01 14:01:00.456 2018-02-01 09:01:00.456
3 2 2018-03-05 14:01:02.200 2018-03-05 09:01:02.200
mytimezone
<chr>
1 UTC
2 UTC
3 UTC
Now look at what happens with feather
:
> dataframe <- read_feather('P://testing.feather')
>
> dataframe %>% mutate(mytimezone = tz(timestamp_est))
# A tibble: 3 x 3
string_time_utc timestamp_est mytimezone
<dttm> <dttm> <chr>
1 2018-02-01 09:00:00.531 2018-02-01 04:00:00.531 ""
2 2018-02-01 09:01:00.456 2018-02-01 04:01:00.456 ""
3 2018-03-05 09:01:02.200 2018-03-05 04:01:02.200 ""
My timestamps have been converted!!! pure insanity. Am I missing something here?
Thanks!!
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (6 by maintainers)
Top GitHub Comments
Thanks. It will need a more descriptive / objective title
Please comment on ARROW-3543. I estimate timeline for 0.13 to be end of March