question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

tricky timestamp conversion

See original GitHub issue

Hello there, its me the bug hunter again 😃

I have this massive 200 million rows dataset, and I encountered some very annoying behavior. I wonder if this is a bug.

I load my csv using

mylog = pd.read_csv('/mydata.csv',
                    names = ['mydatetime',  'var2', 'var3', 'var4'],
                    dtype = {'mydatetime' : str},
                    skiprows = 1)

and the datetime column really look like regular timestamps (tz aware)

mylog.mydatetime.head()
Out[22]: 
0    2019-03-03T20:58:38.000-0500
1    2019-03-03T20:58:38.000-0500
2    2019-03-03T20:58:38.000-0500
3    2019-03-03T20:58:38.000-0500
4    2019-03-03T20:58:38.000-0500
Name: mydatetime, dtype: object

Now, I take extra care in converting these string into proper timestamps:

mylog['mydatetime'] = pd.to_datetime(mylog['mydatetime'] ,errors = 'coerce', format = '%Y-%m-%dT%H:%M:%S.%f%z', infer_datetime_format = True, cache = True)

That takes a looong time to process, but seems OK. The output is

mylog.mydatetime.head()
Out[23]: 
0    2019-03-03 20:58:38-05:00
1    2019-03-03 20:58:38-05:00
2    2019-03-03 20:58:38-05:00
3    2019-03-03 20:58:38-05:00
4    2019-03-03 20:58:38-05:00
Name: mydatetime, dtype: object

What is puzzling is that so far I thought I had full control of my dtypes. However, running the simple

mylog['myday'] = pd.to_datetime(mylog['mydatetime'].dt.date, errors = 'coerce')

  File "pandas/_libs/tslib.pyx", line 537, in pandas._libs.tslib.array_to_datetime

ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True

The only way I was able to go past this error was by running

mylog['myday'] = pd.to_datetime(mylog['mydatetime'].apply(lambda x: x.date()))

Is this a bug? Before upgrading to 24.1 I was not getting the tz error above. What do you think? I cant share the data but I am happy to try some things to help you out!

Thanks!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:1
  • Comments:20 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
mroeschkecommented, Mar 8, 2019

When I addressed this timezone parsing my rational was if %z or %Z were passed, the user would want to preserve these timezones, so this error was intentional.

For your use case, if you leave out the format argument and keep utc=True you should get you’re dates in UTC with datetime64[ns, UTC] dtype

2reactions
mroeschkecommented, Mar 7, 2019

This means that the object dtype is expected. Since your string data contained more than one timezone offset, it’s not possible to cast this data to one datetime64[ns, tz] dtype since there are multiple tzs in your data.

See https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.24.0.html#parsing-datetime-strings-with-timezone-offsets

Read more comments on GitHub >

github_iconTop Results From Across the Web

Timestamp to Date Converter - Online Epoch Calculator
Tool to convert a timestamp. Timestamp is a number (or a character string) representing the date and time, usually with the number of...
Read more >
Convert Timestamp to time easily - YouTube
In this video, you will learn how to convert a timestamp to time easily using text function and time functionWe will use mid...
Read more >
How to convert timestamp to Date & Time (Live Website)
This video shows you how to convert timestamp to date & time. And date & time to timestamp from a live website. Basically...
Read more >
Converting Tick or Epoch to Timestamp in Logic App
In this post, we use the term, epoch, as UNIX timestamp. It is a 32-bit integer value starting from 0, which represents 1970-01-01T00:00:00Z...
Read more >
Convert timestamp long to normal date format - java
ofEpochSecond() , which will make the code a couple of lines shorter. Since Instant and ZonedDateTime do support nanosecond precision, I found ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found