SNOW-219884: Pandas datetime with timezone converts to `timestamp_ntz` in snowflake
See original GitHub issuePlease answer these questions before submitting your issue. Thanks!
-
What version of Python are you using (
python --version
)? Python 3.8.2 -
What operating system and processor architecture are you using (
python -c 'import platform; print(platform.platform())'
)? macOS-10.15.7-x86_64-i386-64bit -
What are the component versions in the environment (
pip list
)? Package Version
asn1crypto 1.4.0 awswrangler 1.10.0 azure-common 1.1.25 azure-core 1.8.2 azure-storage-blob 12.5.0 boto3 1.15.18 botocore 1.18.18 certifi 2020.6.20 cffi 1.14.3 chardet 3.0.4 cryptography 3.2.1 idna 2.10 isodate 0.6.0 jmespath 0.10.0 msrest 0.6.19 numpy 1.19.4 oauthlib 3.1.0 oscrypto 1.2.1 packaging 20.4 pandas 1.1.4 pip 20.2.1 psycopg2-binary 2.8.6 pyarrow 2.0.0 pycparser 2.20 pycryptodomex 3.9.9 PyJWT 1.7.1 PyMySQL 0.10.1 pyOpenSSL 19.1.0 pyparsing 2.4.7 python-dateutil 2.8.1 pytz 2020.4 requests 2.23.0 requests-oauthlib 1.3.0 s3transfer 0.3.3 setuptools 50.2.0 six 1.15.0 snowflake-connector-python 2.3.5 snowflake-sqlalchemy 1.2.4 SQLAlchemy 1.3.20 sqlalchemy-redshift 0.8.1 urllib3 1.25.11 wheel 0.35.1
- What did you do? If possible, provide a recipe for reproducing the error. A complete runnable program is good.
I extract data from MySQL and constructed a pandas data frame. For example,
transaction_date created_on modified_on
0 2017-10-04 08:00:00+00:00 2018-10-05 18:56:39+08:00 2018-10-05 18:56:39+08:00
1 2017-10-04 08:00:00+00:00 2018-10-05 18:56:39+08:00 2018-10-05 18:56:39+08:00
2 2017-10-04 08:00:00+00:00 2018-10-05 18:56:39+08:00 2018-10-05 18:56:39+08:00
3 2017-10-04 08:00:00+00:00 2018-10-05 18:56:39+08:00 2018-10-05 18:56:39+08:00
4 2017-10-04 08:00:00+00:00 2018-10-05 18:56:39+08:00 2018-10-05 18:56:39+08:00
... ... ... ...
33934 2020-11-05 00:00:00+00:00 2020-11-06 18:45:03.681785+08:00 2020-11-06 18:45:03.711735+08:00
33935 2020-11-05 00:00:00+00:00 2020-11-06 18:45:03.685278+08:00 2020-11-06 18:45:18.441977+08:00
33936 2020-11-05 00:00:00+00:00 2020-11-06 18:45:03.688080+08:00 2020-11-06 18:45:21.579158+08:00
33937 2020-11-09 10:00:13.190445+00:00 2020-11-09 10:00:13.195200+08:00 2020-11-09 10:00:13.195237+08:00
33938 2020-11-09 10:28:57+00:00 2020-11-09 10:30:02.515862+08:00 2020-11-09 10:30:02.515884+08:00
the column types are:
transaction_date datetime64[ns, UTC]
created_on datetime64[ns, Asia/Hong_Kong]
modified_on datetime64[ns, Asia/Hong_Kong]
I used to following code to load data into snowflake:
data.to_sql(target_table, conn, if_exists='replace',index=False, method=pd_writer)
-
What did you expect to see?
TIMESTAMP_TZ
columns -
What did you see instead?
TIMESTAMP_NTZ(9)
-
Can you set logging to DEBUG and collect the logs?
import logging
import os
for logger_name in ['snowflake.sqlalchemy', 'snowflake.connector', 'botocore']:
logger = logging.getLogger(logger_name)
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
logger.addHandler(ch)
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:12
Top GitHub Comments
This bug was introduced in v1.1.18 and is also touches
sqlalchemy.DateTime(timezone=True)
columns - in that case, the column is being passed asDATETIME
to the snowflake, andDATETIME
is an alias toTIMESTAMP_NTZ
in snowflake. Previous v1.1.17 logic createdTIMESTAMP_TZ
column. https://github.com/snowflakedb/snowflake-sqlalchemy/compare/v1.1.17...v1.1.18Having the same issue Pandas
datetime64[ns, America/New_York]
becomes SnowflakeTIMESTAMP_NTZ(9)
using snowparksession.write_pandas