Athena CSV import fail (no TEXT type)
See original GitHub issueI import the following CSV into Athena
viewing time,Date
559211,2020-06-07
454096,2020-06-06
766314,2020-06-03
638622,2020-06-01
506586,2020-05-31
468516,2019-10-01
466481,2019-09-30
519893,2019-09-29
364684,2019-09-28
403074,2019-09-27
Expected results
CSV to be uploaded to Athena
Actual results
Got Red error and Athena complains about types (TEXT or Date)
Screenshots
with Date in the column “Date”
How to reproduce the bug
-
have a working athena configuration (test connection is ok) -> be allowed to upload CSV on schema “default”
-
Go to 'superset home"
-
Click on ‘sources/Upload CSV’
-
enter table name “test”
-
choose you athena configuration connection
-
choose the csv test file
-
enter schema “default”
Environment
(please complete the following information):
- superset version: 0.35.2 (also tested with last master AND last release 0.36 as of 2020-06-15)
- python version: Python 3.6.9
- node.js version: v10.20.1
- npm version: 6.14.4
Without giving advice on columns
Unable to upload CSV file "test_athena.csv" to table "test" in database "awsathena". Error message: (pyathena.error.OperationalError) FAILED: ParseException line 3:14 cannot recognize input near 'TEXT' ')' 'STORED' in column type [SQL: CREATE EXTERNAL TABLE
default.test (
559211BIGINT,
2020-06-07 TEXT ) STORED AS PARQUET LOCATION 's3://xxxxxxxx-athena/superset/default/test/' ] (Background on this error at: http://sqlalche.me/e/e3q8)
And If I try to indicate the Date Column is a Date
Unable to upload CSV file "test_athena.csv" to table "test" in database "awsathena". Error message: (pyathena.error.OperationalError) FAILED: SemanticException [Error 10099]: DATETIME type isn't supported yet. Please use DATE or TIMESTAMP instead [SQL: CREATE EXTERNAL TABLE
default.test ( txt BIGINT,
Date DATETIME ) STORED AS PARQUET LOCATION 's3://xxxxxx-athena/superset/default/test/' ] (Background on this error at: http://sqlalche.me/e/e3q8)
Checklist
Make sure these boxes are checked before submitting your issue - thank you!
- [ x ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
- [ x ] I have reproduced the issue with at least the latest released version of superset. (also tested with last master AND last release 0.36 as of 2020-06-15)
- [ x ] I have checked the issue tracker for the same issue and I haven’t found one similar.
Additional context
I think there is 2 problems here
- Athena and TEXT format which does not exists https://docs.aws.amazon.com/athena/latest/ug/data-types.html
- Athena Date conversion does not seem to work properly
I’ve digged a little into the code and found interesting methods like this one get_sqla_column_type
( https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L850
like in mssql
https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/mssql.py#L80
or this technique used in hive (a bit overkill ?) overriding the whole csv reading create_table_from_csv
https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/hive.py#L124
maybe override or just the df_to_sql and change the type on the fly https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L445
I would be glad if someone could give me a hint on where to fix the issue.
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Issue-Label Bot is automatically applying the label
#bug
to this issue, with a confidence of 0.86. Please mark this comment with 👍 or 👎 to give our bot feedback!Links: app homepage, dashboard and code for this bot.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue
.pinned
to prevent stale bot from closing the issue.