question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Athena CSV import fail (no TEXT type)

See original GitHub issue

I import the following CSV into Athena

viewing time,Date
559211,2020-06-07
454096,2020-06-06
766314,2020-06-03
638622,2020-06-01
506586,2020-05-31
468516,2019-10-01
466481,2019-09-30
519893,2019-09-29
364684,2019-09-28
403074,2019-09-27

Expected results

CSV to be uploaded to Athena

Actual results

Got Red error and Athena complains about types (TEXT or Date)

Screenshots

image

screen

with Date in the column “Date” screen_date

How to reproduce the bug

  1. have a working athena configuration (test connection is ok) -> be allowed to upload CSV on schema “default”

  2. Go to 'superset home"

  3. Click on ‘sources/Upload CSV’

  4. enter table name “test”

  5. choose you athena configuration connection

  6. choose the csv test file

  7. enter schema “default”

Environment

(please complete the following information):

  • superset version: 0.35.2 (also tested with last master AND last release 0.36 as of 2020-06-15)
  • python version: Python 3.6.9
  • node.js version: v10.20.1
  • npm version: 6.14.4

Without giving advice on columns Unable to upload CSV file "test_athena.csv" to table "test" in database "awsathena". Error message: (pyathena.error.OperationalError) FAILED: ParseException line 3:14 cannot recognize input near 'TEXT' ')' 'STORED' in column type [SQL: CREATE EXTERNAL TABLE default.test ( 559211BIGINT,2020-06-07 TEXT ) STORED AS PARQUET LOCATION 's3://xxxxxxxx-athena/superset/default/test/' ] (Background on this error at: http://sqlalche.me/e/e3q8)

And If I try to indicate the Date Column is a Date Unable to upload CSV file "test_athena.csv" to table "test" in database "awsathena". Error message: (pyathena.error.OperationalError) FAILED: SemanticException [Error 10099]: DATETIME type isn't supported yet. Please use DATE or TIMESTAMP instead [SQL: CREATE EXTERNAL TABLE default.test ( txt BIGINT, Date DATETIME ) STORED AS PARQUET LOCATION 's3://xxxxxx-athena/superset/default/test/' ] (Background on this error at: http://sqlalche.me/e/e3q8)

Checklist

Make sure these boxes are checked before submitting your issue - thank you!

  • [ x ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • [ x ] I have reproduced the issue with at least the latest released version of superset. (also tested with last master AND last release 0.36 as of 2020-06-15)
  • [ x ] I have checked the issue tracker for the same issue and I haven’t found one similar.

Additional context

I think there is 2 problems here

I’ve digged a little into the code and found interesting methods like this one get_sqla_column_type( https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L850 like in mssql https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/mssql.py#L80

or this technique used in hive (a bit overkill ?) overriding the whole csv reading create_table_from_csv https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/hive.py#L124

maybe override or just the df_to_sql and change the type on the fly https://github.com/apache/incubator-superset/blob/master/superset/db_engine_specs/base.py#L445

I would be glad if someone could give me a hint on where to fix the issue.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
issue-label-bot[bot]commented, Jun 15, 2020

Issue-Label Bot is automatically applying the label #bug to this issue, with a confidence of 0.86. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

0reactions
stale[bot]commented, Aug 22, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Athena CSV import fail (no TEXT type) #10054 - GitHub
have a working athena configuration (test connection is ok) -> be allowed to upload CSV on schema "default" · Go to 'superset home"...
Read more >
Troubleshooting in Athena - AWS Documentation
Convert the data type to string and retry. No meaningful partitions available. This error message usually means the partition settings have been corrupted....
Read more >
Error in data while creating external tables in Athena
When I export the CSV file to S3 and create an Athena table, the data transform into the following format. Id Name 1...
Read more >
Working with CSV - The Athena Guide
It's common with CSV data that the first line of the file contains the names of the columns. Sometimes files have a multi-line...
Read more >
Athena not able to read multi-line text in CSV fields
This table is not imported correctly due to newlines. Is there any other way? ... Hi. You can use OpenCSVSerDe to import CSV,...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found