question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Snowflake: Inconsistent column name case

See original GitHub issue

Superset handles the case of non-case-sensitive column names inconsistently for Snowflake connections:

  • Native tables/views, i.e., physical datasets, have lowercase column names, as it is internally handled by SQLAlchemy.
  • SQLLab’s virtual datasets have UPPERCASE column names.

This is troublesome for all dashboards where filters are present which act on charts with related, mixed physical/virtual datasets.
Superset’s filter scoping is case sensitive. Thus, filters on a certain column name will either be applied to the related physical or virtual datasets.

How to reproduce the bug

  1. Register any Snowflake table with non-case-sensitive column names (i.e., UPPERCASE) as a physical dataset in Superset.
  2. Notice how its columns are recognized and stored in lowercase.
  3. Edit the dataset again and convert it to a virtual dataset. Use select * from … as statement text. Save the changes.
  4. Edit the dataset again and synchronize the column names.
  5. Notice how the column names now changed to UPPERCASE.

Expected results

The case should not depend on whether the dataset is physical or virtual.
All non-case-sensitive (i.e., UPPERCASE in Snowflake) should be converted to lowercase consistently, also for virtual datasets created in SQLLab.

Lowercase is the internal representation of SQLALchemy.

Actual results

Virtual dataset’s column names are treated as UPPERCASE.

Screenshots

The easies way to experience this effect is to simply explore a Snowflake table in SQLLab. As you can see in the following screenshot, the column names that are extracted from the schema and displayed in the schema browser on the left are represented as lowercase, while the query results of the preview table have UPPERCASE column names:

image

Rejected workarounds

An option would of course be to use double-quoted lowercase column aliases in the SELECT statement of the virtual dataset. This would make these column names case-sensitive and thus be treated as lowercase. However, with a broad userbase working with Snowflake/Superset this is very unhandy.
Furthermore, this apporach would not allow any select * statements here.

I think it simply is a bug and it needs to be fixed so the way the column names are treated is always consistent.

Environment

Tested versions:

  • 1.0.1
  • 1.4 rc4

Checklist

Make sure to follow these steps before submitting your issue - thank you!

  • I have checked the superset logs for python stacktraces and included it here as text if there are any.
  • I have reproduced the issue with at least the latest released version of superset.
  • I have checked the issue tracker for the same issue and I haven’t found one similar.

Additional context

This SQLALchemy issue is closely related: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157#issuecomment-807922786

However, my feeling is that we should probably try to stick with SQLALchemy’s way of treating the column names as lowercase instead of uppercase. However, then we need to do so consistently and correct this for the virtual datasets.

tai pointed me in Slack to this piece of code: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:30 (24 by maintainers)

github_iconTop GitHub Comments

2reactions
nytaicommented, Jan 19, 2022

Although using the cursor to get the column names is a likely fix I do feel strongly that the database driver should not be applying any case transformations on the payload returned from the db.

Since @villebro did propose that as a solution to the issue, I agree, it would be good to his thoughts on the matter

0reactions
villebrocommented, Apr 20, 2022

Yes, unfortunately this won’t make it into 2.0, as the list of breaking changes has already been decided. However, if we do reach consensus on how to fix this, we can put it behind a feature flag during 2.0, and then potentially introduce it as the new default behavior in 3.0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Is there case insensitivity for table name or column names?
Table and column names are already case-insensitive, unless you are querying them as values within information_schema views.
Read more >
Case Statement Error when using MINUS Snowflake DB
You have this expression for the second column: (CASE WHEN DATE_VALUE = '2021-01-10' THEN DATE_VALUE = CAST('2021-01-11' AS DATE) END) AS ...
Read more >
Snowflake Destination Reference | Stitch Documentation
Column name length. 251 characters. Maximum columns per table. None. Maximum table size. None. Maximum tables per database. 100,000. Case sensitivity.
Read more >
Inconsistent use of column name aliases when replacing a ...
The reason that the default names are different in the de-normalised variants of the data sources is that many of the tables I...
Read more >
Insert value list does not match column list expecting 8 but got ...
This issue occurs when there is a mismatch in the name of column between Snowflake database DDL and target table imported in Developer...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found