Snowflake: Inconsistent column name case
See original GitHub issueSuperset handles the case of non-case-sensitive column names inconsistently for Snowflake connections:
- Native tables/views, i.e., physical datasets, have lowercase column names, as it is internally handled by SQLAlchemy.
- SQLLab’s virtual datasets have UPPERCASE column names.
This is troublesome for all dashboards where filters are present which act on charts with related, mixed physical/virtual datasets.
Superset’s filter scoping is case sensitive. Thus, filters on a certain column name will either be applied to the related physical or virtual datasets.
How to reproduce the bug
- Register any Snowflake table with non-case-sensitive column names (i.e., UPPERCASE) as a physical dataset in Superset.
- Notice how its columns are recognized and stored in lowercase.
- Edit the dataset again and convert it to a virtual dataset. Use
select * from …
as statement text. Save the changes. - Edit the dataset again and synchronize the column names.
- Notice how the column names now changed to UPPERCASE.
Expected results
The case should not depend on whether the dataset is physical or virtual.
All non-case-sensitive (i.e., UPPERCASE in Snowflake) should be converted to lowercase consistently, also for virtual datasets created in SQLLab.
Lowercase is the internal representation of SQLALchemy.
Actual results
Virtual dataset’s column names are treated as UPPERCASE.
Screenshots
The easies way to experience this effect is to simply explore a Snowflake table in SQLLab. As you can see in the following screenshot, the column names that are extracted from the schema and displayed in the schema browser on the left are represented as lowercase, while the query results of the preview table have UPPERCASE column names:
Rejected workarounds
An option would of course be to use double-quoted lowercase column aliases in the SELECT
statement of the virtual dataset.
This would make these column names case-sensitive and thus be treated as lowercase.
However, with a broad userbase working with Snowflake/Superset this is very unhandy.
Furthermore, this apporach would not allow any select *
statements here.
I think it simply is a bug and it needs to be fixed so the way the column names are treated is always consistent.
Environment
Tested versions:
- 1.0.1
- 1.4 rc4
Checklist
Make sure to follow these steps before submitting your issue - thank you!
- I have checked the superset logs for python stacktraces and included it here as text if there are any.
- I have reproduced the issue with at least the latest released version of superset.
- I have checked the issue tracker for the same issue and I haven’t found one similar.
Additional context
This SQLALchemy issue is closely related: https://github.com/snowflakedb/snowflake-sqlalchemy/issues/157#issuecomment-807922786
However, my feeling is that we should probably try to stick with SQLALchemy’s way of treating the column names as lowercase instead of uppercase. However, then we need to do so consistently and correct this for the virtual datasets.
tai pointed me in Slack to this piece of code: https://github.com/snowflakedb/snowflake-sqlalchemy/blob/9118cf8f18a0039f9cb5d3892ff2b1e5c82a05e0/snowdialect.py#L217
Issue Analytics
- State:
- Created 2 years ago
- Comments:30 (24 by maintainers)
Top GitHub Comments
Although using the cursor to get the column names is a likely fix I do feel strongly that the database driver should not be applying any case transformations on the payload returned from the db.
Since @villebro did propose that as a solution to the issue, I agree, it would be good to his thoughts on the matter
Yes, unfortunately this won’t make it into 2.0, as the list of breaking changes has already been decided. However, if we do reach consensus on how to fix this, we can put it behind a feature flag during 2.0, and then potentially introduce it as the new default behavior in 3.0.