Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SIP-84] Case-insensitive handling of datasets' column names

See original GitHub issue

!!! warning “This document is still WIP; review of @villebro, @agusfigueroa-htg is required.”

[SIP-84] Case-insensitive handling of datasets’ column names

Motivation

The default case (upper/lower) and case sensitivity of object names (schemas, tables, columns,…) is handled very differently in the various DSMSes that are supported by Superset.
E.g., Postgres interprets unquoted column names as lowercase while Oracle and Snowflake treat them as UPPERCASE.

Superset is currently not consistently treating the case of column names. As a result, virtual datasets of an UPPERCASE DB like Snowflake are represented in UPPERCASE, while physical datasets of these DBs have lowercase column names in Superset. See #18085 for more details and discussion on how this ought to be fixed.

The main issues, that arise from this inconsistency, are:

Dashboard filters refer to case-sensitive representations of the columns. If a dashboard contains charts that are based on physical and virtual datasets, the filters will only be applied to the ones there the case of the column name matches.
If a physical dataset is later on changed to become a virtual dataset (or vice versa), the case of the column names changes and existing charts and filters will be harmed. Such changes are pretty common, e.g., when a virtual dataset is promoted to become a view in the database or when an existing table needs some more logic applied (e.g. filtering of soft-deleted records).
Migration of the data warehouse system — e.g. from Postgres to Snowflake, while reproducing the data marts — will cause the column names to potentially change in case, thus breaking existing charts.

Proposed Change

In order to find a database-agnostic solution which dows not require upstream changes on SQLAlchemy drivers, this issue my best be tackled by making Superset handle column names case-insensitively. I.e., all columns should internally be treated in lowercase.

There is a small risk of datasets having two columns that would translate to the same case-insensitive (lowercase) representation of the column name. However, @villebro feels that only very few people would really have a need to distinguish columns based on their case.

However, we need to ensure that e.g. CamelCase column names keep their human readability. Thus, I suggest to auto-fill the label of the dataset column (a.k.a. the verbose_name) with it’s original, case-sensitive, name in cases where this field is not already filled (do not overwrite existing information).

New or Changed Public Interfaces

@villebro, @agusfigueroa-htg - I need your input regarding this and the following sections of this SIP…

New dependencies

Migration Plan and Compatibility

Rejected Alternatives

Consistency could be introduced on a per-DBMS basis, i.e. per SQLAlchemy driver, so all datasets in UPPERCASE DBMSes would be represented in UPPERCASE, regardless of whether they are physical or virtual datasets. Thsi would fix the aforementioned issues 1 and 2. However, the 3rd issue would not be covered. Furthermore, this issue may be more error-prone, when introducing support for more DBMSes or when upstream changes occur.

Issue Analytics

State:
Created a year ago
Comments:6 (4 by maintainers)

Top GitHub Comments

1reaction

yousophcommented, Jun 1, 2022

Noticed there were two SIP-82s, I’ve renumbered this one to SIP-84

1reaction

rumbincommented, Apr 19, 2022

@agusfigueroa-htg same here 😉.

https://github.com/apache/superset/issues/5602 might help, I guess.

Top Results From Across the Web

Case-insensitive and case-sensitive databases

Database case-sensitivity considerations for when specifying the select query. Most databases treat table column names as case-insensitive.

Solved: Is select statement in Proc datasets case sensitive

Hi,. I can replicate what you describe under a SAS Windows version (9.04.01M7). All works well when I use an upper case name...

How to show case insensitive data in powerbi

Solved: Hi, I have a table EMPLOYEE that has 3 columns, ID, NAME and SALARY. In Database I inserted 4 records as shown...

SQL Server Column names case sensitivity

No, delimited identifiers are not normally case-sensitive. The sensitivities (case, accent, kana type, width, and starting in SQL Server 2017 ...

Handling case-insensitive queries with case-sensitive data

More often the collation is chose during development and defined at the database/table/column level and then the engine will use the chosen collation...