Soda core 3.0.3+ introduced breaking change for count tests
See original GitHub issueHello,
We have been using soda-core-snowflake
3.0.1 internally for quality checks and recently decided to test using 3.0.7. We found the following breaking change when trying to make a test using a column that is numeric in Snowflake:
When doing a missing_count > 0:
Soda 3.0.1:
SELECT
COUNT(*),
COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
COUNT(CASE WHEN NOT (NUMERIC_COLUMN IS NULL) AND NOT (REGEXP_LIKE(NUMERIC_COLUMN, '^ *[-+]? *[0-9]+ *$')) THEN 1 END)
FROM schema.table
Soda 3.0.3 and up:
SELECT
COUNT(*),
COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
COUNT(CASE WHEN NOT (NUMERIC_COLUMN IS NULL) AND NOT (REGEXP_LIKE(COLLATE(NUMERIC_COLUMN, ''), '^ *[-+]? *[0-9]+ *$')) THEN 1 END)
FROM schema.table
The collate breaks down in Snowflake without casting the numeric field to a string.
Release logs seem to point to @ScottAtDisney as the contributor to this change, not sure if @m1n0 has any context on this change as well.
If these tests were never meant for numeric columns it might be a good idea to update the documentation on this.
Thank you!
Issue Analytics
- State:
- Created a year ago
- Comments:8 (1 by maintainers)
Top Results From Across the Web
Release notes - Soda Documentation
Release notes for Soda products. [soda-core] 3.0.16. 15 December 2022. Features and fixes. Cloud: Do not upload more than 100 rows to Cloud...
Read more >Soda Core Roadmap - GitHub
Data reliability tools for SQL- and Spark-accessible data - Soda Core Roadmap ... Change Percent test incorrectly failing #1527 opened by kamalacharya
Read more >Soda-Core. Data Quality at Scale. - Confessions Of A Data Guy
Introduction to soda-core soda-core is an open-source cli and Python library for data-quality testing, data observability, and data monitoring, ...
Read more >Text - GovInfo
For changes to the Code prior to 2001, consult the List of CFR Sections Affected ... Safety and functionality data include all studies...
Read more >LIGGGHTS® Version History - CFDEM®project
Arno Mayrhofer (DCS Computing) implemented liquid bridges for particle-wall contacts and improved the nighbor list break-up handling.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Got it. That’s really helpful.
The Validity Metric doc states,
The NUMERIC_COLUMN is defined as NUMBER(3,0) so the (valid format: integer) metric would be out of scope as I understand the definition.
Since the data type of that column (NUMBER(3,0)) is enforced by the DB and the check is looking for an INTEGER, even if the check worked, it would always return passed.
A valid min/max metric could be used to check the range of integer values, but that column can only hold integers.
Are you using a metric argument like,
Could you please share a minimal check file, the data type of the column and some sample data that creates this issue?