question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Soda core 3.0.3+ introduced breaking change for count tests

See original GitHub issue

Hello,

We have been using soda-core-snowflake 3.0.1 internally for quality checks and recently decided to test using 3.0.7. We found the following breaking change when trying to make a test using a column that is numeric in Snowflake:

When doing a missing_count > 0:

Soda 3.0.1:

SELECT 
  COUNT(*),
  COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
  COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
  COUNT(CASE WHEN NOT (NUMERIC_COLUMN IS NULL) AND NOT (REGEXP_LIKE(NUMERIC_COLUMN, '^ *[-+]? *[0-9]+ *$')) THEN 1 END) 
FROM schema.table

Soda 3.0.3 and up:

SELECT 
  COUNT(*),
  COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
  COUNT(CASE WHEN NUMERIC_COLUMN IS NULL THEN 1 END),
  COUNT(CASE WHEN NOT (NUMERIC_COLUMN IS NULL) AND NOT (REGEXP_LIKE(COLLATE(NUMERIC_COLUMN, ''), '^ *[-+]? *[0-9]+ *$')) THEN 1 END) 
FROM schema.table

The collate breaks down in Snowflake without casting the numeric field to a string.

Release logs seem to point to @ScottAtDisney as the contributor to this change, not sure if @m1n0 has any context on this change as well.

If these tests were never meant for numeric columns it might be a good idea to update the documentation on this.

Thank you!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:8 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
ScottAtDisneycommented, Sep 14, 2022

Got it. That’s really helpful.

The Validity Metric doc states,

Valid formats apply only to columns using data type TEXT.

The NUMERIC_COLUMN is defined as NUMBER(3,0) so the (valid format: integer) metric would be out of scope as I understand the definition.

Since the data type of that column (NUMBER(3,0)) is enforced by the DB and the check is looking for an INTEGER, even if the check worked, it would always return passed.

A valid min/max metric could be used to check the range of integer values, but that column can only hold integers.

1reaction
ScottAtDisneycommented, Sep 14, 2022

Are you using a metric argument like,

  • missing values
  • missing regex
  • missing format

Could you please share a minimal check file, the data type of the column and some sample data that creates this issue?

Read more comments on GitHub >

github_iconTop Results From Across the Web

Release notes - Soda Documentation
Release notes for Soda products. [soda-core] 3.0.16. 15 December 2022. Features and fixes. Cloud: Do not upload more than 100 rows to Cloud...
Read more >
Soda Core Roadmap - GitHub
Data reliability tools for SQL- and Spark-accessible data - Soda Core Roadmap ... Change Percent test incorrectly failing #1527 opened by kamalacharya
Read more >
Soda-Core. Data Quality at Scale. - Confessions Of A Data Guy
Introduction to soda-core​​ soda-core is an open-source cli and Python library for data-quality testing, data observability, and data monitoring, ...
Read more >
Text - GovInfo
For changes to the Code prior to 2001, consult the List of CFR Sections Affected ... Safety and functionality data include all studies...
Read more >
LIGGGHTS® Version History - CFDEM®project
Arno Mayrhofer (DCS Computing) implemented liquid bridges for particle-wall contacts and improved the nighbor list break-up handling.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found