question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Fix SQL API option/properties metadata configuration

See original GitHub issue

Wait for https://github.com/delta-io/delta/issues/1182 first. This PR needs to use the same feature flag that is created for that issue.

Bug

Describe the problem

There are various inconsistencies in the SQL API when writing metadata using OPTIONS and TBLPROPERTIES. Specifically

Case 1

when using TBLPROPERTIES, we incorrectly write properties that do not start with .delta. This happens using any of CREATE TABLE, REPLACE TABLE, CREATE OR REPLACE

Case 2

when using CREATE TABLE (not using AS SELECT) using path and using OPTIONS

  • we incorrectly write options that do not start with .delta
  • we incorrectly duplicate options with option.$key

Steps to reproduce

Case 1
CREATE TABLE tbl (id INT) USING DELTA
TBLPROPERTIES('logRetentionDuration'='interval 60 days', 'delta.checkpointInterval'=20)
Case 2
CREATE TABLE tbl (id INT) USING DELTA
OPTIONS('dataSkippingNumIndexedCols'=33,'delta.deletedFileRetentionDuration'='interval 2 weeks')
LOCATION '/private/var/folders/mv/gj5n7hvn78n7td5pp1c_jgjh0000gp/T/spark-69b43bf6-b143-459f-a05b-751bdbf4308a'

Observed results

Case 1

The following is written out as table metadata

logRetentionDuration -> interval 60 days
delta.checkpointInterval -> 20  

however what we really want to be written out is

delta.checkpointInterval -> 20  
Case 2

The following is written out as table metadata

delta.deletedFileRetentionDuration -> interval 2 weeks
dataSkippingNumIndexedCols -> 33
option.delta.deletedFileRetentionDuration -> interval 2 weeks
option.dataSkippingNumIndexedCols -> 33

however what we really want to be written out is

delta.deletedFileRetentionDuration -> interval 2 weeks

Implementation Requirement

After you fix this issue, please update the test and test output table here: https://github.com/delta-io/delta/blob/master/core/src/test/scala/org/apache/spark/sql/delta/DeltaWriteConfigsSuite.scala#L316

Also, please add tests with the feature from #1182 flag enabled/disabled, as well as by reading an “older” table with these invalid delta properties (this should go into EvolvabilitySuite).

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
zsxwingcommented, Aug 26, 2022

so it’s only related to those that start with option.

Correct!

1reaction
scottsand-dbcommented, Aug 10, 2022

@alfonsorr - you can read the table metadata by doing deltaLog.snapshot.metadata.configuration.

So, we expect the result (i.e. the table metadata) after this fix to look as is described in the issue description above.

e.g. for case 1, currently deltaLog.snapshot.metadata.configuration is returning

logRetentionDuration -> interval 60 days
delta.checkpointInterval -> 20  

but instead we want it to only return

delta.checkpointInterval -> 20  

Let me know if you have any more questions!

BTW, a similar fix for a similar problem has already been done. See https://github.com/delta-io/delta/issues/1182 and https://github.com/delta-io/delta/pull/1254

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Fix SQL API option/properties metadata configuration ...
[BUG] Fix SQL API option/properties metadata configuration #1183 #1366. Open. alfonsorr wants to merge 8 commits into delta-io:master.
Read more >
Metadata Visibility Configuration - SQL Server - Microsoft Learn
Learn how to configure metadata visibility for securables that a user owns or has been granted permission to in SQL Server.
Read more >
node-mssql | Microsoft SQL Server client for Node.js
The initial probe connection is created to find out whether the configuration is valid. Arguments. callback(err) - A callback which is called after...
Read more >
Spark SQL and DataFrames - Spark 2.3.0 Documentation
Hive/Parquet Schema Reconciliation; Metadata Refreshing ... Unlike the basic Spark RDD API, the interfaces provided by Spark SQL provide Spark with more ...
Read more >
Spring Cloud Data Flow Reference Guide
REST API Guide. 41. Overview ... Report bugs with Spring Cloud Data Flow at ... PostgreSQL, SQL Server are available without additional configuration....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found