question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] metric `numDeletedRows` missing in Delta log when DELETING complete partition

See original GitHub issue

Describe the problem

When performing a DELETE operation on a Delta Table, some operational metrics are added to the Delta log / table history that contain information (attribute operationMetrics) like number of rows (numDeletedRows) and files (numAddedFiles, numRemovedFiles) deleted/added. See: https://docs.delta.io/latest/delta-utility.html#operation-metrics-keys

However, I noticed that when a complete partition of a partitioned table is deleted via partitionkey, some of those metrics are missing like the very central metric of how many rows were deleted numDeletedRows.

Steps to reproduce

CREATE or REPLACE TABLE TestDeletePartitioned (
  id bigint,
  part string
)
USING DELTA
PARTITIONED BY (part)
;

INSERT INTO TestDeletePartitioned (id, part) values (1,'a'),(2,'a'),(3,'b'),(4,'b'),(5,'c'),(6,'c'),(7,'d'),(8,'d');

DELETE FROM TestDeletePartitioned WHERE id = 1;                 /* only one row is deleted from partiton part=a, one row remains in part=a */
DELETE FROM TestDeletePartitioned WHERE id IN (3, 4);           /* two row are deleted from partiton part=b what effectively corresponds to deleting the whole partition part=b */
DELETE FROM TestDeletePartitioned WHERE part = 'c';             /* complete partition part=c is deleted */
DELETE FROM TestDeletePartitioned WHERE id = 7 AND part = 'd';  /* only one row is deleted from partiton part=d, one row remains in part=d */

DESC HISTORY TestDeletePartitioned;

Observed results

When using the only the partition key for specifying the DELETE condition, the resulting entry in the Delta log does not contain all the operational metrics. image

Expected results

I’d like to see the numDeletedRows metric in the log also when partitions are deleted.

Environment information

  • Delta Lake version: 1.2.1.4
  • Spark version: 3.2.2.5.0
  • Scala version: 2.12.15

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
zsxwingcommented, Oct 10, 2022

@keen85 we are working on migrating our doc to https://github.com/delta-io/website. Will post the update here when it’s done.

1reaction
rahulsmahadevcommented, Oct 27, 2022
Read more comments on GitHub >

github_iconTop Results From Across the Web

FileReadException when reading a Delta table - Microsoft Learn
A FileReadException error occurs when you attempt to read from a Delta table. The underlying data has been deleted, or the storage blob...
Read more >
Table utility commands - Delta Lake Documentation
Operation metrics keys ; numRemovedFiles. Number of files removed. ; numDeletedRows. Number of rows removed. Not provided when partitions of the table are...
Read more >
Databricks Delta Lake — Database on top of a Data Lake
Part 1 of 2— Understanding the Basics of Databricks Delta Lake —ACID Transactions, Checkpoints, Transaction Log & Time Travel.
Read more >
Tech Talk | Diving into Delta Lake Part 3 - YouTube
In the earlier Delta Lake Internals tech talk series sessions, we described how the Delta Lake transaction log works.
Read more >
Understanding the Delta Lake Transaction Log - Databricks Blog
Users can delete the files that are no longer needed by using VACUUM. Quickly Recomputing State With Checkpoint Files. Once we've made a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found