question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Roadmap 2022 H1 (discussion)

See original GitHub issue

This is the proposed Delta Lake 2022 H1 roadmap discussion thread. Below are the initially proposed items for the roadmap to be completed by June 2022. We will also be sending out a survey (we will update this issue with the survey) to get more feedback from the Delta Lake community!

Performance Optimizations

Based on the overwhelming feedback from the Delta Users Slack, Google Groups, Community AMAs (on Delta Lake YouTube), Delta Lake 2021H2 survey, and 2021H2 roadmap, we propose the following Delta Lake performance enhancements in the next two quarters.

Issue Description Target CY2022
927 OPTIMIZE (file compaction): Table optimize is an operation to rearrange the data and/or metadata to speed up queries and/or reduce the metadata size Released in 1.2
923 File skipping using columns stats: This is a performance optimization that aims at speeding up queries that contain filters (WHERE clauses) on non-partitionBy columns. Released in 1.2
931 Automatic data skipping using generated columns: Enhance generated columns to include automatic data skipping Released in 1.2
1134 OPTIMIZE ZORDER: Data clustering via multi-column locality-preserving space-filling curves with offline sorting. Q3/Q4
MERGE Performance Improvements: We will be providing a project improvement plan (PIP) document shortly on the proposed design for discussion. Q2/Q3

Schema Operations

For this year, our focus will be on columnar mappings.

Issue Description Target CY2022
958 Support for renaming column: Rename column with ALTER TABLE Released in 1.2
957 Support for arbitrary column names: Support characters in column names not allowed by Parquet Released in 1.2
1064 Support for dropping columns: Drop column with ALTER TABLE Released in 2.0
348 Support for dynamic partition overwrite: Currently you can overwrite using the replaceWhere option but in various scenarios, it is more convenient to specify overwrite partition. Q2

Integrations

Extending from the recent releases of PrestoDB, Hive 3, and Delta Sink for Apache Flink Streams API, we have additional integrations planned.

Issue Description Target CY2022
112 Delta Source for Apache Pulsar: Build a Pulsar/Delta reader leveraging Delta Standalone. Join us via the Delta Users Slack #connector-pulsar channel. Q2
238 Flink Sink on Table API: Build a Flink/Delta sink (i.e., Flink writes to Delta Lake) using the Apache Flink Table API. Join us via the Delta Users Slack #flink-delta-connector channel and we have bi-weekly meetings on Tuesdays. Q2/Q3
110 Delta Source for Apache Flink: Build a Flink/Delta source (i.e., Flink reads from Delta Lake) leveraging Delta Standalone. Join us via the Delta Users Slack #flink-delta-connector channel and we have bi-weekly meetings on Tuesdays. Q2/Q3
82 Delta Source for Trino: Joint Delta Lake and Trino community collaboration on the following PRs: 10987, 10300. This is a community effort and all are welcome! Join us via the Delta User Slack channel #trino channel and we will have bi-weekly meetings on Thursdays. Released
Delta Source for Big Query: Allows Big Query to natively read Delta Lake tables. Q2/Q3
523, 566 Delta Rust Writer: Extending Delta Rust API to write to Delta Lake. Q2/Q3
Hive/Delta writer: Extending Hive to write to Delta Lake Q3

Operations Enhancements

Two very popular requests are planned for this semester: Table Restore, S3 multi-cluster writes.

Issue Description Target CY2022
903, 863 Table Restore: Rollback to a previous version of a Delta table using Python, Scala, and/or SQL APIs. Released in 1.2
41 S3 multi-cluster writes: Allows multiple clusters/drivers/JVMs to concurrently write to S3 using DynanoDB as the lock store. Please refer to this PIP: [2021-12-22] Delta OSS S3 Multi-Cluster Writes Released in 1.2
747 delta.io.Guide: Enhance the Delta Lake documentation by creating a new guide (PIP will follow soon) Q2/Q3
Iceberg to Delta Converter: Ability to convert Iceberg table to Delta table without a rewrite. Q3
Table Cloning: Clones a source Delta table to a target destination at a specific version. A clone can be either deep or shallow: deep clones copy over the data from the source and shallow clones do not. Q3
1105 Change Data Feed: The Delta change data feed represents row-level changes between versions of a Delta table. When enabled on a Delta table, the runtime records “change events” for all the data written into the table. Q2

Updates

  • 2022-05-18: Include Issue 348 for the dynamic partition overwrite feature request
  • 2022-05-03: Updated tables with Delta Lake 1.2 release.
  • 2022-03-08: Based on community feedback, we are also prioritizing Hive/Delta writer, clones, and CDF

If there are other issue that should be considered within this roadmap, let’s have a discussion here or via the Delta Users Slack #deltalake-oss channel.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:60
  • Comments:18 (9 by maintainers)

github_iconTop GitHub Comments

9reactions
sliu4commented, Mar 8, 2022

Would love to see a built-in solution for implementing a retention policy / archiving delta data on append-only tables - this would be a huge help for my team!

5reactions
novemberdudecommented, Feb 26, 2022

It would be great if the CDF was open source on the latest date. I really interest with this feature!

Read more comments on GitHub >

github_iconTop Results From Across the Web

H1 2022: Product roadmap - DEV Community ‍ ‍
H1 2022 : Product roadmap ... We believe that part of being open source is to share our processes, plans and roadmaps into...
Read more >
Sven Wagner-Boysen on Twitter: "Received my #DeltaLake swag ...
This is the proposed Delta Lake 2022 H1 roadmap discussion thread. Below are the initially proposed items for the roadmap to be completed...
Read more >
H1 2022: Product roadmap - Medusajs
H1 2022 : Product roadmap. Our 6-month roadmap for Medusa incl. new Tax API, bulk import/export, the introduction of strategies and improved API...
Read more >
Everscale H1 2022 roadmap report —... - Everscale Network ...
Everscale H1 2022 roadmap report — developing at high speed Everscale fam, ... the ecosystem was filled with development, reform discussions, and marketing, ......
Read more >
Century: Age of Ashes - Roadmap H1 2022 - Steam News
The Last Team Standing event leads the way, soon to be followed by a Voice Chat and more on the horizon!
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found