Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Roadmap 2022 H1 (discussion)

See original GitHub issue

This is the proposed Delta Lake 2022 H1 roadmap discussion thread. Below are the initially proposed items for the roadmap to be completed by June 2022. We will also be sending out a survey (we will update this issue with the survey) to get more feedback from the Delta Lake community!

Performance Optimizations

Based on the overwhelming feedback from the Delta Users Slack, Google Groups, Community AMAs (on Delta Lake YouTube), Delta Lake 2021H2 survey, and 2021H2 roadmap, we propose the following Delta Lake performance enhancements in the next two quarters.

Issue	Description	Target CY2022
927	OPTIMIZE (file compaction): Table optimize is an operation to rearrange the data and/or metadata to speed up queries and/or reduce the metadata size	Released in 1.2
923	File skipping using columns stats: This is a performance optimization that aims at speeding up queries that contain filters (WHERE clauses) on non-partitionBy columns.	Released in 1.2
931	Automatic data skipping using generated columns: Enhance generated columns to include automatic data skipping	Released in 1.2
1134	OPTIMIZE ZORDER: Data clustering via multi-column locality-preserving space-filling curves with offline sorting.	Q3/Q4
	MERGE Performance Improvements: We will be providing a project improvement plan (PIP) document shortly on the proposed design for discussion.	Q2/Q3

Schema Operations

For this year, our focus will be on columnar mappings.

Issue	Description	Target CY2022
958	Support for renaming column: Rename column with ALTER TABLE	Released in 1.2
957	Support for arbitrary column names: Support characters in column names not allowed by Parquet	Released in 1.2
1064	Support for dropping columns: Drop column with ALTER TABLE	Released in 2.0
348	Support for dynamic partition overwrite: Currently you can overwrite using the `replaceWhere` option but in various scenarios, it is more convenient to specify overwrite partition.	Q2

Integrations

Extending from the recent releases of PrestoDB, Hive 3, and Delta Sink for Apache Flink Streams API, we have additional integrations planned.

Issue	Description	Target CY2022
112	Delta Source for Apache Pulsar: Build a Pulsar/Delta reader leveraging Delta Standalone. Join us via the Delta Users Slack #connector-pulsar channel.	Q2
238	Flink Sink on Table API: Build a Flink/Delta sink (i.e., Flink writes to Delta Lake) using the Apache Flink Table API. Join us via the Delta Users Slack #flink-delta-connector channel and we have bi-weekly meetings on Tuesdays.	Q2/Q3
110	Delta Source for Apache Flink: Build a Flink/Delta source (i.e., Flink reads from Delta Lake) leveraging Delta Standalone. Join us via the Delta Users Slack #flink-delta-connector channel and we have bi-weekly meetings on Tuesdays.	Q2/Q3
82	Delta Source for Trino: Joint Delta Lake and Trino community collaboration on the following PRs: 10987, 10300. This is a community effort and all are welcome! Join us via the Delta User Slack channel #trino channel and we will have bi-weekly meetings on Thursdays.	Released
	Delta Source for Big Query: Allows Big Query to natively read Delta Lake tables.	Q2/Q3
523, 566	Delta Rust Writer: Extending Delta Rust API to write to Delta Lake.	Q2/Q3
	Hive/Delta writer: Extending Hive to write to Delta Lake	Q3

Operations Enhancements

Two very popular requests are planned for this semester: Table Restore, S3 multi-cluster writes.

Issue	Description	Target CY2022
903, 863	Table Restore: Rollback to a previous version of a Delta table using Python, Scala, and/or SQL APIs.	Released in 1.2
41	S3 multi-cluster writes: Allows multiple clusters/drivers/JVMs to concurrently write to S3 using DynanoDB as the lock store. Please refer to this PIP: [2021-12-22] Delta OSS S3 Multi-Cluster Writes	Released in 1.2
747	delta.io.Guide: Enhance the Delta Lake documentation by creating a new guide (PIP will follow soon)	Q2/Q3
	Iceberg to Delta Converter: Ability to convert Iceberg table to Delta table without a rewrite.	Q3
	Table Cloning: Clones a source Delta table to a target destination at a specific version. A clone can be either deep or shallow: deep clones copy over the data from the source and shallow clones do not.	Q3
1105	Change Data Feed: The Delta change data feed represents row-level changes between versions of a Delta table. When enabled on a Delta table, the runtime records “change events” for all the data written into the table.	Q2

Updates

2022-05-18: Include Issue 348 for the dynamic partition overwrite feature request
2022-05-03: Updated tables with Delta Lake 1.2 release.
2022-03-08: Based on community feedback, we are also prioritizing Hive/Delta writer, clones, and CDF

If there are other issue that should be considered within this roadmap, let’s have a discussion here or via the Delta Users Slack #deltalake-oss channel.

Issue Analytics

State:
Created 2 years ago
Reactions:60
Comments:18 (9 by maintainers)

Top GitHub Comments

9reactions

sliu4commented, Mar 8, 2022

Would love to see a built-in solution for implementing a retention policy / archiving delta data on append-only tables - this would be a huge help for my team!

5reactions

novemberdudecommented, Feb 26, 2022

It would be great if the CDF was open source on the latest date. I really interest with this feature!

Top Results From Across the Web

H1 2022: Product roadmap - DEV Community ‍ ‍

H1 2022 : Product roadmap ... We believe that part of being open source is to share our processes, plans and roadmaps into...

Sven Wagner-Boysen on Twitter: "Received my #DeltaLake swag ...

This is the proposed Delta Lake 2022 H1 roadmap discussion thread. Below are the initially proposed items for the roadmap to be completed...

H1 2022: Product roadmap - Medusajs

H1 2022 : Product roadmap. Our 6-month roadmap for Medusa incl. new Tax API, bulk import/export, the introduction of strategies and improved API...

Everscale H1 2022 roadmap report —... - Everscale Network ...

Everscale H1 2022 roadmap report — developing at high speed Everscale fam, ... the ecosystem was filled with development, reform discussions, and marketing, ......

Century: Age of Ashes - Roadmap H1 2022 - Steam News

The Last Team Standing event leads the way, soon to be followed by a Voice Chat and more on the horizon!