REPLACE COLUMNS unsupported?
See original GitHub issueThe Delta Lake 1.0.0 docs contain an example for replacing columns using ALTER TABLE table_name REPLACE COLUMNS...
. However, when I try to run this, I’m getting an exception from DeltaCatalog
.
ALTER TABLE table_name REPLACE COLUMNS (col_1 string, col_2 double)
throws:
java.lang.UnsupportedOperationException: Unrecognized column change class org.apache.spark.sql.connector.catalog.TableChange$DeleteColumn. You may be running an out of date Delta Lake version.
at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$alterTable$9(DeltaCatalog.scala:504)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$alterTable$9$adapted(DeltaCatalog.scala:476)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$alterTable$2(DeltaCatalog.scala:476)
at scala.collection.immutable.Map$Map2.foreach(Map.scala:159)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.alterTable(DeltaCatalog.scala:431)
at org.apache.spark.sql.delta.catalog.DeltaCatalog.alterTable(DeltaCatalog.scala:57)
at org.apache.spark.sql.execution.datasources.v2.AlterTableExec.run(AlterTableExec.scala:37)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
...
I took a look at DeltaCatalog
and confirmed that alterTable()
doesn’t handle DeleteColumn
.
Is this actually a supported scenario? I took a look through the unit tests and there seem to be no tests covering this.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:2
- Comments:8 (8 by maintainers)
Top Results From Across the Web
Spark - Operation not allowed: alter table replace columns
Sadly, it seems ALTER TABLE table REPLACE is not implemented by Spark. Take a look at SparkSqlParser.scala : SparkSqlParser.scala.
Read more >ALTER TABLE REPLACE COLUMNS - Amazon Athena
The following ALTER TABLE REPLACE COLUMNS command replaces the column names with first_name , last_name , and city . The underlying source data...
Read more >Solved: Replace column to hive - Cloudera Community - 147538
Solved: Hi: I want to delete one column to Hive table, my table is like that: CREATE TABLE journey_v4( CODTF - 147538.
Read more >DataFrame.replace - Dask documentation
For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the...
Read more >ALTER TABLE … ALTER COLUMN - Snowflake Documentation
The following table describes the supported/unsupported actions for modifying column properties: ... Replace a Masking Policy on a Column.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Delta Lake does not support deleting a column. This is an opinionated approach we have taken. We believe that deleting and renaming columns in tables lead to a lot of downstream confusion, and it’s easy for folks to shoot themselves in the foot with it - incorrect results, data loss, etc. Hence we do not support it as of now.
Quick note, we currently have issue #732 to support column drop and rename. Closing this issue for now - thanks!