question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

REPLACE COLUMNS unsupported?

See original GitHub issue

The Delta Lake 1.0.0 docs contain an example for replacing columns using ALTER TABLE table_name REPLACE COLUMNS.... However, when I try to run this, I’m getting an exception from DeltaCatalog.

ALTER TABLE table_name REPLACE COLUMNS (col_1 string, col_2 double)

throws:

java.lang.UnsupportedOperationException: Unrecognized column change class org.apache.spark.sql.connector.catalog.TableChange$DeleteColumn. You may be running an out of date Delta Lake version.
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$alterTable$9(DeltaCatalog.scala:504)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$alterTable$9$adapted(DeltaCatalog.scala:476)
	at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
	at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
	at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.$anonfun$alterTable$2(DeltaCatalog.scala:476)
	at scala.collection.immutable.Map$Map2.foreach(Map.scala:159)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.alterTable(DeltaCatalog.scala:431)
	at org.apache.spark.sql.delta.catalog.DeltaCatalog.alterTable(DeltaCatalog.scala:57)
	at org.apache.spark.sql.execution.datasources.v2.AlterTableExec.run(AlterTableExec.scala:37)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
	at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:46)
	at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3687)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3685)
	at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
	at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
	at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:615)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:772)
	at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:610)
        ...

I took a look at DeltaCatalog and confirmed that alterTable() doesn’t handle DeleteColumn.

Is this actually a supported scenario? I took a look through the unit tests and there seem to be no tests covering this.

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Reactions:2
  • Comments:8 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
tdascommented, Jun 22, 2021

Delta Lake does not support deleting a column. This is an opinionated approach we have taken. We believe that deleting and renaming columns in tables lead to a lot of downstream confusion, and it’s easy for folks to shoot themselves in the foot with it - incorrect results, data loss, etc. Hence we do not support it as of now.

0reactions
dennygleecommented, Oct 12, 2021

Quick note, we currently have issue #732 to support column drop and rename. Closing this issue for now - thanks!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Spark - Operation not allowed: alter table replace columns
Sadly, it seems ALTER TABLE table REPLACE is not implemented by Spark. Take a look at SparkSqlParser.scala : SparkSqlParser.scala.
Read more >
ALTER TABLE REPLACE COLUMNS - Amazon Athena
The following ALTER TABLE REPLACE COLUMNS command replaces the column names with first_name , last_name , and city . The underlying source data...
Read more >
Solved: Replace column to hive - Cloudera Community - 147538
Solved: Hi: I want to delete one column to Hive table, my table is like that: CREATE TABLE journey_v4( CODTF - 147538.
Read more >
DataFrame.replace - Dask documentation
For a DataFrame a dict of values can be used to specify which value to use for each column (columns not in the...
Read more >
ALTER TABLE … ALTER COLUMN - Snowflake Documentation
The following table describes the supported/unsupported actions for modifying column properties: ... Replace a Masking Policy on a Column.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found