[BUG] INSERT INTO with generated columns fails (not enough data columns)
See original GitHub issueWhen executing sql("INSERT INTO delta_gencols VALUES 1")
Delta Lake 1.2.1 fails with the following AnalysisException
:
org.apache.spark.sql.AnalysisException: Cannot write to 'default.delta_gencols', not enough data columns; target table has 2 column(s) but the inserted data has 1 column(s)
at org.apache.spark.sql.delta.DeltaErrors$.notEnoughColumnsInInsert(DeltaErrors.scala:356)
at org.apache.spark.sql.delta.DeltaAnalysis.org$apache$spark$sql$delta$DeltaAnalysis$$needsSchemaAdjustment(DeltaAnalysis.scala:322)
at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:67)
at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:64)
The table was created as follows:
import io.delta.tables.DeltaTable
import org.apache.spark.sql.types.DataTypes
val tableName = "delta_gencols"
sql(s"DROP TABLE IF EXISTS $tableName")
DeltaTable.create
.addColumn("id", DataTypes.LongType, nullable = false)
.addColumn(
DeltaTable.columnBuilder("value")
.dataType(DataTypes.BooleanType)
.generatedAlwaysAs("true")
.build)
.tableName(tableName)
.execute
I found this conversation with @zsxwing in delta-users group, but that does not seem to apply (unless I’m mistaken):
The INSERT INTO issue and CREATE TABLE syntax for generated columns are not available in OSS Delta Lake right now because Apache Spark SQL parser doesn't support them.
Spark SQL API works just fine:
spark.range(5).writeTo(tableName).append()
Environment information
- Delta Lake version: 1.2.1
- Spark version: 3.2.1
- Scala version: 2.12.15
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.2.1
/_/
Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.14)
Issue Analytics
- State:
- Created a year ago
- Reactions:1
- Comments:8 (4 by maintainers)
Top Results From Across the Web
Generated Column bug - Google Groups
1. The INSERT INTO issue and CREATE TABLE syntax for generated columns are not available in OSS Delta Lake right now because Apache...
Read more >insert into a table with an identity column fails
i am trying to insert into a table with an identity column using a select ... not enough data columns; target table has...
Read more >Postgres: Can I bypass the error "cannot insert into generated ...
I know this isn't pretty but it would be helpful to bypass the error for insert into a generated column in Postgres.
Read more >mysqldump creates invalid dumpfile when generated columns ...
Description: mysqldump creates an invalid insert statement when table contains generated column first: INSERT INTO `t` (`b`) VALUES (,1); ...
Read more >898 Insert Row tries to insert values into generated columns
If the metadata for the column doesn't indicate that the column is auto-generated, then there is not much we can do. Check the...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
For the query
INSERT INTO delta_gencols VALUES 1
(delta_gencols has the id and value column as shown in the issue description), we don’t support it as it’s not a SQL standard. The following two SQL syntax are the standard:INSERT INTO delta_gencols VALUES(1, DEFAULT)
. Spark is adding the DEFAULT keyword support and we will work with them to make it work with generated columns as well ( https://issues.apache.org/jira/browse/SPARK-38334 )INSERT INTO delta_gencols(id) VALUES(1)
. Currently Spark will block such INSERT query if the user doesn’t provide the entire column list. We are working on this and hope to remove the restriction so that we can make it work with generated columns in Delta.@keen85 SPARK-41290 is the spark side ticket. We will make sure
insert into
works with Generated Columns when finishing the change in Spark side.