question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] INSERT INTO with generated columns fails (not enough data columns)

See original GitHub issue

When executing sql("INSERT INTO delta_gencols VALUES 1") Delta Lake 1.2.1 fails with the following AnalysisException:

org.apache.spark.sql.AnalysisException: Cannot write to 'default.delta_gencols', not enough data columns; target table has 2 column(s) but the inserted data has 1 column(s)
  at org.apache.spark.sql.delta.DeltaErrors$.notEnoughColumnsInInsert(DeltaErrors.scala:356)
  at org.apache.spark.sql.delta.DeltaAnalysis.org$apache$spark$sql$delta$DeltaAnalysis$$needsSchemaAdjustment(DeltaAnalysis.scala:322)
  at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:67)
  at org.apache.spark.sql.delta.DeltaAnalysis$$anonfun$apply$1.applyOrElse(DeltaAnalysis.scala:64)

The table was created as follows:

import io.delta.tables.DeltaTable
import org.apache.spark.sql.types.DataTypes

val tableName = "delta_gencols"
sql(s"DROP TABLE IF EXISTS $tableName")
DeltaTable.create
  .addColumn("id", DataTypes.LongType, nullable = false)
  .addColumn(
    DeltaTable.columnBuilder("value")
      .dataType(DataTypes.BooleanType)
      .generatedAlwaysAs("true")
      .build)
  .tableName(tableName)
  .execute

I found this conversation with @zsxwing in delta-users group, but that does not seem to apply (unless I’m mistaken):

The INSERT INTO issue and CREATE TABLE syntax for generated columns are not available in OSS Delta Lake right now because Apache Spark SQL parser doesn't support them.

Spark SQL API works just fine:

spark.range(5).writeTo(tableName).append()

Environment information

  • Delta Lake version: 1.2.1
  • Spark version: 3.2.1
  • Scala version: 2.12.15
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.2.1
      /_/

Using Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 11.0.14)

Issue Analytics

  • State:open
  • Created a year ago
  • Reactions:1
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
zsxwingcommented, Jun 29, 2022

For the query INSERT INTO delta_gencols VALUES 1 (delta_gencols has the id and value column as shown in the issue description), we don’t support it as it’s not a SQL standard. The following two SQL syntax are the standard:

  • INSERT INTO delta_gencols VALUES(1, DEFAULT). Spark is adding the DEFAULT keyword support and we will work with them to make it work with generated columns as well ( https://issues.apache.org/jira/browse/SPARK-38334 )
  • INSERT INTO delta_gencols(id) VALUES(1). Currently Spark will block such INSERT query if the user doesn’t provide the entire column list. We are working on this and hope to remove the restriction so that we can make it work with generated columns in Delta.
0reactions
zsxwingcommented, Dec 1, 2022

@keen85 SPARK-41290 is the spark side ticket. We will make sure insert into works with Generated Columns when finishing the change in Spark side.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Generated Column bug - Google Groups
1. The INSERT INTO issue and CREATE TABLE syntax for generated columns are not available in OSS Delta Lake right now because Apache...
Read more >
insert into a table with an identity column fails
i am trying to insert into a table with an identity column using a select ... not enough data columns; target table has...
Read more >
Postgres: Can I bypass the error "cannot insert into generated ...
I know this isn't pretty but it would be helpful to bypass the error for insert into a generated column in Postgres.
Read more >
mysqldump creates invalid dumpfile when generated columns ...
Description: mysqldump creates an invalid insert statement when table contains generated column first: INSERT INTO `t` (`b`) VALUES (,1); ...
Read more >
898 Insert Row tries to insert values into generated columns
If the metadata for the column doesn't indicate that the column is auto-generated, then there is not much we can do. Check the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found