question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] change column type from int to long, schema compatibility check failed

See original GitHub issue

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

I have an existed COW table foo (int a, string b). When I set hoodie.avro.schema.validate to true, upsert a dataframe(long a, string b) on table foo, “schema compatibility check failed” will throw.

To Reproduce

Steps to reproduce the behavior:

  1. write existed data, with column type int
val existedDf = spark.sql("select 1 as a, '' as b, 1 as __row_key, 0 as __row_version")
val testTable = "foo"


val testConf = Map(
      "hoodie.table.name" -> testTable,
      "hoodie.avro.schema.validate" -> "true",
      "hoodie.datasource.write.recordkey.field" -> "__row_key",
      "hoodie.datasource.write.table.name" -> testTable,
      "hoodie.datasource.write.precombine.field" -> "__row_version",
      "hoodie.datasource.write.partitionpath.field" -> "",
      "hoodie.datasource.write.keygenerator.class" -> classOf[org.apache.hudi.keygen.NonpartitionedKeyGenerator].getName,
      "hoodie.datasource.write.hive_style_partitioning" -> "true",
      "hoodie.datasource.write.operation" -> "upsert"        
    )

existedDf.write.format("org.apache.hudi").options(testConf).mode("append").save(s"file:///jfs/cadl/hudi_data/schema/foo")
  1. write new data, with column type long
val newDf = spark.sql("select cast(1 as long) as a, '' as b, 1 as __row_key, 1 as __row_version")
newDf.write.format("org.apache.hudi").options(testConf).mode("append").save(s"file:///jfs/cadl/hudi_data/schema/foo")

Then the exception raised. If I remove hoodie.avro.schema.validate from testConf, upsert successfully.

Expected behavior

Upsert and read successfully.

Environment Description

  • Hudi version : 0.6.0

  • Spark version : 2.4.3

  • Hive version :

  • Hadoop version :

  • Storage (HDFS/S3/GCS…) : LocalFile

  • Running on Docker? (yes/no) :

Additional context

Stacktrace

org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check.
  at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:572)
  at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:190)
  at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
  ... 69 elided
Caused by: org.apache.hudi.exception.HoodieException: Failed schema compatibility check for writerSchema :{"type":"record","name":"foo_record","namespace":"hoodie.foo","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"a","type":"long"},{"name":"b","type":"string"},{"name":"__row_key","type":"int"},{"name":"__row_version","type":"int"}]}, table schema :{"type":"record","name":"foo_record","namespace":"hoodie.foo","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"a","type":"int"},{"name":"b","type":"string"},{"name":"__row_key","type":"int"},{"name":"__row_version","type":"int"}]}, base path :file:///jfs/cadl/hudi_data/schema/foo
  at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:564)
  at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:570)
  ... 94 more

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
bvaradarcommented, Sep 8, 2020

@cadl : Regarding schema validation error, @prashantwason will be looking into it when he is back.

Meanwhile, After you disable schema compatibility check, can you try setting “hoodie.avro.schema.externalTransformation=true” and see if you are able to upsert without any issues ?

0reactions
nsivabalancommented, Jan 24, 2021

@cadl : did you get a chance to try out the setting? We plan to close out this issue due to inactivity in a weeks time. But feel free to reopen to create a new ticket if you find any more issues.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Schema Evolution and Compatibility - Confluent Documentation
The following table presents a summary of the types of schema changes allowed for the different compatibility types, for a given subject. The...
Read more >
Postgresql change column type from int to UUID
It permanently throws away the old values in colA . ALTER TABLE tableA ALTER COLUMN colA SET DATA TYPE UUID USING (uuid_generate_v4()); A...
Read more >
Changing incompatible column types - Cloudera Documentation
A default configuration change can cause applications that change column types to fail. ... Compatible column type changes, such as INT, STRING, BIGINT, ......
Read more >
Resolve data incompatibility errors in Amazon Redshift
1. Retrieve the complete error message from the SVL_S3LOG system view: · 2. Check the Message column to view the error description. ·...
Read more >
Schema Evolution - Apache Hudi
Make sure disable hive.metastore.disallow.incompatible.col.type.changes in hive side. Adding Columns​. Syntax.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found