Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] change column type from int to long, schema compatibility check failed

See original GitHub issue

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

I have an existed COW table foo (int a, string b). When I set hoodie.avro.schema.validate to true, upsert a dataframe(long a, string b) on table foo, “schema compatibility check failed” will throw.

To Reproduce

Steps to reproduce the behavior:

write existed data, with column type int

val existedDf = spark.sql("select 1 as a, '' as b, 1 as __row_key, 0 as __row_version")
val testTable = "foo"


val testConf = Map(
      "hoodie.table.name" -> testTable,
      "hoodie.avro.schema.validate" -> "true",
      "hoodie.datasource.write.recordkey.field" -> "__row_key",
      "hoodie.datasource.write.table.name" -> testTable,
      "hoodie.datasource.write.precombine.field" -> "__row_version",
      "hoodie.datasource.write.partitionpath.field" -> "",
      "hoodie.datasource.write.keygenerator.class" -> classOf[org.apache.hudi.keygen.NonpartitionedKeyGenerator].getName,
      "hoodie.datasource.write.hive_style_partitioning" -> "true",
      "hoodie.datasource.write.operation" -> "upsert"        
    )

existedDf.write.format("org.apache.hudi").options(testConf).mode("append").save(s"file:///jfs/cadl/hudi_data/schema/foo")

write new data, with column type long

val newDf = spark.sql("select cast(1 as long) as a, '' as b, 1 as __row_key, 1 as __row_version")
newDf.write.format("org.apache.hudi").options(testConf).mode("append").save(s"file:///jfs/cadl/hudi_data/schema/foo")

Then the exception raised. If I remove hoodie.avro.schema.validate from testConf, upsert successfully.

Expected behavior

Upsert and read successfully.

Environment Description

Hudi version : 0.6.0
Spark version : 2.4.3
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS…) : LocalFile
Running on Docker? (yes/no) :

Additional context

Stacktrace

org.apache.hudi.exception.HoodieUpsertException: Failed upsert schema compatibility check.
  at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:572)
  at org.apache.hudi.client.HoodieWriteClient.upsert(HoodieWriteClient.java:190)
  at org.apache.hudi.DataSourceUtils.doWriteOperation(DataSourceUtils.java:260)
  at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
  at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:125)
  at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
  at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
  at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
  at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
  at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
  at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
  at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
  at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
  at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
  ... 69 elided
Caused by: org.apache.hudi.exception.HoodieException: Failed schema compatibility check for writerSchema :{"type":"record","name":"foo_record","namespace":"hoodie.foo","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"a","type":"long"},{"name":"b","type":"string"},{"name":"__row_key","type":"int"},{"name":"__row_version","type":"int"}]}, table schema :{"type":"record","name":"foo_record","namespace":"hoodie.foo","fields":[{"name":"_hoodie_commit_time","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_commit_seqno","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_record_key","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_partition_path","type":["null","string"],"doc":"","default":null},{"name":"_hoodie_file_name","type":["null","string"],"doc":"","default":null},{"name":"a","type":"int"},{"name":"b","type":"string"},{"name":"__row_key","type":"int"},{"name":"__row_version","type":"int"}]}, base path :file:///jfs/cadl/hudi_data/schema/foo
  at org.apache.hudi.table.HoodieTable.validateSchema(HoodieTable.java:564)
  at org.apache.hudi.table.HoodieTable.validateUpsertSchema(HoodieTable.java:570)
  ... 94 more

Issue Analytics

State:
Created 3 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

bvaradarcommented, Sep 8, 2020

@cadl : Regarding schema validation error, @prashantwason will be looking into it when he is back.

Meanwhile, After you disable schema compatibility check, can you try setting “hoodie.avro.schema.externalTransformation=true” and see if you are able to upsert without any issues ?

0reactions

nsivabalancommented, Jan 24, 2021

@cadl : did you get a chance to try out the setting? We plan to close out this issue due to inactivity in a weeks time. But feel free to reopen to create a new ticket if you find any more issues.

Top Results From Across the Web

Schema Evolution and Compatibility - Confluent Documentation

The following table presents a summary of the types of schema changes allowed for the different compatibility types, for a given subject. The...

Postgresql change column type from int to UUID

It permanently throws away the old values in colA . ALTER TABLE tableA ALTER COLUMN colA SET DATA TYPE UUID USING (uuid_generate_v4()); A...

Changing incompatible column types - Cloudera Documentation

A default configuration change can cause applications that change column types to fail. ... Compatible column type changes, such as INT, STRING, BIGINT, ......

Resolve data incompatibility errors in Amazon Redshift

1. Retrieve the complete error message from the SVL_S3LOG system view: · 2. Check the Message column to view the error description. ·...

Schema Evolution - Apache Hudi

Make sure disable hive.metastore.disallow.incompatible.col.type.changes in hive side. Adding Columns. Syntax.

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[SUPPORT] change column type from int to long, schema compatibility check failed

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[SUPPORT]Deltastreamer Upsert Very Slow / Never Completes After Initial Data Load

[SUPPORT] EMR 6.0 hudi-spark-bundle 0.6.0 is not able to query data, two problems