question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hive Sync Error when creating a table with partition

See original GitHub issue

I have a bunch of data that I am writing to s3 and doing a Hive sync during the write process. The write to S3 is successful but the Hive sync is failing.

Hudi version - 0.5.0-SNAPSHOT from the master branch Hive version - Hive 2.3.2-amzn-2 Spark version - 2.4.3

Sample Data Frame

|     gender|comments|               title|              cc|    ip_address|last_name| id| birthdate|   salary|   registration_dttm|             country|               email|first_name|       key          | timestamp          |   date       |
+-----------+--------+--------------------+----------------+--------------+---------+---+----------+---------+--------------------+--------------------+--------------------+----------+--------------------+--------------------+--------------+
|     Female|   1E+02|    Internal Auditor|6759521864920116|   1.197.201.2|   Jordan|  1|  3/8/1971| 49756.53|2016-02-03T07:55:29Z|           Indonesia|    ajordan0@com.com|    Amanda|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|       Male|        |       Accountant IV|                |218.111.175.34|  Freeman|  2| 1/16/1968|150280.17|2016-02-03T17:04:03Z|              Canada|     afreeman1@is.gd|    Albert|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|     Female|        | Structural Engineer|6767119071901597|  7.161.136.94|   Morgan|  3|  2/1/1960|144972.51|2016-02-03T01:09:31Z|              Russia|emorgan2@altervis...|    Evelyn|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|     Female|        |Senior Cost Accou...|3576031598965625| 140.35.109.83|    Riley|  4|  4/8/1997| 90263.05|2016-02-03T12:36:21Z|               China|    driley3@gmpg.org|    Denise|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|           |        |                    |5602256255204850|169.113.235.40|    Burns|  5|          |         |2016-02-03T05:05:31Z|        South Africa|cburns4@miitbeian...|    Carlos|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|Transgender|        |   Account Executive|3583136326049310|195.131.81.179|    White|  6| 2/25/1983| 69227.11|2016-02-03T07:22:34Z|           Indonesia|  kwhite5@google.com|   Kathryn|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|    Unknown|        |Senior Financial ...|3582641366974690|232.234.81.197|   Holmes|  7|12/18/1987| 14247.62|2016-02-03T08:33:08Z|            Portugal|sholmes6@foxnews.com|    Samuel|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|     Secret|        |    Web Developer IV|                |  91.235.51.73|   Howell|  8|  3/1/1962|186469.43|2016-02-03T06:47:06Z|Bosnia and Herzeg...| hhowell7@eepurl.com|     Harry|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|    Obvious|   1E+02|Software Test Eng...|                |  132.31.53.61|   Foster|  9| 3/27/1992|231067.84|2016-02-03T03:52:53Z|         South Korea|   jfoster8@yelp.com|      Jose|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
|     Female|        |     Health Coach IV|3574254110301671|143.28.251.245|  Stewart| 10| 1/28/1997| 27234.28|2016-02-03T18:29:47Z|             Nigeria|estewart9@opensou...|     Emily|aHVkaV8wXzRfN190Z...|2019-09-04 14:28:...|    2019/09/04|
+-----------+--------+--------------------+----------------+--------------+---------+---+----------+---------+--------------------+--------------------+--------------------+----------+--------------------+--------------------+--------------+

Spark Scala Code

cleanedDF
    .write.format("org.apache.hudi")
    .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL)
    .option(DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY, "key")
    .option(DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY, "date")
    .option(DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY, "timestamp")
    .option(HoodieWriteConfig.TABLE_NAME, catalogName)
    .option(DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY, "true")
    .option(DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY, "date")
    .option(DataSourceWriteOptions.HIVE_TABLE_OPT_KEY, catalogName)
    .option(DataSourceWriteOptions.HIVE_URL_OPT_KEY, sparkConfig.hiveJDBCUri)
    .option("path", basePath)
    .mode(SaveMode.Append)
    .save()

Error

837579 [stream execution thread for [id = 2bb17a39-4a86-4952-8378-0d431b1ba74f, runId = a6d04a3e-1316-407f-a41e-1b24822b3420]] WARN  com.amazonaws.services.s3.internal.S3AbortableInputStream  - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
837580 [stream execution thread for [id = 2bb17a39-4a86-4952-8378-0d431b1ba74f, runId = a6d04a3e-1316-407f-a41e-1b24822b3420]] INFO  org.apache.hudi.hive.HiveSyncTool  - Table hudi_test is not found. Creating it
837594 [stream execution thread for [id = 2bb17a39-4a86-4952-8378-0d431b1ba74f, runId = a6d04a3e-1316-407f-a41e-1b24822b3420]] INFO  org.apache.hudi.hive.HoodieHiveClient  - Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS default.hudi_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `gender` string, `comments` string, `title` string, `cc` string, `ip_address` string, `last_name` string, `id` bigint, `birthdate` string, `salary` string, `registration_dttm` string, `country` string, `email` string, `first_name` string, `key` string, `timestamp` bigint) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://<some-bucket>/catalogs/hudi_test/hudi'
837602 [stream execution thread for [id = 2bb17a39-4a86-4952-8378-0d431b1ba74f, runId = a6d04a3e-1316-407f-a41e-1b24822b3420]] INFO  org.apache.hudi.hive.HoodieHiveClient  - Executing SQL CREATE EXTERNAL TABLE  IF NOT EXISTS default.hudi_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `gender` string, `comments` string, `title` string, `cc` string, `ip_address` string, `last_name` string, `id` bigint, `birthdate` string, `salary` string, `registration_dttm` string, `country` string, `email` string, `first_name` string, `key` string, `timestamp` bigint) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://<some-bucket>/catalogs/hudi_test/hudi'
837698 [stream execution thread for [id = 2bb17a39-4a86-4952-8378-0d431b1ba74f, runId = a6d04a3e-1316-407f-a41e-1b24822b3420]] ERROR org.apache.spark.sql.execution.streaming.MicroBatchExecution  - Query [id = 2bb17a39-4a86-4952-8378-0d431b1ba74f, runId = a6d04a3e-1316-407f-a41e-1b24822b3420] terminated with error
org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing SQL CREATE EXTERNAL TABLE  IF NOT EXISTS default.hudi_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `gender` string, `comments` string, `title` string, `cc` string, `ip_address` string, `last_name` string, `id` bigint, `birthdate` string, `salary` string, `registration_dttm` string, `country` string, `email` string, `first_name` string, `key` string, `timestamp` bigint) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://<some-bucket>/catalogs/hudi_test/hudi'
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:467)
	at org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:265)
	at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:132)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:96)
	at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:68)
	at org.apache.hudi.HoodieSparkSqlWriter$.syncHive(HoodieSparkSqlWriter.scala:235)
	at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:169)
	at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:91)
	at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)\
	at org.apache.spark.sql.execution.streaming.sources.ForeachBatchSink.addBatch(ForeachBatchSink.scala:35)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
	at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
	at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
	at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
	at org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
	at org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
	at org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:160)
	at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:281)
	at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:193)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:522 cannot recognize input near 'date' 'string' ')' in column specification
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:267)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:253)
	at org.apache.hive.jdbc.HiveStatement.runAsyncOnServer(HiveStatement.java:313)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:253)
	at org.apache.hudi.org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
	at org.apache.hudi.org.apache.commons.dbcp.DelegatingStatement.execute(DelegatingStatement.java:264)
	at org.apache.hudi.hive.HoodieHiveClient.updateHiveSQL(HoodieHiveClient.java:465)
	... 53 more
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: ParseException line 1:522 cannot recognize input near 'date' 'string' ')' in column specification
	at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:320)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at com.sun.proxy.$Proxy35.executeStatementAsync(Unknown Source)
	at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
	at org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.ParseException:line 1:522 cannot recognize input near 'date' 'string' ')' in column specification
	at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211)
	at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:77)
	at org.apache.hadoop.hive.ql.parse.ParseUtils.parse(ParseUtils.java:70)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:468)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1295)
	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:204)
	... 27 more

As far as I understand, Hive does not allow you to add partitions to tables during CREATE.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
vinothchandarcommented, Sep 11, 2019

doh. Thanks for chasing this down @firecast . Opened HUDI-244 to fix this. I think the other fields are escaped correctly.

0reactions
firecastcommented, Sep 11, 2019

It looks like the issue is with the partition name __xxx_date__. When I tried the SQL hoodie was doing it gave the following error in Hive FAILED: ParseException line 1:503 cannot recognize input near '__xx_date__' 'string' ')' in column specification

So I escaped the partition column manually in the SQL and it worked.

CREATE EXTERNAL TABLE  IF NOT EXISTS default.hudi_test( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `gender` string, `comments` string, `title` string, `cc` string, `ip_address` string, `last_name` string, `id` bigint, `birth__xx_date__` string, `salary` string, `registration_dttm` string, `country` string, `email` string, `first_name` string, `key` string, `timestamp` bigint) PARTITIONED BY (`__xx_date__` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://demo-bucket/catalogs/hudi_test/hudi'

It looks like HUDI does not escape the column name in the PARTITIONED BY clause.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Hive Sync Error when creating a table with partition · Issue #879
I have a bunch of data that I am writing to s3 and doing a Hive sync during the write process. The write...
Read more >
HIVE partitions adding not working as expected..pa... - 224083
Currently i am working on HIVE tables and facing issue with hive partitions ,we have script to drop partitions if exist based on...
Read more >
Hive sync fails to register tables partitioned by Date Type column
Hive is not able to make sense of the partition field values like 17897 as it is not able to convert it to...
Read more >
[jira] [Updated] (HUDI-4099) hive sync no partition table error
ERROR > Caused by: org.apache.hudi.hive.HoodieHiveSyncException: Failed in executing > SQL CREATE EXTERNAL TABLE IF NOT EXISTS > `default`.
Read more >
Troubleshoot Athena query failing with the error ...
If you created the table manually, then use an Athena data definition language (DDL) statement to drop the affected partition and recreate the ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found