Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Trouble getting yyyy/mm partitioning to work with Hive sync

See original GitHub issue

Describe the problem you faced

Hi, everyone! We ingest data with options:

hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM
hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ
hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=
hoodie.deltastreamer.keygen.timebased.input.timezone='
hoodie.datasource.write.partitionpath.field=time:TIMESTAMP

Field time is in format 2021-05-16T21:36:39Z. We want for some table to have partitions by yyyy/MM, because they are small and there is no need in deep partitioning. But we have a problem with run_sync_tool.sh. What did we try:

–partitioned-by time obviously didn’t help
–partition-value-extractor org.apache.hudi.hive.MultiPartKeysValueExtractor –partitioned-by _hoodie_partition_path Didn’t help much as well, we are getting an error in screenshoot (in parquet file _hoodie_partition_path=2021/05 ) Any ideas how to fix it?

https://apache-hudi.slack.com/archives/C4D716NPQ/p1625675498061500

To Reproduce

Steps to reproduce the behavior:

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS…) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

Issue Analytics

State:
Created 2 years ago
Comments:5 (3 by maintainers)

Top GitHub Comments

1reaction

affeicommented, Jul 29, 2021

Solve by using --partitioned-by 'year,month'. Thanks everybody!

0reactions

nsivabalancommented, Jul 29, 2021

cool, thanks.

Top Results From Across the Web

subject:"\[GitHub\] \[hudi\] affei edited a comment ... - The Mail Archive

[GitHub] [hudi] affei edited a comment on issue #3337: [SUPPORT] Trouble getting yyyy/mm partitioning to work with Hive sync · 2021-07-29 Thread GitBox....

Synchronizing to hive partition is incorrect #828 - apache/hudi

The job success however I found some problems with the hive partition in new table. 1. The partition path is incorrect. If the...

Synchronizing Hudi Table Data to Hive

Command Description Mandatory or Not (Yes or... ‑‑database Specifies the Hive database name. No ‑‑table Specifies the Hive table name. Yes ‑‑base‑file‑format Specifies the file format...

Hive recipe to parition in Hive parquet - Dataiku Community

I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)......

All Configurations | Apache Hudi

This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...