[SUPPORT] Trouble getting yyyy/mm partitioning to work with Hive sync
See original GitHub issueDescribe the problem you faced
Hi, everyone! We ingest data with options:
hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator
hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING
hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM
hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ
hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=
hoodie.deltastreamer.keygen.timebased.input.timezone='
hoodie.datasource.write.partitionpath.field=time:TIMESTAMP
Field time is in format 2021-05-16T21:36:39Z. We want for some table to have partitions by yyyy/MM, because they are small and there is no need in deep partitioning. But we have a problem with run_sync_tool.sh. What did we try:
- –partitioned-by time obviously didn’t help
- –partition-value-extractor org.apache.hudi.hive.MultiPartKeysValueExtractor –partitioned-by _hoodie_partition_path Didn’t help much as well, we are getting an error in screenshoot (in parquet file _hoodie_partition_path=2021/05 ) Any ideas how to fix it?
https://apache-hudi.slack.com/archives/C4D716NPQ/p1625675498061500
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
-
Hudi version :
-
Spark version :
-
Hive version :
-
Hadoop version :
-
Storage (HDFS/S3/GCS…) :
-
Running on Docker? (yes/no) :
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (3 by maintainers)
Top Results From Across the Web
subject:"\[GitHub\] \[hudi\] affei edited a comment ... - The Mail Archive
[GitHub] [hudi] affei edited a comment on issue #3337: [SUPPORT] Trouble getting yyyy/mm partitioning to work with Hive sync · 2021-07-29 Thread GitBox....
Read more >Synchronizing to hive partition is incorrect #828 - apache/hudi
The job success however I found some problems with the hive partition in new table. 1. The partition path is incorrect. If the...
Read more >Synchronizing Hudi Table Data to Hive
Command Description Mandatory or Not (Yes or...
‑‑database Specifies the Hive database name. No
‑‑table Specifies the Hive table name. Yes
‑‑base‑file‑format Specifies the file format...
Read more >Hive recipe to parition in Hive parquet - Dataiku Community
I want to use a Hive recipe to change some format or column name and partitionned the table with a created column (YYYY-MM-DD)......
Read more >All Configurations | Apache Hudi
This page covers the different ways of configuring your job to write/read Hudi tables. At a high level, you can control behaviour at...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Solve by using
--partitioned-by 'year,month'
. Thanks everybody!cool, thanks.