question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[hivewriter] with partition result with empty data

See original GitHub issue

my flinkx config with kafkareader and hivewriter like this

{
    "job": {
        "content": [
            {
                "reader": {
                    "parameter": {
                        "topic": "flinkx_test",
                        "mode": "earliest-offset",
                        "codec": "json",
                        "consumerSettings": {
                            "bootstrap.servers": "demo:9092"
                        },
                        "column": [
                            {
                                "name": "distinct_id",
                                "type": "string"
                            },
                            {
                                "name": "event_id",
                                "type": "string"
                            },
                            {
                                "name": "event",
                                "type": "string"
                            },
                            {
                                "name": "properties_push_content",
                                "type": "string"
                            }
                        ]
                    },
                    "name": "kafkareader"
                },
                "writer": {
                    "parameter": {
                        "jdbcUrl": "jdbc:hive2://ip:10000/demo;principal=hive/_HOST@demo.COM",
                        "username": "demo",
                        "fileType": "parquet",
                        "writeMode": "append",
                        "compress": "SNAPPY",
                        "charsetName": "UTF-8",
                        "maxFileSize": 134217728,
                        "tablesColumn": "{\"flinkx_test\":[{\"key\":\"distinct_id\",\"type\":\"string\"},{\"key\":\"event_id\",\"type\":\"string\"},{\"key\":\"event\",\"type\":\"string\"},{\"key\":\"type\",\"type\":\"string\"},{\"key\":\"time\",\"type\":\"string\"},{\"key\":\"properties_push_content\",\"type\":\"string\"},{\"key\":\"properties_push_status\",\"type\":\"string\"}]}",
                        "partition": "pt_mi",
                        "partitionType": "MINUTE",
                        "defaultFS": "hdfs://nameHAservice",
                        "hadoopConfig": {}
                    },
                    "name": "hivewriter"
                }
            }
        ],
        "setting": {
            "restore": {
                "isRestore": false,
                "isStream": false
            },
            "speed": {
                "readerChannel": 1,
                "writerChannel": 1
            }
        }
    }
}

and the table and partition path is created in hdfs. but the data path is empty, no parquet file exist;

spark-sql> dfs -du -h  /user/hive/warehouse/demo.db/flinkx_test;
0  /user/hive/warehouse/demo.db/flinkx_test/pt_mi=202105251620
0  /user/hive/warehouse/demo.db/flinkx_test/pt_mi=202105251621
0  /user/hive/warehouse/demo.db/flinkx_test/pt_mi=202105251622
spark-sql> 

so why the data not write in the path.

I use this kafka reader cant write to mysql, but not working with hivewrite.

any idea would be appreciated!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
kanata163commented, May 25, 2021

配置CK间隔,CK时flush数据 image image

0reactions
geosmartcommented, Feb 15, 2022

semantic语义配置

Flink的OutputFormat接口包含4个步骤:open, configure, writeRecord, close 。 outputFormat本质上是一个RichSinkFunction

  • 设置semantic=exactly-once时,会以2pc模式提交;
  • 设置semantic=at-least-once时,会直接提交到db;

jdbc

  • sinkFormat在open时初始化jdbc连接:exactly-once时开启con.setAutoCommit(false)
  • ck触发时执行stmt.addBatch批量写操作
  • ckComplete时时执行tx.commit
  • ckAbort时时执行tx.rollback

hdfs

  • sinkFunction在时执行parquet/orc的writer.write到内存;
  • ck触发时将flushData(将writer中文件写入.data目录的文件),将.data/xx.parquet文件复制到实际数据目录;
  • ckComplete时删除.data/xx.parquet
  • ckAbort同ckComplete操作;

ck=checkpoint

Read more comments on GitHub >

github_iconTop Results From Across the Web

Query from partitioned table returns empty result
MSCK REPAIR TABLE <tablename>;. But the problem is when new partition is added, say new date, we need to run this command again...
Read more >
Does INSERT OVERWRITE create new empty partition if ...
Yes, it will create empty partition even if SELECT returned 0 results. ... and run SELECT part again and check if it returns...
Read more >
empty result on hive table with integer partition keys #2029
Querying this table with presto seems to work fine. Haven't done anything special to the hdfs config except to add the AWS key...
Read more >
Hive Writer - Striim
When a HiveWriter target is deployed on multiple Striim servers, partition the input stream or use an environment variable in table mappings ...
Read more >
External Hive Partitioned table, is empty!! - Cloudera Community
Im trying to create an external hive partitioned table which location ... to get my appending HDFS data in to an external Partition...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found