question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SUPPORT] Transform from kafka complains about table not found when using transformer.sql

See original GitHub issue

Dear experts, we are trying to read data from a kafka topic and write it with hudi applying a transformation. Note our hudi command has worked before without transformation. Also we are working on a cluster that is configured very well and overall working fine.

**Command used" the relevant part of the spark-submit command is:

spark-submit
--jars abfs://somelocation/hudi-utilities-bundle_2.11-0.6.0.jar \
--props /somepath/mypropfile.properties \
--target-base-path abfs://somepath/somedb/unmanaged/sometable \
--table-type COPY_ON_WRITE \
--transformer-class org.apache.hudyi.utilities.transform.SqlQueryBasedTransformer \
--hoodie-conf hoodie.deltastreamer.transformer.sql='select `fielda,fieldb from <SRC>';

Stacktrace We keep on getting following error complaining about the table provided in target-base-path:

ERROR yarn.Client: Application diagnostics message: User class threw exception: org.apache.hudi.exception.TableNotFoundException: Hoodie table not found in path abfs://somepath/somedb/unmanaged/sometable/.hoodie'

We have defined an external table “sometable” for the table to land in the location above (abfs://somepath/somedb/unmanaged/sometable/), and it is empty so that hudi can write the data there.

Questions I have 1)in the “hoodie.deltastreamer.transformer.sql” statement, the SELECT … FROM < SRC > -> what table do we choose here? do we really write < SRC > in the code? Does it know then from the props file that you have provided that he needs to fetch it from a kafka topic? 2)is our approach correct: we have a working kafka topic, and an empty external table, with the location that we provide in target-base-path hudi should read the data from the kafka topic and then write to the location of the external table?

Environment Description

  • Hudi version : 0.6.0

  • Spark version :2.4.5

  • Hive version :Hive 3.1 on Tez

  • Hadoop version :3.1.1

  • Storage (HDFS/S3/GCS…) : abfs (azure blob storage)

  • Running on Docker? (yes/no) :no

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:8 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
xushiyancommented, Nov 17, 2021

@JB-data I have not used it myself. but from the code and example here, you write <SRC> which will be interpreted. See

  • org.apache.hudi.utilities.transform.TestSqlQueryBasedTransformer
  • org.apache.hudi.utilities.transform.SqlFileBasedTransformer#SRC_PATTERN
1reaction
xushiyancommented, Nov 13, 2021

This is similar to https://github.com/apache/hudi/issues/3906

@JB-data can you try not manually creating the path before job starts? deltastreamer is supposed to init a hoodie table for you. Also please consider upgrading to 0.8 or later since 0.6 is a bit too old.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[GitHub] [hudi] JB-data commented on issue #3905: [SUPPORT ...
[GitHub] [hudi] JB-data commented on issue #3905: [SUPPORT] Transform from kafka complains about table not found when using transformer.sql.
Read more >
kafka sink connector with mysql DB table not found
Kafka topic has value in AVRO format, and i want to dump data to mysql. I was getting error saying table not found...
Read more >
Streams DSL - Apache Kafka
Built-in abstractions for streams and tables in the form of KStream, KTable, and GlobalKTable. Having first-class support for streams and tables is crucial ......
Read more >
Handling "table not found" errors - Striim
Handling "table not found" errors. By default, when a writer's Tables property specifies a table that does not exist in the target database, ......
Read more >
2022 Changelog | ClickHouse Docs
Updated to use new field allow_readonly to allow using table functions in readonly ... This bug is not seen with new analyser (allow_experimental_analyzer), ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found