[Bug] [Module Name] Kudu2ClickHouse Timestamp type error
See original GitHub issueSearch before asking
- I had searched in the issues and found no similar issues.
What happened
when i extract data from kudu 1.10.0 to clickhouse 21.x, kudu timestamp cloumn failed.(java.lang.ClassCastException: java.sql.Timestamp cannot be cast to java.lang.String at io.github.interestinglab.waterdrop.output.batch.Clickhouse.renderBaseTypeStatement(Clickhouse.scala:351))
Kudu timestamp column can not cast to clickhouse column type.Then executor throw “job aborted”.
Kudu Table ±----------------±----------±--------±------------±---------±--------------±--------------±--------------------±-----------+ | name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size | ±----------------±----------±--------±------------±---------±--------------±--------------±--------------------±-----------+ | cust_no | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | | tag_code | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | | update_datetime | timestamp | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 | ±----------------±----------±--------±------------±---------±--------------±--------------±--------------------±-----------+
CH Table
CREATE TABLE test.k_tag_sb
(
cust_no
String,
tag_code
String,
update_datetime
Date
)
ENGINE = MergeTree
ORDER BY cust_no;
SeaTunnel Version
1.5.5
SeaTunnel Config
# File: /opt/seatunnel-1.5.5/config/kudu2ch.batch.all.conf
spark {
spark.app.name = "kudu2ch"
spark.executor.instances = 2
spark.executor.cores = 1
spark.executor.memory = "1g"
}
input {
kudu{
kudu_master="newcdh01:7051,newcdh02:7051,newcdh04:7051"
kudu_table="impala::ukudu.k_tag_sb"
result_table_name="kudu_k_tab_sb_source"
}
}
filter {
}
output {
clickhouse {
source_table_name="kudu_k_tab_sb_source"
host = "newcdh04:8123"
clickhouse.socket_timeout = 50000
database = "test"
table = "k_tag_sb1"
# fields = ["cust_no","tag_code","update_datetime"]
username = "default"
password = "admin"
bulk_size = 20000
}
}
Running Command
/opt/seatunnel-1.5.5/bin/start-seatunnel.sh --master local[3] --deploy-mode client --config /opt/seatunnel-1.5.5/config/kudu2ch.batch.all.conf
Error Exception
2021-12-22 15:23:47 ERROR TaskSetManager:70 - Task 2 in stage 0.0 failed 1 times; aborting job
Exception in thread "main" java.lang.Exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 0.0 failed 1 times, most recent failure: Lost task 2.0 in stage 0.0 (TID 2, localhost, executor driver): java.lang.ClassCastException: java.sql.Timestamp cannot be cast to java.lang.String
at io.github.interestinglab.waterdrop.output.batch.Clickhouse.renderBaseTypeStatement(Clickhouse.scala:351)
at io.github.interestinglab.waterdrop.output.batch.Clickhouse.io$github$interestinglab$waterdrop$output$batch$Clickhouse$$renderStatementEntry(Clickhouse.scala:373)
at io.github.interestinglab.waterdrop.output.batch.Clickhouse$$anonfun$io$github$interestinglab$waterdrop$output$batch$Clickhouse$$renderStatement$1.apply$mcVI$sp(Clickhouse.scala:403)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
at io.github.interestinglab.waterdrop.output.batch.Clickhouse.io$github$interestinglab$waterdrop$output$batch$Clickhouse$$renderStatement(Clickhouse.scala:391)
at io.github.interestinglab.waterdrop.output.batch.Clickhouse$$anonfun$process$2.apply(Clickhouse.scala:187)
at io.github.interestinglab.waterdrop.output.batch.Clickhouse$$anonfun$process$2.apply(Clickhouse.scala:162)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Flink or Spark Version
spark-2.3.2-bin-hadoop2.6
Java or Scala Version
java 1.8
Screenshots
No response
Are you willing to submit PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project’s Code of Conduct
Issue Analytics
- State:
- Created 2 years ago
- Comments:11 (4 by maintainers)
Top GitHub Comments
I also tried this:
But it still doesn’t work.
Is it possible to convert the timestamp types internally to string datatypes ? I have 100 columns and casting them to string is a very tedious process. Any suggestions ?