question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FAQ 1. Waterdrop开发者自己开发插件时,是否需要了解Waterdrop代码,是否需要把代码写到Waterdrop项目里?

开发者开发的插件,与waterdrop项目可以完全没有关系,不需要把你的插件代码写到waterdrop项目里面。 插件可以是一个完全独立的项目,在里面你用java,scala,maven,sbt,gradle,都随你。 这也是我们建议开发者开发插件的方式。


FAQ 2. 以集群模式(cluster)运行waterdrop,提示找不到:plugins.tar.gz

使用cluster模式提交前,需要您先执行如下命令:

# 备注: 预计下一个版本发布时,v1.2.3 我们会支持插件目录自动打包,无需再执行此命令。
tar zcvf plugins.tar.gz plugins

将插件目录打包后,执行(之后如果您的plugins目录没有添加或删除插件,则不需要再次打包了)

./bin/start-waterdrop.sh --master yarn --deploy-mode cluster --config ./config/first.conf

如有其他需要,请加微信garyelephant 为您服务。


FAQ 3. Waterdrop启动后报错如下:

ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3ANTLR Runtime version 4.7 used for parser compilation does not match the current runtime version 4.5.3

你的问题是jar包依赖冲突了,可以下载一下最新版本试试,应该没事了:

https://github.com/InterestingLab/waterdrop/releases/download/v1.2.3/waterdrop-1.2.3.zip


FAQ 4. 我想学习Waterdrop 源码,从哪里开始呢?

Waterdrop 拥有完全抽象化,结构化的代码实现,已经有很多人选择将Waterdrop的源码作为学习Spark的方式,你可以从主程序入口开始学习源码:Waterdrop.scala


FAQ 5. Waterdrop 是否支持动态的变量替换,比如我想在定时任务中替换sql中的where条件?

没问题,都支持,具体配置例子,请见 用${varname} 做变量替换的配置示例。


FAQ 6. Waterdrop 是否支持在Azkaban, Oozie 这些任务调度框架中运行呢?

当然可以,请见下面的截图:


FAQ 7. 使用Waterdrop时遇到问题,我自己解决不了,我应该怎么办?

请进入项目主页,找到项目负责人的微信号,加他微信。


FAQ 8. Waterdrop 中如何在配置中指定变量,之后在运行时,动态指定变量的值?

Waterdrop 从v1.2.4开始,支持在配置中指定变量,此功能常用于做定时或非定时的离线处理时,替换时间、日期等变量,用法如下:

在配置中,配置变量名称,比如:

...

filter {
  sql {
    table_name = "user_view"
    sql = "select * from user_view where city ='"${city}"' and dt = '"${date}"'"
  }
}

...

这里只是以sql filter举例,实际上,配置文件中任意位置的key = value中的value,都可以使用变量替换功能。

详细配置示例,请见variable substitution

启动命令如下:

# local  模式
./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m local[2] -i city=shanghai -i date=20190319

# yarn client 模式
./bin/start-waterdrop.sh -c ./config/your_app.conf -e client -m yarn -i city=shanghai -i date=20190319

# yarn cluster 模式
./bin/start-waterdrop.sh -c ./config/your_app.conf -e cluster -m yarn -i city=shanghai -i date=20190319

# mesos, spark standalone  启动方式相同。

可以用参数 -i 或者 --variable 后面指定 key=value来指定变量的值,其中key 需要与配置中的变量名相同。


FAQ 9. Waterdrop消费Kafka出现OOM怎么解决?

多数情况,OOM是由消费没有限速导致的,解决方法如下:

image

详见:https://www.processon.com/view/link/5c9862ece4b0c996d36fe7d7


Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:24 (18 by maintainers)

github_iconTop GitHub Comments

5reactions
RickyHuocommented, Jun 6, 2019

FAQ 10. Exception in thread “main” java.lang.NoSuchFieldError: INSTANCE

原因是CDH版本Spark自带的 httpclient.jar 版本较低,而ClickHouse JDBC基于的httpclient版本为4.5.2,包版本冲突。解决方法是用 httpclient-4.5.2 版本替换掉 CDH 自带的jar包

FAQ 11. 我的Spark集群的默认jdk是java7,我安装了java8之后,如何指定让waterdrop使用java启动:

在waterdrop的config 文件中,指定如下配置:

spark {
 ...
 spark.executorEnv.JAVA_HOME="/your/java_8_home/directory"
 spark.yarn.appMasterEnv.JAVA_HOME="/your/java_8_home/directory"
 ...
}

FAQ 11. 如何为Waterdrop on Yarn指定不同的JDK版本?

比如,你希望将JDK版本定为JDK8

分两种情况:

  • Yarn集群已经部署了JDK8,但是默认的JDK不是JDK8,此时只需要增加2个配置即可。

在waterdrop的config 文件中,指定如下配置:

spark {
 ...
 spark.executorEnv.JAVA_HOME="/your/java_8_home/directory"
 spark.yarn.appMasterEnv.JAVA_HOME="/your/java_8_home/directory"
 ...
}

FAQ 12 waterdrop配置多个数据源, 比如在input里面同时配置es和hdfs, 有这种多数据源的案例吗?

多数据源举例如下:

spark {
	...
}

input {
  hdfs { ... }	
  elasticsearch { ... }
  mysql {...}
}

filter {
	sql {
	 sql = """
	 	select .... from hdfs_table 
	 	join es_table 
	 	on hdfs_table.uid = es_table.uid where ..."""
	}
}

output {
	elasticsearch { ... }
}

类似这样的配置,可以实现多源数据处理。


4reactions
garyelephantcommented, Nov 20, 2021

FAQ 19. Waterdrop-v1(Spark) 有3种设置Logging相关参数的方法(如Log Level)

  • [不推荐] 更改默认的$SPARK_HOME/conf/log4j.properties

    • 这样会影响到所有通过此$SPARK_HOME/bin/spark-submit 提交程序的logging配置
  • [不推荐] 直接在Waterdrop的Spark代码中修改logging相关参数

    • 这样相当于写死了,每次更改都需要重新编译
  • [推荐] 通过下面的方式来更改logging配置(Waterdrop >= 1.5.5 之后才生效): 在waterdrop的配置文件中:

spark {
    spark.driver.extraJavaOptions = "-Dlog4j.configuration=file:<file path>/log4j.properties"
    spark.executor.extraJavaOptions = "-Dlog4j.configuration=file:<file path>/log4j.properties"
}
input {
  ...
}
filter {
 ...
}
output {
  ...
}

参考的配置文件内容如下:

$ cat log4j.properties
log4j.rootLogger=ERROR, console

# set the log level for these components
log4j.logger.org=ERROR
log4j.logger.org.apache.spark=ERROR
log4j.logger.org.spark-project=ERROR
log4j.logger.org.apache.hadoop=ERROR
log4j.logger.io.netty=ERROR
log4j.logger.org.apache.zookeeper=ERROR

# add a ConsoleAppender to the logger stdout to write to the console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
# use a simple message format
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n

疑问:Waterdrop-v2(Spark, Flink) 如何设置logging相关配置?

目前暂时还不能直接设置,需要用户修改waterdrop启动脚本,在spark-submit或者flink的提交任务命令中指定相关参数,具体是哪些参数,可直接参照


Reference:

https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console

http://spark.apache.org/docs/latest/configuration.html#configuring-logging

https://medium.com/@iacomini.riccardo/spark-logging-configuration-in-yarn-faf5ba5fdb01

https://stackoverflow.com/questions/27781187/how-to-stop-info-messages-displaying-on-spark-console

Read more comments on GitHub >

github_iconTop Results From Across the Web

25 Best Examples Of Effective FAQ Pages
Frequently Asked Question (FAQ) pages (or informational hubs) enable your business to respond, react, and anticipate the needs of your audience ...
Read more >
FAQ - Wikipedia
A frequently asked questions (FAQ) list is often used in articles, websites, email lists, and online forums where common questions tend to recur, ......
Read more >
12 Crystal-Clear FAQ Page Examples & How to Make Your Own
Most companies have an FAQ — or Frequently Asked Questions — page on their website. This page includes a series of questions that...
Read more >
FAQ – Privacy & Terms - Google's policies
How does Google protect my privacy and keep my information secure? We know security and privacy are important to you – and they...
Read more >
What is a FAQ page? 8 great examples & how to create one
An FAQ (Frequently Asked Questions) page is a key part of a knowledge base because it addresses the most common questions customers have...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found