question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError

See original GitHub issue

Hi, I’m studying Hail and installing Hail on spark.

I have plan that run GWAS about 1000 genomes. So, I install and set up hail on spark.

<my environment> Linux: Centos 7.8 Python: 3.7.3 (anaconda) Apache spark: spark-2.2.0-bin-hadoop2.6 Hadoop: hadoop-2.6.0 Java -version (info. I’m using linux server by korea Institution, So i can’t use root permission) openjdk version “1.8.0_262” OpenJDK Runtime Environment (build 1.8.0_262-b10) OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode) Hail version: 0.2.68

<My workflow>

  1. Run start-master.sh and start-slaves.sh in spark sbin directory.
  2. (bash) pyspark

I got message below.

image image image

How can i set up hail on spark? Do i need to change java version?

Thank you for your services.

My <bashrc>, <conf/spark-defaults.conf> and <./spark-env.sh> are below.

<.bashrc>

#SPARK
export SPARK_HOME=/home/edu1/tools/spark-2.2.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/python:$PATH
export PYTHONPATH=$HAIL_HOME/python:$SPARK_HOME/python:$(echo ${SPARK_HOME}/python/lib/py4j-*-src.zip):$PYTHONPATH

# Hail
export HAIL_HOME=/home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail
export PATH=$PATH:$HAIL_HOME/bin
export PYTHONPATH=$PYTHONPATH:$HAIL_HOME/python
export SPARK_CLASSPATH=$HAIL_HOME/backend/hail-all-spark.jar

# JAVA (I just can modify .bashrc, so This would not apply to java path.)
export JAVA_HOME=/home/edu1/tools/jdk-1.8.0_231
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=$JAVA_HOME/lib/tools.jar

# Hadoop
export HADOOP_INSTALL=/home/edu1/tools/hadoop-2.6.0
export PAHT=$PATH:$HADOOP_INSTALL/bin
export LD_LIBRARY_PATH=$HADOOP_INSTALL/lib/native

</spark/conf/spark-defaults.conf>

spark.master                     spark://training.server:7077

spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator           is.hail.kryo.HailKryoRegistrator
spark.speculation                True

spark.driver.memory              37414m
spark.executor.memory            37414m
spark.executor.instances         1

spark.driver.extraClassPath      /home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
spark.executor.extraClassPath    /home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail/hail-all-spark.jar
spark.jars                       /home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar

spark.eventLog.enabled           true
spark.history.fs.logDirectory    file:/tmp/spark-events
spark.enevtLog.dir               file:/tmp/spark-events

spark.ui.reverseProxy            true
spark.ui.reverseProxyUrl         spark://training.server/spark
spark.executor.extraJavaOptions  -Dlog4j.debug=true

</spark/conf/spark-env.sh>

export SPARK_WORKER_INSTANCES=1

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
tpoterbacommented, Jul 6, 2021

This is a scala mismatch - this version of Hail was compiled with 2.12, but your spark version uses 2.11. If you use Spark 3.1 with Scala 2.12, it should work fine.

0reactions
johnc1231commented, Aug 16, 2021

I see you posted on discuss. let’s keep this issue closed and debugging can happen there.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError
Hi, I'm studying Hail and installing Hail on spark. I have plan that run GWAS about 1000 genomes. So, I install and set...
Read more >
py4j.protocol.Py4JError: org.apache.spark.api.python ...
Solution 2. Using findspark. Install findspark package by running $pip install findspark and add the following lines to your pyspark program.
Read more >
py4j.protocol.Py4JError: org.apache.spark ... - Stack Overflow
Using findspark is expected to solve the problem: Install findspark. $pip install findspark. In you code use:
Read more >
Using Spark to Analyze Tabular Data
Learn to use Spark in JupyterLab to analyze UK Biobank tabular data. Apache Spark is a modern, scalable framework ...
Read more >
Py4J.Protocol.Py4Jnetworkerror: Answer from Java Side Is ...
@hollinwilkins Mleap with pyspark transformers looks like a lot of work for someone coming ... Apache Spark error when running in a cluster:...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found