Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError

See original GitHub issue

Hi, I’m studying Hail and installing Hail on spark.

I have plan that run GWAS about 1000 genomes. So, I install and set up hail on spark.

<my environment> Linux: Centos 7.8 Python: 3.7.3 (anaconda) Apache spark: spark-2.2.0-bin-hadoop2.6 Hadoop: hadoop-2.6.0 Java -version (info. I’m using linux server by korea Institution, So i can’t use root permission) openjdk version “1.8.0_262” OpenJDK Runtime Environment (build 1.8.0_262-b10) OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode) Hail version: 0.2.68

<My workflow>

Run start-master.sh and start-slaves.sh in spark sbin directory.
(bash) pyspark

I got message below.

How can i set up hail on spark? Do i need to change java version?

Thank you for your services.

My <bashrc>, <conf/spark-defaults.conf> and <./spark-env.sh> are below.

<.bashrc>

#SPARK
export SPARK_HOME=/home/edu1/tools/spark-2.2.0-bin-hadoop2.6
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/python:$PATH
export PYTHONPATH=$HAIL_HOME/python:$SPARK_HOME/python:$(echo ${SPARK_HOME}/python/lib/py4j-*-src.zip):$PYTHONPATH

# Hail
export HAIL_HOME=/home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail
export PATH=$PATH:$HAIL_HOME/bin
export PYTHONPATH=$PYTHONPATH:$HAIL_HOME/python
export SPARK_CLASSPATH=$HAIL_HOME/backend/hail-all-spark.jar

# JAVA (I just can modify .bashrc, so This would not apply to java path.)
export JAVA_HOME=/home/edu1/tools/jdk-1.8.0_231
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=$JAVA_HOME/lib/tools.jar

# Hadoop
export HADOOP_INSTALL=/home/edu1/tools/hadoop-2.6.0
export PAHT=$PATH:$HADOOP_INSTALL/bin
export LD_LIBRARY_PATH=$HADOOP_INSTALL/lib/native

</spark/conf/spark-defaults.conf>

spark.master                     spark://training.server:7077

spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.kryo.registrator           is.hail.kryo.HailKryoRegistrator
spark.speculation                True

spark.driver.memory              37414m
spark.executor.memory            37414m
spark.executor.instances         1

spark.driver.extraClassPath      /home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar
spark.executor.extraClassPath    /home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail/hail-all-spark.jar
spark.jars                       /home/edu1/miniconda2/envs/Hail-on-spark/lib/python3.7/site-packages/hail/backend/hail-all-spark.jar

spark.eventLog.enabled           true
spark.history.fs.logDirectory    file:/tmp/spark-events
spark.enevtLog.dir               file:/tmp/spark-events

spark.ui.reverseProxy            true
spark.ui.reverseProxyUrl         spark://training.server/spark
spark.executor.extraJavaOptions  -Dlog4j.debug=true

</spark/conf/spark-env.sh>

export SPARK_WORKER_INSTANCES=1

Issue Analytics

State:
Created 2 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

tpoterbacommented, Jul 6, 2021

This is a scala mismatch - this version of Hail was compiled with 2.12, but your spark version uses 2.11. If you use Spark 3.1 with Scala 2.12, it should work fine.

0reactions

johnc1231commented, Aug 16, 2021

I see you posted on discuss. let’s keep this issue closed and debugging can happen there.

Top Results From Across the Web

[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError

Hi, I'm studying Hail and installing Hail on spark. I have plan that run GWAS about 1000 genomes. So, I install and set...

py4j.protocol.Py4JError: org.apache.spark.api.python ...

Solution 2. Using findspark. Install findspark package by running $pip install findspark and add the following lines to your pyspark program.

py4j.protocol.Py4JError: org.apache.spark ... - Stack Overflow

Using findspark is expected to solve the problem: Install findspark. $pip install findspark. In you code use:

Using Spark to Analyze Tabular Data

Learn to use Spark in JupyterLab to analyze UK Biobank tabular data. Apache Spark is a modern, scalable framework ...

Py4J.Protocol.Py4Jnetworkerror: Answer from Java Side Is ...

@hollinwilkins Mleap with pyspark transformers looks like a lot of work for someone coming ... Apache Spark error when running in a cluster:...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[Hail on apache spark] Using pyspark, py4j.protocol.Py4JError

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

[query] Liftover NoSuchElementException

[Bug] comments in python requirements.txt breaks installation process as the pip install takes line by line