Spark UI not accessible
See original GitHub issueHi everybody,
as @ryanlovett asked me I opened this issue here, related to jupyterhub/zero-to-jupyterhub-k8s#1030. The Problem is as following:
After starting PySpark I am not able to access the Spark UI, resulting in a Jupyterhub 404 error. Here are (hopefully) all the relevant Information:
- I create a new user image from the from the jupyter/pyspark image
- The Dockerfile for this image contains:
FROM jupyter/pyspark-notebook:5b2160dfd919
RUN pip install nbserverproxy
RUN jupyter serverextension enable --py nbserverproxy
USER root
RUN echo “$NB_USER ALL=(ALL) NOPASSWD:ALL” > /etc/sudoers.d/notebook
USER $NB_USER
- I create the
SparkContext()
in the pod, created with respective image which gives me the link to the UI. - The
SparkContext()
is created with the following config:
conf.setMaster('k8s://https://'+ os.environ['KUBERNETES_SERVICE_HOST'] +':443')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '10.16.205.42 ')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
- The link created by Spark is obviously not accessible on the hub as it points to
<POD_IP>:4040
- I try to access the UI via
.../username/proxy/4040
and.../username/proxy/4040/
both don’t work and lead to a Jupyterhub 404. - Other ports are accessible via this method so I assume nbserverextension is working correctly.
- This is the output of
npnetstat -pl
:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:54695 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:33896 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:34577 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:52211 0.0.0.0:* LISTEN 23/python
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN 7/python
tcp 0 0 localhost:39388 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:39971 0.0.0.0:* LISTEN 23/python
tcp 0 0 localhost:32867 0.0.0.0:* LISTEN 23/python
tcp6 0 0 jupyter-hagen:43878 [::]:* LISTEN 45/java
tcp6 0 0 [::]:4040 [::]:* LISTEN 45/java
tcp6 0 0 localhost:32816 [::]:* LISTEN 45/java
tcp6 0 0 jupyter-hagen:41793 [::]:* LISTEN 45/java
One can see that the java processes have another format due to tcp6
-
To check if this is the error I set the environment variable
'_JAVA_OPTIONS'
set to"-Djava.net.preferIPv4Stack=true"
. -
This results in the following output but does not resolve the problem:
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 localhost:54695 0.0.0.0:* LISTEN 456/python
tcp 0 0 0.0.0.0:4040 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:33896 0.0.0.0:* LISTEN 456/python
tcp 0 0 localhost:34990 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:36079 0.0.0.0:* LISTEN 456/python
tcp 0 0 jupyter-hagen:35119 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:34577 0.0.0.0:* LISTEN 456/python
tcp 0 0 jupyter-hagen:42195 0.0.0.0:* LISTEN 475/java
tcp 0 0 localhost:34836 0.0.0.0:* LISTEN 456/python
tcp 0 0 0.0.0.0:8888 0.0.0.0:* LISTEN 7/python
tcp 0 0 localhost:39971 0.0.0.0:* LISTEN 456/python
tcp 0 0 localhost:32867 0.0.0.0:* LISTEN 456/python
- I checked, whether the UI is generally accessible by running a local version of the user image on my PC and forwarding the port. That works fine!
My user image is available on docker hub at idalab/spark-user:1.0.2
so this should be easy to inject for debugging if neccessary.
Thank you for your help.
Issue Analytics
- State:
- Created 5 years ago
- Comments:37 (1 by maintainers)
Top Results From Across the Web
How to access Spark Web UI? - Stack Overflow
As long as the Spark application is up and running, you can access the web UI at http://10.0.2.15:4040. INFO SparkUI: Stopped Spark web...
Read more >How to access the Spark UI without setting up a Spark History ...
Accessing the Spark UI of terminated applications using open-sourced Delight (the easy way) · Create a free account, then go under settings to ......
Read more >Spark Web UI - Understanding Spark Execution
If you are running the Spark application locally, Spark UI can be accessed using the http://localhost:4040/ . Spark UI by default runs on...
Read more >Accessing the Web UI of a Running Spark Application
You can access the web UI of a running Spark application from a web browser. To access the web application UI of a...
Read more >Spark U/I is not loading for recently "terminated" job clustsers
However, I think they need to refocus their efforts on the root cause that is preventing the "Spark UI" from working reliably in...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks for the documentation. I did as said above and it all works fine except the Executors tab in Spark UI. It seems that the proxy replaces the [app-id] with the port instead of the actual app-id.
From: https://spark.apache.org/docs/latest/monitoring.html /applications/[app-id]/allexecutors | A list of all(active and dead) executors for the given application.
The problem that allexecutors endpoint returns 404 can be fixed by modifying
core/src/main/resources/org/apache/spark/ui/static/utils.js
. For example, Our hub url includejupyter
in URL.But basically this problem can be fixed simply if jupyter-proxy-server extension supports modification of
proxy/
URL infix. Since spark javascript functions in UI tries to handleproxy
string in URL as you can see the code above.Is it possible to modify
proxy
string infix in URL for jupyter-server-proxy extension? (e.g, by setting some options…) I searched the code of this repository, but could find any hardcodedproxy
string. Theproxy
string might come fromjupyter-server
extensions or somewhere outside of this repository 😦