question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Spark UI not accessible

See original GitHub issue

Hi everybody,

as @ryanlovett asked me I opened this issue here, related to jupyterhub/zero-to-jupyterhub-k8s#1030. The Problem is as following:

After starting PySpark I am not able to access the Spark UI, resulting in a Jupyterhub 404 error. Here are (hopefully) all the relevant Information:

  1. I create a new user image from the from the jupyter/pyspark image
  2. The Dockerfile for this image contains:
FROM jupyter/pyspark-notebook:5b2160dfd919
RUN pip install nbserverproxy
RUN jupyter serverextension enable --py nbserverproxy
USER root
RUN echo “$NB_USER ALL=(ALL) NOPASSWD:ALL” > /etc/sudoers.d/notebook
USER $NB_USER
  1. I create the SparkContext() in the pod, created with respective image which gives me the link to the UI.
  2. The SparkContext() is created with the following config:
conf.setMaster('k8s://https://'+ os.environ['KUBERNETES_SERVICE_HOST'] +':443')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '10.16.205.42 ')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
  1. The link created by Spark is obviously not accessible on the hub as it points to <POD_IP>:4040
  2. I try to access the UI via .../username/proxy/4040 and .../username/proxy/4040/ both don’t work and lead to a Jupyterhub 404.
  3. Other ports are accessible via this method so I assume nbserverextension is working correctly.
  4. This is the output of npnetstat -pl:
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:52211         0.0.0.0:*               LISTEN      23/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39388         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      23/python
tcp6       0      0 jupyter-hagen:43878     [::]:*                  LISTEN      45/java
tcp6       0      0 [::]:4040               [::]:*                  LISTEN      45/java
tcp6       0      0 localhost:32816         [::]:*                  LISTEN      45/java
tcp6       0      0 jupyter-hagen:41793     [::]:*                  LISTEN      45/java

One can see that the java processes have another format due to tcp6

  1. To check if this is the error I set the environment variable '_JAVA_OPTIONS' set to "-Djava.net.preferIPv4Stack=true" .

  2. This results in the following output but does not resolve the problem:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:4040            0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:34990         0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:36079         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:35119     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:42195     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34836         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      456/python
  1. I checked, whether the UI is generally accessible by running a local version of the user image on my PC and forwarding the port. That works fine!

My user image is available on docker hub at idalab/spark-user:1.0.2 so this should be easy to inject for debugging if neccessary.

Thank you for your help.

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:37 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
ransoor2commented, Mar 17, 2019

Thanks for the documentation. I did as said above and it all works fine except the Executors tab in Spark UI. It seems that the proxy replaces the [app-id] with the port instead of the actual app-id.

From: https://spark.apache.org/docs/latest/monitoring.html /applications/[app-id]/allexecutors | A list of all(active and dead) executors for the given application.

Capture

2reactions
1ambdacommented, Jun 12, 2021

The problem that allexecutors endpoint returns 404 can be fixed by modifying core/src/main/resources/org/apache/spark/ui/static/utils.js. For example, Our hub url include jupyter in URL.

function getStandAloneAppId(cb) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (document.baseURI.indexOf("jupyter") > 0) { ind = 0 }   //  newly added line
 
 
function createRESTEndPointForExecutorsPage(appId) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (document.baseuri.indexof("jupyter") > 0) { ind = 0 }   //  newly added line
 

function createTemplateURI(appId, templateName) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (document.baseuri.indexof("jupyter") > 0) { ind = 0 }   //  newly added line

But basically this problem can be fixed simply if jupyter-proxy-server extension supports modification of proxy/ URL infix. Since spark javascript functions in UI tries to handle proxy string in URL as you can see the code above.

Is it possible to modify proxy string infix in URL for jupyter-server-proxy extension? (e.g, by setting some options…) I searched the code of this repository, but could find any hardcoded proxy string. The proxy string might come from jupyter-server extensions or somewhere outside of this repository 😦

Read more comments on GitHub >

github_iconTop Results From Across the Web

How to access Spark Web UI? - Stack Overflow
As long as the Spark application is up and running, you can access the web UI at http://10.0.2.15:4040. INFO SparkUI: Stopped Spark web...
Read more >
How to access the Spark UI without setting up a Spark History ...
Accessing the Spark UI of terminated applications using open-sourced Delight (the easy way) · Create a free account, then go under settings to ......
Read more >
Spark Web UI - Understanding Spark Execution
If you are running the Spark application locally, Spark UI can be accessed using the http://localhost:4040/ . Spark UI by default runs on...
Read more >
Accessing the Web UI of a Running Spark Application
You can access the web UI of a running Spark application from a web browser. To access the web application UI of a...
Read more >
Spark U/I is not loading for recently "terminated" job clustsers
However, I think they need to refocus their efforts on the root cause that is preventing the "Spark UI" from working reliably in...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found