Debugging pyspark applications no longer works after August update
See original GitHub issueEnvironment data
- VS Code version: 1.27
- Extension version (available under the Extensions sidebar): 2018.8.0
- OS and version: Linux Mint 18.1 x64
- Python version (& distribution if applicable, e.g. Anaconda): 2.7.12
- Type of virtual environment used (N/A | venv | virtualenv | conda | …): N/A
- Relevant/affected Python packages and their versions: Pyspark 2.21
Actual behavior
Debugging pyspark application doesn’t work after updating to 2018.8.0. After starting the debugger, the terminal shows the following command and error
cd /home/user/etl ; env "PYSPARK_PYTHON=python" "PYTHONPATH=/home/user/etl:/home/user/.vscode/extensions/ms-python.python-2018.8.0/pythonFiles/experimental/ptvsd" "PYTHONIOENCODING=UTF-8" "PYTHONUNBUFFERED=1" /home/user/Spark/spark-2.2.1-bin-hadoop2.7/bin/spark-submit -m ptvsd --host localhost --port 39763 /home/user/etl/etl/jobs/process_data.py
Error: Unrecognized option: -m
Usage: spark-submit [options] <app jar | python file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Expected behavior
With version 2018.7.0, pyspark debugging works fine. The following command is displayed in the terminal after starting the debugger
cd /home/user/etl ; env "PYSPARK_PYTHON=python" "PYTHONPATH=/home/user/etl" "PYTHONIOENCODING=UTF-8" "PYTHONUNBUFFERED=1" /home/user/Spark/spark-2.2.1-bin-hadoop2.7/bin/spark-submit /home/user/.vscode/extensions/ms-python.python-2018.7.0/pythonFiles/PythonTools/visualstudio_py_launcher.py /home/user/etl 46508 34806ad9-833a-4524-8cd6-18ca4aa74f14 RedirectOutput,RedirectOutput /home/user/etl/etl/jobs/process_data.py
Issue Analytics
- State:
- Created 5 years ago
- Comments:20 (3 by maintainers)
Top Results From Across the Web
Debugging PySpark - Apache Spark
This page focuses on debugging Python side of PySpark on both driver and executor sides instead of focusing on debugging with JVM.
Read more >How can PySpark be called in debug mode? - Stack Overflow
First of all you should add a configuration for remote debugger: alt + shift + a and choose Edit Configurations or Run ->...
Read more >How to call the Debug Mode in PySpark | Edureka Community
When spark -submit calls myFirstSparkScript.py, the debug mode is not getting started instead it executes as normal. Editing the Apache Spark ...
Read more >Solving 5 Mysterious Spark Errors | by yhoztak - Medium
It's powerful and great(This post explains how great it is), but it's sometime hard to debug when there's issue. How come it's hard...
Read more >PySpark debugging — 6 common issues | by Maria Karanasou
Debugging a spark application can range from a fun to a very (and I ... If you want to know a bit about...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ll have a fix today.
I ran into this issue today. I can confirm that the fix worked for me. Thanks