PySpark setting for launch.json has bad default values
See original GitHub issueEnvironment data
- VS Code version: 1.27.2
- Extension version (available under the Extensions sidebar): 2018.8.0
- OS and version: Mac OS High Sierra 10.13.6
- Python version (& distribution if applicable, e.g. Anaconda): Python 3.7
- Type of virtual environment used (N/A | venv | virtualenv | conda | …): direnv
- Relevant/affected Python packages and their versions: XXX
Actual behavior
I followed the steps on this page to setup IntelliSense for pyspark
. I have a standalone spark downloaded and I wanted to try and integrate it with VSCode. I have already set up my SPARK_HOME
and I ran the command “Update Workspace PySpark Libraries”.
However I was still not able to get IntelliSense running as expected for PySpark. Turns out what happened was that the environment variable referencing was incorrect:
Expected behavior
The reference to SPARK_HOME
should be ${env:SPARK_HOME}
instead of ${env.SPARK_HOME}
as specified in VSCode documentation
Steps to reproduce:
- Download a standalone Spark binary from https://spark.apache.org/downloads.html
- Set
SPARK_HOME
as an environment variable in your shell (e.g bash_profile etc.) - Open VSCode in a python workspace
- Run “Update Workspace PySpark Libraries” from Cmd+P
- Create a .py file and add in
from pyspark import SparkContext
- Autocompletion fails to find information about
SparkContext
I know that I could have just installed pyspark
using pip install
, which does work as expected, but I was just trying out this setup which took me quite a while to realize the mistake
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (2 by maintainers)
Top Results From Across the Web
pyspark json read to tag bad records - Stack Overflow
I have this input file and want to specify schema . It works when data is in the expected format as per schema....
Read more >Configuration - Spark 3.3.1 Documentation - Apache Spark
This is a useful place to check to make sure that your properties have been set correctly. Note that only values explicitly specified...
Read more >Settings Reference for Python - Visual Studio Code
The default value "poetry" assumes the executable is in the current path. The Python extension uses this setting to install packages when Poetry...
Read more >Notes about json schema handling in Spark SQL - Medium
Sometimes your data will start arriving with new fields or even ... in Spark 3.0 where only the score column will have null...
Read more >Dealing with null in Spark - MungingData
You won't be able to set nullable to false for all columns in a DataFrame and pretend like null values don't exist. For...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@btruhand PRs are always welcome, but realize you are actually looking at Don’s personal copy of the extension code and not the code in this repo. 😄
If there is a better way to do it in the team’s opinion then yeah probably documentation update.
Else I was thinking something needs to be changed so that
${env.SPARK_HOME}
is made to be${env:SPARK_HOME}
instead