[BUG]: Install-worker.sh for HDInsight is out of date.
See original GitHub issueDocumentation/Readme Change Suggestion Install-Worker.sh shell script in the deploy folder is outdated and doesn’t work at all for HD Insight Spark 2.4. This one works better just needs aws git support etc. I updated it for .NET Core 3.1.
Section 2 of the deployment readme should be completely updated to use this script. Hopefully, this helps. I copied it from here
#!/bin/bash
##############################################################################
# Description:
# This is a helper script to install the worker binaries on your Apache Spark cluster
#
##############################################################################
set +e
# Uncomment if you want full tracing (for debugging purposes)
#set -o xtrace
# Install SparkDotNet
SPARK_DOTNET_VERSION=$1
# Check if parameter exists, otherwise error out
[ -z "$SPARK_DOTNET_VERSION" ] && { echo "Error: Sparkdotnet version parameter is missing..."; exit 1; }
sudo dpkg --purge --force-all packages-microsoft-prod
sudo wget -q https://packages.microsoft.com/config/ubuntu/`lsb_release -rs`/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo add-apt-repository universe
sudo apt-get -yq install apt-transport-https
sudo apt-get -yq update
sudo apt-get -yq install dotnet-sdk-3.1
sudo dotnet tool uninstall dotnet-try --tool-path /usr/share/dotnet-tools || true
sudo dotnet tool install dotnet-try --add-source https://dotnet.myget.org/F/dotnet-try/api/v3/index.json --tool-path /usr/share/dotnet-tools --version 1.0.19473.13
# copy .NET for Apache Spark jar to SPARK's jar folder
sudo mkdir -p /tmp/temp_jar
sudo wget "https://www.nuget.org/api/v2/package/Microsoft.Spark/${SPARK_DOTNET_VERSION}" -O /tmp/temp_jar/"microsoft.spark.${SPARK_DOTNET_VERSION}.nupkg"
sudo unzip -o /tmp/temp_jar/"microsoft.spark.${SPARK_DOTNET_VERSION}.nupkg" -d /tmp/temp_jar
sudo install --verbose --mode 644 /tmp/temp_jar/jars/"microsoft-spark-2.4.x-${SPARK_DOTNET_VERSION}.jar" "/usr/hdp/current/spark2-client/jars/microsoft-spark-2.4.x-${SPARK_DOTNET_VERSION}.jar"
# cleanup unneeded packages
sudo apt-get autoremove -yq
# Remove the prod deb file and temporary jar file.
sudo rm packages-microsoft-prod.deb
# Install Microsoft.Spark.Worker
# Path where packaged worker file (tgz) exists.
SRC_WORKER_PATH_OR_URI="https://github.com/dotnet/spark/releases/download/v${SPARK_DOTNET_VERSION}/Microsoft.Spark.Worker.netcoreapp3.1.linux-x64-${SPARK_DOTNET_VERSION}.tar.gz"
# The path on the executor nodes where Microsoft.Spark.Worker executable is installed.
WORKER_INSTALLATION_PATH=/usr/local/bin
# The path where all the dependent libraies are installed so that it doesn't
# pollute the $WORKER_INSTALLATION_PATH.
SPARKDOTNET_ROOT=$WORKER_INSTALLATION_PATH/spark-dotnet
# Temporary worker file.
TEMP_WORKER_FILENAME=/tmp/temp_worker.tgz
# Extract version
IFS='-' read -ra BASE_FILENAME <<< "$(basename $SRC_WORKER_PATH_OR_URI .tar.gz)"
VERSION=${BASE_FILENAME[2]}
IFS='.' read -ra VERSION_CHECK <<< "$VERSION"
[[ ${#VERSION_CHECK[@]} == 3 ]] || { echo >&2 "Version check does not satisfy. Raise an issue here: https://github.com/dotnet/spark"; exit 1; }
# Path of the final destination for the worker binaries
# (the one we just downloaded and extracted)
DEST_WORKER_PATH=$SPARKDOTNET_ROOT/Microsoft.Spark.Worker-$VERSION
DEST_WORKER_BINARY=$DEST_WORKER_PATH/Microsoft.Spark.Worker
# Clean up any existing files.
sudo rm -f $WORKER_INSTALLATION_PATH/Microsoft.Spark.Worker
sudo rm -rf $SPARKDOTNET_ROOT
# Copy the worker file to a local temporary file.
sudo wget $SRC_WORKER_PATH_OR_URI -O $TEMP_WORKER_FILENAME
# Untar the file.
sudo mkdir -p $SPARKDOTNET_ROOT
sudo tar xzf $TEMP_WORKER_FILENAME -C $SPARKDOTNET_ROOT
# Make the file executable since dotnet doesn't set this correctly.
sudo chmod 755 $DEST_WORKER_BINARY
# Create a symlink.
sudo ln -sf $DEST_WORKER_BINARY $WORKER_INSTALLATION_PATH/Microsoft.Spark.Worker
# Remove the temporary nuget and worker file.
sudo rm $TEMP_WORKER_FILENAME
sudo rm -rf /tmp/temp_jar
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (2 by maintainers)
Top Results From Across the Web
Troubleshoot a slow or failing job on Azure HDInsight cluster
A helpful diagnostic is to try to reproduce the error state on another cluster. Step 1: Gather data about the issue. Step 2:...
Read more >Archived release notes for Azure HDInsight
Summary; Release date: May 08, 2023; Coming soon; Release date: February 28, ... We have started rolling out a new version of HDInsight...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@Niharikadutta Can you help fixing this?
Ok, it may be related to this other issue as I should’ve clarified I was using vector UDFs and did have to specify --files flags when testing to deploy the UDF
On Sun, Apr 26, 2020 at 8:51 PM Niharika Dutta notifications@github.com wrote:
–
*Marcus D. Cobb | *CEO | marcus@jammber.com | 844-JAMMBER (526-6237)
www.jammber.com www.linkedin.com/in/marcuscobb
This email is intended solely for the individual or entity to which it is addressed. If you are not the intended recipient of this email, you should know that any dissemination, distribution, copying, or action taken in relation to this email’s contents or attachments is prohibited and may be unlawful. If you received this email in error, please notify the sender immediately and delete all electronic and hard copies of both the e-mail and its attachments. Thank you.