question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]: Install-worker.sh for HDInsight is out of date.

See original GitHub issue

Documentation/Readme Change Suggestion Install-Worker.sh shell script in the deploy folder is outdated and doesn’t work at all for HD Insight Spark 2.4. This one works better just needs aws git support etc. I updated it for .NET Core 3.1.

Section 2 of the deployment readme should be completely updated to use this script. Hopefully, this helps. I copied it from here

#!/bin/bash

##############################################################################
# Description:
# This is a helper script to install the worker binaries on your Apache Spark cluster
#
##############################################################################

set +e

# Uncomment if you want full tracing (for debugging purposes)
#set -o xtrace

# Install SparkDotNet
SPARK_DOTNET_VERSION=$1
# Check if parameter exists, otherwise error out
[ -z "$SPARK_DOTNET_VERSION" ] && { echo "Error: Sparkdotnet version parameter is missing..."; exit 1; }
    
sudo dpkg --purge --force-all packages-microsoft-prod
sudo wget -q https://packages.microsoft.com/config/ubuntu/`lsb_release -rs`/packages-microsoft-prod.deb -O packages-microsoft-prod.deb
sudo dpkg -i packages-microsoft-prod.deb
sudo add-apt-repository universe
sudo apt-get -yq install apt-transport-https
sudo apt-get -yq update
sudo apt-get -yq install dotnet-sdk-3.1

sudo dotnet tool uninstall dotnet-try --tool-path /usr/share/dotnet-tools || true
sudo dotnet tool install dotnet-try --add-source https://dotnet.myget.org/F/dotnet-try/api/v3/index.json --tool-path /usr/share/dotnet-tools --version 1.0.19473.13

# copy .NET for Apache Spark jar to SPARK's jar folder
sudo mkdir -p /tmp/temp_jar
sudo wget "https://www.nuget.org/api/v2/package/Microsoft.Spark/${SPARK_DOTNET_VERSION}" -O /tmp/temp_jar/"microsoft.spark.${SPARK_DOTNET_VERSION}.nupkg"
sudo unzip -o /tmp/temp_jar/"microsoft.spark.${SPARK_DOTNET_VERSION}.nupkg" -d /tmp/temp_jar
sudo install --verbose --mode 644 /tmp/temp_jar/jars/"microsoft-spark-2.4.x-${SPARK_DOTNET_VERSION}.jar" "/usr/hdp/current/spark2-client/jars/microsoft-spark-2.4.x-${SPARK_DOTNET_VERSION}.jar"

# cleanup unneeded packages
sudo apt-get autoremove -yq

# Remove the prod deb file and temporary jar file.
sudo rm packages-microsoft-prod.deb

# Install Microsoft.Spark.Worker
# Path where packaged worker file (tgz) exists.
SRC_WORKER_PATH_OR_URI="https://github.com/dotnet/spark/releases/download/v${SPARK_DOTNET_VERSION}/Microsoft.Spark.Worker.netcoreapp3.1.linux-x64-${SPARK_DOTNET_VERSION}.tar.gz"

# The path on the executor nodes where Microsoft.Spark.Worker executable is installed.
WORKER_INSTALLATION_PATH=/usr/local/bin

# The path where all the dependent libraies are installed so that it doesn't
# pollute the $WORKER_INSTALLATION_PATH.
SPARKDOTNET_ROOT=$WORKER_INSTALLATION_PATH/spark-dotnet

# Temporary worker file.
TEMP_WORKER_FILENAME=/tmp/temp_worker.tgz

# Extract version
IFS='-' read -ra BASE_FILENAME <<< "$(basename $SRC_WORKER_PATH_OR_URI .tar.gz)"
VERSION=${BASE_FILENAME[2]}

IFS='.' read -ra VERSION_CHECK <<< "$VERSION"
[[ ${#VERSION_CHECK[@]} == 3 ]] || { echo >&2 "Version check does not satisfy. Raise an issue here: https://github.com/dotnet/spark"; exit 1; }

# Path of the final destination for the worker binaries
# (the one we just downloaded and extracted)
DEST_WORKER_PATH=$SPARKDOTNET_ROOT/Microsoft.Spark.Worker-$VERSION
DEST_WORKER_BINARY=$DEST_WORKER_PATH/Microsoft.Spark.Worker

# Clean up any existing files.
sudo rm -f $WORKER_INSTALLATION_PATH/Microsoft.Spark.Worker
sudo rm -rf $SPARKDOTNET_ROOT

# Copy the worker file to a local temporary file.
sudo wget $SRC_WORKER_PATH_OR_URI -O $TEMP_WORKER_FILENAME

# Untar the file.
sudo mkdir -p $SPARKDOTNET_ROOT
sudo tar xzf $TEMP_WORKER_FILENAME -C $SPARKDOTNET_ROOT

# Make the file executable since dotnet doesn't set this correctly.
sudo chmod 755 $DEST_WORKER_BINARY

# Create a symlink.
sudo ln -sf $DEST_WORKER_BINARY $WORKER_INSTALLATION_PATH/Microsoft.Spark.Worker

# Remove the temporary nuget and worker file.
sudo rm $TEMP_WORKER_FILENAME
sudo rm -rf /tmp/temp_jar

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
imback82commented, Apr 2, 2020

@Niharikadutta Can you help fixing this?

0reactions
jammmancommented, Apr 28, 2020

Ok, it may be related to this other issue as I should’ve clarified I was using vector UDFs and did have to specify --files flags when testing to deploy the UDF

On Sun, Apr 26, 2020 at 8:51 PM Niharika Dutta notifications@github.com wrote:

@jammman https://github.com/jammman We are going to close this issue as we have tested the script and it is working as expected in Spark 2.4 HDI 4.0 cluster. Just as an FYI, we do not want to upload the microsoft spark jar to every worker as being done in the script you pasted here as that is not the intended behavior of this script. Thanks for your question and please feel free to open another issue in case you come across any errors.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/spark/issues/470#issuecomment-619666011, or unsubscribe https://github.com/notifications/unsubscribe-auth/APALDTNL2ZHMVVNYTC2OZ23ROTQJNANCNFSM4LZ52TBA .

*Marcus D. Cobb | *CEO | marcus@jammber.com | 844-JAMMBER (526-6237)

www.jammber.com www.linkedin.com/in/marcuscobb

This email is intended solely for the individual or entity to which it is addressed. If you are not the intended recipient of this email, you should know that any dissemination, distribution, copying, or action taken in relation to this email’s contents or attachments is prohibited and may be unlawful. If you received this email in error, please notify the sender immediately and delete all electronic and hard copies of both the e-mail and its attachments. Thank you.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Troubleshoot a slow or failing job on Azure HDInsight cluster
A helpful diagnostic is to try to reproduce the error state on another cluster. Step 1: Gather data about the issue. Step 2:...
Read more >
Archived release notes for Azure HDInsight
Summary; Release date: May 08, 2023; Coming soon; Release date: February 28, ... We have started rolling out a new version of HDInsight...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found