question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG]: UDFs do not work on MacOS

See original GitHub issue

I have a very simple example where UDF’s do not seem to work properly on MacOS. Here is my set-up:

  • Environment variable pointing to Spark:
➜  env | grep DOTNET_WORKER_DIR
DOTNET_WORKER_DIR=/Users/cosmin/lib/Microsoft.Spark.Worker/
  • Contents of the folder the environment variable is pointing to:
➜  ls /Users/cosmin/lib/Microsoft.Spark.Worker/
Microsoft.Spark.Worker-0.4.0\Apache.Arrow.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.CSharp.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.deps.json
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.pdb
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.runtimeconfig.json
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.xml
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.pdb
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.xml
Microsoft.Spark.Worker-0.4.0\Microsoft.VisualBasic.dll
...............
Microsoft.Spark.Worker-0.4.0\libhostpolicy.dylib
Microsoft.Spark.Worker-0.4.0\libmscordaccore.dylib
Microsoft.Spark.Worker-0.4.0\libmscordbi.dylib
Microsoft.Spark.Worker-0.4.0\libsos.dylib
Microsoft.Spark.Worker-0.4.0\mscorlib.dll
Microsoft.Spark.Worker-0.4.0\netstandard.dll
Microsoft.Spark.Worker-0.4.0\sosdocsunix.txt
  • Program listing
using Microsoft.Spark.Sql;

namespace HelloSpark
{
    class Program
    {
        static void Main(string[] args)
        {
            var spark = SparkSession.Builder().GetOrCreate();
	    spark.Udf().Register<string, bool>("MyUDF", (text) => text == "Andy");
            spark.Read().Json("people.json").CreateOrReplaceTempView("people");
	    spark.Sql("SELECT name, MyUDF(name) FROM people").Show();
        }
    }
}
  • Project file:
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp2.1</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.Spark" Version="0.4.0" />
  </ItemGroup>

</Project>

I run the compiled binary with the following command: spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.4.x-0.4.0.jar dotnet HelloSpark.dll

I get the following error:

[2019-08-13T11:31:38.6919900Z] [osX] [Error] [JvmBridge] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.io.IOException: Cannot run program "/Users/cosmin/lib/Microsoft.Spark.Worker/Microsoft.Spark.Worker": error=2, No such file or directory

Modifying query in Program.cs file from SELECT name, MyUDF(name) FROM people to SELECT name FROM people results in a successful execution.

I suspect the problem is around how I set the environment variable or something about the worker, but I just cannot figure out what.

Help is appreciated.

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:12 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
imback82commented, Aug 15, 2019

@cosmincatalin we updated the worker binaries with #209 and they are available here: https://github.com/dotnet/spark/releases/download/v0.4.0/Microsoft.Spark.Worker.netcoreapp2.1.osx-x64-0.4.0.zip

Thanks @suhsteve for quickly fixing this and @GoEddie for helping out!

0reactions
cosmincatalincommented, Aug 14, 2019

Ok, so workaround until this issue is fixed is to do mass rename using procedure here and remove the Microsoft.Spark.Worker-0.4.0\ part (including the back slash). Point the environment variable to the folder the files are in (DOTNET_WORKER_DIR). It will work after this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why are xlwings UDF not compatible with OSX?
1 Answer 1 ... On Windows, UDFs work via COM server. That's a Windows specific technology that the pywin32 package supports, i.e. the...
Read more >
Problems with DiskUtil and udf formatted DVDs
Problems with DiskUtil and udf formatted DVDs. I don't know if this is a bug or something strange with my installation. After I...
Read more >
Excel UDF not working: problems and solutions
In this article, we'll take a look at the issues that you may face when using custom functions in your workbooks. I will...
Read more >
Excel for Mac Office 365 UDF
I have created a UDF in VBA and only shows up if I use the = sign. It does not show up in...
Read more >
User Defined Functions (UDFs) - xlwings Documentation
It tells xlwings for which functions it should create a VBA wrapper function, otherwise it has no effect on how the functions behave...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found