[BUG]: UDFs do not work on MacOS
See original GitHub issueI have a very simple example where UDF’s do not seem to work properly on MacOS. Here is my set-up:
- Environment variable pointing to Spark:
➜ env | grep DOTNET_WORKER_DIR
DOTNET_WORKER_DIR=/Users/cosmin/lib/Microsoft.Spark.Worker/
- Contents of the folder the environment variable is pointing to:
➜ ls /Users/cosmin/lib/Microsoft.Spark.Worker/
Microsoft.Spark.Worker-0.4.0\Apache.Arrow.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.CSharp.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.deps.json
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.pdb
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.runtimeconfig.json
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.Worker.xml
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.dll
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.pdb
Microsoft.Spark.Worker-0.4.0\Microsoft.Spark.xml
Microsoft.Spark.Worker-0.4.0\Microsoft.VisualBasic.dll
...............
Microsoft.Spark.Worker-0.4.0\libhostpolicy.dylib
Microsoft.Spark.Worker-0.4.0\libmscordaccore.dylib
Microsoft.Spark.Worker-0.4.0\libmscordbi.dylib
Microsoft.Spark.Worker-0.4.0\libsos.dylib
Microsoft.Spark.Worker-0.4.0\mscorlib.dll
Microsoft.Spark.Worker-0.4.0\netstandard.dll
Microsoft.Spark.Worker-0.4.0\sosdocsunix.txt
- Program listing
using Microsoft.Spark.Sql;
namespace HelloSpark
{
class Program
{
static void Main(string[] args)
{
var spark = SparkSession.Builder().GetOrCreate();
spark.Udf().Register<string, bool>("MyUDF", (text) => text == "Andy");
spark.Read().Json("people.json").CreateOrReplaceTempView("people");
spark.Sql("SELECT name, MyUDF(name) FROM people").Show();
}
}
}
- Project file:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp2.1</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Microsoft.Spark" Version="0.4.0" />
</ItemGroup>
</Project>
I run the compiled binary with the following command:
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local microsoft-spark-2.4.x-0.4.0.jar dotnet HelloSpark.dll
I get the following error:
[2019-08-13T11:31:38.6919900Z] [osX] [Error] [JvmBridge] org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): java.io.IOException: Cannot run program "/Users/cosmin/lib/Microsoft.Spark.Worker/Microsoft.Spark.Worker": error=2, No such file or directory
Modifying query in Program.cs
file from SELECT name, MyUDF(name) FROM people
to SELECT name FROM people
results in a successful execution.
I suspect the problem is around how I set the environment variable or something about the worker, but I just cannot figure out what.
Help is appreciated.
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (8 by maintainers)
Top Results From Across the Web
Why are xlwings UDF not compatible with OSX?
1 Answer 1 ... On Windows, UDFs work via COM server. That's a Windows specific technology that the pywin32 package supports, i.e. the...
Read more >Problems with DiskUtil and udf formatted DVDs
Problems with DiskUtil and udf formatted DVDs. I don't know if this is a bug or something strange with my installation. After I...
Read more >Excel UDF not working: problems and solutions
In this article, we'll take a look at the issues that you may face when using custom functions in your workbooks. I will...
Read more >Excel for Mac Office 365 UDF
I have created a UDF in VBA and only shows up if I use the = sign. It does not show up in...
Read more >User Defined Functions (UDFs) - xlwings Documentation
It tells xlwings for which functions it should create a VBA wrapper function, otherwise it has no effect on how the functions behave...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@cosmincatalin we updated the worker binaries with #209 and they are available here: https://github.com/dotnet/spark/releases/download/v0.4.0/Microsoft.Spark.Worker.netcoreapp2.1.osx-x64-0.4.0.zip
Thanks @suhsteve for quickly fixing this and @GoEddie for helping out!
Ok, so workaround until this issue is fixed is to do mass rename using procedure here and remove the
Microsoft.Spark.Worker-0.4.0\
part (including the back slash). Point the environment variable to the folder the files are in (DOTNET_WORKER_DIR
). It will work after this.