Spark can't find DLL's specified
See original GitHub issueI am new to DOTNET with spark and facing some issues with passing DLLs. Basically, I have some DLL files (from another c# project) which I want to reuse here in my Spark project UDF.
Error: [Warn] [AssemblyLoader] Assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0' file not found 'Classes[.dll,.ni.dll]' in '/tmp/spark-e2e6444a-99fc-42c6-ae15-8a5b328e3038/userFiles-aafb5491-4485-46d9-8e17-0849aed7c57a,/home/ubuntu/project/mySparkApp/bin/Debug/net5.0,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-13T11:16:15.1691280Z] [ubuntu-Vostro] [Error] [TaskRunner] [1] ProcessStream() failed with exception: System.IO.FileNotFoundException: Could not load file or assembly 'Classes, Version=3.0.142.0, Culture=neutral, PublicKeyToken=910ab64095116ac0'. The system cannot find the file specified.
Here I have copied Classes.dll (an external DLL) file in my home/ubuntu/project/mySparkApp
. Initially, I was facing the same error with mySparkApp.dll and I resolved that with copying in my current directory and that woked. But in case of this third party DLL, it failed to find.
Here is my .csproj file where I have mentioned the Classes.dll: ` <Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>net5.0</TargetFramework> </PropertyGroup> <ItemGroup> <PackageReference Include="Microsoft.Spark" Version="1.0.0" /> </ItemGroup> <ItemGroup> <Reference Include="Classes">
<HintPath>/home/incs83/project/mySparkApp/Classes.dll</HintPath>
</Reference>
<Reference Include="CSharpZip">
<HintPath>/home/incs83/project/mySparkApp/CSharpZip.dll</HintPath>
</Reference>
</ItemGroup>
</Project>
`
Here is spark-submit:
spark-submit --class org.apache.spark.deploy.dotnet.DotnetRunner --master local bin/Debug/net5.0/microsoft-spark-3-0_2.12-1.0.0.jar dotnet bin/Debug/net5.0/mySparkApp.dll
I have spend a lot of time digging into this, still no luck.
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:15 (4 by maintainers)
Top GitHub Comments
Spark.NET will look at your custom DLL using this environment variable; DOTNET_ASSEMBLY_SEARCH_PATHS So, just before spark-submit, you can set the environment variable targeting your dll folder:
You can also copy these DLLs to Microsoft.Spark.Worker installation folder. (This what is perform on Databricks environment)
@suhsteve I have used sources directly to get rid off the Classes.dll. But I am deserializing some data in UDF using
System.Runtime.Serialization.Formatters
BinaryFormatter and MemoryStream. But it is giving me below error:[Warn] [AssemblyLoader] Assembly 'System.Runtime.Serialization.Formatters.resources, Version=4.0.4.0, Culture=en-IN, PublicKeyToken=b03f5f7f11d50a3a' file not found 'System.Runtime.Serialization.Formatters.resources[.dll,.ni.dll]' in '/tmp/spark-024dfc93-f0fc-4c04-8737-ba0dbc8370bf/userFiles-599198e1-61d3-43f7-b810-c6d5376c2d65,/home/incs83/project/rs-etl-test/bin/Debug/netcoreapp3.1,/opt/Microsoft.Spark.Worker-1.0.0/' [2021-04-20T06:51:51.5112399Z] [incs83-Vostro-3490] [Warn] [AssemblyLoader] Assembly 'System.Runtime.Serialization.Formatters.resources, Version=4.0.4.0, Culture=en, PublicKeyToken=b03f5f7f11d50a3a' file not found 'System.Runtime.Serialization.Formatters.resources[.dll,.ni.dll]' in '/tmp/spark-024dfc93-f0fc-4c04-8737-ba0dbc8370bf/userFiles-599198e1-61d3-43f7-b810-c6d5376c2d65,/home/incs83/project/rs-etl-test/bin/Debug/netcoreapp3.1,/opt/Microsoft.Spark.Worker-1.0.0/'
@clegendre Got to know this is obsolete in .NET5, so I am using .NETCore 3.1 but facing this issue.
Please help!!