Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

.NET Native Library Packaging (RuntimeIdentifiers, build, testing, VS etc.)

See original GitHub issue

cc: @tannergooding @richlander @jkotas

This is yet another issue regarding how to best author native library nuget packages and define, build, test, publish deploy applications that consume these. I have tried hard to wrap my head about this by reading many issues and studying existing packages. I have a particular need that is similar to TorchSharp with massive native libraries that not only need to be split into fragments but also where if possible it would be best only to “download” the runtime identifier (RID) specific packages needed for local development. (But on windows that local development often means BOTH x86 and x64 in our case).

Below I wrote a walk-through I did of using ClangSharp (in excessive detail for reference) and the many questions that it raised for me compared to how I am used to working with this (based on our own way of authoring native library packages that are explicitly copied to sub-directories (x64, x86) alongside exe and with those directories then added at runtime based on the process arch/os/system to dll directories i.e. via AddDllDirectory. Having something “custom” is a maintenance issue of course, but also an on-boarding issue. Using documented best practices would be best, but as far as I can tell there are none?

In any case, at the end of the walk-through I encounter the problem that when specifying multiple RIDs i.e.

<RuntimeIdentifiers>win-x64;win-x86</RuntimeIdentifiers>

then the runtime.json trick does not appear to work when running unit tests from inside Visual Studio. I have to explicitly add the RID specific nuget packages anyway, so I then wonder how exactly is one supposed to author nuget packages to be able to support running multiple RIDs (in this case solely interested in win-x86 and win-x64 for now) with full support for it as usual in VS and other tools? We need to be able to debug and run from VS?

And how do you switch which RID you run with when F5 running in VS?

Should I simply accept that the runtime.json way is too flawed and explicitly reference all needed nuget packages? Would this then avoid the need to specify RIDs? Which also has issues with “forcing” self-contained (we don’t want that), in fact we’d like to simply be able to deploy/copy-paste build output as something like:

App.exe
win-x64\
  // win-x64 specific native libraries
win-x86\
  // win-x86 specific native libraries

where the app is not RID specific (framework-dependent of course). And this should work on both win-x86/winx64. This is what we have now and what works. Our developers are used to this. But it’s based on native library nuget packages that explicitly copy their native library contents to those folders and of course referencing all those RID specific ones. I had hoped perhaps one could avoid the RID specific referencing, but that does not seem to work “smoothly”. Which I’d guess then means the whole runtime.json is not the way to go.

Secondly, I think I read somewhere (can’t find or remember where) that for .NET 8 it is considered to force a specific RID on build? I can see given my experience below why one might consider doing that, but that would then raise other issues such as losing what used to be a core tenant (IMHO) of .NET which is that a build output (not publish) is RID agnostic. Would that be lost then?

All in all, to solve these issues I have to author my own little tool for packaging the native libraries, consider all the issues around consumption, testing etc. And after going through all this I am still left with feeling rather lost 😅 I still don’t know exactly what is the best solution here. And the packages I am creating are intended to be published for the public, e.g. so I can publish the revived CNTK packages I’ve made on nuget.org for example.

On top of this we still want to support publishing RID specific applications, but then we don’t want native libraries embedded in single file, there is an option for that which is great, but then we want those dlls in sub-folder, not directly next to the exe, which means we have to hack around that in MSBuild and then face issues with mixed-mode assemblies etc. Yes, we also have those which also makes things very interesting.

ML/AI isn’t going away. For each new CUDA or whatever release the native libraries double in size (minimum!). Easy authoring and consumption of those would be great, but I am sure also won’t be solved in the immediate future, I need to know what to do now?

The walk-through will come as the next comment.

`ClangSharp`/`libclang` Walkthrough

Create simple console application in for example a Tester directory.

dotnet new console

Add package reference to ClangSharp package so csproj looks like:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="ClangSharp" Version="16.0.0" />
  </ItemGroup>
  
</Project>

Run dotnet restore -verbosity:detailed > restore.txt on project. Verbosity set to be able to check what happens. Nothing of worth here. Look in .nuget package cache to see what is downloaded:

"C:\Users\<USERNAME>\.nuget\packages\clangsharp"
"C:\Users\<USERNAME>\.nuget\packages\clangsharp.interop"
"C:\Users\<USERNAME>\.nuget\packages\libclang"
"C:\Users\<USERNAME>\.nuget\packages\libclangsharp"

What’s interesting here is no RID specific packages appear to be downloaded (yet). The ClangSharp package has a nuspec file with:

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2013/05/nuspec.xsd">
  <metadata minClientVersion="4.3">
    <id>ClangSharp</id>
    <version>16.0.0</version>
    <authors>.NET Foundation and Contributors</authors>
    <requireLicenseAcceptance>true</requireLicenseAcceptance>
    <license type="expression">MIT</license>
    <licenseUrl>https://licenses.nuget.org/MIT</licenseUrl>
    <projectUrl>https://github.com/dotnet/clangsharp/</projectUrl>
    <description>ClangSharp are strongly-typed safe Clang bindings written in C# for .NET and Mono, tested on Linux and Windows.</description>
    <copyright>Copyright © .NET Foundation and Contributors</copyright>
    <repository type="git" url="https://github.com/dotnet/clangsharp/" commit="1c5588c84a5d22d2ddab41dbf7854667bf722332" />
    <dependencies>
      <group targetFramework="net6.0">
        <dependency id="ClangSharp.Interop" version="16.0.0" exclude="Build,Analyzers" />
      </group>
      <group targetFramework="net7.0">
        <dependency id="ClangSharp.Interop" version="16.0.0" exclude="Build,Analyzers" />
      </group>
      <group targetFramework=".NETStandard2.0">
        <dependency id="ClangSharp.Interop" version="16.0.0" exclude="Build,Analyzers" />
      </group>
    </dependencies>
  </metadata>
</package>

Jumping over the interop package and looking at libClang this nuspec has:

<?xml version="1.0" encoding="utf-8"?>
<package xmlns="http://schemas.microsoft.com/packaging/2013/01/nuspec.xsd">
  <metadata minClientVersion="2.12">
    <id>libclang</id>
    <version>16.0.6</version>
    <authors>.NET Foundation and Contributors</authors>
    <owners>.NET Foundation and Contributors</owners>
    <requireLicenseAcceptance>true</requireLicenseAcceptance>
    <license type="expression">Apache-2.0 WITH LLVM-exception</license>
    <licenseUrl>https://licenses.nuget.org/Apache-2.0%20WITH%20LLVM-exception</licenseUrl>
    <projectUrl>https://github.com/dotnet/clangsharp</projectUrl>
    <description>Multi-platform native library for libclang.</description>
    <copyright>Copyright © LLVM Project</copyright>
    <repository type="git" url="https://github.com/llvm/llvm-project" branch="llvmorg-16.0.6" />
    <dependencies>
      <group targetFramework=".NETStandard2.0" />
    </dependencies>
  </metadata>
</package>

That’s interesting given it has no dependencies and contains no libraries:

"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\.nupkg.metadata"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\.signature.p7s"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\libclang.16.0.6.nupkg"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\libclang.16.0.6.nupkg.sha512"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\libclang.nuspec"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\LICENSE.TXT"
"C:\Users\<USERNAME>\.nuget\packages\libclang\16.0.6\runtime.json"

But what’s in the runtime.json file:

{
  "runtimes": {
    "linux-arm64": {
      "libclang": {
        "libclang.runtime.linux-arm64": "16.0.6"
      }
    },
    "linux-x64": {
      "libclang": {
        "libclang.runtime.linux-x64": "16.0.6"
      }
    },
    "osx-arm64": {
      "libclang": {
        "libclang.runtime.osx-arm64": "16.0.6"
      }
    },
    "osx-x64": {
      "libclang": {
        "libclang.runtime.osx-x64": "16.0.6"
      }
    },
    "win-arm64": {
      "libclang": {
        "libclang.runtime.win-arm64": "16.0.6"
      }
    },
    "win-x64": {
      "libclang": {
        "libclang.runtime.win-x64": "16.0.6"
      }
    },
    "win-x86": {
      "libclang": {
        "libclang.runtime.win-x86": "16.0.6"
      }
    }
  }
}

Ah, that appears to map RIDs to runtime specific packages. But none were downloaded, so what happens when we build the project. Run dotnet build -verbosity:detailed > build.txt on project. Examining the build output and the .nuget cache none of those runtime specific packages appear to be downloaded (yet). Let’s try running the project with some dummy code in Program.cs.

using ClangSharp.Interop;

using var index = CXIndex.Create();

It runs, but still no runtime specific packages downloaded nor any native libraries in build output. Let’s try a more involved example copied from a unit test in ClangSharp.

// https://github.com/dotnet/ClangSharp/blob/main/tests/ClangSharp.UnitTests/CXTranslationUnitTest.cs
using ClangSharp.Interop;
using static ClangSharp.Interop.CXTranslationUnit_Flags;

var name = "basic";
var dir = Path.GetRandomFileName();
_ = Directory.CreateDirectory(dir);

try
{
    // Create a file with the right name
    var file = new FileInfo(Path.Combine(dir, name + ".c"));
    File.WriteAllText(file.FullName, "int main() { return 0; }");

    using var index = CXIndex.Create();
    using var translationUnit = CXTranslationUnit.Parse(
        index, file.FullName, Array.Empty<string>(),
        Array.Empty<CXUnsavedFile>(), CXTranslationUnit_None);
    var clangFile = translationUnit.GetFile(file.FullName);
}
finally
{
    Directory.Delete(dir, true);
}

This runs fine. But still no runtime specific packages downloaded nor any native libraries in build output. Let’s trying running the code in Visual Studio with native debugging enabled. That is add launch settings with "nativeDebugging": true. This is just a quick way to look at which native libraries are loaded and from where. Many ways of doing that, just using Visual Studio since quick and easy. In the Debug window one can see:

(Win32): Loaded '\bin\Debug\net7.0\ClangSharp.Interop.dll'. 
(CoreCLR: clrhost): Loaded '\bin\Debug\net7.0\ClangSharp.Interop.dll'. Skipped loading symbols. Module is optimized and the debugger option 'Just My Code' is enabled.
(Win32): Loaded 'C:\Program Files\LLVM\bin\libclang.dll'. Module was built without symbols.

Ah, turns out I have LLVM with clang installed 🤷‍ So this must be in environment variable PATH. Which it turns out it is C:\Program Files\LLVM\bin. Let’s try removing that, and restart all consoles, applications in use.

Running the example program again will then fail with exception:

System.DllNotFoundException: 'Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)'

Hmm, so the libclang native library is not available and the package is not downloaded automatically? How does runtime.json then work? Let’s try running the application with a runtime identifier defined:

dotnet run -r win-x64 > run.txt

This takes a while, and only output is:

C:\Program Files\dotnet\sdk\7.0.400-preview.23274.1\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(1142,5): 
  warning NETSDK1179: One of '--self-contained' or '--no-self-contained' options are required when '--runtime' is used. 
  [Tester.csproj]
Tester\4lzfbeoi.214\basic.c

but the program runs fine. Looking in .nuget and we can see the runtime specific packages have actually been downloaded now.

"C:\Users\<USERNAME>\.nuget\packages\libclangsharp.runtime.win-x64"
"C:\Users\<USERNAME>\.nuget\packages\libclang.runtime.win-x64"

so what this means is we cannot actually run and define the application without specifying a runtime identifier? That’s seems problematic if we want to use this as framework dependent AnyCPU application… in fact if we run the application from Visual Studio again it will fail with the same exception as before.

Use tree /F to see the files in the bin output, which shows all the native libraries related to libclang for win-x64 (and others).

├───bin
│   └───Debug
│       └───net7.0
│           │   ClangSharp.dll
│           │   ClangSharp.Interop.dll
│           │   Tester.deps.json
│           │   Tester.dll
│           │   Tester.exe
│           │   Tester.pdb
│           │   Tester.runtimeconfig.json
│           │
│           ├───egfakait.om3
│           │       basic.c
│           │
│           └───win-x64
│                   ClangSharp.dll
│                   ClangSharp.Interop.dll
│                   clretwrc.dll
│                   clrgc.dll
│                   clrjit.dll
│                   coreclr.dll
│                   createdump.exe
│                   hostfxr.dll
│                   hostpolicy.dll
│                   libclang.dll
│                   libClangSharp.dll
│                   Microsoft.CSharp.dll
│                   Microsoft.DiaSymReader.Native.amd64.dll
│                   Microsoft.VisualBasic.Core.dll
│                   Microsoft.VisualBasic.dll
│                   Microsoft.Win32.Primitives.dll
│                   Microsoft.Win32.Registry.dll
│                   mscordaccore.dll
│                   mscordaccore_amd64_amd64_7.0.523.17405.dll
│                   mscordbi.dll
│                   mscorlib.dll
│                   mscorrc.dll
│                   msquic.dll
│                   Tester.deps.json
│                   Tester.dll
│                   Tester.exe
│                   Tester.pdb
│                   Tester.runtimeconfig.json
│                   netstandard.dll
                    // Almost all System.*dlls follow here
│                   System.*.dll

Note how this has an exe under the specific runtime folder and all the dlls next to it.

As far as I can tell this means the runtime.json way of mapping runtime identifier specific packages only works if you define a hard-coded specific runtime identifier in the program you want to run. Which is incredibly annoying if you want to build and deploy runtime agnostic applications. E.g. if we wanted to deploy a win-x86 + win-x64 single exe. How is that supposed to work then? Am I getting this wrong?

Let’s try a hack. Adding the RID specific package to the project. That is add <PackageReference Include="libclang.runtime.win-x64" Version="16.0.0" /> to the project. Run it from VS and then it now runs fine. Right, so in some ways this works fine if we add the RID specific packages explicitly.

Still how does this work with regards to testing and if you use MSTest for both x86 and x64 testing? Let’s add a unit test project and reference the tester console project, and copy code from above unit test in Program.cs into this project. Now if we run the unit test with Processor Architecture for AnyCPU Projects set to Auto. If we change this to x86 and it will fail with the same exception as before:

System.DllNotFoundException: Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)

Interestingly, in the output we will get:

*****IMPORTANT*****
Failed to resolve libclang.
If you are running as a dotnet tool, you may need to manually copy the appropriate DLLs 
from NuGet due to limitations in the dotnet tool support. 
Please see https://github.com/dotnet/clangsharp for more details.
*****IMPORTANT*****

Note that the RID is win10-x86 in this case if logged e.g. with log(RuntimeInformation.RuntimeIdentifier);. If we select x64 it is win10-x64 and the test succeeds, but only because we added the RID specific libclang.runtime.win-x64 package to the project.

In https://github.com/dotnet/ClangSharp/issues/118#issuecomment-598305888 this issue is expanded upon with the comment by Tanner Gooding:

The simple fix for now is to add <RuntimeIdentifier Condition="'$(RuntimeIdentifier)' == '' AND '$(PackAsTool)' != 'true'">$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier> to your project (under a PropertyGroup), unfortunately because of the way NuGet restore works, we can’t just add this to a build/*.targets in the ClangSharp nuget package.

The issue is essentially that libClang and libClangSharp just contain a runtime.json file which point to the real packages. This was done to avoid users needing to download hundreds of megabytes just to consume ClangSharp (when they only need one of the native binaries most often). You can see some more details on the sizes here: #46 (comment), noting that that is the size of the compressed NuGet.

I had thought this was working for dev scenarios where the RID wasn’t specified, but it apparently isn’t. I’ll log an issue on NuGet to see if this is something that can be improved.

I wonder whether this actually works for the case of switching processor architecture in VS or similar? Let’s try adding it to the unit tests project and remove the RID specific package from the console project. Hence we have console project:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="ClangSharp" Version="16.0.0" />
  </ItemGroup>
  
</Project>

and unit test project:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>

    <IsPackable>false</IsPackable>

    <RuntimeIdentifier Condition="'$(RuntimeIdentifier)' == '' AND '$(PackAsTool)' != 'true'">$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.NET.Test.Sdk" Version="17.3.2" />
    <PackageReference Include="MSTest.TestAdapter" Version="2.2.10" />
    <PackageReference Include="MSTest.TestFramework" Version="2.2.10" />
    <PackageReference Include="coverlet.collector" Version="3.1.2" />
  </ItemGroup>

  <ItemGroup>
    <ProjectReference Include="..\Tester\Tester.csproj" />
  </ItemGroup>

</Project>

First time you then try to build this you will get a well-known error:

Assets file 'TesterUnitTests\obj\project.assets.json' doesn't have a target for 'net7.0/win-x64'. 
Ensure that restore has run and that you have included 'net7.0' in the TargetFrameworks for your project. 
You may also need to include 'win-x64' in your project's RuntimeIdentifiers.

So restore and build again. Let’s try running x86 unit tests in VS. This succeeds but the RID is actually now win10-x64, so we can now no longer run or debug x86 tests from Visual Studio?

Let’s first try to define test running via a script test-x86-x64.ps1:

#!/usr/bin/env pwsh
Write-Host "Testing Debug X86"
dotnet test --nologo -c Debug -- RunConfiguration.TargetPlatform=x86
Write-Host "Testing Release X86"
dotnet test --nologo -c Release -- RunConfiguration.TargetPlatform=x86
Write-Host "Testing Debug X64"
dotnet test --nologo -c Debug -- RunConfiguration.TargetPlatform=x64
Write-Host "Testing Release X64"
dotnet test --nologo -c Release -- RunConfiguration.TargetPlatform=x64

For x86 this will then fail with:

Test run detected DLL(s) which would use different framework and platform versions. Following DLL(s) do not match current settings, which are .NETCoreApp,Version=v7.0 framework and X86 platform.
TesterUnitTests.dll would use Framework .NETCoreApp,Version=v7.0 and Platform X64.

again this isn’t great. We need to be able to run both x64 and x86 without having to go through hoops.

Perhaps if we add both win-x64 and win-x86 to a RuntimeIdentifiers property instead? So change

    <RuntimeIdentifier Condition="'$(RuntimeIdentifier)' == '' AND '$(PackAsTool)' != 'true'">$(NETCoreSdkRuntimeIdentifier)</RuntimeIdentifier>

    <RuntimeIdentifiers>win-x64;win-x86</RuntimeIdentifiers>

then run test-x86-x64.ps1. Now everything fails with the same exception:

System.DllNotFoundException: Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)

According to https://learn.microsoft.com/en-us/dotnet/core/project-sdk/msbuild-props#runtimeidentifiers I should have defined the RIDs correctly. An example from there is:

<PropertyGroup>
  <RuntimeIdentifiers>win10-x64;osx.10.11-x64;ubuntu.16.04-x64</RuntimeIdentifiers>
</PropertyGroup>

Okay, perhaps running tests then need to be done differently and not with the RunConfiguration.TargetPlatform property? Let’s try to run the tests with --runtime instead in a new script test-x86-x64-rid.ps1:

#!/usr/bin/env pwsh
Write-Host "Testing Debug win-x86"
dotnet test --nologo -c Debug --runtime win-x86
Write-Host "Testing Release win-x86"
dotnet test --nologo -c Release --runtime win-x86
Write-Host "Testing Debug win-x64" 
dotnet test --nologo -c Debug --runtime win-x64
Write-Host "Testing Release win-x64"
dotnet test --nologo -c Release --runtime win-x64

Then the tests succeed, albeit with the annoying warnings below.

C:\Program Files\dotnet\sdk\7.0.400-preview.23274.1\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(1142,5): 
warning NETSDK1179: One of '--self-contained' or '--no-self-contained' options are required when '--runtime' is used. [TesterUnitTests\TesterUnitTests.csproj]
C:\Program Files\dotnet\sdk\7.0.400-preview.23274.1\Sdks\Microsoft.NET.Sdk\targets\Microsoft.NET.Sdk.targets(1142,5): 
warning NETSDK1179: One of '--self-contained' or '--no-self-contained' options are required when '--runtime' is used. [Tester.csproj]

why do I need to specify whether to be self-contained or not when I am just running tests? I am not publishing?

And are the tests really running x86 as expected? To test this I add two simple test:

    [TestMethod]
    public void X86() => Assert.AreEqual("win10-x86", RuntimeInformation.RuntimeIdentifier);
    [TestMethod]
    public void X64() => Assert.AreEqual("win10-x64", RuntimeInformation.RuntimeIdentifier);

and run the tests again. On win-x86 the X64 test fails as expected:

Assert.AreEqual failed. Expected:<win10-x64>. Actual:<win10-x86>.

and vice versa on win-x64:

Assert.AreEqual failed. Expected:<win10-x86>. Actual:<win10-x64>.

so at least that works as expected.

Let’s try running these tests from Visual Studio again. First, by setting processor architecture to x86. All tests except x86 fail, so this does switch the runtime identifier to win10-x86, but it does not fix the libclang problem.

System.DllNotFoundException: Unable to load DLL 'libclang' or one of its dependencies: 
The specified module could not be found. (0x8007007E)

so even though RIDs are now specified this doesn’t work when running tests from VS? Switching to x64 in VS and then only X64 test passes, and still the libclang dll cannot be found, so now this doesn’t work either. The difference apparently being there is now multiple RIDs, not just one.

Only way I think this can then be resolved is to actually explicitly add those RID specific runtime packages after all then so console project looks like:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="libtorch-cuda-11.7-win-x64" Version="2.0.1.1" />-->
    <PackageReference Include="ClangSharp" Version="16.0.0" />
    <PackageReference Include="libclang.runtime.win-x64" Version="16.0.6" />
    <PackageReference Include="libclang.runtime.win-x86" Version="16.0.6" />
  </ItemGroup>
  
</Project>

Re-running the unit tests and now libclang can be loaded and that test succeeds. Let’s try command line too and it’s the same.

So after all this, it seems like the runtime.json way of packaging native libraries has it’s set of challenges, you basically end up having explicitly add the RID specific packages anyway if you target multiple RIDs. In the process you then end up implicitly forcing the Any CPU build to no longer be frame dependent but self-contained? This is all very confusing and hard to understand and not the least convey to other developers.