question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CollectDumpOnTestSessionHang doesn't produce a dump file

See original GitHub issue

Description

I’m trying to troubleshoot hanging builds on a CI server. I found this which seems very promising:

https://github.com/microsoft/vstest-docs/blob/master/RFCs/0028-BlameCollector-Hang-Detection.md

However, when I use the hang detector, I don’t get a dump file.

Steps to reproduce

The test hangs are intermittent, so they are hard to reproduce.

dotnet vstest is invoked with:

<lots of DLLs> --Parallel --logger:"trx;LogFileName=NUnitTestsCore.trx" --logger:"console;verbosity=minimal" --ResultsDirectory:.../build/test-reports --Settings:...\tmpCF7A.tmp

The settings file is auto generated and contains something like this:

<RunSettings>
  <RunConfiguration>
    <MaxCpuCount>4</MaxCpuCount>
  </RunConfiguration>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="blame" enabled="True">
        <Configuration>
          <ResultsDirectory>...\build</ResultsDirectory>
      	  <CollectDumpOnTestSessionHang TestTimeout="120000" DumpType="full"/>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>    
</RunSettings>

Expected behavior

I expect the hang detector to detect a hang and produce a crash dump file.

Actual behavior

The hang detector did detect a hang after ~2 minutes:

The active test run was aborted. Reason: Test host process crashed
...
Test Run Aborted.
Attachments:
  ...\build\test-reports\4a680b77-23cd-471a-9b82-ead6630865fa\Sequence_af08f6cfd55f4dd5989add68f10ea91f.xml

However, it only produces a sequence file, not a crash dump.

Note that the sequence file ends up in the result directory used on the command line, rather than the results directory in the settings file.

Diagnostic logs

None produced by the above command.

Environment

Windows Server 2012 .NET Core version 3.0.100

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:22 (11 by maintainers)

github_iconTop GitHub Comments

2reactions
provegardcommented, Apr 8, 2020

Problem solved, the part about 4.6 targeting pack was a red herring. Installing VS extension development support in VS2017 did it.

1reaction
nohwndcommented, Jul 16, 2020

Actually there is a lot. In the latest net5.0 release (I think since preview6). We are leveraging the Diagnostics NetCore client to create hang dumps. This works on Windows (with any target framework) and Linux (with netcoreapp3.1 and newer). There is no need for procdump.exe when creating hang dumps, or for the temporary folder.

To trigger a hang dump you can now simply do: dotnet test --blame-hang-timeout 2min or vstest.console /Blame:"CollectHangDump;TestTimeout=2min".

For crash dumps the situation is similar as before, but it errors out a bit better. There you still need procdump, because that flow needs to attach to a running process and detect failure, which is no easy task. But luckily crash dumps are usually way less interesting than hang dumps, because when the process crashes it often has an eay to see reason.

From dotnet test help:

  --blame                                  Runs the tests in blame mode. This option is helpful in isolating problematic tests that cause the test host to crash or hang.
                                           When a crash is detected, it creates an sequence file in TestResults/guid/guid_Sequence.xml that captures the order of tests that were run before the crash.
                                           Based on the additional settings, hang dump or crash dump can also be collected.
                                           Example:
                                           Timeout the test run when test takes more than the default timeout of 1 hour, and collect crash dump when the test host exits unexpectedly.
                                           (Crash dumps require additional setup, see below.)
                                           dotnet test --blame-hang --blame-crash
                                           Example:
                                           Timeout the test run when a test takes more than 20 minutes and collect hang dump.
                                           dotnet test --blame-hang-timeout 20min
  --blame-crash                            Runs the tests in blame mode and enables collecting crash dump when testhost exits unexpectedly.
                                           This option is currently only supported on Windows, and requires procdump.exe and procdump64.exe to be available in PATH.
                                           Or PROCDUMP_PATH environment variable to be set, and point to a directory that contains procdump.exe and procdump64.exe.
                                           The tools can be downloaded here: https://docs.microsoft.com/en-us/sysinternals/downloads/procdump
                                           Implies --blame.
  --blame-crash-dump-type <DUMP_TYPE>      The type of crash dump to be collected. Implies --blame-crash.
  --blame-crash-collect-always             Enables collecting crash dump on expected as well as unexpected testhost exit.
  --blame-hang                             Run the tests in blame mode and enables collecting hang dump when test exceeds the given timeout. Implies --blame-hang.
  --blame-hang-dump-type <DUMP_TYPE>       The type of crash dump to be collected. When None, is used then test host is terminated on timeout, but no dump is collected. Implies --blame-hang.
  --blame-hang-timeout <TIMESPAN>          Per-test timeout, after which hang dump is triggered and the testhost process is terminated.
                                           The timeout value is specified in the following format: 1.5h / 90m / 5400s / 5400000ms. When no unit is used (e.g. 5400000), the value is assumed to be in milliseconds.
                                           When used together with data driven tests, the timeout behavior depends on the test adapter used. For xUnit and NUnit the timeout is renewed after every test case,
                                           For MSTest, the timeout is used for all testcases.
                                           This option is currently supported only on Windows together with netcoreapp2.1 and newer. And on Linux with netcoreapp3.1 and newer. OSX and UWP are not supported.
Read more comments on GitHub >

github_iconTop Results From Across the Web

BSOD not generating any Dump files...
It seems either case, the system fail to generate dump files. I end up trying out a couple of the ... There doesn't...
Read more >
Windows Error Reporting doesn't generate mini dump for a ...
I've check the DumpFolder in registry, and I can find the dump files for old crashes, but there's just no new dump file...
Read more >
27931: Creating Process Dumps with ProcDump
When Procdump captures the dump file, it does not kill the running process. Solution. To create a dump with ProcDump, do the following:...
Read more >
Generate a complete memory dump on Windows 10
The following steps describe how to generate a dump file using Task Manager: Search for System Configuration and select it. 10001_1.png.
Read more >
How to save a Windows crash dump file after an ...
The file that is produced when a game or application crashes is called a Windows crash dump file. If you experience an application...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found