question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Error starting embedded DCGM engine

See original GitHub issue

I’m trying to run model-analyzer in kubernetes but it is failing with the following error:

│ Unhandled exception. System.TypeInitializationException: The type initializer for 'Triton.MemoryAnalyzer.Metrics.GpuMetrics' threw an exception.                                           │
│  ---> System.InvalidOperationException: Error starting embedded DCGM engine. DCGM initialization error.                                                                                    │
│    at Triton.MemoryAnalyzer.Metrics.GpuMetrics..cctor()                                                                                                                                    │
│    --- End of inner exception stack trace ---                                                                                                                                              │
│    at Triton.MemoryAnalyzer.Metrics.GpuMetrics..ctor()                                                                                                                                     │
│    at Triton.MemoryAnalyzer.MetricsCollector..ctor(MetricsCollectorConfig config)                                                                                                          │
│    at Triton.MemoryAnalyzer.Program.<>c__DisplayClass7_0.<Main>b__2(K8sOptions options)                                                                                                    │
│    at CommandLine.ParserResultExtensions.MapResult[T1,T2,TResult](ParserResult`1 result, Func`2 parsedFunc1, Func`2 parsedFunc2, Func`2 notParsedFunc)                                     │
│    at Triton.MemoryAnalyzer.Program.Main(String[] args)                                                                                                                                    │
│ stream closed 

Has anyone seen it before ?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:19 (1 by maintainers)

github_iconTop GitHub Comments

3reactions
dyastremskycommented, Oct 30, 2020

@fabito Not yet. The Triton team is integrating this tool into the Triton universe, including a complete rewrite in C++. I briefly skimmed the code in that branch. My assumption is that it will support more current versions of Triton and better integrate into the experience you are accustomed to with Triton.

I’m not involved in that project, so I do not know the timeline. @deadeyegoodwin and @dzier would be better contacts for that. In the meantime, hopefully this version of Triton Memory Analyzer can provide you with approximate memory metrics.

1reaction
dziercommented, Nov 24, 2020

We have just pushed out the new rewrite in Python to the main branch. Please try the newer version and see if the issues still persist. Note we will officially releasing v1.0.0 of ModelAnalyzer in the 20.12 release of Triton SDK, which will be sometime in December.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Administrative — NVIDIA DCGM Documentation latest ...
Start an embedded host engine agent within this process. ... DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit yet.
Read more >
DCGM Samples
DCGM can be run in three different ways. Embedded Mode. In embedded mode, hostengine is started as part of the running process and...
Read more >
TOOLS FOR MANAGING GPUs
Embedded into a daemon called NVIDIA Host Engine. DCGM clients prefer to interact with a daemon. Multiple clients wish to interact with DCGM,...
Read more >
CVE-2022-21820 – NVIDIA DCGM contains a vulnerability in ...
To start an embedded host engine and check that it is publishing: ... dcgm_prometheus.py error AttributeError: 'DcgmPrometheus' object has ...
Read more >
DCGM Error: unable to establish a connection to the ...
after install DCGM, When I enter that statement, I get the following error ... Error: Unable to connect to host engine.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found