Error starting embedded DCGM engine
See original GitHub issueI’m trying to run model-analyzer in kubernetes but it is failing with the following error:
│ Unhandled exception. System.TypeInitializationException: The type initializer for 'Triton.MemoryAnalyzer.Metrics.GpuMetrics' threw an exception. │
│ ---> System.InvalidOperationException: Error starting embedded DCGM engine. DCGM initialization error. │
│ at Triton.MemoryAnalyzer.Metrics.GpuMetrics..cctor() │
│ --- End of inner exception stack trace --- │
│ at Triton.MemoryAnalyzer.Metrics.GpuMetrics..ctor() │
│ at Triton.MemoryAnalyzer.MetricsCollector..ctor(MetricsCollectorConfig config) │
│ at Triton.MemoryAnalyzer.Program.<>c__DisplayClass7_0.<Main>b__2(K8sOptions options) │
│ at CommandLine.ParserResultExtensions.MapResult[T1,T2,TResult](ParserResult`1 result, Func`2 parsedFunc1, Func`2 parsedFunc2, Func`2 notParsedFunc) │
│ at Triton.MemoryAnalyzer.Program.Main(String[] args) │
│ stream closed
Has anyone seen it before ?
Issue Analytics
- State:
- Created 3 years ago
- Comments:19 (1 by maintainers)
Top Results From Across the Web
Administrative — NVIDIA DCGM Documentation latest ...
Start an embedded host engine agent within this process. ... DCGM_ST_UNINITIALIZED if DCGM has not been initialized with dcgmInit yet.
Read more >DCGM Samples
DCGM can be run in three different ways. Embedded Mode. In embedded mode, hostengine is started as part of the running process and...
Read more >TOOLS FOR MANAGING GPUs
Embedded into a daemon called NVIDIA Host Engine. DCGM clients prefer to interact with a daemon. Multiple clients wish to interact with DCGM,...
Read more >CVE-2022-21820 – NVIDIA DCGM contains a vulnerability in ...
To start an embedded host engine and check that it is publishing: ... dcgm_prometheus.py error AttributeError: 'DcgmPrometheus' object has ...
Read more >DCGM Error: unable to establish a connection to the ...
after install DCGM, When I enter that statement, I get the following error ... Error: Unable to connect to host engine.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@fabito Not yet. The Triton team is integrating this tool into the Triton universe, including a complete rewrite in C++. I briefly skimmed the code in that branch. My assumption is that it will support more current versions of Triton and better integrate into the experience you are accustomed to with Triton.
I’m not involved in that project, so I do not know the timeline. @deadeyegoodwin and @dzier would be better contacts for that. In the meantime, hopefully this version of Triton Memory Analyzer can provide you with approximate memory metrics.
We have just pushed out the new rewrite in Python to the
main
branch. Please try the newer version and see if the issues still persist. Note we will officially releasing v1.0.0 of ModelAnalyzer in the 20.12 release of Triton SDK, which will be sometime in December.