question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Cannot load a exported deepfm model with NGC 22.03 inference container

See original GitHub issue

run into following errors

I0318 00:00:18.082645 172 hugectr.cc:1926] TRITONBACKEND_ModelInstanceInitialize: deepfm_0 (device 0)
I0318 00:00:18.082694 172 hugectr.cc:1566] Triton Model Instance Initialization on device 0
I0318 00:00:18.082792 172 hugectr.cc:1576] Dense Feature buffer allocation:
I0318 00:00:18.083026 172 hugectr.cc:1583] Categorical Feature buffer allocation:
I0318 00:00:18.083095 172 hugectr.cc:1601] Categorical Row Index buffer allocation:
I0318 00:00:18.083143 172 hugectr.cc:1611] Predict result buffer allocation:
I0318 00:00:18.083203 172 hugectr.cc:1939] ******Loading HugeCTR Model******
I0318 00:00:18.083217 172 hugectr.cc:1631] The model origin json configuration file path is: /ensemble_models/deepfm/1/deepfm.json
[HCTR][00:00:18][INFO][RK0][main]: Global seed is 1305961709
[HCTR][00:00:19][WARNING][RK0][main]: Peer-to-peer access cannot be fully enabled.
[HCTR][00:00:19][INFO][RK0][main]: Start all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: End all2all warmup
[HCTR][00:00:19][INFO][RK0][main]: Create inference session on device: 0
[HCTR][00:00:19][INFO][RK0][main]: Model name: deepfm
[HCTR][00:00:19][INFO][RK0][main]: Use mixed precision: False
[HCTR][00:00:19][INFO][RK0][main]: Use cuda graph: True
[HCTR][00:00:19][INFO][RK0][main]: Max batchsize: 64
[HCTR][00:00:19][INFO][RK0][main]: Use I64 input key: True
[HCTR][00:00:19][INFO][RK0][main]: start create embedding for inference
[HCTR][00:00:19][INFO][RK0][main]: sparse_input name data1
[HCTR][00:00:19][INFO][RK0][main]: create embedding for inference success
[HCTR][00:00:19][INFO][RK0][main]: Inference stage skip BinaryCrossEntropyLoss layer, replaced by Sigmoid layer
I0318 00:00:19.826815 172 hugectr.cc:1639] ******Loading HugeCTR model successfully
I0318 00:00:19.827763 172 model_repository_manager.cc:1149] successfully loaded 'deepfm' version 1
E0318 00:00:19.827767 172 model_repository_manager.cc:1152] failed to load 'deepfm_nvt' version 1: Internal: TypeError: 'NoneType' object is not subscriptable

At:
  /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype
  /ensemble_models/deepfm_nvt/1/model.py(76): initialize

E0318 00:00:19.827960 172 model_repository_manager.cc:1332] Invalid argument: ensemble 'deepfm_ens' depends on 'deepfm_nvt' which has no loaded version
I0318 00:00:19.828048 172 server.cc:522]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0318 00:00:19.828117 172 server.cc:549]
+---------+---------------------------------------------------------+-----------------------------------------------+
| Backend | Path                                                    | Config                                        |
+---------+---------------------------------------------------------+-----------------------------------------------+
| hugectr | /opt/tritonserver/backends/hugectr/libtriton_hugectr.so | {"cmdline":{"ps":"/ensemble_models/ps.json"}} |
+---------+---------------------------------------------------------+-----------------------------------------------+

I0318 00:00:19.828209 172 server.cc:592]
+------------+---------+--------------------------------------------------------------------------+
| Model      | Version | Status                                                                   |
+------------+---------+--------------------------------------------------------------------------+
| deepfm     | 1       | READY                                                                    |
| deepfm_nvt | 1       | UNAVAILABLE: Internal: TypeError: 'NoneType' object is not subscriptable |
|            |         |                                                                          |
|            |         | At:                                                                      |
|            |         |   /ensemble_models/deepfm_nvt/1/model.py(91): _set_output_dtype          |
|            |         |   /ensemble_models/deepfm_nvt/1/model.py(76): initialize                 |
+------------+---------+--------------------------------------------------------------------------+

I0318 00:00:19.845925 172 metrics.cc:623] Collecting metrics for GPU 0: Tesla T4
I0318 00:00:19.846404 172 tritonserver.cc:1932]
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                              |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                             |
| server_version                   | 2.19.0                                                                                                                             |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_mem |
|                                  | ory cuda_shared_memory binary_tensor_data statistics trace                                                                         |
| model_repository_path[0]         | /ensemble_models                                                                                                                   |
| model_control_mode               | MODE_NONE                                                                                                                          |
| strict_model_config              | 1                                                                                                                                  |
| rate_limit                       | OFF                                                                                                                                |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                          |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                           |
| response_cache_byte_size         | 0                                                                                                                                  |
| min_supported_compute_capability | 6.0                                                                                                                                |
| strict_readiness                 | 1                                                                                                                                  |
| exit_timeout                     | 30                                                                                                                                 |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------------------+


Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-818

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:26 (10 by maintainers)

github_iconTop GitHub Comments

2reactions
bschifferercommented, Apr 22, 2022

I tested the Criteo HugeCTR Inference Example and it worked for me

1reaction
mengdongcommented, Apr 28, 2022

yes please

On Thu, Apr 28, 2022 at 14:53 viswa-nvidia @.***> wrote:

@mengdong https://github.com/mengdong , can we close this issue ? @sohn21c https://github.com/sohn21c for viz.

— Reply to this email directly, view it on GitHub https://github.com/NVIDIA-Merlin/Merlin/issues/125#issuecomment-1112688473, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLQN42LNNQ2LZDN55F5GXLVHMCHLANCNFSM5PSPJBTA . You are receiving this because you were mentioned.Message ID: @.***>

Read more comments on GitHub >

github_iconTop Results From Across the Web

[1703.04247] DeepFM: A Factorization-Machine based Neural ...
In this paper, we show that it is possible to derive an end-to-end learning model that emphasizes both low- and high-order feature interactions....
Read more >
Using the app to send SMS for other purposes - Nextcloud ...
[BUG] Cannot load a exported deepfm model with NGC 22.03 inference container, 26, 2022-02-28, 2022-08-07. How use with a listener, 0, 2022-03-25, 2022-10-28....
Read more >
NVIDIA Merlin is an open source library providing end-to-end ...
NVIDIA Merlin consists of the following open source libraries: ... [BUG] Cannot load a exported deepfm model with NGC 22.03 inference container.
Read more >
HugeCTR is a GPU-accelerated recommender framework ...
Start a NGC container with your local host directory (/your/host/dir mounted) by running ... [BUG] HugeCTR Model segfaults on Tritonserver inference request.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found