[BUG] Mismatch between total number of items and length of prediction logit values (end-to-end-session-based notebook)
See original GitHub issueBug description
In the example here (using the Tensorflow approach), after we perform ETL with NVTabular, train the model with Tensorflow, and deploy it into Triton Server, there is a mismatch in the length of prediction which makes it impossible (?) to understand prediction results! We have a total of 52739 item ids. However, if we try to predict recommended item for a few (new) sessions, the length of logit values is 52743. Therefore:
total number of existing items (52739) < total number of recommended options (52743)
Confirming total number of items
Checking the total number of predictions/recommendations
Could you please clarify if this is expected? As far as I understood, the visualized number is supposed to be the index of the ITEM ID to be recommended, so the full length of predictions and items should match.
Steps/Code to reproduce bug
- Follow the example here (using the Tensorflow approach)
- Try to resolve predictions. Compare the length of logit values in the prediction result and the length of total items in the initial dataset
Expected behavior
Full length of predicted logit values are equal to the number of items so that we could understand/resolve predicted items.
Environment details
- Transformers4Rec version: 0.1.8
- Platform: GCP - Linux (4vCPU, 15GB RAM, 1 Tesla T4 GPU)
- Python version: 3.8.10
- Huggingface Transformers version:
- PyTorch version (GPU?):
- Tensorflow version (GPU?): 2.8.0+nv22.4 - with GPU enabled
Additional context
Used nvcr.io/nvidia/merlin/merlin-training:22.05
image for training
Used nvcr.io/nvidia/merlin/merlin-tensorflow-inference:22.05
for serving on Inference Server
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top GitHub Comments
@hosseinkalbasi If you call
to_parquet()
afterfit_tranform
on thenvtabular.workflow
object, it will auto-generate a schema file stored inoutput_dir
. e.g.,@hosseinkalbasi this is not really a bug. this is bcs we set the
max
num for theitem-id
column as follow in theschema_demo.pb
. This schema file was created manually so it does not read theunique
items from the NVT pipeline.Please go to schema file and replace
52742
as52739
, and you should be getting +1 item_ids predicted for each session id. Hope that helps.