[BUG] Error running ETL with NVTabular tutorial notebook -- version compatibility issue?
See original GitHub issueBug description
When running 02-ETL-with-NVTabular.ipynb
on REES46 data, the following error was reported:
Failed to transform operator <nvtabular.ops.groupby.Groupby object at 0x7f62fed519a0>
Traceback (most recent call last):
File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/workflow/workflow.py", line 519, in _transform_partition
output_df = node.op.transform(selection, input_df)
File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 132, in transform
new_df = _apply_aggs(
File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 244, in _apply_aggs
df[f"{col}{name_sep}{_agg}"] = _first_or_last(
File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 288, in _first_or_last
return _first(x)
File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 302, in _first
return elements[offsets[:-1]]
TypeError: 'NumericalColumn' object is not subscriptable
....
File ~/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py:302, in _first(x)
300 offsets = x.list._column.offsets
301 elements = x.list._column.elements
--> 302 return elements[offsets[:-1]]
303 else:
304 # cpu/pandas
305 return x.apply(lambda y: y[0])
TypeError: 'NumericalColumn' object is not subscriptable
Steps/Code to reproduce bug
- Loading REES
Oct-2019.parquet
intoexamples/tutorial/02-ETL-with-NVTabular.ipynb
- Run the cells until hitting the cell right beneath
Initialize the NVTabular dataset object and workflow graph.
Expected behavior
The workflow is expected to fit and transform the data without error.
Environment details
- Transformers4Rec version: The current codebase from Github (commit 0fd858dcee5ce5b2508f7e91733ae68710affd76)
- Platform: Ubuntu 18.04
- Python version: 3.9
- Huggingface Transformers version: 4.18
- PyTorch version (GPU?): 1.12 (GPU)
- Tensorflow version (GPU?):
- NVTabular version: 1.3.3
- CUDF version: 22.08 (cuda version, from rapidsai)
Additional context
I am suspecting it might be compatibility issues related to NVTabular and CUDF versions. Much appreciate your help! Thanks!
Issue Analytics
- State:
- Created a year ago
- Comments:7 (3 by maintainers)
Top Results From Across the Web
Issues · NVIDIA-Merlin/NVTabular - GitHub
[BUG] Categorify does not set non-frequent item mapping size in unique parquet file when max_size arg is set bug Something isn't working P0....
Read more >Troubleshooting — NVTabular 2021 documentation
String Column Error . If you run into a problem an error that states the size of your string column is too large,...
Read more >Accelerating ETL for Recommender Systems on NVIDIA GPUs ...
While NVTabular is built upon the RAPIDS cuDF library, it improves cuDF in a major way: Data is not limited to GPU or...
Read more >The Platform Inside and Out Release 0.16 - UCSD CSE
Using cuIO primatives and cuDF, NVTabular accelerates dataloading for PyTorch & Tensorflow, removing I/O issues common in deep learning based recommender system ...
Read more >transformers4rec Changelog - PyUp.io
Bug Fixes - Fix failing ci error related to `sparse_names` containing features that are not part of the model's schema sararb (541)
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I believe this issue is fixed with https://github.com/NVIDIA-Merlin/NVTabular/pull/1654 - this will be in the next release of nvtabular
Update: The issue went away with the installation of the latest NVTabular version 1.4.0 released on 09/06/2022.
Thank you for your help @benfred @rnyak!