question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Error running ETL with NVTabular tutorial notebook -- version compatibility issue?

See original GitHub issue

Bug description

When running 02-ETL-with-NVTabular.ipynb on REES46 data, the following error was reported:

Failed to transform operator <nvtabular.ops.groupby.Groupby object at 0x7f62fed519a0>
Traceback (most recent call last):
  File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/workflow/workflow.py", line 519, in _transform_partition
    output_df = node.op.transform(selection, input_df)
  File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 132, in transform
    new_df = _apply_aggs(
  File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 244, in _apply_aggs
    df[f"{col}{name_sep}{_agg}"] = _first_or_last(
  File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 288, in _first_or_last
    return _first(x)
  File "/home/hwan/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py", line 302, in _first
    return elements[offsets[:-1]]
TypeError: 'NumericalColumn' object is not subscriptable
....
File ~/miniconda3/envs/trans4rec/lib/python3.9/site-packages/nvtabular/ops/groupby.py:302, in _first(x)
    300     offsets = x.list._column.offsets
    301     elements = x.list._column.elements
--> 302     return elements[offsets[:-1]]
    303 else:
    304     # cpu/pandas
    305     return x.apply(lambda y: y[0])

TypeError: 'NumericalColumn' object is not subscriptable

Steps/Code to reproduce bug

  1. Loading REES Oct-2019.parquet into examples/tutorial/02-ETL-with-NVTabular.ipynb
  2. Run the cells until hitting the cell right beneath Initialize the NVTabular dataset object and workflow graph.

Expected behavior

The workflow is expected to fit and transform the data without error.

Environment details

  • Transformers4Rec version: The current codebase from Github (commit 0fd858dcee5ce5b2508f7e91733ae68710affd76)
  • Platform: Ubuntu 18.04
  • Python version: 3.9
  • Huggingface Transformers version: 4.18
  • PyTorch version (GPU?): 1.12 (GPU)
  • Tensorflow version (GPU?):
  • NVTabular version: 1.3.3
  • CUDF version: 22.08 (cuda version, from rapidsai)

Additional context

I am suspecting it might be compatibility issues related to NVTabular and CUDF versions. Much appreciate your help! Thanks!

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (3 by maintainers)

github_iconTop GitHub Comments

2reactions
benfredcommented, Sep 1, 2022

I believe this issue is fixed with https://github.com/NVIDIA-Merlin/NVTabular/pull/1654 - this will be in the next release of nvtabular

0reactions
hui-wancommented, Sep 14, 2022

Update: The issue went away with the installation of the latest NVTabular version 1.4.0 released on 09/06/2022.

Thank you for your help @benfred @rnyak!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · NVIDIA-Merlin/NVTabular - GitHub
[BUG] Categorify does not set non-frequent item mapping size in unique parquet file when max_size arg is set bug Something isn't working P0....
Read more >
Troubleshooting — NVTabular 2021 documentation
String Column Error . If you run into a problem an error that states the size of your string column is too large,...
Read more >
Accelerating ETL for Recommender Systems on NVIDIA GPUs ...
While NVTabular is built upon the RAPIDS cuDF library, it improves cuDF in a major way: Data is not limited to GPU or...
Read more >
The Platform Inside and Out Release 0.16 - UCSD CSE
Using cuIO primatives and cuDF, NVTabular accelerates dataloading for PyTorch & Tensorflow, removing I/O issues common in deep learning based recommender system ...
Read more >
transformers4rec Changelog - PyUp.io
Bug Fixes - Fix failing ci error related to `sparse_names` containing features that are not part of the model's schema sararb (541)
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found