question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Rossmann notebook still broken from recent sweeping changes to NVTabular

See original GitHub issue

After the recent dataset changes / dask API compatibility changes, and after the recent fix to the Rossmann notebook (https://github.com/NVIDIA/NVTabular/pull/140), which made the notebook run without runtime errors, convergence and final results are still much worse than what they once were.

This is true for both the TensorFlow and fast.ai implementations, although the fast.ai implementation is affected more harshly.

It’s difficult to be more specific about this bug, because I do not know what the root cause is, but I thought it’s worth reporting, so that someone familiar with the dataset / dask changes can investigate.

Steps to reproduce:

First, roll back to an old commit, before dataset / dask changes, but incorporate the changes to the Rossmann notebook that fixed bugs, improved organization, etc.:

# d44defa Refactor get_emb_sz (#110)
git checkout d44defa

# c179905 Merge pull request #123 from NVIDIA/vinhn-demo-notebook
git checkout c179905 -- examples/rossmann-store-sales-example.ipynb

In this setting, if we run the notebook 3 times, training is always stable, and we get final RMSPEs of

  • 0.19, 0.19, 0.21 for Tensorflow
  • 0.19, 0.22, 0.21 for fast.ai

Next, come back to the more recent commit, after dataset / dask changes, and run the notebook:

# 294b480 Rossmann notebook fixes (#140)
git checkout 294b480

In this setting, if we run the notebook 3 times, training is no longer stable, and we get final RMSPEs of

  • 0.27 0.29 0.29 for TensorFlow
  • 0.69 0.42 0.48 for fast.ai

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
rjzamoracommented, Jul 21, 2020

Thanks for investigating @rdipietro - It seems reasonable to me that there could be a bug/problem in one of the dask-based operations. There is not much “validation” in the unit tests - mostly high-level sanity checks.

I will try to investigate the pre-processing phase of the notebook later tonight or tomorrow and see if there are any obvious problems with the processed dataset. My understanding is that the dataset is small enough to do preprocessing with cudf/pandas alone. Is that right?

0reactions
rdipietrocommented, Jul 22, 2020

Thanks – I’ll try that

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] NV-Tabular: end-to-end accuracy on Rossmann dataset ...
hi folks, I spent sometime trying to improve the accuracy of the model on the Rossmann data, but it's not anywhere near competitive....
Read more >
NVTabular demo on Rossmann data - Feature Engineering ...
This notebook demonstrates the steps for carrying out data preprocessing, transformation and loading with NVTabular on the Kaggle Rossmann dataset.
Read more >
Untitled
Pulsometro polar ft7 foro, Describe how your eye is like a camera? ... Almost broke up, Cabinet maker supplies townsville, New bike lanes...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found