[BUG] Rossmann notebook still broken from recent sweeping changes to NVTabular
See original GitHub issueAfter the recent dataset changes / dask API compatibility changes, and after the recent fix to the Rossmann notebook (https://github.com/NVIDIA/NVTabular/pull/140), which made the notebook run without runtime errors, convergence and final results are still much worse than what they once were.
This is true for both the TensorFlow and fast.ai implementations, although the fast.ai implementation is affected more harshly.
It’s difficult to be more specific about this bug, because I do not know what the root cause is, but I thought it’s worth reporting, so that someone familiar with the dataset / dask changes can investigate.
Steps to reproduce:
First, roll back to an old commit, before dataset / dask changes, but incorporate the changes to the Rossmann notebook that fixed bugs, improved organization, etc.:
# d44defa Refactor get_emb_sz (#110)
git checkout d44defa
# c179905 Merge pull request #123 from NVIDIA/vinhn-demo-notebook
git checkout c179905 -- examples/rossmann-store-sales-example.ipynb
In this setting, if we run the notebook 3 times, training is always stable, and we get final RMSPEs of
- 0.19, 0.19, 0.21 for Tensorflow
- 0.19, 0.22, 0.21 for fast.ai
Next, come back to the more recent commit, after dataset / dask changes, and run the notebook:
# 294b480 Rossmann notebook fixes (#140)
git checkout 294b480
In this setting, if we run the notebook 3 times, training is no longer stable, and we get final RMSPEs of
- 0.27 0.29 0.29 for TensorFlow
- 0.69 0.42 0.48 for fast.ai
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (5 by maintainers)
Top GitHub Comments
Thanks for investigating @rdipietro - It seems reasonable to me that there could be a bug/problem in one of the dask-based operations. There is not much “validation” in the unit tests - mostly high-level sanity checks.
I will try to investigate the pre-processing phase of the notebook later tonight or tomorrow and see if there are any obvious problems with the processed dataset. My understanding is that the dataset is small enough to do preprocessing with cudf/pandas alone. Is that right?
Thanks – I’ll try that