[BUG] Rossmann notebook not hitting expected RMSPE (again)
See original GitHub issueDescribe the bug
Rossmann convergence and final RMSPEs are again considerably worse than they once were.
I’m separating this out from https://github.com/NVIDIA/NVTabular/issues/146 because that issue was resolved by @rjzamora 's PR (I confirmed this, as described below).
Steps/Code to reproduce bug
First, rewind to @rjzamora 's PR, which (for reasons that are right now unknown) fixed https://github.com/NVIDIA/NVTabular/issues/146:
# Align Dask and Single-GPU Writer Logic (#160)
git checkout 7407cfd
Run examples/rossmann-store-sales-preproc.ipynb
Run examples/rossmann-store-sales-example.ipynb
Here we see consistent convergence and good final RMSPEs, confirming the fix.
Note: NVTabular’s Workflow
outputs are now saved in examples/data/jp_ross
Now fast forward to master as of 2020 08 04:
# [REVIEW] Async torch Dataloaders (#127)
git checkout 7935f7e
Note: examples/rossmann-store-sales-preproc.ipynb
is completely unchanged from 7407cfd
to 7935f7e
.
If we run examples/rossmann-store-sales-example.ipynb
3 times, we now 1. see unstable convergence and 2. obtain final RMSPEs of
TensorFlow: 25.0%, 22.3%, 22.3% fast.ai: 29.9%, 29.1%, 21.5%
The problem seems to do with Workflow
processing.
Note that the newer version of examples/rossmann-store-sales-example.ipynb
does not use examples/data/jp_ross
for exporting Workflow
data, but rather examples/data/ross_pre
.
So, we can now run this notebook exactly as is but using 7407cfd
's Workflow
outputs instead. This was done by inserting
PREPROCESS_DIR = os.path.join(DATA_DIR, 'jp_ross')
PREPROCESS_DIR_TRAIN = os.path.join(PREPROCESS_DIR, 'train')
PREPROCESS_DIR_VALID = os.path.join(PREPROCESS_DIR, 'valid')
right before the Training a Network section.
Now, if we rerun the notebook 3 times, we once again get stable convergence, and the final RMSPEs are
TensorFlow: 18.9%, 17.4%, 17.9%
fast.ai: 19.7%, 19.5%, 21.4%
@benfred @rjzamora @jperez999 for visibility
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Top GitHub Comments
👍 reproduced on my end. Now results in stable convergence + good final RMSPEs
Can you use
git bisect
to figure out where this started to break?