[BUG] record_stats=False not working as expected
See original GitHub issueDescribe the bug record_stats=False corrupts validation dataset. However, it works fine, when I set record_stats=True on the validation dataset. Steps/Code to reproduce bug
Workflow Processor:
- Train dataset record_stats=True and Validation dataset record_stats=False -> as you can see CumCount Max is 896 instead of 299.
- Train dataset record_stats=True and Validation dataset record_stats=True -> as you can see CumCount Max is now 299.
Expected behavior I am expecting this output when Validation dataset record_stats is set to False.
Environment details (please complete the following information):
- Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
- Method of NVTabular install: [conda, Docker, or from source]
- If method of install is [Docker], provide
docker pull
&docker run
commands used
- If method of install is [Docker], provide
Installed 0.2 version from here…https://pypi.org/project/nvtabular/
And using it with Rapids 0.16 as I need pivot().
Additional context Add any other context about the problem here.
Issue Analytics
- State:
- Created 3 years ago
- Comments:7 (1 by maintainers)
Top Results From Across the Web
media/libstagefright/CameraSource.cpp - Google Git
you may not use this file except in compliance with the License. * You may obtain a copy of the License ... #include...
Read more >How do you fix a bug you can't replicate? - Stack Overflow
Pointer issues are arduous to track and replicate, but debuggers can help (such as GDB and DDD). Java. An application that has multiple...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Perhaps it would be reasonable to add a
Workflow
parameter to specify a list of columns that should pass through NVTabular unchanged - Thoughts on this @benfred ? I’m honestly unsure how often a feature like this would be used.Rather than allowing the
Categorify
to act on all categorical columns (the default), you can specify a subset withCategorify(columns=<your-list>)
.