question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] record_stats=False not working as expected

See original GitHub issue

Describe the bug record_stats=False corrupts validation dataset. However, it works fine, when I set record_stats=True on the validation dataset. Steps/Code to reproduce bug

Workflow Processor: workflow_processor

  • Train dataset record_stats=True and Validation dataset record_stats=False -> as you can see CumCount Max is 896 instead of 299.

1_train_true_valid_false

  • Train dataset record_stats=True and Validation dataset record_stats=True -> as you can see CumCount Max is now 299. 2_train_true_valid_true

Expected behavior I am expecting this output when Validation dataset record_stats is set to False.

2_train_true_valid_true

Environment details (please complete the following information):

  • Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
  • Method of NVTabular install: [conda, Docker, or from source]
    • If method of install is [Docker], provide docker pull & docker run commands used

Installed 0.2 version from here…https://pypi.org/project/nvtabular/

And using it with Rapids 0.16 as I need pivot().

Additional context Add any other context about the problem here.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
rjzamoracommented, Nov 3, 2020

Perhaps it would be reasonable to add a Workflow parameter to specify a list of columns that should pass through NVTabular unchanged - Thoughts on this @benfred ? I’m honestly unsure how often a feature like this would be used.

1reaction
rjzamoracommented, Nov 3, 2020

is there a better way to pass down id column untouched via NVTabular

Rather than allowing the Categorify to act on all categorical columns (the default), you can specify a subset with Categorify(columns=<your-list>).

Read more comments on GitHub >

github_iconTop Results From Across the Web

media/libstagefright/CameraSource.cpp - Google Git
you may not use this file except in compliance with the License. * You may obtain a copy of the License ... #include...
Read more >
How do you fix a bug you can't replicate? - Stack Overflow
Pointer issues are arduous to track and replicate, but debuggers can help (such as GDB and DDD). Java. An application that has multiple...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found