_partial_dependence_brute uses unnecessary memory
See original GitHub issueDescribe the bug
When users run _partial_dependence_brute
from inspection_partial_dependence.py
, the following code is executed for every grid point (usually 100 times), in lines 149-150:
for new_values in grid:
X_eval = X.copy()
Since each loop overwrites a single feature at a time, it would be more efficient to move the X.copy()
prior to the loop. Unless the model itself mutates the dataset, I don’t think there’s a reason to make a new copy on every loop.
For large datasets, this causes extreme memory use.
Steps/Code to Reproduce
N/A
Expected Results
Expected results are actual results, but with more memory use.
Actual Results
Expected results are actual results, but with more memory use.
Versions
Since 0.23, or whenever partial_dependence was introduced.
Issue Analytics
- State:
- Created 2 years ago
- Comments:5 (5 by maintainers)
Top Results From Across the Web
Partial dependence plots (PDP) and individual conditional ...
The 'brute' and 'recursion' methods will likely disagree regarding the value of the partial dependence, because they will treat these unlikely samples ...
Read more >8.1 Partial Dependence Plot (PDP) | Interpretable Machine ...
Partial dependence works by marginalizing the machine learning model output over the distribution of the features in set C, so that the function...
Read more >Sustained Space and Cumulative Complexity Trade-o s for ...
In this paper, our focus will be on understanding and quantifying SSC and. CMC trade-offs for data-dependent memory-hard Functions using dynamic graphs and ......
Read more >Fooling Partial Dependence via Data Poisoning - Medium
TL;DR: We highlight that Partial Dependence can be maliciously altered, e.g. bent and shifted, with adversarial data perturbations.
Read more >Rendering Mechanism - Vue.js
Patch: When a dependency used during mount changes, the effect re-runs. ... created for them on each re-render, resulting in unnecessary memory pressure....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Ups my bad, I went a bit more in-depth in the code and you are right. At this stage in the code, we are handling a single interaction and we will always overwrite the same targeted feature(s). I thought that the
features
here was corresponding to loop over all the possible interactions.We can safely put the copy outside of the loop then. If the garbage collection was working as expected then we should not have any gain regarding the memory usage. However, we avoid triggering copies which could be a potential speed-up then.
My only concern would be if the model is a pipeline which modifies the dataframe inplace during prediction.