Yeo-Johnson Power Transformer gives Numpy warning
See original GitHub issueDescribe the bug
When I use a power transformer with yeo-johnson method I get this warning in numpy:
../lib/python3.10/site-packages/numpy/core/_methods.py:235: RuntimeWarning: overflow encountered in multiply
Strangely I can’t surpress this warning with filter.warnings()
Tl;dr: The eroor seems to appear when multiple instances of the transformer are called, in this example because n_jobs=2 is set in the preprocessor. But the bug also appears with n_jobs=1 when you call this preprocessor in a grid search for example.
Steps/Code to Reproduce
The following code snippet will raise a warning:
import numpy as np
from sklearn.preprocessing import PowerTransformer
rng = np.random.default_rng(0)
X = rng.exponential(size=(50,))
X -= X.max()
X = -X
X += 100
X = X.reshape(-1, 1)
method = "yeo-johnson"
transformer = PowerTransformer(method=method, standardize=False)
transformer.fit_transform(X)
Expected Results
No Warnings
Actual Results
/Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/numpy/core/_methods.py:233: RuntimeWarning: overflow encountered in multiply
x = um.multiply(x, x, out=x)
Versions
System:
python: 3.8.12 | packaged by conda-forge | (default, Sep 16 2021, 01:38:21) [Clang 11.1.0 ]
executable: /Users/glemaitre/mambaforge/envs/dev/bin/python
machine: macOS-12.3.1-arm64-arm-64bit
Python dependencies:
sklearn: 1.2.dev0
pip: 21.3
setuptools: 58.2.0
numpy: 1.21.6
scipy: 1.8.0
Cython: 0.29.24
pandas: 1.4.2
matplotlib: 3.4.3
joblib: 1.0.1
threadpoolctl: 2.2.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /Users/glemaitre/mambaforge/envs/dev/lib/libopenblas_vortexp-r0.3.18.dylib
version: 0.3.18
threading_layer: openmp
architecture: VORTEX
num_threads: 8
user_api: openmp
internal_api: openmp
prefix: libomp
filepath: /Users/glemaitre/mambaforge/envs/dev/lib/libomp.dylib
version: None
num_threads: 8
Issue Analytics
- State:
- Created a year ago
- Comments:21 (11 by maintainers)
Top Results From Across the Web
Input contains infinity" when data has no large/inf/nan values ...
sklearn's yeo-johnson PowerTransformer throws "ValueError: Input contains infinity" when data has no large/inf/nan values · Ask Question. Asked ...
Read more >sklearn.preprocessing.PowerTransformer
This is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired. Currently, ...
Read more >Source code for sklearn.preprocessing._data - Diffprivlib
For a comparison of the different scalers, transformers, and normalizers, ... e.g. if the data is not a NumPy array or scipy.sparse CSR...
Read more >63_preprocessing
from sklearn import preprocessing import numpy as np X_train = np.array([[ 1. ... This provides robustness to very small standard deviations of features...
Read more >feature_engine Documentation - Read the Docs
This is an example of how to use Feature-engine's transformers to perform missing data imputation. import numpy as np import pandas as pd....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Can someone craft a minimal reproducer with a single column dataset, possibly syntactically generated?
This maybe explains why the warnings can not be turned off, with
filter.warnings
when the setting only applies to the main thread/process, but the problem is caused by some processes/threads which are created in parallel.