question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Yeo-Johnson Power Transformer gives Numpy warning

See original GitHub issue

Describe the bug

When I use a power transformer with yeo-johnson method I get this warning in numpy:

../lib/python3.10/site-packages/numpy/core/_methods.py:235: RuntimeWarning: overflow encountered in multiply

Strangely I can’t surpress this warning with filter.warnings()

Tl;dr: The eroor seems to appear when multiple instances of the transformer are called, in this example because n_jobs=2 is set in the preprocessor. But the bug also appears with n_jobs=1 when you call this preprocessor in a grid search for example.

Steps/Code to Reproduce

The following code snippet will raise a warning:

import numpy as np
from sklearn.preprocessing import PowerTransformer

rng = np.random.default_rng(0)
X = rng.exponential(size=(50,))
X -= X.max()
X = -X
X += 100
X = X.reshape(-1, 1)

method = "yeo-johnson"
transformer = PowerTransformer(method=method, standardize=False)
transformer.fit_transform(X)

Expected Results

No Warnings

Actual Results

/Users/glemaitre/mambaforge/envs/dev/lib/python3.8/site-packages/numpy/core/_methods.py:233: RuntimeWarning: overflow encountered in multiply
  x = um.multiply(x, x, out=x)

Versions

System:
    python: 3.8.12 | packaged by conda-forge | (default, Sep 16 2021, 01:38:21)  [Clang 11.1.0 ]
executable: /Users/glemaitre/mambaforge/envs/dev/bin/python
   machine: macOS-12.3.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.2.dev0
          pip: 21.3
   setuptools: 58.2.0
        numpy: 1.21.6
        scipy: 1.8.0
       Cython: 0.29.24
       pandas: 1.4.2
   matplotlib: 3.4.3
       joblib: 1.0.1
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /Users/glemaitre/mambaforge/envs/dev/lib/libopenblas_vortexp-r0.3.18.dylib
        version: 0.3.18
threading_layer: openmp
   architecture: VORTEX
    num_threads: 8

       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: /Users/glemaitre/mambaforge/envs/dev/lib/libomp.dylib
        version: None
    num_threads: 8

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:21 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
ogriselcommented, May 10, 2022

Can someone craft a minimal reproducer with a single column dataset, possibly syntactically generated?

1reaction
nilslacroixcommented, May 10, 2022

This maybe explains why the warnings can not be turned off, with filter.warnings when the setting only applies to the main thread/process, but the problem is caused by some processes/threads which are created in parallel.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Input contains infinity" when data has no large/inf/nan values ...
sklearn's yeo-johnson PowerTransformer throws "ValueError: Input contains infinity" when data has no large/inf/nan values · Ask Question. Asked ...
Read more >
sklearn.preprocessing.PowerTransformer
This is useful for modeling issues related to heteroscedasticity (non-constant variance), or other situations where normality is desired. Currently, ...
Read more >
Source code for sklearn.preprocessing._data - Diffprivlib
For a comparison of the different scalers, transformers, and normalizers, ... e.g. if the data is not a NumPy array or scipy.sparse CSR...
Read more >
63_preprocessing
from sklearn import preprocessing import numpy as np X_train = np.array([[ 1. ... This provides robustness to very small standard deviations of features...
Read more >
feature_engine Documentation - Read the Docs
This is an example of how to use Feature-engine's transformers to perform missing data imputation. import numpy as np import pandas as pd....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found