Incremental PCA - ValueError: array must not contain infs or NaNs
See original GitHub issueI’m trying to use IncrementalPCA from sklearn.decomposition. My code couldn’t really be simpler:
from sklearn.decomposition import IncrementalPCA
import pandas as pd
with open('C:/My/File/Path/file.csv', 'r') as fp:
data = pd.read_csv(fp)
ipca = IncrementalPCA(n_components=4)
ipca.fit(data)
but this is how it finishes when launched:
C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py:293: RuntimeWarning: overflow encountered in long_scalars
np.sqrt((self.n_samples_seen_ * n_samples) /
C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py:293: RuntimeWarning: invalid value encountered in sqrt
np.sqrt((self.n_samples_seen_ * n_samples) /
Traceback (most recent call last):
File "C:/Users/myuser/AppData/Roaming/JetBrains/PyCharmCE2020.1/scratches/scratch_9.py", line 6, in <module>
ipca.fit(data)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py", line 215, in fit
self.partial_fit(X_batch, check_input=False)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\sklearn\decomposition\_incremental_pca.py", line 298, in partial_fit
U, S, V = linalg.svd(X, full_matrices=False)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\scipy\linalg\decomp_svd.py", line 106, in svd
a1 = _asarray_validated(a, check_finite=check_finite)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\scipy\_lib\_util.py", line 263, in _asarray_validated
a = toarray(a)
File "C:\Users\myuser\PycharmProjects\mushu\venv\lib\site-packages\numpy\lib\function_base.py", line 498, in asarray_chkfinite
raise ValueError(
ValueError: array must not contain infs or NaNs
Process finished with exit code 1
I already checked:
- There is no NaN, infinite or negative anywhere in my data
- I had scikit-learn v0.22.2.post1, I updated to 0.23.1, no difference
- If I use PCA instead of IncrementalPCA leaving everything else the same, everything works fine, no warnings, no errors, all good
- Tried using both
data = pd.read_csv(fp, dtype = 'Int64')
anddata = pd.read_csv(fp, dtype = np.float64)
with no difference in results. - There were similar issues in previous versions, but they refer to versions around 0.16/0.17, most were with more complex code and afaik all were fixed around those versions
My data, exactly as I feed them to the above code. This are really just 243 columns x 2.000.000 rows of 0s and 1s. https://drive.google.com/file/d/1JBIliADt9TViTk8qjnmIS3RFEO934dY6/view?usp=sharing
Update Seems like the issue is related with the dataset size. If I try fitting to a smaller portion everything works fine. This is until I get around 1800000 rows. That’s where the error starts showing.
Issue Analytics
- State:
- Created 3 years ago
- Comments:17 (7 by maintainers)
Top Results From Across the Web
PCA in Sklearn - ValueError: array must not contain infs or NaNs
I am trying to use grid search to choose the number of principal components of the data before fitting into a linear regression....
Read more >PCA scikit-learn - ValueError: array must not contain infs or ...
My guess is the issue is a combination of whiten=True and the largest value equal to the largest possible value of a float64...
Read more >cggh/pygenomics - Gitter
Expect further incremental updates as we migrate remainder of packages to conda recipes. ... ValueError: array must not contain infs or NaNs. Alistair...
Read more >Getting 'ValueError: array must not contain infs or NaNs' even ...
I'm getting this ValueError: array must not contain infs or NaNs even after I used np.nan_to_num(). As you can see in the code...
Read more >sklearn.decomposition.IncrementalPCA
IncrementalPCA: Incremental PCA Incremental PCA. ... This algorithm has constant memory complexity, on the order of batch_size * n_features , enabling use ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Glad the change worked. I’m still working on writing some tests before submitting another push.
Now that I have uninstalled and reinstalled everything including the change of the push request. The error is really not present any more. Thanks for the feedback