Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MemoryError issue

See original GitHub issue

The memory of my machine has 120 GB, and there are 40 GB left for me to conduct MCA computation.

The DataFrame has a shape of (1244210, 37), and I have processed the DataFrame with get_dummy() function in Pandas.

And I want to get 10 components, however, I got MemoryError here

>>> mca_result = prince.MCA(X_MCA, n_components=10)
MemoryError                               Traceback (most recent call last)
<ipython-input-20-ee2308cc121f> in <module>()
----> 1 mca_result = prince.MCA(X_MCA, n_components=10)

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/mca.py in __init__(self, dataframe, n_components, use_benzecri_rates, plotter)
     43             dataframe=pd.get_dummies(dataframe),
     44             n_components=n_components,
---> 45             plotter=plotter
     46         )
     47 

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in __init__(self, dataframe, n_components, plotter)
     26         self._set_plotter(plotter_name=plotter)
     27 
---> 28         self._compute_svd()
     29 
     30     def _compute_svd(self):

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in _compute_svd(self)
     29 
     30     def _compute_svd(self):
---> 31         self.svd = SVD(X=self.standardized_residuals, k=self.n_components)
     32 
     33     def _set_plotter(self, plotter_name):

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in standardized_residuals(self)
    123         """
    124         residuals = (self.P - self.expected_frequencies).values
--> 125         return self.row_masses.dot(residuals).dot(self.column_masses)
    126 
    127     @property

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in row_masses(self)
     99             represents the weight of the matching row; the non-diagonal cells are equal to 0.
    100         """
--> 101         return np.diag(1 / np.sqrt(self.row_sums))
    102 
    103     @property

/home/libertatis/anaconda3/lib/python3.6/site-packages/numpy/lib/twodim_base.py in diag(v, k)
    247     if len(s) == 1:
    248         n = s[0]+abs(k)
--> 249         res = zeros((n, n), v.dtype)
    250         if k >= 0:
    251             i = k

MemoryError:

And there are 40GB memories left for me and I can apply PCA to the DataFrame. How can I solve it?

I found a similar issue on this problem: https://github.com/esafak/mca/issues/15

Issue Analytics

State:
Created 6 years ago
Comments:14 (6 by maintainers)

Top GitHub Comments

1reaction

abdoulsncommented, Apr 14, 2020

Something like this

> 	reseau	cdapet
> 0	XX	7010Z
> 1	YY	2030Z
> 2	YY	4674B
> 3	XZ	6820B
> 4	YY_XX	6820A
> ...	...	...
> 680553	XX	6832A
> 680554	YY	4120A
> 680555	XX_WX	7820Z
> 680556	YZ	4941A
> 680557	WX	4669A

0reactions

thomlennoncommented, Apr 1, 2021

df.describe

    Cust_no                    Risk_Rating     Date               _Nb_day

0 ARAR64757686100 High 1989-07-14 9.0 1 SHDH64757636547 Low 1978-06-28 23.0 2 AYZY33546757585 Medium 1999-09-15 44.0 3 QISS46575859494 Medium 2000-02-18 61.0 4 SODJ24253673838 high 2001-07-22 50.0 … … … … … 62644 DGDT28387374645 Medium 2002-10-03 61.0 62645 ARZU36464748484 High 1993-03-06 232.0 62646 ZRRF16263636353 High 1950-02-13 356.0 62647 ERER14253536373 High 1992-05-30 224.0 62648 ETRF53536353536 Medium 2002-10-14 984.0

[62649 rows x 4 columns]>

mca = prince.MCA( n_components=3,n_iter=3, copy=False, engine=‘sklearn’ )

MemoryError Traceback (most recent call last) <ipython-input-6-839f04045ccc> in <module> ----> 1 mca.fit(df2)

~/.local/lib/python3.6/site-packages/prince/mca.py in fit(self, X, y) 22 23 # One-hot encode the data —> 24 one_hot = pd.get_dummies(X) 25 26 # Apply CA to the indicator matrix

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype) 897 ) 898 with_dummies.append(dummy) –> 899 result = concat(with_dummies, axis=1) 900 else: 901 result = _get_dummies_1d(

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 285 ) 286 –> 287 return op.get_result() 288 289

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self) 501 502 new_data = concatenate_block_managers( –> 503 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy, 504 ) 505 if not self.copy:

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/internals/concat.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy) 58 values = b.values 59 if copy: —> 60 values = values.copy() 61 else: 62 values = values.view()

MemoryError: Unable to allocate 3.40 GiB for an array with shape (58264, 62649) and data type uint8