question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

MemoryError issue

See original GitHub issue

The memory of my machine has 120 GB, and there are 40 GB left for me to conduct MCA computation.

The DataFrame has a shape of (1244210, 37), and I have processed the DataFrame with get_dummy() function in Pandas.

And I want to get 10 components, however, I got MemoryError here

>>> mca_result = prince.MCA(X_MCA, n_components=10)
MemoryError                               Traceback (most recent call last)
<ipython-input-20-ee2308cc121f> in <module>()
----> 1 mca_result = prince.MCA(X_MCA, n_components=10)

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/mca.py in __init__(self, dataframe, n_components, use_benzecri_rates, plotter)
     43             dataframe=pd.get_dummies(dataframe),
     44             n_components=n_components,
---> 45             plotter=plotter
     46         )
     47 

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in __init__(self, dataframe, n_components, plotter)
     26         self._set_plotter(plotter_name=plotter)
     27 
---> 28         self._compute_svd()
     29 
     30     def _compute_svd(self):

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in _compute_svd(self)
     29 
     30     def _compute_svd(self):
---> 31         self.svd = SVD(X=self.standardized_residuals, k=self.n_components)
     32 
     33     def _set_plotter(self, plotter_name):

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in standardized_residuals(self)
    123         """
    124         residuals = (self.P - self.expected_frequencies).values
--> 125         return self.row_masses.dot(residuals).dot(self.column_masses)
    126 
    127     @property

/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in row_masses(self)
     99             represents the weight of the matching row; the non-diagonal cells are equal to 0.
    100         """
--> 101         return np.diag(1 / np.sqrt(self.row_sums))
    102 
    103     @property

/home/libertatis/anaconda3/lib/python3.6/site-packages/numpy/lib/twodim_base.py in diag(v, k)
    247     if len(s) == 1:
    248         n = s[0]+abs(k)
--> 249         res = zeros((n, n), v.dtype)
    250         if k >= 0:
    251             i = k

MemoryError: 

And there are 40GB memories left for me and I can apply PCA to the DataFrame. How can I solve it?

I found a similar issue on this problem: https://github.com/esafak/mca/issues/15

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:14 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
abdoulsncommented, Apr 14, 2020

Something like this

> 	reseau	cdapet
> 0	XX	7010Z
> 1	YY	2030Z
> 2	YY	4674B
> 3	XZ	6820B
> 4	YY_XX	6820A
> ...	...	...
> 680553	XX	6832A
> 680554	YY	4120A
> 680555	XX_WX	7820Z
> 680556	YZ	4941A
> 680557	WX	4669A
0reactions
thomlennoncommented, Apr 1, 2021

df.describe

    Cust_no                    Risk_Rating     Date               _Nb_day

0 ARAR64757686100 High 1989-07-14 9.0 1 SHDH64757636547 Low 1978-06-28 23.0 2 AYZY33546757585 Medium 1999-09-15 44.0 3 QISS46575859494 Medium 2000-02-18 61.0 4 SODJ24253673838 high 2001-07-22 50.0 … … … … … 62644 DGDT28387374645 Medium 2002-10-03 61.0 62645 ARZU36464748484 High 1993-03-06 232.0 62646 ZRRF16263636353 High 1950-02-13 356.0 62647 ERER14253536373 High 1992-05-30 224.0 62648 ETRF53536353536 Medium 2002-10-14 984.0

[62649 rows x 4 columns]>

mca = prince.MCA( n_components=3,n_iter=3, copy=False, engine=‘sklearn’ )


MemoryError Traceback (most recent call last) <ipython-input-6-839f04045ccc> in <module> ----> 1 mca.fit(df2)

~/.local/lib/python3.6/site-packages/prince/mca.py in fit(self, X, y) 22 23 # One-hot encode the data —> 24 one_hot = pd.get_dummies(X) 25 26 # Apply CA to the indicator matrix

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype) 897 ) 898 with_dummies.append(dummy) –> 899 result = concat(with_dummies, axis=1) 900 else: 901 result = _get_dummies_1d(

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 285 ) 286 –> 287 return op.get_result() 288 289

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self) 501 502 new_data = concatenate_block_managers( –> 503 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy, 504 ) 505 if not self.copy:

/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/internals/concat.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy) 58 values = b.values 59 if copy: —> 60 values = values.copy() 61 else: 62 values = values.view()

MemoryError: Unable to allocate 3.40 GiB for an array with shape (58264, 62649) and data type uint8

Read more comments on GitHub >

github_iconTop Results From Across the Web

memory error in python - Stack Overflow
The issue is that 32-bit python only has access to ~4GB of RAM. This can shrink even further if your operating system is...
Read more >
MemoryError · Issue #538 · benfred/implicit - GitHub
I faced a problem, that BayesianPersonalizedRanking model can't allocate memory for fitting data neither using gpu nor cpu mode.
Read more >
How to Solve the Python Memory Error - HackerNoon
A memory error occurs when an operation runs out of memory. It's most likely because you're using a 32-bit Python version.
Read more >
How to solve MemoryError problem
You could try the following: 1.) Convert to greyscale images instead of RGB if your application does not need RGB.
Read more >
Pandas Dataframes Memory Error - MemoryError: unable to ...
Problem. "MemoryError: Unable to allocate …" is the last thing that you want to see during data loading into Pandas Dataframe.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found