MemoryError issue
See original GitHub issueThe memory of my machine has 120 GB, and there are 40 GB left for me to conduct MCA computation.
The DataFrame has a shape of (1244210, 37)
, and I have processed the DataFrame with get_dummy()
function in Pandas.
And I want to get 10 components, however, I got MemoryError here
>>> mca_result = prince.MCA(X_MCA, n_components=10)
MemoryError Traceback (most recent call last)
<ipython-input-20-ee2308cc121f> in <module>()
----> 1 mca_result = prince.MCA(X_MCA, n_components=10)
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/mca.py in __init__(self, dataframe, n_components, use_benzecri_rates, plotter)
43 dataframe=pd.get_dummies(dataframe),
44 n_components=n_components,
---> 45 plotter=plotter
46 )
47
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in __init__(self, dataframe, n_components, plotter)
26 self._set_plotter(plotter_name=plotter)
27
---> 28 self._compute_svd()
29
30 def _compute_svd(self):
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in _compute_svd(self)
29
30 def _compute_svd(self):
---> 31 self.svd = SVD(X=self.standardized_residuals, k=self.n_components)
32
33 def _set_plotter(self, plotter_name):
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in standardized_residuals(self)
123 """
124 residuals = (self.P - self.expected_frequencies).values
--> 125 return self.row_masses.dot(residuals).dot(self.column_masses)
126
127 @property
/home/libertatis/anaconda3/lib/python3.6/site-packages/prince/ca.py in row_masses(self)
99 represents the weight of the matching row; the non-diagonal cells are equal to 0.
100 """
--> 101 return np.diag(1 / np.sqrt(self.row_sums))
102
103 @property
/home/libertatis/anaconda3/lib/python3.6/site-packages/numpy/lib/twodim_base.py in diag(v, k)
247 if len(s) == 1:
248 n = s[0]+abs(k)
--> 249 res = zeros((n, n), v.dtype)
250 if k >= 0:
251 i = k
MemoryError:
And there are 40GB memories left for me and I can apply PCA to the DataFrame. How can I solve it?
I found a similar issue on this problem: https://github.com/esafak/mca/issues/15
Issue Analytics
- State:
- Created 6 years ago
- Comments:14 (6 by maintainers)
Top Results From Across the Web
memory error in python - Stack Overflow
The issue is that 32-bit python only has access to ~4GB of RAM. This can shrink even further if your operating system is...
Read more >MemoryError · Issue #538 · benfred/implicit - GitHub
I faced a problem, that BayesianPersonalizedRanking model can't allocate memory for fitting data neither using gpu nor cpu mode.
Read more >How to Solve the Python Memory Error - HackerNoon
A memory error occurs when an operation runs out of memory. It's most likely because you're using a 32-bit Python version.
Read more >How to solve MemoryError problem
You could try the following: 1.) Convert to greyscale images instead of RGB if your application does not need RGB.
Read more >Pandas Dataframes Memory Error - MemoryError: unable to ...
Problem. "MemoryError: Unable to allocate …" is the last thing that you want to see during data loading into Pandas Dataframe.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Something like this
df.describe
0 ARAR64757686100 High 1989-07-14 9.0 1 SHDH64757636547 Low 1978-06-28 23.0 2 AYZY33546757585 Medium 1999-09-15 44.0 3 QISS46575859494 Medium 2000-02-18 61.0 4 SODJ24253673838 high 2001-07-22 50.0 … … … … … 62644 DGDT28387374645 Medium 2002-10-03 61.0 62645 ARZU36464748484 High 1993-03-06 232.0 62646 ZRRF16263636353 High 1950-02-13 356.0 62647 ERER14253536373 High 1992-05-30 224.0 62648 ETRF53536353536 Medium 2002-10-14 984.0
[62649 rows x 4 columns]>
mca = prince.MCA( n_components=3,n_iter=3, copy=False, engine=‘sklearn’ )
MemoryError Traceback (most recent call last) <ipython-input-6-839f04045ccc> in <module> ----> 1 mca.fit(df2)
~/.local/lib/python3.6/site-packages/prince/mca.py in fit(self, X, y) 22 23 # One-hot encode the data —> 24 one_hot = pd.get_dummies(X) 25 26 # Apply CA to the indicator matrix
/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/reshape.py in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first, dtype) 897 ) 898 with_dummies.append(dummy) –> 899 result = concat(with_dummies, axis=1) 900 else: 901 result = _get_dummies_1d(
/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy) 285 ) 286 –> 287 return op.get_result() 288 289
/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py in get_result(self) 501 502 new_data = concatenate_block_managers( –> 503 mgrs_indexers, self.new_axes, concat_axis=self.bm_axis, copy=self.copy, 504 ) 505 if not self.copy:
/opt/disk1/anaconda3/lib/python3.6/site-packages/pandas/core/internals/concat.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy) 58 values = b.values 59 if copy: —> 60 values = values.copy() 61 else: 62 values = values.view()
MemoryError: Unable to allocate 3.40 GiB for an array with shape (58264, 62649) and data type uint8