Regressing out, potential boundary problem
See original GitHub issue>>> sc.pp.regress_out(adata, ['n_counts', 'percent_mito', 'S_score', 'G2M_score'], n_jobs = 1)
regressing out ['n_counts', 'percent_mito', 'S_score', 'G2M_score']
sparse input is densified and may lead to high memory use
... storing 'phase' as categorical
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scanpy/preprocessing/simple.py", line 783, in regress_out
res = list(map(_regress_out_chunk, tasks))
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scanpy/preprocessing/simple.py", line 809, in _regress_out_chunk
result = sm.GLM(data_chunk[:, col_index], regres, family=sm.families.Gaussian()).fit()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1012, in fit
cov_kwds=cov_kwds, use_t=use_t, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1109, in _fit_irls
raise ValueError("The first guess on the deviance function "
ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported.
Hello, after going through the data once and evaluating which clusters are potential unwanted cell types, I grabbed the barcodes (cell sample names or observation indices) and removed them from the dataset after freshly reloading the dataset so that I can do the preprocessing steps without those cells. When I go to regress out, this occurs, but it didn’t before I removed those cell types.
Here’s an example of how I removed those cell types:
keep_cells = [i for i in adata.obs.index if i not in e13_blood2.obs.index]
adata = adata[keep_cells, :]
adata
Any help appreciated.
Issue Analytics
- State:
- Created 5 years ago
- Comments:9 (1 by maintainers)
Top Results From Across the Web
Regressing out, potential boundary problem #230 - GitHub
Hello, after going through the data once and evaluating which clusters are potential unwanted cell types, I grabbed the barcodes (cell sample ...
Read more >Study Note: Linear Regression Part II - Potential Problems
Potential Problems. Non-linearity of the Data. Assumption: The linear regression model assumes that there is a straight-line relationship ...
Read more >Multicollinearity in Regression Analysis: Problems, Detection ...
Multicollinearity is when independent variables in a regression model are correlated. I explore its problems, testing your model for it, and solutions.
Read more >Week 5: Simple Linear Regression
Potential Violations: Time series data (regressor values may exhibit persistence). Sample selection problems (sample not representative of the population).
Read more >9.4 - Studentized Residuals | STAT 462
To address this issue, studentized residuals offer an alternative criterion for identifying outliers. The basic idea is to delete the observations one at...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
You can do:
To remove the problematic genes.
Probably after removing some cells, some genes no longer had any value in the matrix.
Let me know if this helps with your problem.
On Fri, Aug 10, 2018 at 2:33 AM jayypaul notifications@github.com wrote:
can you check that adata.X does not contain columns or rows with only zeros?