Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regressing out, potential boundary problem

See original GitHub issue

>>> sc.pp.regress_out(adata, ['n_counts', 'percent_mito', 'S_score', 'G2M_score'], n_jobs = 1)
regressing out ['n_counts', 'percent_mito', 'S_score', 'G2M_score']
    sparse input is densified and may lead to high memory use
... storing 'phase' as categorical

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scanpy/preprocessing/simple.py", line 783, in regress_out
    res = list(map(_regress_out_chunk, tasks))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scanpy/preprocessing/simple.py", line 809, in _regress_out_chunk
    result = sm.GLM(data_chunk[:, col_index], regres, family=sm.families.Gaussian()).fit()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1012, in fit
    cov_kwds=cov_kwds, use_t=use_t, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1109, in _fit_irls
    raise ValueError("The first guess on the deviance function "
ValueError: The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.

Hello, after going through the data once and evaluating which clusters are potential unwanted cell types, I grabbed the barcodes (cell sample names or observation indices) and removed them from the dataset after freshly reloading the dataset so that I can do the preprocessing steps without those cells. When I go to regress out, this occurs, but it didn’t before I removed those cell types.

Here’s an example of how I removed those cell types:

keep_cells = [i for i in adata.obs.index if i not in e13_blood2.obs.index]
adata = adata[keep_cells, :]
adata

Any help appreciated.

Issue Analytics

State:
Created 5 years ago
Comments:9 (1 by maintainers)

Top GitHub Comments

2reactions

fidelramcommented, Aug 10, 2018

You can do:

adata = adata[:,adata.X.sum(axis=0) > 0]

To remove the problematic genes.

Probably after removing some cells, some genes no longer had any value in the matrix.

Let me know if this helps with your problem.

On Fri, Aug 10, 2018 at 2:33 AM jayypaul notifications@github.com wrote:

Hello Fidelram,

Here’s the output. Looks like a I have column(s) with zeros. Any suggestions for a remedy and/or possible explanation for why this occurs after removing certain indices from the anndata structure? I’m almost done processing each of my datasets for each timepoint and it hasn’t been a problem except for one.

print(np.any(adata.X.sum(axis=0) == 0)) True print(np.any(adata.X.sum(axis=1) == 0)) False

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/230#issuecomment-411939560, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1fnAqNW3l-t4w865sLwW6-_2zPU4ks5uPNTDgaJpZM4V0Faw .

1reaction

fidelramcommented, Aug 9, 2018

can you check that adata.X does not contain columns or rows with only zeros?

print(np.any(adata.X.sum(axis=0) == 0))
print(np.any(adata.X.sum(axis=1) == 0))

Top Results From Across the Web

Regressing out, potential boundary problem #230 - GitHub

Hello, after going through the data once and evaluating which clusters are potential unwanted cell types, I grabbed the barcodes (cell sample ...

Study Note: Linear Regression Part II - Potential Problems

Potential Problems. Non-linearity of the Data. Assumption: The linear regression model assumes that there is a straight-line relationship ...

Multicollinearity in Regression Analysis: Problems, Detection ...

Multicollinearity is when independent variables in a regression model are correlated. I explore its problems, testing your model for it, and solutions.

Week 5: Simple Linear Regression

Potential Violations: Time series data (regressor values may exhibit persistence). Sample selection problems (sample not representative of the population).

9.4 - Studentized Residuals | STAT 462

To address this issue, studentized residuals offer an alternative criterion for identifying outliers. The basic idea is to delete the observations one at...