question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Regressing out, potential boundary problem

See original GitHub issue
>>> sc.pp.regress_out(adata, ['n_counts', 'percent_mito', 'S_score', 'G2M_score'], n_jobs = 1)
regressing out ['n_counts', 'percent_mito', 'S_score', 'G2M_score']
    sparse input is densified and may lead to high memory use
... storing 'phase' as categorical
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scanpy/preprocessing/simple.py", line 783, in regress_out
    res = list(map(_regress_out_chunk, tasks))
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/scanpy/preprocessing/simple.py", line 809, in _regress_out_chunk
    result = sm.GLM(data_chunk[:, col_index], regres, family=sm.families.Gaussian()).fit()
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1012, in fit
    cov_kwds=cov_kwds, use_t=use_t, **kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/statsmodels/genmod/generalized_linear_model.py", line 1109, in _fit_irls
    raise ValueError("The first guess on the deviance function "
ValueError: The first guess on the deviance function returned a nan.  This could be a boundary  problem and should be reported.

Hello, after going through the data once and evaluating which clusters are potential unwanted cell types, I grabbed the barcodes (cell sample names or observation indices) and removed them from the dataset after freshly reloading the dataset so that I can do the preprocessing steps without those cells. When I go to regress out, this occurs, but it didn’t before I removed those cell types.

Here’s an example of how I removed those cell types:

keep_cells = [i for i in adata.obs.index if i not in e13_blood2.obs.index]
adata = adata[keep_cells, :]
adata

Any help appreciated.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:9 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
fidelramcommented, Aug 10, 2018

You can do:

adata = adata[:,adata.X.sum(axis=0) > 0]

To remove the problematic genes.

Probably after removing some cells, some genes no longer had any value in the matrix.

Let me know if this helps with your problem.

On Fri, Aug 10, 2018 at 2:33 AM jayypaul notifications@github.com wrote:

Hello Fidelram,

Here’s the output. Looks like a I have column(s) with zeros. Any suggestions for a remedy and/or possible explanation for why this occurs after removing certain indices from the anndata structure? I’m almost done processing each of my datasets for each timepoint and it hasn’t been a problem except for one.

print(np.any(adata.X.sum(axis=0) == 0)) True print(np.any(adata.X.sum(axis=1) == 0)) False

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/theislab/scanpy/issues/230#issuecomment-411939560, or mute the thread https://github.com/notifications/unsubscribe-auth/AEu_1fnAqNW3l-t4w865sLwW6-_2zPU4ks5uPNTDgaJpZM4V0Faw .

1reaction
fidelramcommented, Aug 9, 2018

can you check that adata.X does not contain columns or rows with only zeros?

print(np.any(adata.X.sum(axis=0) == 0))
print(np.any(adata.X.sum(axis=1) == 0))
Read more comments on GitHub >

github_iconTop Results From Across the Web

Regressing out, potential boundary problem #230 - GitHub
Hello, after going through the data once and evaluating which clusters are potential unwanted cell types, I grabbed the barcodes (cell sample ...
Read more >
Study Note: Linear Regression Part II - Potential Problems
Potential Problems. Non-linearity of the Data. Assumption: The linear regression model assumes that there is a straight-line relationship ...
Read more >
Multicollinearity in Regression Analysis: Problems, Detection ...
Multicollinearity is when independent variables in a regression model are correlated. I explore its problems, testing your model for it, and solutions.
Read more >
Week 5: Simple Linear Regression
Potential Violations: Time series data (regressor values may exhibit persistence). Sample selection problems (sample not representative of the population).
Read more >
9.4 - Studentized Residuals | STAT 462
To address this issue, studentized residuals offer an alternative criterion for identifying outliers. The basic idea is to delete the observations one at...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found