question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

problem reproducing tutorial

See original GitHub issue

Hi,

I’m trying to reproduce your tutorial, I’m using python 3.8 and spaGCN version 1.2.0.

But I’m having some issues when reaching point 5.3 Run SpaGCN: By running this code:

clf=spg.SpaGCN()
clf.set_l(l)
#Set seed
random.seed(r_seed)
torch.manual_seed(t_seed)
np.random.seed(n_seed)
#Run
clf.train(adata,adj,init_spa=True,init="louvain",res=res, tol=5e-3, lr=0.05, max_epochs=200)
y_pred, prob=clf.predict()
adata.obs["pred"]= y_pred
adata.obs["pred"]=adata.obs["pred"].astype('category')
#Do cluster refinement(optional)
#shape="hexagon" for Visium data, "square" for ST data.
adj_2d=spg.calculate_adj_matrix(x=x_array,y=y_array, histology=False)
refined_pred=spg.refine(sample_id=adata.obs.index.tolist(), pred=adata.obs["pred"].tolist(), dis=adj_2d, shape="hexagon")
adata.obs["refined_pred"]=refined_pred
adata.obs["refined_pred"]=adata.obs["refined_pred"].astype('category')
#Save results
adata.write_h5ad("151673/results.h5ad")

I get this output:

Initializing cluster centers with louvain, resolution =  0.7
Epoch  0
Epoch  10
Epoch  20
Epoch  30
Epoch  40
delta_label  0.004396812311074471 < tol  0.005
Reach tolerance threshold. Stopping training.
Total epoch: 46
Calculateing adj matrix using xy only...

And when trying to plot the spatial domains:

adata=sc.read("151673/results.h5ad")
#Set colors used
plot_color=["#F56867","#FEB915","#C798EE","#59BE86","#7495D3","#D1D1D1","#6D1A9C","#15821E","#3A84E6","#997273","#787878","#DB4C6C","#9E7A7A","#554236","#AF5F3C","#93796C","#F9BD3F","#DAB370","#877F6C","#268785"]
#Plot spatial domains
domains="pred"
num_celltype=len(adata.obs[domains].unique())
adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
ax.set_aspect('equal', 'box')
ax.axes.invert_yaxis()
plt.savefig("151673/pred.png", dpi=600)
plt.close()

#Plot refined spatial domains
domains="refined_pred"
num_celltype=len(adata.obs[domains].unique())
adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
ax.set_aspect('equal', 'box')
ax.axes.invert_yaxis()
plt.savefig("151673/refined_pred.png", dpi=600)
plt.close()

I get:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/var/folders/3b/2z91xnmx6vq99jc862r_5xn40000gn/T/ipykernel_12057/419635388.py in <module>
     6 num_celltype=len(adata.obs[domains].unique())
     7 adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
----> 8 ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
     9 ax.set_aspect('equal', 'box')
    10 ax.axes.invert_yaxis()

~/Downloads/spagcn3.8/lib/python3.8/site-packages/scanpy/plotting/_anndata.py in scatter(adata, x, y, color, use_raw, layers, sort_order, alpha, basis, groups, components, projection, legend_loc, legend_fontsize, legend_fontweight, legend_fontoutline, color_map, palette, frameon, right_margin, left_margin, size, title, show, save, ax)
   145         adata.uns = adata_T.uns
   146         return axs
--> 147     raise ValueError(
   148         '`x`, `y`, and potential `color` inputs must all '
   149         'come from either `.obs` or `.var`'

ValueError: `x`, `y`, and potential `color` inputs must all come from either `.obs` or `.var`

And last and I think more important with SVGs:

#Read in raw data
raw=sc.read("151673/sample_data.h5ad")
raw.var_names_make_unique()
raw.obs["pred"]=adata.obs["pred"].astype('category')
raw.obs["x_array"]=raw.obs["x2"]
raw.obs["y_array"]=raw.obs["x3"]
raw.obs["x_pixel"]=raw.obs["x4"]
raw.obs["y_pixel"]=raw.obs["x5"]
#Convert sparse matrix to non-sparse
raw.X=(raw.X.A if issparse(raw.X) else raw.X)
raw.raw=raw
sc.pp.log1p(raw)
#Use domain 0 as an example
target=0
#Set filtering criterials
min_in_group_fraction=0.8
min_in_out_group_ratio=1
min_fold_change=1.5
#Search radius such that each spot in the target domain has approximately 10 neighbors on average
adj_2d=spg.calculate_adj_matrix(x=x_array, y=y_array, histology=False)
start, end= np.quantile(adj_2d[adj_2d!=0],q=0.001), np.quantile(adj_2d[adj_2d!=0],q=0.1)
r=spg.search_radius(target_cluster=target, cell_id=adata.obs.index.tolist(), x=x_array, y=y_array, pred=adata.obs["pred"].tolist(), start=start, end=end, num_min=10, num_max=14,  max_run=100)
#Detect neighboring domains
nbr_domians=spg.find_neighbor_clusters(target_cluster=target,
                                   cell_id=raw.obs.index.tolist(), 
                                   x=raw.obs["x_array"].tolist(), 
                                   y=raw.obs["y_array"].tolist(), 
                                   pred=raw.obs["pred"].tolist(),
                                   radius=r,
                                   ratio=1/2)

nbr_domians=nbr_domians[0:3]
de_genes_info=spg.rank_genes_groups(input_adata=raw,
                                target_cluster=target,
                                nbr_list=nbr_domians, 
                                label_col="pred", 
                                adj_nbr=True, 
                                log=True)
#Filter genes
de_genes_info=de_genes_info[(de_genes_info["pvals_adj"]<0.05)]
filtered_info=de_genes_info
filtered_info=filtered_info[(filtered_info["pvals_adj"]<0.05) &
                            (filtered_info["in_out_group_ratio"]>min_in_out_group_ratio) &
                            (filtered_info["in_group_fraction"]>min_in_group_fraction) &
                            (filtered_info["fold_change"]>min_fold_change)]
filtered_info=filtered_info.sort_values(by="in_group_fraction", ascending=False)
filtered_info["target_dmain"]=target
filtered_info["neighbors"]=str(nbr_domians)
print("SVGs for domain ", str(target),":", filtered_info["genes"].tolist())

I get NO SVGs for domain 0:

Calculateing adj matrix using xy only...
Calculateing adj matrix using xy only...
Calculateing adj matrix using xy only...
Run 1: radius [1.4142135381698608, 16.970561981201172], num_nbr [1.0, 315.1679389312977]
Calculateing adj matrix using xy only...
Run 2: radius [1.4142135381698608, 9.192387759685516], num_nbr [1.0, 107.9587786259542]
Calculateing adj matrix using xy only...
Run 3: radius [1.4142135381698608, 5.303300648927689], num_nbr [1.0, 37.621374045801524]
Calculateing adj matrix using xy only...
Run 4: radius [1.4142135381698608, 3.3587570935487747], num_nbr [1.0, 18.21526717557252]
Calculateing adj matrix using xy only...
Run 5: radius [2.386485315859318, 3.3587570935487747], num_nbr [8.125190839694657, 18.21526717557252]
Calculateing adj matrix using xy only...
recommended radius =  2.8726212047040462 num_nbr=11.500763358778626
radius= 2.8726212047040462 average number of neighbors for each spot is 11.500763358778626
 Cluster 0 has neighbors:
Dmain  5 :  1041
Dmain  2 :  598
Dmain  3 :  392
Dmain  1 :  390
WARNING: It seems you use rank_genes_groups on the raw count data. Please logarithmize your data before calling rank_genes_groups.
SVGs for domain  0 : []

How do you suggest to account for these errors? thanks!

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
yoda-vidcommented, Nov 20, 2021

Thanks for this excellent project. Really appreciate it and the tutorial to get oriented.

I’ve encountered this error as well and found a workaround:

ValueError: `x`, `y`, and potential `color` inputs must all come from either `.obs` or `.var`

I think it relates to the commented out code in Section 3 of the tutorial. When I uncommented this section to regenerate the ../tutorial/data/151673/sample_data.h5ad file rather than using the one currently included with the toy dataset, the plot worked fine.

For the SVGs, I also got zero SVGs as reported above, at least for the given target domain:

Calculateing adj matrix using xy only...
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Calculateing adj matrix using xy only...
Calculateing adj matrix using xy only...
Calculateing adj matrix using xy only...
Run 1: radius [1.4142135381698608, 16.970561981201172], num_nbr [1.0, 407.4214559386973]
Calculateing adj matrix using xy only...
Run 2: radius [1.4142135381698608, 9.192387759685516], num_nbr [1.0, 130.99233716475095]
Calculateing adj matrix using xy only...
Run 3: radius [1.4142135381698608, 5.303300648927689], num_nbr [1.0, 43.46743295019157]
Calculateing adj matrix using xy only...
Run 4: radius [1.4142135381698608, 3.3587570935487747], num_nbr [1.0, 20.39080459770115]
Calculateing adj matrix using xy only...
Run 5: radius [2.386485315859318, 3.3587570935487747], num_nbr [8.78544061302682, 20.39080459770115]
Calculateing adj matrix using xy only...
recommended radius =  2.8726212047040462 num_nbr=12.651340996168582
radius= 2.8726212047040462 average number of neighbors for each spot is 12.651340996168582
 Cluster 0 has neighbors:
Dmain  4 :  983
Dmain  5 :  414
Dmain  1 :  358
SVGs for domain  0 : []

As a workaround, I could get it to find a gene by reducing the min_fold_change variable to 1. It also found several SVGs when I looked at a different target (eg target = 2) without changing min_fold_change (leaving it as 1.5).

I tested this using the “Environment 1” package versions from the System Requirements section, except that I needed AnnData 0.7.5 to avoid an error. I also tried a new environment with the current versions of each package and got slightly different numbers but otherwise unchanged SVG output.

I imagine it could be related to #3 but am just very excited this is working overall. Thanks again for this package!

0reactions
jianhuupenncommented, Oct 18, 2021
  1. I have launched this tutorial for a while and you are the only person running into this error.
ValueError: `x`, `y`, and potential `color` inputs must all come from either `.obs` or `.var`
  1. I have replied to you in my first response, why you are having this error and clearly stated that the solution is to check if and why the three variables are missing, but you NEVER did that. After you try that, we can have further discussions.

  2. On the SpaGCN GitHub main page, the “System Requirements” section lists all the tested environments.

  3. SpaGCN has an early-stopping criteria, which is described in the method section of the paper. The loss may coverages after a different number of epochs.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Ufl-interpolation - problem reproducing tutorial - dolfinx
Hi everyone, I'm new to FEniCSx, so I'm trying to reproduce some of the examples in the tutorial. I'm now with the membrane...
Read more >
Debugging Hard to Reproduce Issues - CODE Magazine
In order to really understand the challenges with debugging hard to reproduce issues, let's walk through a few scenarios here.
Read more >
How to Troubleshoot a Computer - YouTube
An error occurred while retrieving sharing information. Please try again later. 0:00. 2:32. 0:00 / 2:32•
Read more >
Record steps to reproduce a problem - Microsoft Support
Learn how to record your steps to help troubleshoot a problem in Windows 10. ... Go through the steps to reproduce the problem...
Read more >
Reproducing Kernel Hilbert Spaces for Penalized Regression
Reproducing Kernel Hilbert Spaces for Penalized Regression: A Tutorial, The American Statistician, 66:1, 50-60.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found