problem reproducing tutorial
See original GitHub issueHi,
I’m trying to reproduce your tutorial, I’m using python 3.8 and spaGCN version 1.2.0.
But I’m having some issues when reaching point 5.3 Run SpaGCN: By running this code:
clf=spg.SpaGCN()
clf.set_l(l)
#Set seed
random.seed(r_seed)
torch.manual_seed(t_seed)
np.random.seed(n_seed)
#Run
clf.train(adata,adj,init_spa=True,init="louvain",res=res, tol=5e-3, lr=0.05, max_epochs=200)
y_pred, prob=clf.predict()
adata.obs["pred"]= y_pred
adata.obs["pred"]=adata.obs["pred"].astype('category')
#Do cluster refinement(optional)
#shape="hexagon" for Visium data, "square" for ST data.
adj_2d=spg.calculate_adj_matrix(x=x_array,y=y_array, histology=False)
refined_pred=spg.refine(sample_id=adata.obs.index.tolist(), pred=adata.obs["pred"].tolist(), dis=adj_2d, shape="hexagon")
adata.obs["refined_pred"]=refined_pred
adata.obs["refined_pred"]=adata.obs["refined_pred"].astype('category')
#Save results
adata.write_h5ad("151673/results.h5ad")
I get this output:
Initializing cluster centers with louvain, resolution = 0.7
Epoch 0
Epoch 10
Epoch 20
Epoch 30
Epoch 40
delta_label 0.004396812311074471 < tol 0.005
Reach tolerance threshold. Stopping training.
Total epoch: 46
Calculateing adj matrix using xy only...
And when trying to plot the spatial domains:
adata=sc.read("151673/results.h5ad")
#Set colors used
plot_color=["#F56867","#FEB915","#C798EE","#59BE86","#7495D3","#D1D1D1","#6D1A9C","#15821E","#3A84E6","#997273","#787878","#DB4C6C","#9E7A7A","#554236","#AF5F3C","#93796C","#F9BD3F","#DAB370","#877F6C","#268785"]
#Plot spatial domains
domains="pred"
num_celltype=len(adata.obs[domains].unique())
adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
ax.set_aspect('equal', 'box')
ax.axes.invert_yaxis()
plt.savefig("151673/pred.png", dpi=600)
plt.close()
#Plot refined spatial domains
domains="refined_pred"
num_celltype=len(adata.obs[domains].unique())
adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
ax.set_aspect('equal', 'box')
ax.axes.invert_yaxis()
plt.savefig("151673/refined_pred.png", dpi=600)
plt.close()
I get:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/3b/2z91xnmx6vq99jc862r_5xn40000gn/T/ipykernel_12057/419635388.py in <module>
6 num_celltype=len(adata.obs[domains].unique())
7 adata.uns[domains+"_colors"]=list(plot_color[:num_celltype])
----> 8 ax=sc.pl.scatter(adata,alpha=1,x="y_pixel",y="x_pixel",color=domains,title=domains,color_map=plot_color,show=False,size=100000/adata.shape[0])
9 ax.set_aspect('equal', 'box')
10 ax.axes.invert_yaxis()
~/Downloads/spagcn3.8/lib/python3.8/site-packages/scanpy/plotting/_anndata.py in scatter(adata, x, y, color, use_raw, layers, sort_order, alpha, basis, groups, components, projection, legend_loc, legend_fontsize, legend_fontweight, legend_fontoutline, color_map, palette, frameon, right_margin, left_margin, size, title, show, save, ax)
145 adata.uns = adata_T.uns
146 return axs
--> 147 raise ValueError(
148 '`x`, `y`, and potential `color` inputs must all '
149 'come from either `.obs` or `.var`'
ValueError: `x`, `y`, and potential `color` inputs must all come from either `.obs` or `.var`
And last and I think more important with SVGs:
#Read in raw data
raw=sc.read("151673/sample_data.h5ad")
raw.var_names_make_unique()
raw.obs["pred"]=adata.obs["pred"].astype('category')
raw.obs["x_array"]=raw.obs["x2"]
raw.obs["y_array"]=raw.obs["x3"]
raw.obs["x_pixel"]=raw.obs["x4"]
raw.obs["y_pixel"]=raw.obs["x5"]
#Convert sparse matrix to non-sparse
raw.X=(raw.X.A if issparse(raw.X) else raw.X)
raw.raw=raw
sc.pp.log1p(raw)
#Use domain 0 as an example
target=0
#Set filtering criterials
min_in_group_fraction=0.8
min_in_out_group_ratio=1
min_fold_change=1.5
#Search radius such that each spot in the target domain has approximately 10 neighbors on average
adj_2d=spg.calculate_adj_matrix(x=x_array, y=y_array, histology=False)
start, end= np.quantile(adj_2d[adj_2d!=0],q=0.001), np.quantile(adj_2d[adj_2d!=0],q=0.1)
r=spg.search_radius(target_cluster=target, cell_id=adata.obs.index.tolist(), x=x_array, y=y_array, pred=adata.obs["pred"].tolist(), start=start, end=end, num_min=10, num_max=14, max_run=100)
#Detect neighboring domains
nbr_domians=spg.find_neighbor_clusters(target_cluster=target,
cell_id=raw.obs.index.tolist(),
x=raw.obs["x_array"].tolist(),
y=raw.obs["y_array"].tolist(),
pred=raw.obs["pred"].tolist(),
radius=r,
ratio=1/2)
nbr_domians=nbr_domians[0:3]
de_genes_info=spg.rank_genes_groups(input_adata=raw,
target_cluster=target,
nbr_list=nbr_domians,
label_col="pred",
adj_nbr=True,
log=True)
#Filter genes
de_genes_info=de_genes_info[(de_genes_info["pvals_adj"]<0.05)]
filtered_info=de_genes_info
filtered_info=filtered_info[(filtered_info["pvals_adj"]<0.05) &
(filtered_info["in_out_group_ratio"]>min_in_out_group_ratio) &
(filtered_info["in_group_fraction"]>min_in_group_fraction) &
(filtered_info["fold_change"]>min_fold_change)]
filtered_info=filtered_info.sort_values(by="in_group_fraction", ascending=False)
filtered_info["target_dmain"]=target
filtered_info["neighbors"]=str(nbr_domians)
print("SVGs for domain ", str(target),":", filtered_info["genes"].tolist())
I get NO SVGs for domain 0:
Calculateing adj matrix using xy only...
Calculateing adj matrix using xy only...
Calculateing adj matrix using xy only...
Run 1: radius [1.4142135381698608, 16.970561981201172], num_nbr [1.0, 315.1679389312977]
Calculateing adj matrix using xy only...
Run 2: radius [1.4142135381698608, 9.192387759685516], num_nbr [1.0, 107.9587786259542]
Calculateing adj matrix using xy only...
Run 3: radius [1.4142135381698608, 5.303300648927689], num_nbr [1.0, 37.621374045801524]
Calculateing adj matrix using xy only...
Run 4: radius [1.4142135381698608, 3.3587570935487747], num_nbr [1.0, 18.21526717557252]
Calculateing adj matrix using xy only...
Run 5: radius [2.386485315859318, 3.3587570935487747], num_nbr [8.125190839694657, 18.21526717557252]
Calculateing adj matrix using xy only...
recommended radius = 2.8726212047040462 num_nbr=11.500763358778626
radius= 2.8726212047040462 average number of neighbors for each spot is 11.500763358778626
Cluster 0 has neighbors:
Dmain 5 : 1041
Dmain 2 : 598
Dmain 3 : 392
Dmain 1 : 390
WARNING: It seems you use rank_genes_groups on the raw count data. Please logarithmize your data before calling rank_genes_groups.
SVGs for domain 0 : []
How do you suggest to account for these errors? thanks!
Issue Analytics
- State:
- Created 2 years ago
- Comments:7 (4 by maintainers)
Top Results From Across the Web
Ufl-interpolation - problem reproducing tutorial - dolfinx
Hi everyone, I'm new to FEniCSx, so I'm trying to reproduce some of the examples in the tutorial. I'm now with the membrane...
Read more >Debugging Hard to Reproduce Issues - CODE Magazine
In order to really understand the challenges with debugging hard to reproduce issues, let's walk through a few scenarios here.
Read more >How to Troubleshoot a Computer - YouTube
An error occurred while retrieving sharing information. Please try again later. 0:00. 2:32. 0:00 / 2:32•
Read more >Record steps to reproduce a problem - Microsoft Support
Learn how to record your steps to help troubleshoot a problem in Windows 10. ... Go through the steps to reproduce the problem...
Read more >Reproducing Kernel Hilbert Spaces for Penalized Regression
Reproducing Kernel Hilbert Spaces for Penalized Regression: A Tutorial, The American Statistician, 66:1, 50-60.
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Thanks for this excellent project. Really appreciate it and the tutorial to get oriented.
I’ve encountered this error as well and found a workaround:
I think it relates to the commented out code in Section 3 of the tutorial. When I uncommented this section to regenerate the
../tutorial/data/151673/sample_data.h5adfile rather than using the one currently included with the toy dataset, the plot worked fine.For the SVGs, I also got zero SVGs as reported above, at least for the given target domain:
As a workaround, I could get it to find a gene by reducing the
min_fold_changevariable to 1. It also found several SVGs when I looked at a different target (egtarget = 2) without changingmin_fold_change(leaving it as 1.5).I tested this using the “Environment 1” package versions from the System Requirements section, except that I needed AnnData 0.7.5 to avoid an error. I also tried a new environment with the current versions of each package and got slightly different numbers but otherwise unchanged SVG output.
I imagine it could be related to #3 but am just very excited this is working overall. Thanks again for this package!
I have replied to you in my first response, why you are having this error and clearly stated that the solution is to check if and why the three variables are missing, but you NEVER did that. After you try that, we can have further discussions.
On the SpaGCN GitHub main page, the “System Requirements” section lists all the tested environments.
SpaGCN has an early-stopping criteria, which is described in the method section of the paper. The loss may coverages after a different number of epochs.