Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sp.spatial_high_variable_genes() - Improve computational speed #2

Closed
PSSUN opened this issue Mar 13, 2025 · 2 comments
Closed

sp.spatial_high_variable_genes() - Improve computational speed #2

PSSUN opened this issue Mar 13, 2025 · 2 comments

Comments

@PSSUN
Copy link
Collaborator

PSSUN commented Mar 13, 2025

Discussed in #1

Originally posted by ZhangZao-HB March 13, 2025
我在运行到sp.spatial_high_variable_genes()这一步进行高变基因计算时,显示需要接近100小时的运算时长,是否有办法提升运算速度,例如增加线程数或者将运行的矩阵转换为稀疏矩阵之类的。

When I was running the high variable gene calculation at sp.spatial-highvariable_genes (), it showed that it required nearly 100 hours of computation time. Is there any way to improve the computation speed, such as increasing the number of threads or converting the running matrix to a sparse matrix.

@PSSUN PSSUN changed the title sp.spatial_high_variable_genes() 提升运算速度 sp.spatial_high_variable_genes() - Improve computational speed Mar 13, 2025
@PSSUN
Copy link
Collaborator Author

PSSUN commented Mar 13, 2025

Regarding this issue, we have relevant descriptions in the documentation. I speculate that the dataset you are using has a high resolution. You can increase the bin_size parameter when loading the data, for example:
sp.read_h5ad(file=file_path, bin_size=50, merge_bin=True). This will not reduce the accuracy of STMiner, as resolution generally does not affect the spatial distribution of genes.

We do have a multi-process version, but we are concerned that it might cause memory overflow issues on personal computers. Therefore, we have not mentioned it in the documentation and tutorials yet. We will update the multi-process version once it is optimized and stable.

In general, spatial transcriptomics data contains two sets of coordinates. One is the index coordinates of the sampling points, which start from the top-left corner (or another corner) of the tissue slice (these values are usually smaller and represent indices). The other is the pixel coordinates of the corresponding HE image for the sampling points (these values are larger and correspond to pixel positions). By default, STMiner uses the first set of coordinates (retrieved from adata.uns). If this does not resolve the issue, please check whether the x and y values in adata.obs are the index coordinates.

@PSSUN PSSUN closed this as completed Mar 13, 2025
@PSSUN PSSUN pinned this issue Mar 13, 2025
@PSSUN
Copy link
Collaborator Author

PSSUN commented Mar 15, 2025

Also try:

sp.spatial_high_variable_genes(thread=12)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant