-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use_observed_lib_size=False
fails with NaN for some datasets
#1903
Comments
Can you provide what the minimum of your library size (and maximum) is and whether this behavior is dependent on this? I have observed this when not filtering for min_counts. I am not sure the problem is the range or the absolute value. It might be not what you are searching for but could help figuring out the problem. |
This is why we switched the default. I imagine the problem exists because we use an exp to transform the log library size, where we could instead consider using softplus. The prior is described here. It's designed so that |
After reviewing this more carefully, I think
I think a combination of 1 and 3 could solve this. I assume that when the library size is learnable, the biological expression is transformed to positive using softplus rather than softmax, correct? |
This line https://github.com/scverse/scvi-tools/blob/library_stability/scvi/module/_vae.py#L218 means that If you |
We need to preserve backwards compatibility. scVI was originally described with a latent library size and softmax transformation |
Ok, I can define an option to use softplus but keep the rest the same. |
Hi,
I see that
use_observed_lib_size=False
fails with NaN for some datasets. I would like to useuse_observed_lib_size=False
to estimate the technical normalisation effect per cell - which does not necessarily match the total count per cell, especially in more complex data from multiple tissues and cell types. For one dataset, a few attempts to restart the notebook and run all cells help. For the same dataset, it also helps to reduce n_hidden 1024 -> 512. For a different dataset, even n_hidden=128 and n_latent=30 don't work. It would be hard to provide a reproducible example because I observed this issue using unpublished snRNA-seq data.My guess would be that
use_observed_lib_size=False
is not particularly numerically stable. I observed a similar issue with other models when priors are selected in suboptimal ways. Which prior is used for this technical normalisation effect?I generally use a batch-specific prior that regularises the model to keep the cell-specific normalisation
y_c
close to 1 (e.g. in cell2location package):which regularises
y_e
batch-specific normalisation effect to be close to 1, and regularisesy_c
cell-specific normalisation effect to be close to the average for each batchy_e
using hyperparametera
.Please let me know what you think is going on with this NaN
use_observed_lib_size=False
issue and what do you think about using more regularised priors.The text was updated successfully, but these errors were encountered: