`use_observed_lib_size=False` fails with NaN for some datasets #1903

vitkl · 2023-02-11T15:37:20Z

Hi,

I see that use_observed_lib_size=False fails with NaN for some datasets. I would like to use use_observed_lib_size=False to estimate the technical normalisation effect per cell - which does not necessarily match the total count per cell, especially in more complex data from multiple tissues and cell types. For one dataset, a few attempts to restart the notebook and run all cells help. For the same dataset, it also helps to reduce n_hidden 1024 -> 512. For a different dataset, even n_hidden=128 and n_latent=30 don't work. It would be hard to provide a reproducible example because I observed this issue using unpublished snRNA-seq data.

My guess would be that use_observed_lib_size=False is not particularly numerically stable. I observed a similar issue with other models when priors are selected in suboptimal ways. Which prior is used for this technical normalisation effect?

I generally use a batch-specific prior that regularises the model to keep the cell-specific normalisation y_c close to 1 (e.g. in cell2location package):

y_c ~ Gamma(a, a / y_e)
y_e ~ Gamma(10, 10)

which regularises y_e batch-specific normalisation effect to be close to 1, and regularises y_c cell-specific normalisation effect to be close to the average for each batch y_e using hyperparameter a.

Please let me know what you think is going on with this NaN use_observed_lib_size=False issue and what do you think about using more regularised priors.

The text was updated successfully, but these errors were encountered:

canergen · 2023-02-11T17:26:35Z

Can you provide what the minimum of your library size (and maximum) is and whether this behavior is dependent on this? I have observed this when not filtering for min_counts. I am not sure the problem is the range or the absolute value. It might be not what you are searching for but could help figuring out the problem.

adamgayoso · 2023-02-11T17:59:23Z

This is why we switched the default. I imagine the problem exists because we use an exp to transform the log library size, where we could instead consider using softplus.

The prior is described here. It's designed so that $\ell_n$ is on the same scale as the observed library size

vitkl · 2023-03-13T18:49:35Z

After reviewing this more carefully, I think

the prior l_sigma is an overestimate of the total variance that can be attributed to the technical effect. A potential fix (and an easy fix) would be to add a hyper parameter to allows the user to reduce variance prior using a simple weight.
Indeed it is possible that softplus would make the computation more stable. Is it easy to change to softplus exclusively for size factors? Is this the operation here https://github.com/scverse/scvi-tools/blob/library_stability/scvi/nn/_base_components.py#L290?
I assume that the encoder network size, n_hidden is the same both for z and l which in my example is a pretty large number. This would mean that the network is massively overparameterised. In my trials of amortizing inference for cell2location I observed that such 1d parameters need to be amortized with a much smaller network to achieve numerical stability and avoid loss increase (n_hidden=10). Is it possible to change n_hidden exclusively for this parameter? If yes I would like to try this and report results.

I think a combination of 1 and 3 could solve this.

I assume that when the library size is learnable, the biological expression is transformed to positive using softplus rather than softmax, correct?

vitkl · 2023-03-14T17:33:09Z

This line https://github.com/scverse/scvi-tools/blob/library_stability/scvi/module/_vae.py#L218 means that softplus is used here https://github.com/scverse/scvi-tools/blob/library_stability/scvi/nn/_base_components.py#L406 only when use_size_factor_key == True. This doesn't make sense - it should be "softplus" if (use_size_factor_key or not use_observed_lib_size) else "softmax".

If you softmax transform gene expression prediction, it doesn't matter if the library size is estimated or "observed total count" - it has to match the same total count per cell.

adamgayoso · 2023-03-14T17:55:26Z

only when use_size_factor_key == True. This doesn't make sense - it should be "softplus" if (use_size_factor_key or not use_observed_lib_size) else "softmax".

We need to preserve backwards compatibility. scVI was originally described with a latent library size and softmax transformation

vitkl · 2023-03-14T18:20:24Z

Ok, I can define an option to use softplus but keep the rest the same.

vitkl added the bug label Feb 11, 2023

vitkl mentioned this issue Mar 14, 2023

Modifications to scVI that improve preservation of biological variance while correcting known tech effects #1957

Closed

martinkim0 mentioned this issue Jun 23, 2023

SCVI model errors with NaN-filled tensor #2103

Closed

martinkim0 added the nan label Jul 3, 2023

martinkim0 added the P0 label Jul 12, 2024

martinkim0 added this to the scvi-tools 1.2 milestone Jul 12, 2024

martinkim0 self-assigned this Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`use_observed_lib_size=False` fails with NaN for some datasets #1903

`use_observed_lib_size=False` fails with NaN for some datasets #1903

vitkl commented Feb 11, 2023

canergen commented Feb 11, 2023

adamgayoso commented Feb 11, 2023

vitkl commented Mar 13, 2023 •

edited

Loading

vitkl commented Mar 14, 2023 •

edited

Loading

adamgayoso commented Mar 14, 2023

vitkl commented Mar 14, 2023

use_observed_lib_size=False fails with NaN for some datasets #1903

use_observed_lib_size=False fails with NaN for some datasets #1903

Comments

vitkl commented Feb 11, 2023

canergen commented Feb 11, 2023

adamgayoso commented Feb 11, 2023

vitkl commented Mar 13, 2023 • edited Loading

vitkl commented Mar 14, 2023 • edited Loading

adamgayoso commented Mar 14, 2023

vitkl commented Mar 14, 2023

`use_observed_lib_size=False` fails with NaN for some datasets #1903

`use_observed_lib_size=False` fails with NaN for some datasets #1903

vitkl commented Mar 13, 2023 •

edited

Loading

vitkl commented Mar 14, 2023 •

edited

Loading