Description: scGSEA is an extension of ssGSEA tailored for single-cell data analysis. It addresses the challenge of sparsity by employing a normalization method and scoring metric chosen to minimize any variability. By utilizing scGSEA, scientists can explore and interpret pathway activity and functional alterations within heterogeneous populations of cells.
Authors: John Jun; UCSD - Mesirov Lab, UCSD
Contact: Forum Link.
Parameter Group | Name | Description | Default Value |
---|---|---|---|
Input Files | gex_file * | File containing raw counts or mRNA abundance estimates | |
gene_set_database_file * | Gene sets in GMT format | ||
chip_file | Chip file used for conversion to gene symbols | ||
output_file_name * | Basename to use for output file | scGSEA_scores | |
Cell Grouping Data * (Only use one of the two) |
metacell_data_label | Metadata label for cell grouping (metacell) information; clustering data | seurat_clusters |
metacell_data_file | Metadata file for cell grouping (metacell) information; clustering data | ||
Multi-threading | n_cpu | Number of CPUs to utilize for parallel computing | 3 |
* Required
-
gex_file
This is a file containing unnormalized gene expression data in raw read counts or estimated RNA abundance. The scGSEA module supports multiple input file formats including Seurat RDS, H5seurat, H5ad formats as well as 10x Market Exchange (MEX) and HDF5 (h5) formats. For a Seurat object, the $RNA@counts slot will be used. For an AnnData object, the raw.X slot will be used.- If you come across the following message in the
stderr.txt
file, please verify that the input file contains unnormalized raw counts data.The raw counts matrix was not composed of integer values. This may represent an issue with the processing pipeline. Please be advised...
- If you have used
kallisto
orsalmon.alevin
for alignment, please disregard the message about the raw counts data not being in integer format; the aforementioned tools generate estimated RNA abundances, which may consist of non-integer count values. - For 10x MEX file format, please compress the folder containing the three files (barcodes.tsv, matrix.mtx, features.tsv) and supply the
.zip
file.
- If you come across the following message in the
-
gene_set_database_file
- This parameter’s drop-down allows you to select gene sets from the Molecular Signatures Database (MSigDB) on the GSEA website. This drop-down provides access to only the most current (2023) version of MSigDB. You can also upload your own gene set file(s) in GMT format.
- If you want to use files from an earlier version of MSigDB you will need to download them from the archived releases on the GSEA website.
-
chip_file
This parameter’s drop-down allows you to select CHIP files from the Molecular Signatures Database (MSigDB) on the GSEA website. This drop-down provides access to only the most current version (2023) of MSigDB. How do I choose a chip file? -
output_file_name
The prefix used for the name of the output GCT and CSV file. The default output prefix is scGSEA_scores. The output CSV and GCT files will contain a gene set x metacell matrix of enrichments scores.
metacell_data_label
The name of the metadata label for cell grouping information within the input Seurat/AnnData object. This label will be used to access the cell grouping information utilized for aggregating cells to create metacells. The default value for this parameter is seurat_clusters, which is the metadata label for the slot that stores cell-to-cluster mapping generated by the Seurat's FindClusters method. Otherwise, provide the appropriate metadata label for the slot that stores cell grouping information.metacell_data_file
If your input file is10x HDF5
or10x MEX
format, a separate cell grouping data file (tab-delimited .txt file) must be supplied here. The first column, "Name", would have cell names and the second column, "Metacell", would have metacell (cell group) names. The grouping information in this file is used to aggregate cells prior to computing scGSEA scores. Therefore, if you have10X HDF5
or10x MEX
formatted files and do not have a metacell data file, please perform clustering using a clustering method of your choice.
n_cpu
The number of CPUs to utilize for parallel computing. scGSEA package parallelizes the computation of enrichment scores through dividing the computation inton_cpu
number of subprocesses. The default value for this parameter is 3.
<output_file_name>.csv
This is a gene set x metacell matrix consisted of scGSEA scores.<output_file_name>.gct
This is a gene set x metacell matrix consisted of scGSEA scores.
For more details, please refer to the full documentation.