scGSEA

Description: scGSEA is an extension of ssGSEA tailored for single-cell data analysis. It addresses the challenge of sparsity by employing a normalization method and scoring metric chosen to minimize any variability. By utilizing scGSEA, scientists can explore and interpret pathway activity and functional alterations within heterogeneous populations of cells.

Authors: John Jun; UCSD - Mesirov Lab, UCSD

Contact: Forum Link.

Parameters

Parameter Group	Name	Description	Default Value
Input Files	gex_file *	File containing raw counts or mRNA abundance estimates
	gene_set_database_file *	Gene sets in GMT format
	chip_file	Chip file used for conversion to gene symbols
	output_file_name *	Basename to use for output file	scGSEA_scores
Cell Grouping Data * (Only use one* of the two*)	metacell_data_label	Metadata label for cell grouping (metacell) information; clustering data	seurat_clusters
Cell Grouping Data * (Only use one* of the two*)	metacell_data_file	Metadata file for cell grouping (metacell) information; clustering data
Multi-threading	n_cpu	Number of CPUs to utilize for parallel computing	3

* Required

Input Files

gex_file
This is a file containing unnormalized gene expression data in raw read counts or estimated RNA abundance. The scGSEA module supports multiple input file formats including Seurat RDS, H5seurat, H5ad formats as well as 10x Market Exchange (MEX) and HDF5 (h5) formats. For a Seurat object, the $RNA@counts slot will be used. For an AnnData object, the raw.X slot will be used.
- If you come across the following message in the stderr.txt file, please verify that the input file contains unnormalized raw counts data.
```
The raw counts matrix was not composed of integer values. This may represent an issue with the processing pipeline. Please be advised...
```
- If you have used kallisto or salmon.alevin for alignment, please disregard the message about the raw counts data not being in integer format; the aforementioned tools generate estimated RNA abundances, which may consist of non-integer count values.
- For 10x MEX file format, please compress the folder containing the three files (barcodes.tsv, matrix.mtx, features.tsv) and supply the .zip file.
gene_set_database_file
- This parameter’s drop-down allows you to select gene sets from the Molecular Signatures Database (MSigDB) on the GSEA website. This drop-down provides access to only the most current (2023) version of MSigDB. You can also upload your own gene set file(s) in GMT format.
- If you want to use files from an earlier version of MSigDB you will need to download them from the archived releases on the GSEA website.
chip_file
This parameter’s drop-down allows you to select CHIP files from the Molecular Signatures Database (MSigDB) on the GSEA website. This drop-down provides access to only the most current version (2023) of MSigDB. How do I choose a chip file?
output_file_name
The prefix used for the name of the output GCT and CSV file. The default output prefix is scGSEA_scores. The output CSV and GCT files will contain a gene set x metacell matrix of enrichments scores.

Cell Grouping Data

metacell_data_label
The name of the metadata label for cell grouping information within the input Seurat/AnnData object. This label will be used to access the cell grouping information utilized for aggregating cells to create metacells. The default value for this parameter is seurat_clusters, which is the metadata label for the slot that stores cell-to-cluster mapping generated by the Seurat's FindClusters method. Otherwise, provide the appropriate metadata label for the slot that stores cell grouping information.
metacell_data_file
If your input file is 10x HDF5 or 10x MEX format, a separate cell grouping data file (tab-delimited .txt file) must be supplied here. The first column, "Name", would have cell names and the second column, "Metacell", would have metacell (cell group) names. The grouping information in this file is used to aggregate cells prior to computing scGSEA scores. Therefore, if you have 10X HDF5 or 10x MEX formatted files and do not have a metacell data file, please perform clustering using a clustering method of your choice.

Multi-Threading

n_cpu
The number of CPUs to utilize for parallel computing. scGSEA package parallelizes the computation of enrichment scores through dividing the computation into n_cpu number of subprocesses. The default value for this parameter is 3.

Output Files

<output_file_name>.csv
This is a gene set x metacell matrix consisted of scGSEA scores.
<output_file_name>.gct
This is a gene set x metacell matrix consisted of scGSEA scores.

For more details, please refer to the full documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dist		dist
docs		docs
src		src
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
modules.rst		modules.rst
pyproject.toml		pyproject.toml
scGSEA.rst		scGSEA.rst

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scGSEA

Parameters

Input Files

Cell Grouping Data

Multi-Threading

Output Files

About

Releases

Packages

Languages

License

JohnSpJun/scGSEA

Folders and files

Latest commit

History

Repository files navigation

scGSEA

Parameters

Input Files

Cell Grouping Data

Multi-Threading

Output Files

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages