Skip to content

seb-mueller/snakemake_sRNAseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

1b381a9 · Apr 12, 2021

History

16 Commits
Mar 3, 2020
Apr 12, 2021
Oct 23, 2018
Feb 7, 2021
Apr 12, 2021
Feb 7, 2021
Apr 12, 2021
Feb 7, 2021
Mar 3, 2020

Repository files navigation

Automated workflow for small RNA sequence data

Snakemake workflow for processing small RNA-seq libaries produced by Illumina small sequencing kits.

Requirments

  • demultiplex fastq files in located in data directory. They need to be in the form {sample}_R1.fastq.gz

  • Snakefile shipped with this repository.

  • config.yaml shipped with this repository. It contains all parameters and settings to customize the processing of the current dataset.

  • samples.csv listing all samples in the data directory withoug the _R1.fastq.gz suffix. The first line is the header i.e. the work library. An example is shipped with this repository which can be used as a template.

  • Optionall: environment.yaml to create the software environment if conda is used.

  • Installation of snakemake and optionally conda

  • If conda is not used, bowtie, fastqc, samtools and deeptools need to be in the PATH.

    The above files can be downloaded as a whole by cloning the repository (which requires git):

git clone https://github.com/seb-mueller/snakemake_sRNAseq.git

Or individually for example the Snakemake file using wget:

wget https://raw.githubusercontent.com/seb-mueller/snakemake_sRNAseq/master/Snakefile

creating conda environment

conda env create --file environment.yaml --name srna_mapping

activate

conda activate srna_mapping

To deactivate the environment, run:

conda deactivate

Update:

git pull
conda env update --file environment.yaml --name srna_mapping

Usage:

Navigate in a Unix shell to the base directory contains the files listed above plus the data directory including the data like int this example:

.
├── data
│   ├── test2_R1.fastq.gz
│   └── test3_R1.fastq.gz
├── config.yaml
├── environment.yaml
├── samples.csv
└── Snakefile

Then just run snakmake in base directory:

# the most basic usage
snakemake
# recommended: automatic conda managment in central location
snakemake --use-conda --conda-prefix ~/.myconda -p

useful parameters:

  • --cores max number of threads
  • -n dryrun
  • -p print commands
  • --use-conda
  • --conda-prefix ~/.myconda
  • --forcerun postmapping forces rerun of a given rule (e.g. postmapping)
  • --keep-going if for example one sample fails, pipeline will still try to process other samples

Output:

trimmed, log and mapped directory with trimming and mapping results.

Update: added STAR support

# create star index (goes in staridx folder)
snakemake -p --skip-script-cleanup staridx --cores 3
# then map using star
snakemake -p --skip-script-cleanup starmap --cores 3
# TODO: create bw files form STAR mapping

About

Snakemake workflow for processing small RNA-seq libaries

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages