https://github.com/palfalvi/rnaseq
This pipeline is a basic workflow to QC and quantify raw RNA-seq reads with the following steps:
Utilizing fastp
software, quality values are recorded and low quality reads removed. fastp
also recognizes and trims adaptors.
This depends on the quantification mode selected. Currently salmon
, kallisto
and star
are supported.
This is connected to the index generation. 'salmon' and 'kallisto' natively quantifies reads, while the 'star' mode utilizes featureCounts
As this is a nextflow project dependent on conda environments, the only 2 things you need before running are nextflow and conda:
In your bin
directory, or in some places you can access from your $PATH
, copy and execute the following:
wget -qO- https://get.nextflow.io | bash
Done.
Please download the miniconda installer into your bin
directory form the miniconda repository and execute the follwing :
bash Miniconda3-latest-Linux-x86_64.sh
Done.
- If it did not work on NIBB-BIAS5, please try on bias5-db.
If you wish to use SRA ids directly, you can provide with --sra SRP043510
instead of --reads
. This feature, however uses the NCBI Esearch API, which requires an NCBI API Key in your environment. To get an API Key, follow this link.
After you get your API key, you need to set it into your environment. In your favorite editor (e.g. nano or vim) open ~/.bashrc
and in the end of the file, insert the following:
export NCBI_API_KEY=0123456789abcdef
then save. You need to run source ~/.bashrc
or log out and in again to make it work. From that point, you do not need to modify or rerun this part.
First, you need to decide which mode you would like to run. The default and recommended way is salmon
, but kallisto
and star
are also available. Just change specify with the --mode
flag. e.g --mode star
.
Then, you need to know if you have pair-end or single-end dataset. The pipeline runs by default on pair end data, but you can specify single end mode with the --single
flag.
If you run in single end mode with kallisto, please also specify --fragment_length
and --fragment_sd
, which can be calculated from your BioAnalyzer file of the final libraries.
Finally, you need to have a reference set. For salmon
and kallisto
, this should be a fasta file of transcripts (not genes!) specified with the --transcriptome
flag. In the case of star
, please specify the genome and a corresponding GTF annotation file with --genome
and --gtf
flags.
nextflow run palfalvi/rnaseq --reads /path/to/reads/*R{1,2}.fastq --transcriptome transcripts.fasta
nextflow run palfalvi/rnaseq --mode kallisto --reads /path/to/reads/*.fastq --transcriptome transcripts.fasta --fragment_length 300 --fragment_sd 30
nextflow run palfalvi/rnaseq --mode star --reads /path/to/reads/*R{1,2}.fastq --genome genome.fastq --gtf annotation.gtf
Salmon with no QC and trimming and save the index files for later use. Also, save the results into salmon_output
directory.
nextflow run palfalvi/rnaseq --reads /path/to/reads/*R{1,2}.fastq --transcriptome transcripts.fasta --save_index --skip_qc --out salmon_output