Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Short read variants and consensus depth #482

Draft
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

gorbruno
Copy link

@gorbruno gorbruno commented Mar 25, 2025

Description:

#447

This PR adds the min_depth parameter for Illumina variant/consensus calling. The Nanopore artic_make_depth_mask remains unchanged. MultiQC mosdepth coverage > min_depth has not been added either.

Key Changes:

  • The parameter has been added to the pipeline's schema and config
  • MAKE_BED_MASK — new variable min_depth instead of args2
  • MAKE_BED_MASK — updated Python and Samtools versions
  • Illumina config is updated to include the new parameter

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/viralrecon branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.

@gorbruno
Copy link
Author

Also I have a couple comments.

  1. Should the minimum depth for variant calling and consensus calling be separated?
  2. Adding this column to the final MQC table means that multiqc_config_illumina.yml asset must be updated. Also multiqc_to_custom_csv.py must be updated in that case somehow. But as a feature looks interesting.

@gorbruno gorbruno marked this pull request as ready for review March 25, 2025 10:07
@gorbruno gorbruno modified the milestone: 2.7 Mar 25, 2025
@gorbruno gorbruno changed the title Feat/short read consensus depth Short read variants and consensus depth Mar 25, 2025
@gorbruno gorbruno requested a review from svarona March 25, 2025 10:22
@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

Also I have a couple comments.

  1. Should the minimum depth for variant calling and consensus calling be separated?
  2. Adding this column to the final MQC table means that multiqc_config_illumina.yml asset must be updated. Also multiqc_to_custom_csv.py must be updated in that case somehow. But as a feature looks interesting.

I think it is better to have different parameters for variant calling and consensus, because if it won’t be configurable in the config file anymore, it is better to give both options. Don't forget that viralrecon has two callers and two consensus softwates, those both params should be added to both of them.

@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

Also, if you update the conda-forge versions of the softwares, you need to create new mulled files. I think latest python is conda-forge::python=3.13.2, doesn't it work with that version? I was updating local modules version in this issue #467 , I can link the make bed mask version with this PR if you manage it

Copy link

github-actions bot commented Mar 25, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 9d34db6

+| ✅ 204 tests passed       |+
#| ❔   2 tests were ignored |#
!| ❗  47 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • files_exist - File not found: conf/igenomes_ignored.config
  • readme - README contains the placeholder zenodo.XXXXXXX. This should be replaced with the zenodo doi (after the first release).
  • pipeline_todos - TODO string in nextflow.config: Update the field with the details of the contributors to your pipeline. New with Nextflow version 24.10.0
  • pipeline_todos - TODO string in ro-crate-metadata.json: "description": "

    \n \n <source media="(prefers-color-scheme: dark)" srcset="docs/images/nf-core-viralrecon_logo_dark.png">\n <img alt="nf-core/viralrecon" src="docs/images/nf-core-viralrecon_logo_light.png">\n \n

    \n\nGitHub Actions CI Status\nGitHub Actions Linting StatusAWS CICite with Zenodo\nnf-test\n\nNextflow\nrun with conda\nrun with docker\nrun with singularity\nLaunch on Seqera Platform\n\nGet help on SlackFollow on TwitterFollow on MastodonWatch on YouTube\n\n## Introduction\n\nnf-core/viralrecon is a bioinformatics pipeline that ...\n\n TODO nf-core:\n Complete this sentence with a 2-3 sentence summary of what types of data the pipeline ingests, a brief overview of the\n major pipeline sections and the types of output it produces. You're giving an overview to someone new\n to nf-core here, in 15-20 seconds. For an example, see https://github.com/nf-core/rnaseq/blob/master/README.md#introduction\n\n\n Include a figure that guides the user through the major workflow steps. Many nf-core\n workflows use the "tube map" design for that. See https://nf-co.re/docs/contributing/design_guidelines#examples for examples. \n Fill in short bullet-pointed list of the default steps in the pipeline 1. Read QC (FastQC)2. Present QC for raw reads (MultiQC)\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.\n\n Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets.\n Explain what rows and columns represent. For instance (please edit as appropriate):\n\nFirst, prepare a samplesheet with your input data that looks as follows:\n\nsamplesheet.csv:\n\ncsv\nsample,fastq_1,fastq_2\nCONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz\n\n\nEach row represents a fastq file (single-end) or a pair of fastq files (paired end).\n\n\n\nNow, you can run the pipeline using:\n\n update the following command to include all required parameters for a minimal example \n\nbash\nnextflow run nf-core/viralrecon \\\n -profile <docker/singularity/.../institute> \\\n --input samplesheet.csv \\\n --outdir <OUTDIR>\n\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.\n\nFor more details and further functionality, please refer to the usage documentation and the parameter documentation.\n\n## Pipeline output\n\nTo see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page.\nFor more details about the output files and reports, please refer to the\noutput documentation.\n\n## Credits\n\nnf-core/viralrecon was originally written by Patel H, Varona S and Monzon S.\n\nWe thank the following people for their extensive assistance in the development of this pipeline:\n\n If applicable, make list of people who have also contributed \n\n## Contributions and Support\n\nIf you would like to contribute to this pipeline, please see the contributing guidelines.\n\nFor further information or help, don't hesitate to get in touch on the Slack #viralrecon channel (you can join with this invite).\n\n## Citations\n\n Add citation for pipeline after first release. Uncomment lines below and update Zenodo doi and badge at the top of this file. \n If you use nf-core/viralrecon for your analysis, please cite it using the following doi: 10.5281/zenodo.XXXXXX \n\n Add bibliography of tools and data used in your pipeline \n\nAn extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.\n\nYou can cite the nf-core publication as follows:\n\n> The nf-core framework for community-curated bioinformatics pipelines.\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.\n",
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • system_exit - System.exit in nanopore.nf: System.exit(1) [line 246]
  • local_component_structure - rename_fasta_header.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - multiqc_nanopore.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - plot_mosdepth_regions.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - snpeff_ann.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - filter_blastn.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - prepare_primer_fasta.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - make_bed_mask.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - asciigenome.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - plot_base_density.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - snpsift_extractfields.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - collapse_primers.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - make_variants_long_table.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - kraken2_build.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - snpeff_build.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - multiqc_illumina.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - ivar_variants_to_vcf.nf in modules/local should be moved to a TOOL/SUBTOOL/main.nf structure
  • local_component_structure - assembly_qc.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - bam_trim_primers_ivar.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - consensus_bcftools.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - fastq_trim_fastp_fastqc.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - vcf_bgzip_tabix_stats.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - consensus_ivar.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - assembly_spades.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_genome_illumina.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - variants_qc.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - assembly_unicycler.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - snpeff_snpsift.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - consensus_qc.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - variants_bcftools.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - variants_long_table.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - additional_annotation.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - assembly_minia.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - vcf_tabix_stats.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - filter_bam_samtools.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - prepare_genome_nanopore.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure
  • local_component_structure - variants_ivar.nf in subworkflows/local should be moved to a SUBWORKFLOW_NAME/main.nf structure

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt

✅ Tests passed:

Run details

  • nf-core/tools version 3.2.0
  • Run at 2025-03-25 13:49:42

@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

Also I have a couple comments.

  1. Should the minimum depth for variant calling and consensus calling be separated?
  2. Adding this column to the final MQC table means that multiqc_config_illumina.yml asset must be updated. Also multiqc_to_custom_csv.py must be updated in that case somehow. But as a feature looks interesting.

I think it is better to have different parameters for variant calling and consensus, because if it won’t be configurable in the config file anymore, it is better to give both options. Don't forget that viralrecon has two callers and two consensus softwates, those both params should be added to both of them.

Which column do you want to add to MQC?

@gorbruno
Copy link
Author

As I understood — column is a suggestion from the issue, I meant that. And I totally forgot to add mulled version, yes. But I created one for myself while testing via wave 😆

@gorbruno
Copy link
Author

Ok, then I separate consensus depth for different callers too. But I didn't change frequency options — I should do it too? With that it seems like 4 new parameters. Is it ok?

@gorbruno
Copy link
Author

And should I add similar parameters to nanopore workflow?

@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

As I understood — column is a suggestion from the issue, I meant that. And I totally forgot to add mulled version, yes. But I created one for myself while testing via wave 😆

Oh true, I see they asked for the genome fraction of the sample at the min_depth param. I think it is hardcoded as 10X, so yes, if you want to add that to this PR you should change that too. Else we can keep the issue open and maybe someone else wants to add that to MQC.

@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

Ok, then I separate consensus depth for different callers too. But I didn't change frequency options — I should do it too? With that it seems like 4 new parameters. Is it ok?

sorry, I was mixing with another PR, with min_depth, separated for variant_calling and consensus is enough, and adding them to both softwares (2 new params only).

@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

And should I add similar parameters to nanopore workflow?

You can add it too. Just make sure which is the param that is used in artic minion, I don't know if it makes any difference between variant_calling and consensus depths....

@svarona
Copy link
Contributor

svarona commented Mar 25, 2025

❗ I forgot to say, add the new params info to viralrecon usage documentation and the CHANGELOG.

@gorbruno gorbruno closed this Mar 25, 2025
@gorbruno gorbruno deleted the feat/short-read-consensus-depth branch March 25, 2025 12:55
@gorbruno gorbruno restored the feat/short-read-consensus-depth branch March 25, 2025 13:26
@gorbruno gorbruno deleted the feat/short-read-consensus-depth branch March 25, 2025 13:26
@gorbruno gorbruno restored the feat/short-read-consensus-depth branch March 25, 2025 13:27
@gorbruno gorbruno reopened this Mar 25, 2025
@gorbruno
Copy link
Author

So many buttons I need to push them all 🤦‍♂️

@gorbruno gorbruno marked this pull request as draft March 25, 2025 13:31
@gorbruno
Copy link
Author

After staring at the code for a while I see this scheme for adding a new column:

%%{init: {'theme':'sea'}}%%
flowchart LR
  a[params.min_depth] --> b([Modify mqc yml template with new mosdepth_config]) --> c[Gather depth numbers from General Stats file after first MultiQC run] --> d[Add new column to multiqc_to_custom_csv.py] --> e([Add that column to mqc yml template for the second MultiQC run])
Loading

Where round charts involve modifying YML files and adding values from the provided min_depth param. It seems like another Python script is needed to customize YML file with mosdepth_config, despite modifying multiqc_to_custom_csv.py.

The last chart is needed only for the column order in the MultiQC table (otherwise, the new column will be last).

Does this logic seem correct? I want to confirm before proceeding with writing a Python script if necessary.

@gorbruno
Copy link
Author

Reference docks

@svarona
Copy link
Contributor

svarona commented Mar 26, 2025

After staring at the code for a while I see this scheme for adding a new column:

%%{init: {'theme':'sea'}}%%
flowchart LR
  a[params.min_depth] --> b([Modify mqc yml template with new mosdepth_config]) --> c[Gather depth numbers from General Stats file after first MultiQC run] --> d[Add new column to multiqc_to_custom_csv.py] --> e([Add that column to mqc yml template for the second MultiQC run])
Loading

Where round charts involve modifying YML files and adding values from the provided min_depth param. It seems like another Python script is needed to customize YML file with mosdepth_config, despite modifying multiqc_to_custom_csv.py.

The last chart is needed only for the column order in the MultiQC table (otherwise, the new column will be last).

Does this logic seem correct? I want to confirm before proceeding with writing a Python script if necessary.

That approach makes sense for me! Make sure that all this makes sense with the cnages in this PR? #479

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants