Skip to content

VGP/vgp-assembly

Folders and files

NameName
Last commit message
Last commit date

Latest commit

53edb61 · Jan 10, 2023
Jul 21, 2019
Feb 20, 2019
Jun 4, 2018
Dec 13, 2021
May 9, 2020
May 25, 2021
Jun 6, 2022
Dec 19, 2018
Jan 19, 2022
Dec 27, 2022
Nov 30, 2018
Apr 27, 2021
May 20, 2020
Dec 13, 2021
Jul 16, 2018
Jul 16, 2018
Mar 19, 2019
May 28, 2019
Jan 10, 2023
Feb 1, 2018

Repository files navigation

vgp-assembly

VGP repository for the genome assembly working group

Contents

  • Galaxy Workflow
  • DNAnexus Workflow
  • Docker images with WDL Workflow
  • Pipeline for local run
  • MitoVGP pipeline
  • Instructions for AWS s3 genomeark
  • Meta data
  • Citation

Galaxy Workflow

Starting the VGP v2.0 pipeline, the production has been moved to the Galaxy environment. The major difference from the Rhie et al. 2021 is the replacement of the CLR component to HiFi and additional options for QCing and Hi-C scaffolding.

  • Tutorials: starting point for new trainees
  • Workflows: dockstore workflow, input, and output for each assembly steps

DNAnexus Workflow

The production of the VGP v1.0-1.1 assemblies has been performed on DNAnexus, which is available for anyone that registers. We welcome new trainees who are interested in leading the assembly of VGP and other genomes. Feel free to contact us.

Docker images with WDL Workflow

The scaffolding pipeline to run on generic architecture and Docker containers is available to the public. This includes a WDL implementation of the scaffolding portion of VGP Assembly, as well as some of the QC steps. Note that Falcon assembly and Arrow polishing are not included.

Pipeline for local run

The local pipeline is available for each individual step for scaffolding, polishing, and evaluation as bash scripts. These scripts were used to locally assemble the first 17 genomes described in Rhie et al. 2021.

Note the scripts are optimized to run on a Slurm schedular and tested on Biowulf. All submitter scripts have a prefix of _submit_.

MitoVGP pipeline

Pipeline for generating mitochondrial sequences is available on a conda release.

Instructions for AWS s3 genomeark

This is only relevant for our collaborators and data managers, for sharing sequencing data on genomeark not produced by VGP.

Meta data

The meta data proposal and specifications. Actual meta data is stored in this repository.

Citation