Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(not so) small tasks around swarm 3.0 #122

Closed
20 tasks done
frederic-mahe opened this issue Jan 17, 2019 · 7 comments
Closed
20 tasks done

(not so) small tasks around swarm 3.0 #122

frederic-mahe opened this issue Jan 17, 2019 · 7 comments
Assignees
Milestone

Comments

@frederic-mahe
Copy link
Collaborator

frederic-mahe commented Jan 17, 2019

Changelog for end-users

  • swarm 3.0 expects dereplicated fasta files and exits with an error message if this is not the case,
  • the only exception to that rule is when -d 0 (i.e. the number of differences is set to zero),
  • swarm 3.0 yields results identical to that of swarm 2.0, up to 10x faster and with half the memory footprint.
  • clusters are written to output files in the order they are generated, with the exception of the seeds (-w) which are sorted by decreasing abundance, and then by alphabetical order of sequences.
  • ported to Windows x86-64, ARM 64, and POWER8.

Under the hood:

  • complete rewriting of the d = 1 algorithm,
  • code modernization and code quality improvements for all functions (more than 1,600 fixes),
  • tested with state-of-the-art static and dynamic C++ analyzers,
  • tested on a wide range of compilers and three of the most frequent CPU architectures,
  • 669 unit tests created, covering more than 95% of swarm's code (on a given architecture),
  • automatic testing of code modifications with Travis-CI,
  • first steps toward replicable compilations

To do

  • review the wiki (provide an updated version of my pipeline?),
  • add a link to the seminal paper by Qu et al., 2009 (algorithm similar to swarm's, unknown to us when we created swarm).

Done

  • tag the release as v3.0.0,
  • test the effect of compilation parameters on speed (-march=native vs. -march=x86-64),
  • review README (mandatory dereplication, update memory consumption values, shorter README),
  • review man page (mandatory dereplication),
  • use code coverage results to add new tests,
  • run cppcheck and sanitizer (see softwipe),
  • review help (-h) output,
  • bump up version numbers and copyright years (readme, makefile, binary, individual source files),
  • review version (-v) output,
  • compilation tests (gcc 4.9 to gcc-10-alpha, clang 3.8 to 9) (all good, no warning)
  • run valgrind (automatic unit tests for each swarm usage)
  • run fuzzing tests (three weeks of stress-tests on 12 cores, no crash detected)
  • run fuzzing tests with address sanitizer (two weeks of stress-tests on 12 cores, no crash detected)
  • convert companion python scripts to python 3 (support for python 2 ends this year)
  • review unit tests:
    • add missing tests,
    • eliminate unnecessary tmp files,
    • eliminate unnecessary variables,
    • use shorter input,
    • one input per test (avoid using a general input test),
    • eliminate obsolete bash syntax,
@frederic-mahe frederic-mahe added this to the swarm 3.0 milestone Jan 17, 2019
@frederic-mahe frederic-mahe self-assigned this Jan 17, 2019
@frederic-mahe frederic-mahe changed the title small tasks around swarm 3.0 (not so) small tasks around swarm 3.0 May 17, 2019
@frederic-mahe
Copy link
Collaborator Author

frederic-mahe commented May 17, 2019

Help entries that could be modified (some suggestions):

 -d, --differences INTEGER           resolution (1)

 -b, --boundary INTEGER              min mass of large OTUs (3)
 -c, --ceiling INTEGER               max memory in MB for the Bloom filter (unlimited) 

 -i, --internal-structure FILENAME   write internal OTU structure to file

 -o, --output-file FILENAME          output result to file (stdout)
 -r, --mothur                        output using a mothur-like format
 -u, --uclust-file FILENAME          output using a UCLUST-like format to file
 -w, --seeds FILENAME                write OTU representatives to FASTA file

@frederic-mahe
Copy link
Collaborator Author

frederic-mahe commented May 17, 2019

swarm's uclust format output: check whether column 3 entry C "cluster size" should be the number of amplicons or the number of reads (sum of abundances)?

EDIT

usearch tallies amplicons, not reads:

printf ">s1;size=2;\nAAAA\n>s2;size=1;\nAAAA\n" > tmp.fas
usearch7 -cluster_fast tmp.fas -minseqlength 1 -id 0.5 -uc tmp.uc
cat tmp.uc
rm tmp.*

usearch reports a cluster size of 2 amplicons, not an abundance of 3. vsearch and swarm behave like usearch.

@torognes
Copy link
Owner

Made the suggested changes to the help text in commit c07a0d4.

@torognes
Copy link
Owner

torognes commented Oct 1, 2019

Is it ok if I set up Travis CI to compile (and possibly test) Swarm, as done for vsearch?

@frederic-mahe
Copy link
Collaborator Author

Is it ok if I set up Travis CI to compile (and possibly test) Swarm, as done for vsearch?

Yes, that sounds like a good idea. Can Travis CI fetch tests from the swarm-tests repository?

@torognes
Copy link
Owner

torognes commented Oct 1, 2019

Can Travis CI fetch tests from the swarm-tests repository?

Yes, I think so.

@torognes
Copy link
Owner

torognes commented Oct 2, 2019

Travis CI will now automatically compile and test swarm3 after any commit is pushed or any pull request is submitted.

The status can be seen here: https://travis-ci.org/torognes/swarm

It fails if it compiles with an error or if any of the tests fail.

It is compiled using g++ version 7.4.0 on Ubuntu 18.04.3 (bionic) linux.

The swarm-tests repo is automatically cloned from its source.

It all seems to work fine now after a series of modifications.

There is a badge on the front page (in README.md) showing the latest status.

This only applies to the swarm3 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants