-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Many more reads in SNP mapping for 1.3.0 vs 1.2.1 #66
Comments
Thank you for reporting this. One thing I forgot to add in the release notes is that I switched from global to local Bowtie2 alignment in the SNPs module. I made this change because the minimum read alignment coverage is specified by the For consistency, I will add an option to allow the user to specify local or global alignment and make a new release in the next day or two. I should be able to confirm that this change results in output that is consistent with v1.2.1. Stephen |
Thanks for clarifying. Do you then recommend using the local alignment with 75% query coverage? I can see that that will produce more hits, but how much specificity is sacrificed? Thanks, |
Yes, that is what I would recommend. But feel free to experiment with increasing the alignment coverage and seeing if it changes your results. In the future, this is something that could be more rigorously evaluated along with other parameters in the SNP calling pipeline. Stephen |
Hi Sur, I've addressed this issue with a new release. Please see the release notes for details. I'm closing this issue, but feel free to reopen it in the future if you notice other discrepancies. Thanks, |
I recently switched from version 1.2.1 to version 1.3.0. I realize there are newer versions of the alignment software in this release so I wasn't surprised when I saw some tiny differences in species identifications. Here is an example of abundances from MIDAS/1.2.1
And here are the corresponding results from mapping with MIDAS/1.3.0
You can see that there are some tiny differences that probably don't influence the results.
However, I then tried to obtain SNPs from a species, with the following command:
I used the exact same command (except for output directory) with both MIDAS/1.2.1 and MIDAS/1.3.0
This is what I get from MIDAS/1.2.1:
$ zcat midas121/SRS051941/snps/output/Haemophilus_parainfluenzae_62356.snps.gz | head ref_id ref_pos ref_allele alt_allele ref_freq depth count_atcg FQ312002 1 T NA 0.0 0 0,0,0,0 FQ312002 2 A NA 1.0 1 1,0,0,0 FQ312002 3 T NA 1.0 2 0,2,0,0 FQ312002 4 G NA 1.0 2 0,0,0,2 FQ312002 5 G NA 1.0 4 0,0,0,4 FQ312002 6 C NA 1.0 6 0,0,6,0 FQ312002 7 T A 0.833333333333 6 1,5,0,0 FQ312002 8 A NA 1.0 7 7,0,0,0 FQ312002 9 T NA 1.0 7 0,7,0,0
And this is what I get from MIDAS/1.3.0
$ zcat midas130/SRS051941/snps/output/Haemophilus_parainfluenzae_62356.snps.gz | head ref_id ref_pos ref_allele depth count_a count_c count_g count_t FQ312002 1 T 15 0 0 0 15 FQ312002 2 A 16 16 0 0 0 FQ312002 3 T 17 0 0 0 17 FQ312002 4 G 18 0 0 18 0 FQ312002 5 G 17 0 0 17 0 FQ312002 6 C 20 0 20 0 0 FQ312002 7 T 20 0 0 0 20 FQ312002 8 A 23 23 0 0 0 FQ312002 9 T 21 0 0 0 21
You can see I get dramatically more reads in MIDAS/1.3.0 though it seems like the major allele is consistent.
I just wonder if such big differences are expected. I attach a couple of small read files that seem to reproduce the pattern, which I didn't' see with the test sample in the test directory.
Thanks,
Sur
read1.fastq.gz
read2.fastq.gz
The text was updated successfully, but these errors were encountered: