-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with strain phylogeny #53
Comments
Hi Kyle,
Unfortunately at the current time, MIDAS only produces strain phylogenies
based on consensus sequences. As you observe, that means that you will get
only 1 leaf per metagenome. These phylogenies will be accurate only if the
intra-metagenome diversity is low.
I would suggest performing some followup analysis to assess the
intra-metagenome and inter-metagenome diversity levels. The script
`snp_diversity.py` should help in that regard. If the intra-diversity is
very low, you have clonal populations and the phylogenetic tree based on
consensus sequences is likely accurate. Please let me know if you have any
questions.
In the future I may add code to estimate strain phylogenies that take into
account multiple strains/species/sample.
Best,
Stephen
…On Wed, May 3, 2017 at 3:43 PM, kylecampbell3 ***@***.***> wrote:
Hi Stephen,
Thank you for your program!
I am trying to use MIDAS to construct a detailed strain phylogeny for a
single species, and am running into a bit of trouble in the final steps. I
created a custom database using a pan-genome reference that had previously
been assembled and ran run_midas.py species and run_midas.py snps
individually for three metagenomic samples (each around 2GB). I merged the
snp results for the three samples and then ran call_consensus.py. When I
used the consensus.fa file output to construct a phylogenetic tree on
FastTree, it produced a tree with only three leaves, treating each
metagenomic output as a strain, rather than identifying many different
strains within each metagenome. Is this the way that the program is
supposed to work, or could settings be adjusted to produce a more detailed
phylogeny?
Thanks for your help,
Kyle Campbell
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#53>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ACAbrWWnWlQsYw_esx3imKT-M-HRCsOmks5r2QMLgaJpZM4NQCtz>
.
|
Hi Stephen, ubuntu@ip-172-31-61-218:~/MIDAS$ python /home/ubuntu/MIDAS/scripts/snp_diversity.py --indir /home/ubuntu/MIDAS/psb1_snpmerge_out/psb1/ --out /home/ubuntu/MIDAS/snp_diversity MIDAS: Metagenomic Intra-species Diversity Analysis System =============================== Selecting subset of samples... Having not had much coding experience, we do not know what this means or how to fix it. Do you have any suggestions? |
Thanks for reporting the error. I will look into it and get back to you. |
We think the ValueError: invalid literal for int() with base 10: 'quiver' is based on our input system where the program is expecting an integer and instead there's string. We removed 'quiver' from our input file and the program ran. |
Hi Kyle - I'm posting your questions from email here in case it is useful to others. You wrote:
First off, it looks like these are intra-sample diversity estimates. Could you run the script for inter-sample diversity also? The diversity statistic is the sum of the number of nucleotide substitutions over the total number of sites analyzed. So if you divide by count_sites, you will get the number of substitutions per site. You can read more about this statistic here: https://en.wikipedia.org/wiki/Nucleotide_diversity After dividing by the number of sites, per-bp diversity ranges from 2.2e-3 to 3.8e-3. In my experience this indicates a relatively low-diversity community that is probably composed of one dominant strain. If you look at figure 4 from my biorxiv paper you can compare your results to other human gut bacteria: http://biorxiv.org/content/early/2015/11/14/031757 I would recommend using the same script to compute between-sample (i.e. inter-sample) diversity. This script will pool the data across your 3 samples (assuming they are the same species) and use this pooled data to compute diversity again. If this number is much higher than the per-sample diversity, then you can conclude that the nucleotide differences between populations are much greater than those within the individual populations. Does this make sense? |
Hi Stephen,
Thank you for your program!
I am trying to use MIDAS to construct a detailed strain phylogeny for a single species, and am running into a bit of trouble in the final steps. I created a custom database using a pan-genome reference that had previously been assembled and ran run_midas.py species and run_midas.py snps individually for three metagenomic samples (each around 2GB). I merged the snp results for the three samples and then ran call_consensus.py. When I used the consensus.fa file output to construct a phylogenetic tree on FastTree, it produced a tree with only three leaves, treating each metagenomic output as a strain, rather than identifying many different strains within each metagenome. Is this the way that the program is supposed to work, or could settings be adjusted to produce a more detailed phylogeny?
Thanks for your help,
Kyle Campbell
The text was updated successfully, but these errors were encountered: