Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with strain phylogeny #53

Closed
kylecampbell3 opened this issue May 3, 2017 · 5 comments
Closed

Issue with strain phylogeny #53

kylecampbell3 opened this issue May 3, 2017 · 5 comments

Comments

@kylecampbell3
Copy link

Hi Stephen,

Thank you for your program!

I am trying to use MIDAS to construct a detailed strain phylogeny for a single species, and am running into a bit of trouble in the final steps. I created a custom database using a pan-genome reference that had previously been assembled and ran run_midas.py species and run_midas.py snps individually for three metagenomic samples (each around 2GB). I merged the snp results for the three samples and then ran call_consensus.py. When I used the consensus.fa file output to construct a phylogenetic tree on FastTree, it produced a tree with only three leaves, treating each metagenomic output as a strain, rather than identifying many different strains within each metagenome. Is this the way that the program is supposed to work, or could settings be adjusted to produce a more detailed phylogeny?

Thanks for your help,

Kyle Campbell

@snayfach
Copy link
Owner

snayfach commented May 3, 2017 via email

@marianne-LotterJones
Copy link

Hi Stephen,
I am working with Kyle on this project. Thank you for your suggestion! However, when we enter the code, we keep coming across this error:

ubuntu@ip-172-31-61-218:~/MIDAS$ python /home/ubuntu/MIDAS/scripts/snp_diversity.py --indir /home/ubuntu/MIDAS/psb1_snpmerge_out/psb1/ --out /home/ubuntu/MIDAS/snp_diversity

MIDAS: Metagenomic Intra-species Diversity Analysis System
version 1.2.1; github.com/snayfach/MIDAS
Copyright (C) 2015-2016 Stephen Nayfach
Freely distributed under the GNU General Public License (GPLv3)

===============================
===========Parameters===========
Command: /home/ubuntu/MIDAS/scripts/snp_diversity.py --indir /home/ubuntu/MIDAS/psb1_snpmerge_out/psb1/ --out /home/ubuntu/MIDAS/snp_diversity
Script: snp_diversity.py
Input directory: /home/ubuntu/MIDAS/psb1_snpmerge_out/psb1/
Output file: /home/ubuntu/MIDAS/snp_diversity
Diversity options:
genomic_type: genome-wide
sample_type: per-sample
site_type: ['NC', '1D', '2D', '3D', '4D']
weight_by_depth: False
rand_reads: None
replace_reads: False
rand_samples: None
rand_sites: None
Sample filters:
sample_depth: 0.0
fract_cov: 0.0
max_samples: inf
keep_samples: None
exclude_samples: None
Site filters:
site_depth: 2
site_prev: 0.0
site_maf: 0.0
site_ratio: inf
site_freq: 0.0
max_sites: inf

Selecting subset of samples...
Estimating diversity metrics...
Traceback (most recent call last):
File "/home/ubuntu/MIDAS/scripts/snp_diversity.py", line 339, in
pi = compute_pi(args, samples)
File "/home/ubuntu/MIDAS/scripts/snp_diversity.py", line 215, in compute_pi
for site in diversity.parse_sites(args['indir'], samples): # loop over sites
File "/home/ubuntu/MIDAS/midas/analyze/diversity.py", line 144, in parse_sites
site = GenomicSite(files, samples, info)
File "/home/ubuntu/MIDAS/midas/analyze/diversity.py", line 19, in init
self.ref_id, self.ref_pos, self.ref_allele = self.parse_id()
File "/home/ubuntu/MIDAS/midas/analyze/diversity.py", line 34, in parse_id
ref_pos = int(self.id.split('|')[1])
ValueError: invalid literal for int() with base 10: 'quiver'

Having not had much coding experience, we do not know what this means or how to fix it. Do you have any suggestions?

@snayfach
Copy link
Owner

snayfach commented May 3, 2017

Thanks for reporting the error. I will look into it and get back to you.

@marianne-LotterJones
Copy link

We think the ValueError: invalid literal for int() with base 10: 'quiver' is based on our input system where the program is expecting an integer and instead there's string. We removed 'quiver' from our input file and the program ran.

@snayfach
Copy link
Owner

Hi Kyle - I'm posting your questions from email here in case it is useful to others. You wrote:

As you suggested, I ran the followup analysis to assess the intra-metagenome and inter-metagenome diversity of three metagenomic samples using the 'snp_diversity.py' script.   
It gave me the following output:   

sample_id     depth               count_sites     diversity
psb1out_3    1494503086     7159108         19335.1858363
lizzy2out       144701011       7159108         16136.8075072
lizzy9out       247182331       7159108         27027.809093

Could you explain the meaning of the diversity statistic?  Is this the number of unique strains found in each sample, or is there some other method you used to calculate this? 

First off, it looks like these are intra-sample diversity estimates. Could you run the script for inter-sample diversity also?

The diversity statistic is the sum of the number of nucleotide substitutions over the total number of sites analyzed. So if you divide by count_sites, you will get the number of substitutions per site. You can read more about this statistic here: https://en.wikipedia.org/wiki/Nucleotide_diversity

After dividing by the number of sites, per-bp diversity ranges from 2.2e-3 to 3.8e-3. In my experience this indicates a relatively low-diversity community that is probably composed of one dominant strain. If you look at figure 4 from my biorxiv paper you can compare your results to other human gut bacteria: http://biorxiv.org/content/early/2015/11/14/031757

I would recommend using the same script to compute between-sample (i.e. inter-sample) diversity. This script will pool the data across your 3 samples (assuming they are the same species) and use this pooled data to compute diversity again. If this number is much higher than the per-sample diversity, then you can conclude that the nucleotide differences between populations are much greater than those within the individual populations. Does this make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants