Skip to content

Add "chromosome" column to edges #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hyanwong opened this issue Sep 21, 2023 · 5 comments
Closed

Add "chromosome" column to edges #11

hyanwong opened this issue Sep 21, 2023 · 5 comments

Comments

@hyanwong
Copy link
Owner

hyanwong commented Sep 21, 2023

I suddenly thought, we might well want to have material moving between different chromosomes.

The "obvious" way to do that is to have a parent_chromosome and child_chromosome field in the "edges" table. If these are all -1, we can assume we are using the "default" chromosome (whatever that is), and can then convert it to the normal tskit format.

Edit: an alternative option would be for each node to represent a different chromosome. I'm not sure which is better.
Further edit: we have decided on extra edges columns

@hyanwong
Copy link
Owner Author

hyanwong commented Sep 27, 2023

Note that in cases where we have duplication of chromosomes, we might (or might not) want to give them different chromosome IDs. Essentially, the chromosome ID is serving as a marker of which segment of genome you recombine with, normally (and this is essentially a continuous thing, especially in cases such as #15 ).

It may be that it is better to have a separate node for each chromosome instead, and group the nodes together somehow into a haploid genome.

@duncanMR
Copy link
Collaborator

It would be great to support this! I prefer the option of adding chromosome information to the nodes table instead of the edges. Since genetic material transfer between different chromosomes is not a common event to simulate, I think adding a column to the edges just for this purpose isn't efficient. On the other hand, if we are simulating multiple chromosomes, we will need to keep track of which nodes correspond to which chromosome anyway, in order to calculate properties like chromosome length. Having separate nodes for each chromosome is intuitive to me since we effectively do that with tsinfer already (inferring trees for each chromosome arm, then stitching them together).

@hyanwong
Copy link
Owner Author

hyanwong commented Sep 28, 2023

I'm not sure adding a single integer column to the edge table will impact efficiency, to be honest. But let's see what Jerome and Ben think. There was discussion about this in tskit, e.g. at tskit-dev/msprime#848 (comment)

If we do use nodes, we will need another layer (another table?) lying between the individual and the nodes tables, which ties the nodes together into a haploid genome. I guess this could be an integer column in the nodes table.

@duncanMR
Copy link
Collaborator

Fair point; that discussion is helpful, thanks for the link. One downside of using the nodes table is that we already have the headache of how to store duplicate sample nodes in local tree sequences! The problem there is similar: we have to decide whether to add complexity to the edges table or the nodes table of local tree sequences.

@hyanwong
Copy link
Owner Author

hyanwong commented Mar 15, 2024

After some thought, I am fixed on implementing chromosomes as 2 extra columns in the edges table, rather than as separate nodes. The reasons for this is mainly that a "genome" (consisting of multiple chromosomes) is a coherent thing that is mostly passed about as a unit. Separate chromosomes cannot survive and lead an independent life of their own: they mostly have to be part of a whole genome. Thus the node grouping, consisting of multiple chromosomes, is a natural one as the base unit of selection. If we allocated each chromosome a separate node, keeping them all tied together properly would be difficult. We would also need to match different nodes to each other whenever we recombined, which would be a major hassle.

The downside, however, is that during meiosis, the chromosomes can be treated as independent units, which means that e.g. autopolyploidy may be a little tricky to simulate (see #15 (comment)). I think we can get around this by specific MRCA hacks, though, and the alternative is much worse. So I'm closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants