Add "chromosome" column to edges #11

hyanwong · 2023-09-21T08:17:37Z

I suddenly thought, we might well want to have material moving between different chromosomes.

The "obvious" way to do that is to have a parent_chromosome and child_chromosome field in the "edges" table. If these are all -1, we can assume we are using the "default" chromosome (whatever that is), and can then convert it to the normal tskit format.

Edit: an alternative option would be for each node to represent a different chromosome. I'm not sure which is better.
Further edit: we have decided on extra edges columns

The text was updated successfully, but these errors were encountered:

hyanwong · 2023-09-27T09:53:35Z

Note that in cases where we have duplication of chromosomes, we might (or might not) want to give them different chromosome IDs. Essentially, the chromosome ID is serving as a marker of which segment of genome you recombine with, normally (and this is essentially a continuous thing, especially in cases such as #15 ).

It may be that it is better to have a separate node for each chromosome instead, and group the nodes together somehow into a haploid genome.

duncanMR · 2023-09-28T13:56:53Z

It would be great to support this! I prefer the option of adding chromosome information to the nodes table instead of the edges. Since genetic material transfer between different chromosomes is not a common event to simulate, I think adding a column to the edges just for this purpose isn't efficient. On the other hand, if we are simulating multiple chromosomes, we will need to keep track of which nodes correspond to which chromosome anyway, in order to calculate properties like chromosome length. Having separate nodes for each chromosome is intuitive to me since we effectively do that with tsinfer already (inferring trees for each chromosome arm, then stitching them together).

hyanwong · 2023-09-28T14:00:13Z

I'm not sure adding a single integer column to the edge table will impact efficiency, to be honest. But let's see what Jerome and Ben think. There was discussion about this in tskit, e.g. at tskit-dev/msprime#848 (comment)

If we do use nodes, we will need another layer (another table?) lying between the individual and the nodes tables, which ties the nodes together into a haploid genome. I guess this could be an integer column in the nodes table.

duncanMR · 2023-09-28T14:19:18Z

Fair point; that discussion is helpful, thanks for the link. One downside of using the nodes table is that we already have the headache of how to store duplicate sample nodes in local tree sequences! The problem there is similar: we have to decide whether to add complexity to the edges table or the nodes table of local tree sequences.

hyanwong · 2024-03-15T11:58:49Z

After some thought, I am fixed on implementing chromosomes as 2 extra columns in the edges table, rather than as separate nodes. The reasons for this is mainly that a "genome" (consisting of multiple chromosomes) is a coherent thing that is mostly passed about as a unit. Separate chromosomes cannot survive and lead an independent life of their own: they mostly have to be part of a whole genome. Thus the node grouping, consisting of multiple chromosomes, is a natural one as the base unit of selection. If we allocated each chromosome a separate node, keeping them all tied together properly would be difficult. We would also need to match different nodes to each other whenever we recombined, which would be a major hassle.

The downside, however, is that during meiosis, the chromosomes can be treated as independent units, which means that e.g. autopolyploidy may be a little tricky to simulate (see #15 (comment)). I think we can get around this by specific MRCA hacks, though, and the alternative is much worse. So I'm closing this.

This was referenced Sep 21, 2023

Modelling the short arm acrocentric arms of human chromosomes #12

Open

Use-case: whole genome duplication / polyploidy #15

Open

hyanwong closed this as completed Mar 15, 2024

hyanwong mentioned this issue Mar 15, 2024

Account for chromosome IDs when tracing intervals #103

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "chromosome" column to edges #11

Add "chromosome" column to edges #11

hyanwong commented Sep 21, 2023 •

edited

Loading

hyanwong commented Sep 27, 2023 •

edited

Loading

duncanMR commented Sep 28, 2023

hyanwong commented Sep 28, 2023 •

edited

Loading

duncanMR commented Sep 28, 2023

hyanwong commented Mar 15, 2024 •

edited

Loading

Add "chromosome" column to edges #11

Add "chromosome" column to edges #11

Comments

hyanwong commented Sep 21, 2023 • edited Loading

hyanwong commented Sep 27, 2023 • edited Loading

duncanMR commented Sep 28, 2023

hyanwong commented Sep 28, 2023 • edited Loading

duncanMR commented Sep 28, 2023

hyanwong commented Mar 15, 2024 • edited Loading

hyanwong commented Sep 21, 2023 •

edited

Loading

hyanwong commented Sep 27, 2023 •

edited

Loading

hyanwong commented Sep 28, 2023 •

edited

Loading

hyanwong commented Mar 15, 2024 •

edited

Loading