Skip to content

window_by_interval #974

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2022
Merged

window_by_interval #974

merged 1 commit into from
Dec 6, 2022

Conversation

tomwhite
Copy link
Collaborator

@tomwhite tomwhite commented Dec 5, 2022

Adds a function to create windows from arbitrary intervals, which may come from interval lists or BED files, for example. It's left to the user to load from these files (though there's an example for BED in the tests), mainly to avoid having to handle genomic coordinate conventions in sgkit itself (#434).

This provides the useful minimum needed for gene-ε (#692), where some of the examples have gene lists that I would like to convert to sgkit windows.

@timothymillar you may be interested in this, since it implements a case you brought up in #790.

Copy link
Collaborator

@timothymillar timothymillar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I think leaving the IO to users is the best approach for simple tabular formats.

Just out of curiosity, was there a particular reason that contig names are stored in an attribute of the dataset rather than in an array (e.g., contig_id)?

Copy link
Collaborator

@jeromekelleher jeromekelleher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@benjeffery benjeffery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tomwhite
Copy link
Collaborator Author

tomwhite commented Dec 6, 2022

Just out of curiosity, was there a particular reason that contig names are stored in an attribute of the dataset rather than in an array (e.g., contig_id)?

I looked back and found that contig names were introduced in #10, and there doesn't seem to have been any particular reason that they are stored as an attribute. Sample IDs are stored as a variable, for example. I guess we could change this, but might need to think about backward compatibility.

@tomwhite tomwhite added the auto-merge Auto merge label for mergify test flight label Dec 6, 2022
@mergify mergify bot merged commit 7e73e8f into sgkit-dev:main Dec 6, 2022
@timothymillar
Copy link
Collaborator

I guess we could change this, but might need to think about backward compatibility.

I don't think it's a big issue, just struck me as a little inconsistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-merge Auto merge label for mergify test flight
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants