-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miriam identifiers #8
Conversation
- kegg namespace handles compound/drug/glycan - metanetx deprecated is changed to metanetx.reaction/metanetx.chemical - chebi identifiers are prefixed with CHEBI: - swisslipid identifiers are prefixed with SLM:
b5a5751
to
d488b82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can indeed get the tables quite easily by running the following nextflow pipeline. I'm working on extending the pipeline to load everything into a SQL database and a few further enhancements to the data.
params.sdk_version = '0.3.2'
params.mnx_release = '3.2'
params.outdir = 'data'
table_names = Channel.from([
"chem_prop.tsv",
"chem_xref.tsv",
"comp_prop.tsv",
"comp_xref.tsv",
"reac_prop.tsv",
"reac_xref.tsv",
])
process pullTables {
container "midnighter/metanetx-sdk:${params.sdk_version}"
input:
val list from table_names.collect()
output:
file '*.tsv.gz' into raw_tables
shell:
"""
metanetx pull --version !{params.mnx_release} . !{list.join(' ')}
"""
}
process transformTables {
container "midnighter/metanetx-sdk:${params.sdk_version}"
publishDir params.outdir, mode:'copy', overwrite: true
input:
file table from raw_tables.flatten()
output:
file "processed_${table}" into processed_tables
"""
metanetx etl ${table.getSimpleName().replace('_', '-')} \
${table} processed_${table}
"""
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the documentation is missing but I actually structured the metanetx-sdk package in a way that not only the CLI but also the internal functions should be easily accessible. However, I can see that you want to avoid pandas so they wouldn't fit together, indeed.
It appears swisslipids also need the identifier prefixed with
SLM:
Thanks for noticing, you're absolutely correct! I will correct that in the package and you can decide whether you want to use the workflow above or run the code as is.
I did try to run the workflow, but it's been frozen here for close to 30 minutes now:
It also seems to hang when running |
I'll look into that but there might also be problems on MetaNetX' end:
|
Ok, then I'll move forward with this PR until we're able to make the transformed files available somewhere. |
Follow-up on #7 which was closed by accident.
That package is great stuff! It would be nice to pull it in as a dependency but it's unfortunately not a good fit as it's mostly a CLI tool, different workflow, etc., so I decided to reimplement the transformation features. However, if those intermediate corrected tables would be available, we could maybe use those as source material instead of metanetx directly and avoid the miriam mapping in this service.
This is now implemented. It appears
swisslipids
also need the identifier prefixed withSLM:
, at least if it is to be understood similarly to thechebi
ones.