Miriam identifiers #8

kvikshaug · 2019-10-11T10:41:51Z

Follow-up on #7 which was closed by accident.

That package is great stuff! It would be nice to pull it in as a dependency but it's unfortunately not a good fit as it's mostly a CLI tool, different workflow, etc., so I decided to reimplement the transformation features. However, if those intermediate corrected tables would be available, we could maybe use those as source material instead of metanetx directly and avoid the miriam mapping in this service.

KEGG can be either kegg.compound, kegg.drug, or kegg.glycan. MetaNetX identifiers themselves need the registry prefix. And chebi identifiers all need to be prefixed with CHEBI: in order to be correct.

This is now implemented. It appears swisslipids also need the identifier prefixed with SLM:, at least if it is to be understood similarly to the chebi ones.

- kegg namespace handles compound/drug/glycan - metanetx deprecated is changed to metanetx.reaction/metanetx.chemical - chebi identifiers are prefixed with CHEBI: - swisslipid identifiers are prefixed with SLM:

Midnighter

You can indeed get the tables quite easily by running the following nextflow pipeline. I'm working on extending the pipeline to load everything into a SQL database and a few further enhancements to the data.

params.sdk_version = '0.3.2'
params.mnx_release = '3.2'
params.outdir = 'data'

table_names = Channel.from([
    "chem_prop.tsv",
    "chem_xref.tsv",
    "comp_prop.tsv",
    "comp_xref.tsv",
    "reac_prop.tsv",
    "reac_xref.tsv",
])

process pullTables {
    container "midnighter/metanetx-sdk:${params.sdk_version}"

    input:
    val list from table_names.collect()

    output:
    file '*.tsv.gz' into raw_tables

    shell:
    """
    metanetx pull --version !{params.mnx_release} . !{list.join(' ')}
    """
}

process transformTables {
    container "midnighter/metanetx-sdk:${params.sdk_version}"
    publishDir params.outdir, mode:'copy', overwrite: true

    input:
    file table from raw_tables.flatten()

    output:
    file "processed_${table}" into processed_tables

    """
    metanetx etl ${table.getSimpleName().replace('_', '-')} \
        ${table} processed_${table}
    """
}

Midnighter

I know the documentation is missing but I actually structured the metanetx-sdk package in a way that not only the CLI but also the internal functions should be easily accessible. However, I can see that you want to avoid pandas so they wouldn't fit together, indeed.

It appears swisslipids also need the identifier prefixed with SLM:

Thanks for noticing, you're absolutely correct! I will correct that in the package and you can decide whether you want to use the workflow above or run the code as is.

kvikshaug · 2019-10-15T08:38:09Z

I will correct that in the package and you can decide whether you want to use the workflow above or run the code as is.

I did try to run the workflow, but it's been frozen here for close to 30 minutes now:

N E X T F L O W  ~  version 19.07.0
Launching `metanetx` [small_galileo] - revision: e8f7eaaef7
executor >  local (1)
[02/8704ca] process > pullTables      [  0%] 0 of 1
[-        ] process > transformTables -

It also seems to hang when running metanetx pull --version 3.2 . chem_prop.tsv chem_xref.tsv comp_prop.tsv comp_xref.tsv reac_prop.tsv reac_xref.tsv in the shell. I'm on python 3.7.2.

Midnighter · 2019-10-15T08:39:40Z

I'll look into that but there might also be problems on MetaNetX' end:

Due to network and computer room works MetaNetX/MNXref will be sporadically inaccessible between Oct 7th and Oct 30th

kvikshaug · 2019-10-15T09:22:26Z

Ok, then I'll move forward with this PR until we're able to make the transformed files available somewhere.

kvikshaug added 4 commits October 11, 2019 12:33

chore: add miriam identifier maps

1912dcd

feat: map namespaces to their corresponding miriam identifiers

8c34aaf

refactor: move parsing logic to separate module

f381899

feat: handle special cases for miriam mappings

b50875f

- kegg namespace handles compound/drug/glycan - metanetx deprecated is changed to metanetx.reaction/metanetx.chemical - chebi identifiers are prefixed with CHEBI: - swisslipid identifiers are prefixed with SLM:

kvikshaug requested a review from Midnighter October 11, 2019 10:41

style: fix qa issues

d488b82

kvikshaug force-pushed the miriam-identifiers branch from b5a5751 to d488b82 Compare October 11, 2019 10:50

Midnighter reviewed Oct 14, 2019

View reviewed changes

Midnighter approved these changes Oct 15, 2019

View reviewed changes

kvikshaug merged commit d488b82 into master Oct 15, 2019

kvikshaug deleted the miriam-identifiers branch October 15, 2019 09:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Miriam identifiers #8

Miriam identifiers #8

kvikshaug commented Oct 11, 2019

Midnighter left a comment

Midnighter left a comment

kvikshaug commented Oct 15, 2019

Midnighter commented Oct 15, 2019

kvikshaug commented Oct 15, 2019

Miriam identifiers #8

Miriam identifiers #8

Conversation

kvikshaug commented Oct 11, 2019

Midnighter left a comment

Choose a reason for hiding this comment

Midnighter left a comment

Choose a reason for hiding this comment

kvikshaug commented Oct 15, 2019

Midnighter commented Oct 15, 2019

kvikshaug commented Oct 15, 2019