Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miriam identifiers #8

Merged
merged 5 commits into from
Oct 15, 2019
Merged

Miriam identifiers #8

merged 5 commits into from
Oct 15, 2019

Conversation

kvikshaug
Copy link
Member

Follow-up on #7 which was closed by accident.

That package is great stuff! It would be nice to pull it in as a dependency but it's unfortunately not a good fit as it's mostly a CLI tool, different workflow, etc., so I decided to reimplement the transformation features. However, if those intermediate corrected tables would be available, we could maybe use those as source material instead of metanetx directly and avoid the miriam mapping in this service.

KEGG can be either kegg.compound, kegg.drug, or kegg.glycan. MetaNetX identifiers themselves need the registry prefix. And chebi identifiers all need to be prefixed with CHEBI: in order to be correct.

This is now implemented. It appears swisslipids also need the identifier prefixed with SLM:, at least if it is to be understood similarly to the chebi ones.

- kegg namespace handles compound/drug/glycan
- metanetx deprecated is changed to metanetx.reaction/metanetx.chemical
- chebi identifiers are prefixed with CHEBI:
- swisslipid identifiers are prefixed with SLM:
@kvikshaug kvikshaug requested a review from Midnighter October 11, 2019 10:41
Copy link
Contributor

@Midnighter Midnighter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can indeed get the tables quite easily by running the following nextflow pipeline. I'm working on extending the pipeline to load everything into a SQL database and a few further enhancements to the data.

params.sdk_version = '0.3.2'
params.mnx_release = '3.2'
params.outdir = 'data'

table_names = Channel.from([
    "chem_prop.tsv",
    "chem_xref.tsv",
    "comp_prop.tsv",
    "comp_xref.tsv",
    "reac_prop.tsv",
    "reac_xref.tsv",
])

process pullTables {
    container "midnighter/metanetx-sdk:${params.sdk_version}"

    input:
    val list from table_names.collect()

    output:
    file '*.tsv.gz' into raw_tables

    shell:
    """
    metanetx pull --version !{params.mnx_release} . !{list.join(' ')}
    """
}

process transformTables {
    container "midnighter/metanetx-sdk:${params.sdk_version}"
    publishDir params.outdir, mode:'copy', overwrite: true

    input:
    file table from raw_tables.flatten()

    output:
    file "processed_${table}" into processed_tables

    """
    metanetx etl ${table.getSimpleName().replace('_', '-')} \
        ${table} processed_${table}
    """
}

Copy link
Contributor

@Midnighter Midnighter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the documentation is missing but I actually structured the metanetx-sdk package in a way that not only the CLI but also the internal functions should be easily accessible. However, I can see that you want to avoid pandas so they wouldn't fit together, indeed.

It appears swisslipids also need the identifier prefixed with SLM:

Thanks for noticing, you're absolutely correct! I will correct that in the package and you can decide whether you want to use the workflow above or run the code as is.

@kvikshaug
Copy link
Member Author

I will correct that in the package and you can decide whether you want to use the workflow above or run the code as is.

I did try to run the workflow, but it's been frozen here for close to 30 minutes now:

N E X T F L O W  ~  version 19.07.0
Launching `metanetx` [small_galileo] - revision: e8f7eaaef7
executor >  local (1)
[02/8704ca] process > pullTables      [  0%] 0 of 1
[-        ] process > transformTables -

It also seems to hang when running metanetx pull --version 3.2 . chem_prop.tsv chem_xref.tsv comp_prop.tsv comp_xref.tsv reac_prop.tsv reac_xref.tsv in the shell. I'm on python 3.7.2.

@Midnighter
Copy link
Contributor

I'll look into that but there might also be problems on MetaNetX' end:

Due to network and computer room works MetaNetX/MNXref will be sporadically inaccessible between Oct 7th and Oct 30th

@kvikshaug
Copy link
Member Author

Ok, then I'll move forward with this PR until we're able to make the transformed files available somewhere.

@kvikshaug kvikshaug merged commit d488b82 into master Oct 15, 2019
@kvikshaug kvikshaug deleted the miriam-identifiers branch October 15, 2019 09:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants