Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support conda (python) package manager #2213

Open
rarkins opened this issue Jul 4, 2018 · 70 comments
Open

Support conda (python) package manager #2213

rarkins opened this issue Jul 4, 2018 · 70 comments
Assignees
Labels
help wanted Help is needed or welcomed on this issue new package manager New package manager support priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:feature Feature (new functionality)

Comments

@rarkins
Copy link
Collaborator

rarkins commented Jul 4, 2018

https://conda.io/docs/

@rarkins rarkins added type:feature Feature (new functionality) needs-requirements priority-4-low Low priority, unlikely to be done unless it becomes important to more people labels Jul 4, 2018
@rarkins rarkins changed the title Support conda (python) Support conda (python) package manager Mar 8, 2019
@rarkins rarkins added the new package manager New package manager support label Mar 8, 2019
@meg-hegde
Copy link

Hi, does Renovate now support conda?

@meg-hegde
Copy link

Hi, just wondering whether there are plans to add conda support soon? Alternatively, shall I try adding it using the instructions here: https://github.com/renovatebot/renovate/blob/master/docs/development/adding-a-package-manager.md?

@rarkins
Copy link
Collaborator Author

rarkins commented Aug 5, 2020

No plans, and a PR would be very welcome! I updated the doc just now to make sure it's current.

@meg-hegde
Copy link

Thank you for updating the docs - I'll give this a go when I have some time :)

@gerbenoostra
Copy link
Contributor

#6969 duplicated this, thus closed it. The relevant comments from there:
What would you like Renovate to be able to do?

To also verify python package versions in conda environment files (environment.yml)

Did you already have any implementation ideas?
no

Are there any workarounds or alternative ideas you've tried to avoid needing this feature?

Conda environments can also include pip requirememnts, a workaround is to put those in a separate txt file, and have renovatebot check those.
environment.yml would then be:

dependencies:
- python=3.7
- jupyter
- pip
- pip:
  - -rrequirements.txt

This workaround however does not work for the conda packages (like the python=3.7 here, and any conda packages installed, like jupyter in this case)

Is this a feature you'd be interested in implementing yourself?
maybe

** Related features**
Relates to #931 , but that only implemented pip dependencies.

@rarkins
Copy link
Collaborator Author

rarkins commented Aug 13, 2020

It would be helpful if anyone can provide some public repo examples that can be tested against, as well as clarifications on file naming / file syntax. For example should we match against every environment.yml file in the repo?

@padmick
Copy link

padmick commented Sep 7, 2020

From their docs https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-from-an-environment-yml-file they reference it against environment.yml files. I know internally we use different names for our different projects (IE we name our env the same codename as the project so titan.yml ect) but if that is an issue we can look at changing them back to the conda default name.

@davidspek
Copy link

I'm also interested in this as the PyTorch images (and the jupyter-stack ones) use conda.

@AndreaGiardini
Copy link

We could give it a shot, starting from the datasource. I am not very familiar with typescript but I could give it a go.

The docs in the datasource require a function called ``getReleases` with input:

`lookupName`: the package's full name including scope if present (e.g. @foo/bar)
`registryUrls`: an array of registry Urls to try

If I understood it correctly, for something like https://anaconda.org/conda-forge/proj/ we will have:

lookupName -> conda-forge/proj
registryUrl -> https://anaconda.org/

Am I missing something?

@davidspek
Copy link

I would just like to make a note that the conda-forge channel would be important to have support for (also in terms of ToS of the regular conda channel).

@rarkins
Copy link
Collaborator Author

rarkins commented Jun 6, 2021

I agree that starting with the datasource first makes best sense.

Importantly, note that Renovate doesn't yet have the concept of "platform" for datasources but it looks like that might be necessary for conda packages?

@mathbunnyru
Copy link

Jupyter docker images would also greatly benefit from this feature.
jupyter/docker-stacks#1153

Right now, we're updating our dependencies manually, and it would be great to get rid of this maintenance burden.

@morremeyer
Copy link
Contributor

morremeyer commented Feb 16, 2022

Hey everyone, Anaconda engineer here. We've assembled a small group of engineers that is looking into adding functionality for conda over time.

Please note that none of us is working on this full time right now, but work will be done over time.

I've started implementing a datasource for conda in #14257, any help with my testing issue and feedback in general is very welcome!

@mdehollander
Copy link

Hey everyone, Anaconda engineer here. We've assembled a small group of engineers that is looking into adding functionality for conda over time.

Please note that none of us is working on this full time right now, but work will be done over time.

I've started implementing a datasource for conda in #14257, any help with my testing issue and feedback in general is very welcome!

Thanks for starting the work on this! I am happy to start testing, but I am not sure how since I am new to renovate. I gave it a try with adding a renovate.json to my repo (https://github.com/mdehollander/orochi/blob/master/renovate.json). The conda environment files are the folder src/envs/. But I did not manage to set it up correctly since I get this error:

validationMessage": "Invalid configuration option: conda, The following managers configured in enabledManagers are not supported: \"conda\"",

What is the best way to test and configure this? Or I am too early 😺

@morremeyer
Copy link
Contributor

morremeyer commented Feb 25, 2022

@mdehollander You're not too early, but there's no manager implemented yet, just a datasource. I'll take a first shot at a manager next Friday.

You can check the datasource documentation at https://docs.renovatebot.com/modules/datasource/#conda-datasource.

If you want to use it right now, here's an example for how to do so. In your environment.yml, add a comment that annotates the line for renovate:

name: your-project
channels:
  - defaults
dependencies:
  - pytest
  - pytest-cov
  - coverage
  # renovate datasource=conda depName=main/yapf
  - yapf==0.31.0

This annotates the yapf package for the regexManager and will query the main channel for the versions. You can then use the regexManager with the following configuration:

{
  "reviewersFromCodeOwners": "true",
  "regexManagers": [
    {
      "description": "Upgrade conde dependencies",
      "fileMatch": [
        "(^|/)environment.yml$"
      ],
      "matchStrings": [
        "# renovate datasource=conda\\sdepName=(?<depName>.*?)\\s+- [a-z0-9]+==\"?(?<currentValue>.*)\"?"
      ],
      "datasourceTemplate": "conda"
    }
  ]
}

The job of the package manager is to do the discovery/annotation that I show above automatically so that you don’t need any configuration in the default case.

@mdehollander
Copy link

Thanks for the extra information and the example config. With this I managed to get a PR triggered for an update of a conda environment.
From the logs:

DEBUG: Matched 33 file(s) for manager regex: src/envs/amos.yaml, src/envs/antismash.yaml, src/envs/bamm.yaml, src/envs/bbmap.yaml, src/envs/bedtools.yaml, src/envs/bigscape.yaml, src/envs/bigslice.yaml, src/envs/bwa.yaml, src/envs/cat.yaml, src/envs/checkm.yaml, src/envs/concoct.yaml, src/envs/coverm.yaml, src/envs/dastool.yaml, src/envs/fraggenescan.yaml, src/envs/groopm.yaml, src/envs/khmer.yaml, src/envs/kraken.yaml, src/envs/mash.yaml, src/envs/megahit.yaml, src/envs/metabat.yaml, src/envs/minimap2.yaml, src/envs/mmgenome.yaml, src/envs/mmgenome_prepare.yaml, src/envs/pigz.yaml, src/envs/prodigal.yaml, src/envs/quast.yaml, src/envs/report.yaml, src/envs/samtools.yaml, src/envs/seqtk.yaml, src/envs/spades.yaml, src/envs/tree.yaml, src/envs/vamb.yaml, src/envs/vsearch.yaml

DEBUG: 2 flattened updates found: bioconda/spades, bioconda/vsearch
DEBUG: Returning 2 branch(es)
DEBUG: Fetching changelog: https://github.com/ablab/spades (3.14.0 -> 3.15.4)

And 2 PRs for 2 packages that I enabled: https://github.com/mdehollander/orochi/pulls

image

To get the environment files in subfolders recognized I changed the regular expression in the config to:

      "fileMatch": [
        "^(?:src/envs/)?\\w+\\.yaml$"
      ],

Looks very promising! Thanks! Looking forward for a manager :)

@mdehollander
Copy link

@morremeyer I am wondering if you managed to work on automatic discovery of conda packages via the package manager. That would make the use on existing installation easier, because you don't need to annotate the env files :)

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

The files are often also served used zstd compression which makes them much smaller (20mb). There is also an accept conda enhancement proposal called "sharded repodata" that splits the repodata into individual files per package but it is at the moment only supported by channels on prefix.dev.

rattler provides the so called repodata gateway object which is an object that hides all this complexity and allows one to simply query for the repodata of a specific package. The gateway then figures out what the most efficient way of fetching the data is, as well as caching all of it.

I have been working on adding this to the javascript bindings of rattler with some good results but its not done yet.

For the time being, I recommend you use the zstd compressed files.

for rattler_repodata_gateway crate, I think renovate should handle the http requests but not rust code so it can use renovat's shared http cache

@baszalmstra
Copy link

I assume renovate uses the fetch api, which is what the rust code will also use.

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

I assume renovate uses the fetch api, which is what the rust code will also use.

It have a wrapped http request client to be used in all data source

response = await this.http.getJsonUnchecked(url);

@baszalmstra
Copy link

I see, yeah that complicates things. We can probably make this work but it will make things more complicated. From what I understand the reqwest crate which is used by rattler calls back into javascript to call the fetch API. I assume we can also inject another method to do this manually and allow overriding the client?

But I believe the fetch API also does caching, so as long as renovate is not requesting the same URLs (which doesnt make a lot of sense when you use the gateway) I dont think its that bad of a problem.

Ill not be implementing a custom fetch API in the initial version of the rattler gateway API. Would be happy to accept PRs though!

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

The files are often also served used zstd compression which makes them much smaller (20mb). There is also an accept conda enhancement proposal called "sharded repodata" that splits the repodata into individual files per package but it is at the moment only supported by channels on prefix.dev.

rattler provides the so called repodata gateway object which is an object that hides all this complexity and allows one to simply query for the repodata of a specific package. The gateway then figures out what the most efficient way of fetching the data is, as well as caching all of it.

I have been working on adding this to the javascript bindings of rattler with some good results but its not done yet.

For the time being, I recommend you use the zstd compressed files.

It's not only the file size but also memory usage. for example, parsing conda-forge/linux-64/repodata.json will take up to 1G memory

import * as fs from 'node:fs';

const file = fs.readFileSync(`./conda-forge/linux-64/repodata.json`, 'utf8');

const obj = JSON.parse(file);

console.log(process.memoryUsage());

const _ = obj;
{
  rss: 1052028928,
  heapTotal: 1012838400,
  heapUsed: 983224232,
  external: 1691001,
  arrayBuffers: 10475
}

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

does anaconda has sharded_repodata now?

looks like not conda/conda-index#161

@baszalmstra
Copy link

baszalmstra commented Feb 27, 2025

The memory usage is significantly reduced by using the gateway. It does not parse the entire file as JSON but only cleverly parses the parts from the repodata that it actually needs. It does however need all the bytes in memory. With sharded repodata this problem is also mitigated.

does anaconda has sharded_repodata now?

Unfortunately not yet.

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

I see, yeah that complicates things. We can probably make this work but it will make things more complicated. From what I understand the reqwest crate which is used by rattler calls back into javascript to call the fetch API. I assume we can also inject another method to do this manually and allow overriding the client?

But I believe the fetch API also does caching, so as long as renovate is not requesting the same URLs (which doesnt make a lot of sense when you use the gateway) I dont think its that bad of a problem.

Ill not be implementing a custom fetch API in the initial version of the rattler gateway API. Would be happy to accept PRs though!

I think renovate also support auth config for each http host, which is supported by this.http here.

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

I only use anaconda so I also won't implement it. 😅

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

I'd like to suggest we rename current conda source to ananconda, since it only work with anaconda repo but not generic conda repo

@rarkins
Copy link
Collaborator Author

rarkins commented Feb 27, 2025

I'd like to suggest we rename current conda source to ananconda, since it only work with anaconda repo but not generic conda repo

It depends.

Assuming that non-anaconda registries will be supported in future, would they be best added to the existing datasource, or to a separate one?

If the anaconda API is close to identical to the non-anaconda Conda APIs, then it should be the same datasource (like we do with docker datasource).

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

Assuming that non-anaconda registries will be supported in future, would they be best added to the existing datasource, or to a separate one?

I think a separate one.

If the anaconda API is close to identical to the non-anaconda Conda APIs, then it should be the same datasource (like we do with docker datasource).

It's not very close. non-anaconda conda repo doesn't even have API for single package. Anaconda api is also not part of spec

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

It's kind like git tags / github tags

@rarkins
Copy link
Collaborator Author

rarkins commented Feb 27, 2025

The next challenge is that although we could rename datasource/conda to datasource/anaconda, and add migration code so that any existing config for conda is now massaged to anaconda, this concept would then break if we added back a conda datasource. Then we wouldn't know if user config was referring to conda (new) or the older conda/anaconda.

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

The next challenge is that although we could rename datasource/conda to datasource/anaconda, and add migration code so that any existing config for conda is now massaged to anaconda, this concept would then break if we added back a conda datasource. Then we wouldn't know if user config was referring to conda (new) or the older conda/anaconda.

oops, I forget there is regex manager, someone is already using it.

@baszalmstra
Copy link

We could call the new one simply conda-repodata or something along those lines? or conda-channel? Naming is hard..

@pavelzw
Copy link

pavelzw commented Feb 27, 2025

Couldn't we call that one anaconda-api?

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

conda-channel looks good and make more sence.

@rarkins
Copy link
Collaborator Author

rarkins commented Feb 27, 2025

It's kind like git tags / github tags

Funny that you mention that. I have plans for git-tags to contain the logic which identifies "oh this is a github repository - let's use the github-tags datasource instead". Otherwise you force each manager to have to implement that logic. Similar with conda - it could be a single datasource but dispatch logic separately. Docker isn't the only one like that - it's quite common for the default registry in ones like PyPI or Cargo to have specific implementations.

@trim21
Copy link
Contributor

trim21 commented Feb 27, 2025

I was actually just thinking the same thing.

For multiple channels package, you can't use different data source, so a single conda data source would be more ideal.

In this case, the manager that produce conda packages should prepare the full registry urls (to the conda repo), for conda-forge with anaconda it should be https://conda.anaconda.org/conda-forge/, and for conda-forge from prefix.dev it's https://prefix.dev/conda-forge/. and for mirror it's https://mirrors.some.org/anaconda/conda-forge/, always add trailing slash in all cases.

Then we parse registry url in conda data source to decide how we get packages versions, for example use api.anaconda for https://conda.anaconda.org/, graphql of prefix.dev for https://prefix.dev/, or (if someone implement it) we use generic conda repo logic for repodata.json for all unknown registry.

And in manager, it should output packages in following cases, all conda versioning and conda data source:

// what current conda data source support, goes to api.anaconda.org
{
  packageName: 'conda-forge/numpy',
}

// goes to https://prefix.dev/api/graphql.
{
  packageName: 'numpy',
  registryUrls: ["https://prefix.dev/conda-forge/"]
}

// for multiple channel support, goes to api.anaconda.org and will fallback to
// https://prefix.dev/api/graphql
// if it's missing from api.anaconda.org conda-forge.
{
  packageName: 'numpy',
  registryUrls: [
    "https://conda.anaconda.org/conda-forge/", 
    "https://prefix.dev/conda-forge/",
  ]
}

// for multiple channel support, goes to api.anaconda.org first,
// then use generic conda logic to get versions from
// https://conda.repo.some.org/internal/
{
  packageName: 'package-not-exists-in-conda-forge',
  registryUrls: [
    "https://conda.anaconda.org/conda-forge/",
    "https://conda.repo.some.org/internal/",
  ]
}

And we keep the current default registry url of conda data source (which is api.anaconda) so current regex manager users will also be happy.

for a environment.yml example:

name: example
channels:
    - https://conda.anaconda.org/menpo
    - conda-forge
dependencies:
    - python==3.5.2
    - conda-forge::numpy
    - pip:
        - tensorflow

I would expect it to produce this:

[
  {
    packageName: 'python',
    datasource: 'conda',
    versioning: 'conda',
    currentValue: '==3.5.2',
    registryUrls: [
      "https://conda.anaconda.org/menpo/",
      "https://conda.anaconda.org/conda-forge/",
    ]
  },
  {
    packageName: 'conda-forge/numpy',
    versioning: 'conda',
    datasource: 'conda',
  },
  {
    packageName: 'tensorflow',
    versioning: 'pep440',
    datasource: 'pypi',
  }
]

And there is also a defaults channel means main + r + msys2, which should be handled by conda manager


Should we allow conda manager and pixi manager produce package like this? It currently work with our conda manger, but now very ideal

{
  packageName: 'numpy',
  registryUrls: [
    "https://api.anaconda.org/package/cuda/", 
    "https://api.anaconda.org/package/conda-forge/", 
  ]
}

I think it would be best that manager never use api url as registry url in the future, but currenly it should do this to support multiple channels from anaconda, and ignore channels that are not from anaconda (for now).

@trim21
Copy link
Contributor

trim21 commented Mar 5, 2025

Another problem: there is no way to know if a version is yanked. So you will encounter this:

renovatebot will try to update a package to yanked version and you got a broken manifest file, package manager tell you it can't find available distribution

Image

Image

Image

@baszalmstra
Copy link

If you use the prefix graphql API you can use the yankedReason to determine if a package is yanked.

If you use repodata.json, yanked entries should be under the removed key.

Does that help?

@trim21
Copy link
Contributor

trim21 commented Mar 5, 2025

#34646 should work for most case of pixi, now I just need to get it merged 😄


You should be able to get lock file maintenance of pixi when renovate deploy 39.190.0 to production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Help is needed or welcomed on this issue new package manager New package manager support priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:feature Feature (new functionality)
Projects
None yet
Development

No branches or pull requests