A command line utility for managing and running jobs in complex Python environments.
Support tested for:
Most ongoing research in the ML4GW organization leverages Poetry for managing Python virtual environments in the context of a Python monorepo. In particular, Poetry makes managing a shared set of libraries between jobs within a project simple and straightforward.
However, several tools in the Python gravitational wave analysis ecosystem cannot be installed via Pip (in particular the library GWpy uses to read and write .gwf
files and the library it uses for reading archival data from the NDS2 server). This complicates the environment management picture by having some projects which use Poetry to install local libraries as well as their own code into Conda virtual environments, and others which don't require Conda at all and can install all the libraries they need into Poetry virtual environments.
Pinto attempts to simplify this picture by installing a single tool in the base Conda environment which can dynamically detect whether a project requires Conda, create the appropriate virtual environment, and install all necessary libraries into it.
pinto -p /path/to/my/project build
It can then be used to run jobs inside of that virtual environment.
pinto -p /path/to/my/project run my-command --arg1
If you're currently in the project's directory, you can drop the -p/--project
flag altogether for any pinto command, e.g.
pinto build
pinto run my-command --arg1
To leverage Pinto in a project, all you need is the pyproject.toml
file required by Poetry which specifies your project's dependencies. If just this file is present, pinto
will treat your project as a "vanilla" Poetry project and manage all of its dependencies inside a Poetry virtual environment.
Inidicating to Pinto that your project requires Conda is as simple as including a poetry.toml
file in your project directory with the lines
[virtualenvs]
create = false
Alternatively, from you project directory you can run
poetry config virtualenvs.create false --local
When building your project, pinto
will first look for an entry that looks like
[tool.pinto]
base_env = "/path/to/environment.yaml"
In your project's pyproject.toml
. If this entry doesn't exist, pinto
will look for a file called either environment.yaml
or environment.yml
starting in your project's directory, then ascending up your directory tree to the root, using the first file it finds. This way, you can easily have a base environment.yaml
in the root of a monorepo from on top of which all your projects build, while leaving projects the option of overriding this base image with their own environment.yaml
.
In fact, if the name
listed in the environment.yaml
discovered by pinto
ends with -base
, pinto
will automatically name your project's virtual environment <prefix>-<project-name>
. For example, if the name of your project (as given in the pyproject.toml
) is nn-trainer
, and the environment.yaml
at the root of your monorepo looks like
name: myproject-base
dependencies:
- ...
then pinto
will name your project's virtual environment myproject-nn-trainer
.
To see more examples of project structures, consult the examples
folder.
For any non-containerized installation methods, please consult the support matrix at the top of this document to see which versions of Anaconda and Poetry are supported by pinto
.
The simplest way to get started with pinto is to use the container published by this repository, which is made available through GitHub's container registry. You can pull it by running
docker pull ghcr.io/ML4GW/pinto:main
See this document for information about how to authenticate to the GitHub container reigstry.
Pinto can only be installed on top of Anaconda, so make sure you have a local install available to work with (instructions found here. I particularly recommend using Miniconda for a bare install, since most your work will be in virtual environments anyway.
NOTE:
pinto
is currently only compatible with 4.x Conda versions!! To find the appropriate Miniconda installer, please look at the installer archives.
Your options are then to either install pinto
in the base
conda environment (recommended), or in a virtual environment. If you choose to go the latter route, the conda environments managed by pinto will be kept in a subdirectory of pinto's environment.
First install poetry via pip
(base) ~$ python -m pip install "poetry>1.2.0,<1.3.0"
then install pinto via pip
(base) ~$ python -m pip install git+https://github.com/ML4GW/pinto@main
If you don't want to install pinto into your base
conda environment, you can install it by creating an environment file like the one found here, and creating a virtual environment like:
(base) ~$ conda env create -f environment.yaml
You can then activate your pinto environment and execute commands inside of it
(base) ~$ conda activate pinto
(pinto) ~$ pinto --version
Whether you installed pinto your base environment or in a virtual environment, we recommend setting up Poetry's default virtual environment path so that it installs environments to the same location as conda. With the desired environment activated, run
(base OR pinto) ~$ poetry config virtualenvs.path $CONDA_PREFIX/envs
To develop pinto, clone the repo locally
(base) ~$ git clone https://github.com/ML4GW/pinto.git
Then complete either installation method above, but with the local library installed editably. For base installs:
(base) ~$ python -m pip install -e ./pinto[dev]
For virtual environment installs, edit the environment.yaml
so that the pinto install line is replaced with
- -e .[dev]
Then run
(base) ~$ cd pinto
(base) ~$ conda env create -f environment.yaml