Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define JSON Schema for construct.yml #943

Merged
merged 62 commits into from
Mar 26, 2025
Merged

Conversation

jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Feb 28, 2025

Description

This PR converts our venerable construct.py "schema" to a proper JSON Schema. The Schema is built with Pydantic at "commit time" as we do with docs regeneration. At runtime, only the lightweight jsonschema package is used for validation.

With this change we have:

  • IDE autocompletion (examples have been edited to point to local copy; should just work if you open it in VS Code). I'll also submit it to schemastore to associate construct.yml to the schema in the repo
  • Accurate type validation with detailed error reports
  • Self documented schema

However we have lost the ability to report type information in the docs (the autogenerated annotations are too wordy and noisy). I'm not too worried about this because the information was sometimes inaccurate anyway. Instead users are encouraged to use the schema directly, either in their IDE, or exploring it interactively in apps like https://json-schema.app/view/%23?url=https%3A%2F%2Fraw.githubusercontent.com%2Fjaimergp%2Fconstructor%2Frefs%2Fheads%2Fpydantic-schema%2Fconstructor%2Fdata%2Fconstruct.schema.json.

Checklist - did you ...

  • Add a file to the news directory (using the template) for the next release's release notes?
  • Add / update necessary tests?
  • Add / update outdated documentation?

Additional context:

@bollwyvl mentioned this in a conda-forge Zulip thread and I couldn't resist to attempt a crude conversion. The first commit already gives you inline descriptions and some type validation.

Plenty of work ahead for the very diverse types and actual default values, plus integration with constructor itself.

For fun and laughs, this is the crude script that I used to convert our venerable KEYS "schema" to a Pydantic model:

from constructor.construct import KEYS
from textwrap import indent

print("""
from pydantic import BaseModel, ConfigDict

class ConstructorModel(BaseModel):
    model_config: ConfigDict = ConfigDict(
        extra='forbid',
        use_attribute_docstrings=True,
    )
"""
)

for key, required, type, desc in KEYS:
    type_name = getattr(type, "__name__", getattr(type.__class__, "__name__"))
    print(f"    {key}: {type_name}", "= ..." if required else f"= {type_name}()")
    print('    """')
    print(indent(desc.strip(), "    "))
    print('    """')

@conda-bot conda-bot added the cla-signed [bot] added once the contributor has signed the CLA label Feb 28, 2025
@jaimergp jaimergp changed the title Crude schema migration to Pydantic model and JSON Schema (WIP) Define JSON Schema for `construct.yml Mar 5, 2025
@jaimergp jaimergp changed the title Define JSON Schema for `construct.yml Define JSON Schema for construct.yml Mar 5, 2025
@jaimergp jaimergp marked this pull request as ready for review March 6, 2025 09:07
@jaimergp jaimergp requested a review from a team as a code owner March 6, 2025 09:07
@jaimergp
Copy link
Contributor Author

If it's not too much manual work, it would be awesome to alphabetize the schema keywords.

I don't think alphabetic order would be good. Some values should be grouped together. That said, that's not currently enforced. I couldn't find any tools to do this automatically, so I'd rather not introduce a requirement we can't enforce easily in the future. For now I'll leave it as is.

@jaimergp
Copy link
Contributor Author

Thank you @marcoesters for the thorough review! I think we are getting closer. There are some open questions:

  • Do you like the schema for build_outputs? It's not very elegant but properly encodes our previous logic, I think. It will have to be kept in sync with build_outputs.py though.
  • Alphabetical sort of all attributes. I only sorted the JSON output, not the Pydantic module itself. See thread. Applied some of Nicholas' suggestions.
  • Where to deploy the schema. Right now it can be accessed via Github raw view, but I think @bollwyvl mentioned at some point that that URL it's a little flaky? We can defer that decision to a different PR after accumulating feedback for a bit.

@marcoesters
Copy link
Contributor

* Do you like the [schema for `build_outputs`](https://github.com/conda/constructor/pull/943#discussion_r2006447594)? It's not very elegant but properly encodes our previous logic, I think. It will have to be kept in sync with `build_outputs.py` though.

This doesn't render very well with the viewer in this PR. It just becomes Array<anyOf[string, object, object, object, object, object]>. So, the old way is better for users.

* Alphabetical sort of all attributes. I only sorted the JSON output, not the Pydantic module itself. [See thread](https://github.com/conda/constructor/pull/943#discussion_r2007626258). Applied some of Nicholas' suggestions.

I think that's fine. I personally find it easier to find things when they keywords are alphabetized since we don't have any groupings by installer type.

* Where to deploy the schema. Right now it can be accessed via Github raw view, but I think @bollwyvl mentioned at some point that that URL it's a little flaky? We can defer that decision to a different PR after accumulating feedback for a bit.

Where is the menuinst schema published?

@jaimergp
Copy link
Contributor Author

This doesn't render very well with the viewer in this PR. It just becomes Array<anyOf[string, object, object, object, object, object]>. So, the old way is better for users.

Found a way to render this better.

@jaimergp
Copy link
Contributor Author

Where is the menuinst schema published?

Nowhere besides the repo for now. I need to finish that part where we publish it to schemas.conda.org.

@bollwyvl
Copy link

On the current pydantic source: the "triple quote after the thing" is a little less widely-spread convention: might consider some whitespace to break it up.

Some values should be grouped together. That said, that's not currently enforced. I couldn't find any tools to do this automatically

For primarily machine-readable files, it doesn't matter as long as it's consistent over time, especially for new things, again to maintain diffability. If a change is small, it should generate a small diff.

Over on pixi's contraption, instead of trying to do anything with the pydantic export API, it works with the json.JSONEncoder, and as it targets TOML has to deal with pydantic's inability to create proper optional fields instead of accepting null everywhere. I've intended to pull some of that out as a standalone package, but still haven't gotten around to it... mostly due to issues like the ones discussed here.

To group things together, there are some choices:

  • use a top-level anyOf to structurally group things together in definitions
    • GroupAOptions | GroupBOptions
  • put something human-readable in the description that can be parsed out by both humans and machines
    • "description": "Yaddd Yadda. _Group: Yadda_"
  • add some stuff to the annotation (incompatible with docstrings-as-description)
    • json_schema_extra={"$comment": "group: yadda"}

Copy link
Contributor

@marcoesters marcoesters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! If the loss of information in the docs becomes an issue, we can always revisit regenerating the docs from the schema.

@jaimergp jaimergp merged commit 8b4f49d into conda:main Mar 26, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed [bot] added once the contributor has signed the CLA
Projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

4 participants