Skip to content

Repository module and example #2193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 19, 2022
Merged

Conversation

jku
Copy link
Member

@jku jku commented Nov 25, 2022

Calling this a repository module sounds a little too grand (it's literally an abstract class that does almost nothing)... but I think it works so maybe we start with this.

I'm adding some reviewers for visibility. If someone wants to see what it does:

Note that the Repository class is just a "ergonomic metadata editing framework" -- it makes producing valid and logical next versions of TUF metadata within a repository easier. So it's not really a "repository" (module name makes sense as this is for repository side tooling but the class name could still be debated)

examples/repository should also be extended (or other examples added):

  • it should have a tiny developer API and there should be a developer tool (built using the same building blocks) that can sign and upload things to the example repository app
  • some of the more complex repository details demonstrated in the manual examples should be implemented here as well (and eventually manual repo examples deleted)

Fixes #1136 maybe?

  • The code follows the Code Style Guidelines
  • Tests have been added for the bug fix or new feature
  • Docs have been added for the bug fix or new feature

@coveralls
Copy link

coveralls commented Nov 25, 2022

Pull Request Test Coverage Report for Build 3697521735

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 3 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.001%) to 98.19%

Files with Coverage Reduction New Missed Lines %
tuf/ngclient/updater.py 3 97.97%
Totals Coverage Status
Change from base Build 3609587369: 0.001%
Covered Lines: 1347
Relevant Lines: 1363

💛 - Coveralls

@jku
Copy link
Member Author

jku commented Nov 26, 2022

for demo, please try https://github.com/jku/python-tuf/blob/repository-lib/examples/client_example/README.md

Forgot to mention: you may be tempted to run the server and use a browser to see the metadata being served. This sort of works but ultimately breaks things as at least Chrome does things that are incompatible with single thread servers like http.server.HTTPServer: it opens connections (just in case) and then does nothing.

@jku
Copy link
Member Author

jku commented Nov 28, 2022

Having played with this for about a week now, I think there is one "Repository implementation" we may want to include in the library itself. While the repositories themselves may be wildly different, the "developer tools" or "signing tools" are probably very similar:

  • they have a root of trust like a downloader client
  • they have a cache for local, valid metadata like a downloader client
  • they may update to current metadata by downloading from the remote repository like a downloader client would (but may also have other means like git pull)
  • they do want to only operate on top of a valid metadata cache... but the definition of valid is not the same as in a downloader client: as an example they do want to load expired metadata so that they can make a new version of it...
  • they likely never want to modify the metadata in cache: they'll want to modify the metadata to send a new version to the remote repository. The method of "sending" will vary (http post, git commit & push) but the idea is the same: the metadata cache should represent the remote repository state, and the modified metadata gets sent to that repository to integrate

So the result should be a mix of ngclient.Updater and the SimpleRepository from this PR. This is clearly future work and will not happen in in this PR, I'm just writing the idea out there.

jku added 4 commits November 28, 2022 12:13
I plan to add another repository example as well.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Plan for tuf.repository is:
* provides useful functionality for TUF repository-side implementations
  (repository applications, developer tools, etc)
* is minimalistic: only features that most implementations will use
  should be icluded
* Only example implementations will be provided in python-tuf
* As more repository implementations are built using tuf.repository
  we can evaluate what extended functionality is useful

In this PR, a single abstract class is added that provides a framework
for building repository-modifying tools. In subsequent commits
some examples will be added that demonstrate how to use the class.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
This uses the repository module to create an app that
* generates everything from scratch
* serves metadata and targets from memory
* simulates a live repository by adding new targets every few seconds

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
* Support any repository (that serves /targets/ and /metadata/)
  with --url
* Support multiple repositories by aking the local cache
  repository-specific
* Add "tofu" command to initialize with Trust-On-First-Use
* Update README so it uses the new repository application example

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Nov 28, 2022

plain rebase, no changes

@jku
Copy link
Member Author

jku commented Nov 30, 2022

A potential change to snapshot/timestamp:

  • Current implementation requires "meta" as argument to both methods
  • alternative is to require Repository to provide snapshot meta and timestamp meta as properties, this way the methods wouldn't need arguments.

I think this would be as flexible but would make calling snapshot()/timestamp() easier

EDIT: I've implemented this in later commit

jku added 5 commits November 30, 2022 18:44
This does not make the examples simpler now, but it will when
there are multiple locations where snapshot/timestamp are called.

* This way the snapshot/timestamp input material is an internal detail
  of Repository and the call sites will be simpler.
* Both methods now have a "force" argument that can be used to create a
  new version regardless of meta info changes
* but implementations are now required to implement snapshot_info
  and targets_infos properties that represent the current snapshot and
  targets versions in the repository

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Otherwise the metafile cache and the metadata object end up
pointing to same instances which starts breaking later.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
This is not required for the demo but is more realistic: we keep
a cache of targets versions so that we can produce a new snapshot
whenever one is needed, without accessing all of the targets metadata
to do so.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Tweak comments as well

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive work! I've given this a very close look, especially tuf/repository and examples/repository/_simplerepo.py, and only found some minor nits. I think we can get this merged and even released soon, but maybe note somewhere that this is more experimental than ngclient and metadata api.


Attributes:
role_cache: Contains every historical metadata version of every role in
this repositorys. Keys are rolenames and values are lists of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
this repositorys. Keys are rolenames and values are lists of
this repository. Keys are role names and values are lists of

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping s/repositorys/repository

This is a collection of comment, documentation and logging fixes.

The noteworthy part is making it clear that repository is not stable
API yet: I think this is a good idea.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Dec 3, 2022

(I can't believe I'm asking for bikeshedding but here goes...)

One thing that I would like bring up for discussion is naming:

  • open()/close() -- on one hand these are descriptive but it's not perfect: as an example, not closing a file descriptor is a bug but not closing a role in this api is fine: you just decided not to write a new version after all. I have also considered open()/commit() but so far the symmetry of the current naming has won. Maybe this is not even an important detail as most users should be using edit() which handles open/close.
  • snapshot_info and targets_infos: these caches are really hard to name without making a mess in the code: A timestamp role contains snapshot meta and snapshot role contains targets meta, and these two should sort of map to those metas. snapshot_info and targets_infos represent the current state of stored metadata while the metas are a view of the published repository at a specific point in time -- including already removed roles which these caches do not contain.

Another thing is the idea of the edit() contextmanager: does that make sense to others? Is the use of AbortEdit still understandable -- I imagine it won't be needed often but snapshot() does use it so it should be understandable...

@jku jku marked this pull request as ready for review December 3, 2022 09:50
Comment on lines 74 to 76
def edit(
self, role: str, init: bool = False
) -> Generator[Signed, None, None]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the core usability complaint I have with edit() is that it doesn't handle better static typing:

with edit("targets") as targets:
    # mypy sees targets as type Signed, not Targets

Most issues would just be solved by adding a typed helper:

    def edit_targets(
        self, role: str, init: bool = False
    ) -> Generator[Targets, None, None]:

(not sure if the other types even need a helper: timestamp and snapshot at least don't need so much editing: Repository handles the usual cases already)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not very exited about adding a helper just for typing. What's the implication if mypy doesn't know the subtype?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implication is that more complex code in the with-block (like this attempt at easy target removal) won't have type suggestions in your editor and won't be type checked by mypy unless you do some extra work.

The workaround I use in that example makes VSCode suggestions work but doesn't make mypy happy... I think the least disruptive workaround is probably a assert(isinstance(Targets, targets)) as the first line inside the with-block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but just to be clear: I'm not suggesting to add a helper in this PR -- just bringing up potential areas of improvement

@lukpueh
Copy link
Member

lukpueh commented Dec 5, 2022

(I can't believe I'm asking for bikeshedding but here goes...)

One thing that I would like bring up for discussion is naming:

Fair. Naming is important, and hard.

  • open()/close() -- on one hand these are descriptive but it's not perfect: as an example, not closing a file descriptor is a bug but not closing a role in this api is fine: you just decided not to write a new version after all. I have also considered open()/commit() but so far the symmetry of the current naming has won. Maybe this is not even an important detail as most users should be using edit() which handles open/close.

I had similar thoughts. These names do have other connotations. What about more abstract terms, like start/end, enter/exit, pre_edit/post_edit? ... otoh, those make less sense, when used outside of edit. Maybe open/close are just fine.

  • snapshot_info and targets_infos: these caches are really hard to name without making a mess in the code: A timestamp role contains snapshot meta and snapshot role contains targets meta, and these two should sort of map to those metas. snapshot_info and targets_infos represent the current state of stored metadata while the metas are a view of the published repository at a specific point in time -- including already removed roles which these caches do not contain.

Can't think of better names. Let's solve this with documentation. Maybe add a note to look at timestamp and snapshot methods, and e.g. SimpleRepository, when in doubt about how these are maintained.

For that it might be helpful to rename the second return value in snapshot (e.g. to
replaced/updated/old/previous), to make it clear that this does not contradict with "the dictionary published by snapshot() will also include metadata that no longer exists in the repository".

Btw., it also took me a bit to understand the magical defaultdict+lambda -powered assignment in SimpleRepository, when I tried to understand targets_infos. But maybe that's just me. And it's definitely a cool too. Maybe a comment would help?

Another thing is the idea of the edit() contextmanager: does that make sense to others?

I think it does. One thing I was wondering was whether open/close arguments should also be left unspecified given that open and close can be anything the implementer wants them to be. E.g. they could both accept **kwargs passed on like so:

def edit(role: str, open_kwargs: Dict[str, Any] = None, close_kwargs: Dict[str, Any] = None)

Is the use of AbortEdit still understandable -- I imagine it won't be needed often but snapshot() does use it so it should be understandable...

I think it's fine.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Dec 5, 2022

Btw., it also took me a bit to understand the magical defaultdict+lambda -powered assignment in SimpleRepository, when I tried to understand targets_infos. But maybe that's just me. And it's definitely a cool too. Maybe a comment would help?

yeah, that defaultdict is likely only useful once in in close() ... so the case of missing key could be handled manually there. But on the other hand role_cache and signer_cache use the same mechanism (it would even look the same if MetaFile() had a default version value).

I'll just add a comment

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku
Copy link
Member Author

jku commented Dec 5, 2022

For that it might be helpful to rename the second return value in snapshot (e.g. to
replaced/updated/old/previous), to make it clear that this does not contradict with "the dictionary published by snapshot() will also include metadata that no longer exists in the repository".

I've kept the names but I tweaked the docstring of the methods to make it clear that the returned values are metadata versions that are no longer part of snapshot/timestamp meta.

I think I've now addressed everything so far.

@jku
Copy link
Member Author

jku commented Dec 14, 2022

From chat with lukas:

open(self, role: str, init: bool = False)

this does not really need the init argument...

could also move sign() out of the ABC -- then close also does not need the sign_only argument.

jku added 2 commits December 14, 2022 19:53
This no longer seems needed: if the metadata store does not contain
a single version of role, then open() can assume it is initializing.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
This is only needed for threshold signing and not even used in the
example: leave it to the implementations to handle for now.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@jku jku requested a review from lukpueh December 14, 2022 18:08
@jku
Copy link
Member Author

jku commented Dec 14, 2022

Two changes:

  • init argument was removed from open: it's just not necessary
  • sign() (meaning signing an already existing version) was removed from the ABC: this should be done by implementations that really need this. It's not clear if the open/close concept works in that case at all

Copy link
Member

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix two typos and one outdated docstring and this is good to go.

Signed-off-by: Jussi Kukkonen <jkukkonen@google.com>
@lukpueh lukpueh merged commit 216ae64 into theupdateframework:develop Dec 19, 2022
@jku jku deleted the repository-lib branch December 30, 2024 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Design repository library
3 participants