-
Notifications
You must be signed in to change notification settings - Fork 430
pkg: allow arbitrary opam repo layouts #11616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Opam repos can have fairly arbitrary directory layouts. The only requirement is that all package manifests are under the top-level "packages" directory, and each manifest is in a file named "opam" whose parent directory is named like "<name>.<version>". Prior to this change dune assumed a layout similar to the official opam-repository, where manifests are at "packages/<name>/<name>.<version>/opam". This generalizes dune's expectations of opam repositories such that arbitrary directory layouts are supported. Signed-off-by: Stephen Sherratt <stephen@sherra.tt>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I have noticed few typos, but I mostly have questions regarding the errors handling. My main question is shouldn't we prevent the users from using a corrupted opam repository?
> opam-version: "2.0" | ||
> EOF | ||
|
||
Make some packgaes at various depths in the directory hierarchy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make some packgaes at various depths in the directory hierarchy: | |
Make some packages at various depths in the directory hierarchy: |
(Package_version.Map.update version_to_entry version ~f:(function | ||
| None -> Some entry | ||
| Some existing_entry -> | ||
User_warning.emit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't we want to fail instead if such cases occur? It means that the repository doesn't respect the structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a warning is ok when loading but we should definitely refuse to lock a package like this. Especially since which entry is the existing_entry
and which one is the new entry depends on Fpath.traverse
which doesn't guarantee order, so you might potentially get different results depending e.g. on which FS the repo is on.
let opam_file_path_relative_to_repo_root = | ||
Path.Local.relative (Path.Local.relative Paths.packages dir) file | ||
in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be computed at line 171 to make it closer to where it is used?
let entry = | ||
{ Entry.opam_file_path_relative_to_repo_root = path | ||
; rev_store_file_if_loaded_from_git_repo = Some file | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let entry = | |
{ Entry.opam_file_path_relative_to_repo_root = path | |
; rev_store_file_if_loaded_from_git_repo = Some file | |
} | |
let entry : Entry.t = | |
{ opam_file_path_relative_to_repo_root = path | |
; rev_store_file_if_loaded_from_git_repo = Some file | |
} |
match Path.exists (Path.append_local dir opam_file_path) with | ||
| false -> None | ||
| true -> | ||
let files_dir = Opam_file_table.files_dir_path_of_package opam_file_table package in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't it call twice the opam_file_path_of_package
function? Shouldn't it be simplified to avoid it?
@@ -0,0 +1,105 @@ | |||
Test that dune can solve dependencies with opam repos using a layout other than | |||
the one used by the official opam repo. Opam allows arbitrary dependency | |||
hierarchies inside the "package" directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hierarchies inside the "package" directory. | |
hierarchies inside the "packages" directory. |
- c.3 | ||
|
||
|
||
Now test that dune can handle some degenerate cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now test that dune can handle some degenerate cases. | |
Now test that dune can handle some problematic cases. |
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the code is fine mostly and conceptually it is the right thing, as it enables us to use the Jane Street OPAM repo and any repo really, which is definitely neat.
However, I wonder if the complexity and cost in loading the whole repo into memory is worth it. Realistically people are going to use opam-repository
, our overlays whose layout we control ourselves. So maybe the Jane Street repo layout could be added when inspecting the repo (e.g. how many layers are there between package/**/<name.version>/opam
and then forcing the rest of the repo to match instead of allowing any nesting for any opam
file), thus changing very little about what we currently do, basically just some additional processing in Opam_repo.Paths
.
It is less flexible but I would argue that the added flexibility is not all that useful (OPAM suffers from this already, where it allows a lot of flexibility, sometimes to its detriment) and the price might be quite high.
I don't have a strong opinion on this however, so if people think that the added complexity, processing time and memory usage is worth it, I am okay with that.
module Opam_file_table = struct | ||
module Entry = struct | ||
type t = | ||
{ opam_file_path_relative_to_repo_root : Path.Local.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels overly very verbose, maybe worth shortening it a little?
;; | ||
end | ||
|
||
module Version_to_entry = struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Version_map
?
|
||
let files_dir_path_of_package (t : t) package = | ||
Option.map (opam_file_path_of_package t package) ~f:(fun opam_file_path -> | ||
Path.Local.relative (Path.Local.parent_exn opam_file_path) "files") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So files
is always relative to the opam
file? If so, maybe worth writing a short comment to explain the layout.
~f:(fun (entry : Entry.t) -> entry.opam_file_path_relative_to_repo_root)) | ||
;; | ||
|
||
let files_dir_path_of_package (t : t) package = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let files_dir_path_of_package (t : t) package = | |
let files_dir_of_package (t : t) package = |
Path.Local.relative (Path.Local.parent_exn opam_file_path) "files") | ||
;; | ||
|
||
let all_package_versions_of_name (t : t) package_name = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let all_package_versions_of_name (t : t) package_name = | |
let all_package_versions (t : t) package_name = |
of_name
is kind of implied by the type of the argument, Package_name.t
.
(Package_version.Map.update version_to_entry version ~f:(function | ||
| None -> Some entry | ||
| Some existing_entry -> | ||
User_warning.emit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a warning is ok when loading but we should definitely refuse to lock a package like this. Especially since which entry is the existing_entry
and which one is the new entry depends on Fpath.traverse
which doesn't guarantee order, so you might potentially get different results depending e.g. on which FS the repo is on.
let of_repo_dir_path opam_repo_dir_path ~loc : t = | ||
let repo_name_for_messages = Path.to_absolute_filename opam_repo_dir_path in | ||
try | ||
Fpath.traverse_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd really want a Fiber
version of this, as for large repos or repos on slow shares this will block Dune for a while. On my machine with an SSD just find -name opam
in OPAM repository takes half a second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiber or not, it doesn't seem like a good trade-off to introduce such an expensive step that would benefit so few users. Can you either make this optional or introduce a lazy loading scheme for the JST repo?
files named "opam". *) | ||
let of_rev_store_at_rev at_rev ~repo_name_for_messages ~loc : t = | ||
Rev_store.At_rev.directory_entries ~recursive:true at_rev Paths.packages | ||
|> Rev_store.File.Set.fold ~init:empty ~f:(fun file acc -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this materialize every opam
file from a Git repo into memory? In the case of OPAM-Repository that's 30MB of data, most of which will never be looked at.
- b.2 | ||
- c.3 | ||
|
||
$ rm custom-opam-repo/packages/opam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this action for? It seems like there should be some kind of test but isn't.
|
||
Create an opam file in a directory that doesn't have a version number. There's | ||
no way to refer to this package in dune-project but dune should still be able | ||
to parse the repo with crashing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to parse the repo with crashing. | |
to parse the repo without crashing. |
Maybe?
let of_repo_dir_path opam_repo_dir_path ~loc : t = | ||
let repo_name_for_messages = Path.to_absolute_filename opam_repo_dir_path in | ||
try | ||
Fpath.traverse_files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fiber or not, it doesn't seem like a good trade-off to introduce such an expensive step that would benefit so few users. Can you either make this optional or introduce a lazy loading scheme for the JST repo?
If a repo does not follow the convention set by opam-repository, that doesn't mean it is corrupt. The documentation on repo layout in the opam manual specifies that any layout where opam files are contained in directories named after the package is acceptable. The document also suggests a potential future change to the layout of opam-repository where another level of hierarchy is introduced to group packages by common prefixes, e.g. I agree with the sentiment that in practice we're probably only ever going to see repos formatted like opam-repository or JST, and the complexity/runtime cost of fully general opam repos is too high to be worth it. I briefly thought about introducing a fast-path where if a repo is formatted like opam-repository then we can make some optimizing assumptions, but the only way to be sure that a repo is formatted like opam-repository is to do a full scan of all its opam files which is just as expensive as not having a fast-path to begin with. So I'm thinking I'll defer solving #11615 until I have more time to work on it, and eventually we can use a solution more specialized to the layouts we see in opam-repository and JST. Alternatively, @rgrinberg how feasible would it be to ask Jane Street to change the layout of their repos to match opam-repository? |
It is a misunderstanding: I was referring to repositories with twice the same file or an opam file at the root. |
I'll ask, but I don't really see an issue with implementing an alternative lazy loading scheme. What we're doing is implementing the following function for a single repository:
So you could implement this function by:
With this scheme, the common users always get the optimized experience, while we still support this alternative layout. |
What do you mean by "multiple repositories" here? |
I just noticed that Jane Street's compiler extensions branch uses a mix of different layouts within its packages directory. Most packages are directly under packages but a few (e.g. |
Multiple repositories can be defined in dune's workspace file in order. Refer to tests like Mixing the two layouts defeats the ability to implement any lazy loading scheme. I'm going to ask about sticking to one scheme. For now, I think you could just make eager loading "opt-in" for some repos. |
Is lazy-loading the opam repo something we're planning to implement in the future? Unless I'm mistaken it looks like we currently enumerate all the paths in the opam repo when creating a |
Opam repos can have fairly arbitrary directory layouts. The only requirement is that all package manifests are under the top-level "packages" directory, and each manifest is in a file named "opam" whose parent directory is named like ".".
Prior to this change dune assumed a layout similar to the official opam-repository, where manifests are at
"packages//./opam".
This generalizes dune's expectations of opam repositories such that arbitrary directory layouts are supported.
Fixes #11615