-
Notifications
You must be signed in to change notification settings - Fork 278
delegate_hashed_bins
must hash a target path as it appears in metadata
#995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
On a related side-note:
The refactor could be done while fixing above or separately. Below is some semi-pseudo-code, that shows how I would distribute target paths in bins. Feel free to use as inspiration, enhancements/optimizations are welcome... :) def distribute_in_bins(target_paths, bin_count):
prefix_length = len(f"{bin_count - 1:x}")
prefix_count = 16 ** prefix_length
bin_size = prefix_count // bin_count
# Create targets object for each bin and assign it to a list, where the index
# equals the lower bound of the hash prefix range served by the bin divided
# by the bin size
ordered_roles = []
for idx in range(0, prefix_count, bin_size):
name = f"{idx:0{prefix_length}x}-{idx+bin_size-1:0{prefix_length}x}"
role = {"name": name, "target_paths": []}
ordered_roles.append(role)
# Use floor division to assign targets to their serving bin
for target_path in target_paths:
# hash_prefix = _get_hash(target_path)[:prefix_length]
hash_prefix = target_path # <-- only for testing purpose (FIXME)
ordered_roles[int(hash_prefix, 16) // bin_size]["target_paths"].append(target_path)
return ordered_roles
# Pass hex strings instead of paths to better visualize functionality (see FIXME above)
distribute_in_bins([f"{x:02x}" for x in range(0,256)], 32) |
Huh, I am curious how that floor division works, it looks like base-10 divided by base-16... |
Well spotted, @trishankatdatadog. It misses an |
Also, as hinted above, this snippet should not be copy-pasted as is. It will need some fleshing out to replace the existing functionality in repository_tool. I just wanted to outline the algorithm I had in mind, for distributing targets in an ordered list of bins, where the index equals the bin size divided by the lower bound of the hash prefix range served by the bin. |
Understood, @lukpueh. That bin indexing logic still seems wrong, but I'll double-check later. Understood that it's a sketch! |
@trishankatdatadog, you were right. The variables in the division were switched plus two other minor issues (see edit history of comment). For my defense I only quickly wrote this up today from some pen and paper notes I took last week. Apologies. :) You should be able to copy-paste the updated snippet into your interpreter now and it should show that it works. |
Thanks @lukpueh ! |
Add a helper function to determine the name of a bin that a hashed targetfile will be delegated to. Based sketches by Lukas Puehringer in issues theupdateframework#994 & theupdateframework#995 Signed-off-by: Joshua Lock <jlock@vmware.com>
Add a helper function to determine the name of a bin that a hashed targetfile will be delegated to. Based sketches by Lukas Puehringer in issues theupdateframework#994 & theupdateframework#995 Signed-off-by: Joshua Lock <jlock@vmware.com>
Simplify the delegate_hashed_bins function based on the semi-pseudo-code by Lukas Puehringer in theupdateframework#995. Signed-off-by: Joshua Lock <jlock@vmware.com>
Add a helper function to determine the name of a bin that a hashed targetfile will be delegated to. Based sketches by Lukas Puehringer in issues theupdateframework#994 & theupdateframework#995 Signed-off-by: Joshua Lock <jlock@vmware.com>
Simplify the delegate_hashed_bins function based on the semi-pseudo-code by Lukas Puehringer in theupdateframework#995. Signed-off-by: Joshua Lock <jlock@vmware.com>
Add a helper function to determine the name of a bin that a hashed targetfile will be delegated to. Based sketches by Lukas Puehringer in issues theupdateframework#994 & theupdateframework#995 Signed-off-by: Joshua Lock <jlock@vmware.com>
Simplify the delegate_hashed_bins function based on the semi-pseudo-code by Lukas Puehringer in theupdateframework#995. Signed-off-by: Joshua Lock <jlock@vmware.com>
Add a helper function to determine the name of a bin that a hashed targetfile will be delegated to. Based sketches by Lukas Puehringer in issues theupdateframework#994 & theupdateframework#995 Signed-off-by: Joshua Lock <jlock@vmware.com>
Simplify the delegate_hashed_bins function based on the semi-pseudo-code by Lukas Puehringer in theupdateframework#995. Signed-off-by: Joshua Lock <jlock@vmware.com>
Simplify the delegate_hashed_bins function based on the semi-pseudo-code by Lukas Puehringer in theupdateframework#995. Signed-off-by: Joshua Lock <jlock@vmware.com>
Fixed in #1007 |
Description of issue or feature request:
Fix in accordance with #957 (and #963)!!
The repository tool's
delegate_hashed_bins
creates hash bins (delegated targets roles) and assigns passed target file paths based on their hash. Later it callsdelegate
for each bin passing the relevant paths.delegate
then might left-strip the targets directory path from each targets path, changing the targets path in a way that it no longer is in the right bin.Current behavior:
delegate_hashed_bins
might hash target path based on a different path than the one that will appear in the (delegated) targets metadata.Expected behavior:
delegate_hashed_bins
must hash a target path as it appears in metadata.The text was updated successfully, but these errors were encountered: