-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Deterministic foreach task id's for Argo Workflows #1704
fix: Deterministic foreach task id's for Argo Workflows #1704
Conversation
Testing[608] @ d7a320b |
Testing[608] @ d7a320b had 1 FAILURE. |
I'll check but failure is probably that card test that I need to increase timeout for. There is no change to common code. |
Testing[608] @ bcc2026 |
@@ -883,6 +885,34 @@ def _visit(node, exit_node=None, templates=None, dag_tasks=None): | |||
) | |||
) | |||
] | |||
# We need to add the split-index and root-input-path for the last step in any foreach to be able to generate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add more comments in the code about the need for root-input-path
. Given the complexity of this work, it will be non-trivial to ship further changes without full context on this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing
from hashlib import md5 | ||
|
||
|
||
def generate_input_paths(step_name, timestamp, input_paths, max_splits): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_paths
have been flow/run/step/task
previously but it seems that now input_paths
are run/step/task
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what the join step used to receive as input-paths was the following
"argo-{{workflow.name}}/%s/{{tasks.%s.outputs.parameters}}"
It seems to be a case of mislabeled variables in the original script?
# For foreach splits generate the expected input-paths based on max_split and base_id via. | ||
# newline required at the end due to 'echo' appending one in the shell side task_id creation. | ||
task_str = "%s-%s\n" % (base, idx) | ||
hash = md5(task_str.encode("utf-8")).hexdigest()[-8:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be desirable to get task ids of the nature - t-<hash>-<idx>
- in that scenario - we can probably get a much tighter compression of input_paths (assuming we change the input_paths compression/decompression logic) in addition to the UI looking a lot more nicer.
changes the Task ID generation for tasks in foreach branches to be deterministic. This is so that the following join steps do not require an enormous list of ID's to be passed in through the
input-paths
input Parameter, which has limited the number of foreach tasks we can run on Argo Workflows until now.Note: Still requires rigorous testing for different split cases to confirm this is ready for prime time.
Closes #1538 properly. Supersedes #1655