-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Kubernetes kill and list commands #1998
Conversation
…menting the run state machine seems out of scope
|
||
@kubernetes.command(help="List all runs of the flow on Kubernetes.") | ||
@click.pass_obj | ||
def list_runs(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list
can be congruent to batch list
@kubernetes.command(help="Kill flow execution on Kubernetes.") | ||
@click.argument("run-id", required=True, type=str) | ||
@click.pass_obj | ||
def kill(obj, run_id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kill
can be congruent to batch kill
flow_name, run_id, user, field_selector="status.successful==0" | ||
) | ||
|
||
def _kill_job(job): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this work for argo-workflows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as-is, no this does not work. I was leveraging the existing RunningJob
for the kill logic, but from what I understood we do not create Kubernetes Jobs for runs on Argo Workflows, as the workflow dag wraps the pods directly? As such, the lookup for executing jobs doesn't return anything to be terminated.
I would maybe leave argo workflows out of scope for this PR, as there is a more direct way to issue termination to a workflow on that side instead of individually killing pods.
If we want this to apply to argo-workflows as well then I can change the lookup to search for active pods instead of jobs, and introduce the kill logic separately.
closing for the time being in favour of #2023 |
first draft for introducing
kubernetes list-runs
andkubernetes kill RUN_ID
commandslist of major caveats:
label selectors
Due to Kubernetes being limited to label selectors for filtering resources returned by the API, we need to introduce a new label for flow objects that encodes the flow name for later retrieval. This makes the changes not backwards compatible.
Even with the flow hash, we still need to do some in-memory filtering of results for the
kill
command in order to only select jobs for a specific run. Introducing the run_id as a label would introduce an unnecessarily large amount of new labelsrun status
Displaying a
status
for a run with thelist
command, or filtering by status would require introducing a client-side state machine that would need to go through all the jobs of a run in order to ascertain the current status. The main problem here is that unlike with Argo Workflows, with client-driven Kubernetes we do not have a central status location to query on the cluster side.closes #1631