Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Kubernetes kill and list commands #1998

Closed
wants to merge 8 commits into from

Conversation

saikonen
Copy link
Collaborator

@saikonen saikonen commented Aug 28, 2024

first draft for introducing kubernetes list-runs and kubernetes kill RUN_ID commands

list of major caveats:

label selectors
Due to Kubernetes being limited to label selectors for filtering resources returned by the API, we need to introduce a new label for flow objects that encodes the flow name for later retrieval. This makes the changes not backwards compatible.

Even with the flow hash, we still need to do some in-memory filtering of results for the kill command in order to only select jobs for a specific run. Introducing the run_id as a label would introduce an unnecessarily large amount of new labels

run status
Displaying a status for a run with the list command, or filtering by status would require introducing a client-side state machine that would need to go through all the jobs of a run in order to ascertain the current status. The main problem here is that unlike with Argo Workflows, with client-driven Kubernetes we do not have a central status location to query on the cluster side.

closes #1631

@saikonen saikonen requested a review from savingoyal August 28, 2024 14:00

@kubernetes.command(help="List all runs of the flow on Kubernetes.")
@click.pass_obj
def list_runs(obj):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list can be congruent to batch list

@kubernetes.command(help="Kill flow execution on Kubernetes.")
@click.argument("run-id", required=True, type=str)
@click.pass_obj
def kill(obj, run_id):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kill can be congruent to batch kill

@saikonen saikonen marked this pull request as ready for review September 3, 2024 12:43
flow_name, run_id, user, field_selector="status.successful==0"
)

def _kill_job(job):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work for argo-workflows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as-is, no this does not work. I was leveraging the existing RunningJob for the kill logic, but from what I understood we do not create Kubernetes Jobs for runs on Argo Workflows, as the workflow dag wraps the pods directly? As such, the lookup for executing jobs doesn't return anything to be terminated.

I would maybe leave argo workflows out of scope for this PR, as there is a more direct way to issue termination to a workflow on that side instead of individually killing pods.

If we want this to apply to argo-workflows as well then I can change the lookup to search for active pods instead of jobs, and introduce the kill logic separately.

@saikonen
Copy link
Collaborator Author

closing for the time being in favour of #2023

@saikonen saikonen closed this Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add list and kill support for kubernetes CLI
2 participants