-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: scrub logs #1802
feature: scrub logs #1802
Conversation
f1ae701
to
a3497fc
Compare
@saikonen we could introduce a top-level |
this now introduces the following command still works the same
but now we can also support the following
|
run_id, step_name, task_id = parts | ||
else: | ||
raise CommandException( | ||
"input_path should either be run_id/step_name " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input-path
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as in the existing 'show' command. should this be reworded to something else? pathspec for example?
echo("Logs have been scrubbed.") | ||
|
||
else: | ||
raise CommandException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the behavior for tasks that haven't finished yet? a few different cases here -
- the attempt is still in flight
- the attempt has finished but we expect further attempts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one approach would be that we instead init the datastore with allow_not_done=False
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bunch of fixes regarding this. How things work now:
we rely on allow_not_done=False
and check when initializing the datastore in mode="d"
that the latest / specified attempt is done, otherwise we raise an error.
- trying to
logs scrub [pathspec]
on a running task will lead to an error. logs scrub [pathspec] --attempt x
will work even on a running task, as long as the specified attempt has finished.
so specifically for the above cases:
- if attempt is in flight, log scrubbing raises an exception for that specific attempt (or no attempt specified)
- attempt has finished, if scrubbing with
--attempt
it will work. If not specifying an attempt, the latest attempt will point to an attempt that has not yet finished, raising an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I also tried simply allowing the scrubbing of logs for running tasks, but due to how logs are written in bursts from a buffer, the effects of overwriting a log with the [REDACTED]
message were being nullified, as every log write cycle would simply rewrite the scrubbed log lines if they were still in buffer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would definiltely not advocate for allowing the redaction of in-flight logs. THere are just too many complications. We would have to prevent writing to a log that has been redacted which we can't do with past versions of metaflow so there would be this big gapping hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the current behavior described @saikonen -- ie: can scrub past completed attempts with --attempt
; can scrub latest attempt if finished with no arg and can't scrub any other case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a few comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues with the larger feature -- a few comments here and there.
|
||
# Do not allow destructive operations on the datastore if attempt is still in flight | ||
# and we explicitly did not allow operating on running tasks. | ||
if not allow_not_done and not self.has_metadata(self.METADATA_DONE_SUFFIX): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a case where we want to delete logs for running tasks? It seems in delete mode we may want to ingore allow_not_done
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this in mainly in order to keep the datastore side more generic (allowing delete for in-flight attempts as well, though no use case exists yet). It is definitely not required for this feature, so I could remove it
# NOTE: We need this in order to not introduce breaking changes to existing CLI, as we wanted to | ||
# nest both existing `logs` and the new `logs scrub` under a shared group, but `logs` already has | ||
# a well defined behavior of showing the logs. | ||
class CustomGroup(click.Group): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to check if this is handled by @madhur-ob 's click parsing thing as well (should be but mentioning).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested, and this does not work as-is with the low level click api due to the fallback logic happening when invoking the command. Calling the explicit subcommand works as expected though, but the group level fallback fails
from metaflow.metaflow_runner import MetaflowAPI
from metaflow.cli import start
if __name__ == "__main__":
api = MetaflowAPI.from_cli("AttemptLog.py", start)
# works
cmd = api().logs().show(input_path="217190/start")
print(cmd)
# does not
cmd = api().logs(input_path="217190/start")
print(cmd)
Will look into how easy of a fix there could be to support it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool -- we don't need to fix it for this but ya, we need to probbaly have it working at some point. As in, the low level api should be able to handle all our cases.
echo("Logs have been scrubbed.") | ||
|
||
else: | ||
raise CommandException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the current behavior described @saikonen -- ie: can scrub past completed attempts with --attempt
; can scrub latest attempt if finished with no arg and can't scrub any other case.
… output better scrubbing failure messages
This reverts commit b541e4f.
show_default=False, | ||
help="Scrub latest/all attempts of a step or task", | ||
) | ||
@click.option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this also try to scrub in-flight tasks?
…with existing one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this refactor. I think with a couple name changes, it makes a lot more sense and the code looks clear imho.
introduces
logs scrub run_id/step/task_id
command for scrubbing logs of tasks.by default it deletes both stdout/stderr for the latest attempt of a task.
Final feature set:
logs [pathspec]
works as previously, with the new default subcommand custom click grouplogs show [pathspec]
is the default subcommand for logs cmd grouplogs scrub [pathspec]
will overwrite logs with a redacted messagealso adds support for both show/scrub commands to specify
--attempt [number]
to target a specific attempt of a task, instead of the latest.