-
Notifications
You must be signed in to change notification settings - Fork 43
Cut a few seconds off of every test using Cockroach #6934
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -137,7 +137,6 @@ impl DbRunArgs { | |||
// receive SIGINT. | |||
tokio::select! { | |||
_ = db_instance.wait_for_shutdown() => { | |||
db_instance.cleanup().await.context("clean up after shutdown")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this line because wait_for_shutdown
already is calling cleanup
@@ -593,24 +609,43 @@ impl CockroachInstance { | |||
.await; | |||
} | |||
self.child_process = None; | |||
|
|||
// It shouldn't really matter which cleanup API we use, since | |||
// the child process is gone anyway. | |||
self.cleanup().await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(See above comment) this is where wait_for_shutdown
already calls cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I only wonder if db-dev
ought to shut down gracefully but it seems like not a big deal either way.
This one has some history!
While working on some database-related APIs, I noticed that my tests using CockroachDB were a lot slower than other tests. To some degree, I expect this, but also, this was on the order of ~5-10 seconds per test doing very little other than CockroachDB setup and teardown.
After doing a little profiling, I noticed that tests took several seconds to perform teardown, which increases significantly if any schema changes occurred.
Why does teardown take so long? Well, it turns out we are sending
SIGTERM
to the CockroachDB process to gracefully terminate it, instead ofSIGKILL
, which would terminate it much more abruptly.This is where the history comes in: Gracefully terminating CockroachDB was a choice we made a few years ago to avoid a test flake: #540. Basically, when creating the "seed database" -- a database where we apply schema changes that are shared between many tests -- we want to gracefully terminate to avoid leaving the database in a "dirty state", where it might need to flush work and cleanup intermediate state. In the case of #540, that "dirty intermediate state" was an absolute path, which meant copies of that seed database trampled on each other if graceful shutdown did not complete.
Our approach was to apply graceful termination to all CockroachDB teardown invocations, but this was overkill.
Only the seed database expects to have storage be in-use after the call to
cleanup
-- all other test-only invocations expect to immediately remove their storage. They don't need to terminate gracefully, and arguably, should just exit as quickly as they can.This PR changes the disposition:
cleanup_gracefully
usesSIGTERM
, and waits for graceful cleanup. This is still used when constructing the seed db.cleanup
usesSIGKILL
, and kills the database immediately. This is now used for all other use-cases.As an example in the performance difference, here's a comparison for some datastore tests:
Before
After