-
Notifications
You must be signed in to change notification settings - Fork 119
Implement SKIP LOCKED during action claiming #1259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SKIP LOCKED during action claiming #1259
Conversation
…tests. Some cleanup of DBStore::claim_actions() around the usage of SKIP LOCKED
} | ||
if ( property_exists( $wp_object_cache, 'cache' ) ) { | ||
$wp_object_cache->cache = array(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't directly related to this PR, but the CLI commands being spawned when testing with https://github.com/woocommerce/as-pressure-cooker were causing a bunch of errors/warnings.
@@ -449,7 +449,7 @@ public function test_as_recover_from_incorrect_schema() { | |||
$this->assertContains( 'Caught exception while enqueuing action "hook_18": Error saving action', $logged_errors ); | |||
$this->assertContains( 'Caught exception while enqueuing action "hook_19": Error saving action', $logged_errors ); | |||
$this->assertContains( 'Caught exception while enqueuing action "hook_20": Error saving action', $logged_errors ); | |||
$this->assertContains( "Unknown column 'priority' in 'field list'", $logged_errors ); | |||
$this->assertContains( "Unknown column 'priority' in ", $logged_errors ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The format of this error is slightly different across supported DBs.
I've done some testing against this and am happy with the results. So I've made some additional changes to deal with compatibility:
Testing Instructions
Notes when testing:
Testing ResultsI ran the following tests locally with the below diff applied to better simulate how action scheduler runs normally through requests and the limitation of an assumed 30s request timeout and forcing the clean up queries to be run.
This PR - MySQL 8.0.4 - InnoDB Engine
TRUNK - MySQL 8.0.4 - InnoDB Engine
This PR - MySQL 8.0.4 - MyISAM Engine (SKIP LOCKED ignored)This was really just run to verify no issues were caused on systems using MyISAM engines.
|
/** | ||
* Determines whether the database supports using SKIP LOCKED. | ||
* | ||
* SKIP_LOCKED support was added to MariaDB in 10.6.0 and to MySQL in 8.0.1 | ||
* | ||
* @return bool | ||
*/ | ||
private function db_supports_skip_locked() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should maybe cache this check so it doesn't have to query and confirm the version support before every claim. Can do local runtime cache, but I'd prolly just do object cache with some relatively short TTL like 5 minutes even.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WPprodigy, I don't think that is necessary. My mysqli::$server_info
is a read-only property of the underlying mysqli
object. I can't find documentation explicitly saying it, but I believe it's set when the connection is made or at the very least would be saved within the connected mysqli
object property between calls. https://www.php.net/manual/en/class.mysqli.php
…rces can override this if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to dive in this PR: leaving some questions in comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the answers @prettyboymp! I'll do the second review round.
Can you add some testing steps (I guess with the pressure cooker) so I can try them in my environment?
…info determine DB capabilities.
Sorry, the testing instructions on in this comment: #1259 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I tried my best running the tests, but the pressure cooker is not working properly under our default wp-env cli-container:
Deleting existing actions and claims.
Generating test actions 100% [===================================================================================================================================================================================================================================================================] 1:26 / 1:26
Created 60000 past due actions.
Elapsed: 120s (100%)
Generated actions: 60000
Processed actions: 0
Processing Rate: 0/s
Active queue runners: 29/30
Replacement rate 15/29
Database errors: none detected
Hence I'll rely to passing CI and not hold this PR further.
Avoiding SQL deadlocks during action claiming
When running multiple action scheduler queues concurrently, many will start to fatal error due to SQL deadlocks. I have found that this always stems from the action claims query. This query ends up scanning many rows, and being an UPDATE, it ends up creating locks on the table/rows. This results in concurrent claims queries conflicting with each other and deadlocking. It also causes deadlocks during individual action status updates from other queues that are already processing actions. This leaves actions in unknown states and results in lots needing to be failed later on due to timeouts during the cleanup periods.
This PR prevents the deadlocks altogether. The magic here comes from the
SKIP LOCKED
SQL feature: https://dev.mysql.com/blog-archive/mysql-8-0-1-using-skip-locked-and-nowait-to-handle-hot-rows/. It's a perfect fit for the claims query, because we don't really care if it needs to skip over a few rows. All we want is for it to claim actions without conflict.The
SKIP LOCKED
has to come from a SELECT query. So the new query here uses a SELECT sub-query (w/FOR UPDATE
to claim a lock) and then JOINs the results to accomplish first selecting the items and then updating them all within a single transaction.I've done a lot of testing with this and ran into no deadlocks or failing actions. I've run hundreds of thousands of actions at 250 queue concurrency without any signs of trouble.
Considerations
I'm not necessarily suggesting we merge this PR as is (leaving as a draft for now), but I wanted to show the solution here and what I've found. We'll need to test and discover database compatibility before shipping this as the new default behavior.
SKIP LOCKED
shipped with MySQL 8.0.1 (2017-04-10) and MariaDB in 10.6.0 (2021-4-26), and works only for InnoDB tables (ignored otherwise). I've only done the testing in MySQL only, and I'm not sure if it's safely ignored on the older versions or if it would cause issues.I do think this would definitely be a worthwhile change to get into the plugin, as I've seen it be a source of a lot of issues. Some possible rollout options:
A) Put it into a new method (
claim_actions_v2()
), and use the new method when a feature flag is present via some constant or filter.B) Check the database version, and automatically use the v2 method when it's modern enough.