-
Notifications
You must be signed in to change notification settings - Fork 119
Address deadlocking and performance issues around existing indexes and claiming process. #1251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
$error = empty( $wpdb->last_error ) | ||
? _x( 'unknown', 'database error', 'action-scheduler' ) | ||
: $wpdb->last_error; | ||
$use_priorities = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this variable and flow clause below specifically for simplifying comparing the performance of the two operations.
|
||
throw new \RuntimeException( | ||
sprintf( | ||
foreach ( $pending_priorities as $pending_priority ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This potentially loops over up to 255 priorities. However, in the real world, it's somewhat unlikely that we would have a spread of a full 255 different levels of priorities being scheduled. Even with the full potential range scheduled, testing has shown that it is still more efficient than the deadlocks caused by order by clauses.
If we find we need to reduce this range further, my first thoughts are to apply a filter when adding actions to limit the unique priorities that can be used. I don't see a need for 255 options.
I need to make some updates for PHP <7.4 compat. |
…backwards compatibility by continuing to use the orderby filter.
I've found this to be a magic bullet: #1259 :) |
@WPprodigy, thank you pointing this out. I'll add it to the options we compare when we take a deeper dive into this. |
…hen staking claims
} | ||
if ( property_exists( $wp_object_cache, 'cache' ) ) { | ||
$wp_object_cache->cache = array(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes aren't directly related to this PR, but the CLI commands being spawned when testing with https://github.com/woocommerce/as-pressure-cooker were causing a bunch of errors/warnings.
This PR is a combination of fixes that will be broken into smaller, more specific PRs. I combined them into one for initial review/testing as all of the issues together contributed to deadlocking and performance problems that limit the ability to increase the rate which actions can be processed. Testing while fixing only one of these issues at a time still caused the system to bog down enough to prevent a clear understanding of the improvements.
Main issues being addressed:
Reduce the the deadlock frequency caused by claiming pending actions.
Ref #1104
The atomic query used to claim actions frequently causes deadlocks on the actions table due to the amount of time it takes to apply the updates to the table. The best way to reduce this is to get rid of any sorting needed as it limits the records that end up being locked during the update.
I'm proposing two different options for addressing this:
"priority list"
This proposes that we change the philosophy around claiming actions so that attempts, scheduled_date_gmt, or action_id are no longer relevant when claiming actions. Only priority is considered. So actions with the same priority may execute in any order.
Then, to fully remove the order by clause, we will query the unique priorities that are pending and loop through those in order to claim actions of each priority until our Queue is filled.
To further improve this update query, the KEY
claim_id_status_priority_scheduled_date_gmt
was added to thewp_actionscheduler_actions
table.Downsides:
action_scheduler_claim_actions_order_by
"pre query actions"
This is the more backward compatible change. It simply makes a select query to get a set of action_ids to work with prior to running the update to claim the actions. The update query still applies the same where clauses, so it remains atomic and avoids more than one claim running actions, but limits the records that become locked by the action_ids returned from the prequery.
I decided to try this late, so I haven't tested it as much as the priority list option. But after some initial tests, it seems more promising and has less downside.
Downsides:
[ ] - Need to keep the
claim_id_status_scheduled_date_gmt
index if this is the path chosen.Stop resetting the claim_id of completed actions.
Ref #1105
Many queries used by ActionScheduler apply
claim_id
filters. However, the cardinality of this column is extremely low because all actions, outside of those with actively running claims, have a claim_id of0
. The current behavior resets even completed actions back to a claim_id of0
. As a result, the MySQL optimizer will often choose to use indexes that aren't as efficient because it's estimations don't show that the indexes with the claim_id column will be as efficient.To fix this, only actions that are still
pending
will get set back to a claim_id of0
after a claim queue has been processed. This allows.Improve the query performance of QueueCleaner::reset_timeouts()
This adds a usable index (status_last_attempt_gmt) for the query used in QueueCleaner::reset_timeouts(). The
orderby
argument was also updated tomodified
to help force the use of the index. During testing, the optimizer would often use status_claim_id based indexes which always performed much worse. This also helps make sure the oldest claimed actions are reset first.Improve the query performance of calculating the current number of claims running.
Adds a key on
status_claim_id
for the query used to determine the number of claims already running when deciding whether to spawn more.Testing
I added the following to a file in
mu-plugins
to allow testing a heavier load on ActionScheduler:wp generate_actions
. This will add 50K pending actions that are randomly spread out across the next 24 hours. When each is run, it will reschedule itself again randomly within the next 24 hour period.wp generate_actions
command to generate a lot of actions. I was testing against continually processing ~1M in a 24 hour period.action_scheduler_queue_runner_concurrent_batches
filter to set different levels of load and observe the point where deadlocks start to appear, if at all.CLAIM_STAKING_METHOD
constant to change the algorithm used for claiming.