-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable Sidekiq processing on search_api_govuk_index
queue
#3216
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
search_api_govuk_index
queue
kevindew
reviewed
Mar 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one. Just a couple of minor comments, but this looks great and I'm very much hoping we don't hit issues when we ship it.
The `Indexer::MesageProcessor` class handles processing RabbitMQ messages specifically pulled from the `search_api_to_be_indexed` queue. This commit reworks the class so that it retries message processing up to 5 times if an error is raised. Previously the class handled two errors: * If a `ProcessingError` is raised, we simply report the exception and stop trying to process. However looking in the codebase, it seems like this `ProcessingError` is defined but never actually raised anywhere. So this rescue block wouldn't be executed * If a `StandardError` is raised, we sleep for 1 second and retry the message indefinitely. There's no logic to break out of this retrying loop This change just rescues from `StandardError` and will inspect the message to see how many times it's been retried already. If it's <= 5, we retry, otherwise we report the error and ack the message which puts it on a dead letter queue and won't be processed again. Note that when we call `message.retry`, there's some RabbitMQ configuration already in place which sets a delay of 60s between retries. See the government-infrastructure repo for more information about this but essentially it works like so: 1. A message is processed on the `search_api_to_be_indexed` queue. An error is raised and we call `message.retry` (which actually calls `message.discard`) 2. This moves the message to the configured dead-letter exchange where it sits on a queue and has a TTL of 60 seconds. Nothing listens on this queue so it's never processed 3. After 60 seconds, it's automatically routed back to the `search_api_to_be_indexed` queue for re-processing
This job class is used in multiple places across the codebase to enqueue a Sidekiq job to amend a document in the index. We want to remove the Sidekiq dependency when processing jobs from RabbitMQ, which is one of the places this job is enqueued from. The way to do this with the least amount of friction is to keep the job class, but allow it to be configured to either retry the Sidekiq job or re-raise the error when caught. This means that for parts of the app that still want to use Sidekiq don't need any changes made, and continue to be able to enqueue and retry the job on Sidekiq, but for the RabbitMQ consumer we can pass in `reschedule_on_failure: true` and let the exception bubble up to where we can catch it and retry the RabbitMQ message.
In order to remove the Sidekiq dependency and move retrying logic to RabbitMQ, we need ot change this `ChangeNotificationProcessor` in a couple of ways. Firstly we call the `perform` method directly, rather than calling `perform_async` which will enqueue the job in Sidekiq. Calling the method directly will do the work synchronously, but means we can still call `perform_async` elsewhere in the codebase. Passing in `reschedule_on_failure: false` will re-raise an exception in the `AmendJob` class if one is raised. Omitting this argument will cause another Sidekiq job to be enqueued if an error is raised, which obviously don't want if we're removing this Sidekiq dependency. So instead we let the exception re-raise and the queue consumer code (`Indexer::MessageProcessor`) will rescue it and handle it appropriately.
`GovukMessageQueueConsumer::Consumer` now supports arguments for running the consumers concurrently, so we want to lean on that to hopefully improve performance.
This extends the existing `create_queues` Rake task to create and bind all the necessary exchanges and queues to allow automatic retrying on the `search_api_to_be_indexed` queue. This configuration reflects the actual production RabbitMQ setup which is defined in govuk-infrastructure. * Added a `search_api_to_be_indexed_retry_dlx` dead-letter exchange. This is where messages from the existing `search_api_to_be_indexed` queue go when they are discarded * In that exchange, we have a new `search_api_to_be_indexed_wait_to_retry` queue. This has a TTL of 30s configured on it. A message is put on that queue and nothing listens to the queue. After 30s, the message is automatically discarded and routed to the `search_api_to_be_indexed_discarded_dlx` dead-letter exchange * A new `search_api_to_be_indexed` is configured on the `search_api_to_be_indexed_discarded_dlx` exchange. Because this has the same name as the original `search_api_to_be_indexed` queue, the existing consumer (processed by `Indexer::MessageProcessor`) is already listening to queues of that name, so it'll pick the message up for retrying.
0793bed
to
fc880b0
Compare
kevindew
approved these changes
Mar 25, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
https://trello.com/c/U9M9Og06/2276
This removes the logic to enqueue Sidekiq jobs for messages pulled from the
search_api_govuk_index
queue and instead processes them synchronously.It adds retrying logic to the consumer to retry a message 5 times, with a 60s delay between retries that is already configured on RabbitMQ (alphagov/govuk-infrastructure#1848).
It also configures the consumer to use multiple threads for concurrent processing, as opposed to the serial processing it does now.
Rather than extract logic out of the
AmendJob
class, I decided to just call it synchronously instead. If other parts of the app didn't use that class to enqueue Sidekiq jobs, I'd have made it a non-Sidekiq class instead, but it feels like doing it this way is the path of least resistance.See commit messages for more detail.