Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Sidekiq processing on search_api_govuk_index queue #3216

Merged
merged 5 commits into from
Mar 26, 2025

Conversation

jackbot
Copy link
Contributor

@jackbot jackbot commented Mar 24, 2025

https://trello.com/c/U9M9Og06/2276

This removes the logic to enqueue Sidekiq jobs for messages pulled from the search_api_govuk_index queue and instead processes them synchronously.

It adds retrying logic to the consumer to retry a message 5 times, with a 60s delay between retries that is already configured on RabbitMQ (alphagov/govuk-infrastructure#1848).

It also configures the consumer to use multiple threads for concurrent processing, as opposed to the serial processing it does now.

Rather than extract logic out of the AmendJob class, I decided to just call it synchronously instead. If other parts of the app didn't use that class to enqueue Sidekiq jobs, I'd have made it a non-Sidekiq class instead, but it feels like doing it this way is the path of least resistance.

See commit messages for more detail.

@jackbot jackbot changed the title Disable Sidekiq processing on "search_api_govuk_index" queue Disable Sidekiq processing on search_api_govuk_index queue Mar 24, 2025
@jackbot jackbot requested a review from kevindew March 24, 2025 13:47
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one. Just a couple of minor comments, but this looks great and I'm very much hoping we don't hit issues when we ship it.

jackbot added 5 commits March 25, 2025 13:54
The `Indexer::MesageProcessor` class handles processing RabbitMQ
messages specifically pulled from the `search_api_to_be_indexed` queue.

This commit reworks the class so that it retries message processing up
to 5 times if an error is raised.

Previously the class handled two errors:

* If a `ProcessingError` is raised, we simply report the exception and
  stop trying to process. However looking in the codebase, it seems like
  this `ProcessingError` is defined but never actually raised anywhere.
  So this rescue block wouldn't be executed
* If a `StandardError` is raised, we sleep for 1 second and retry the
  message indefinitely. There's no logic to break out of this retrying
  loop

This change just rescues from `StandardError` and will inspect the
message to see how many times it's been retried already. If it's <= 5,
we retry, otherwise we report the error and ack the message which puts
it on a dead letter queue and won't be processed again.

Note that when we call `message.retry`, there's some RabbitMQ
configuration already in place which sets a delay of 60s between
retries. See the government-infrastructure repo for more information
about this but essentially it works like so:

1. A message is processed on the `search_api_to_be_indexed` queue. An error
   is raised and we call `message.retry` (which actually calls `message.discard`)
2. This moves the message to the configured dead-letter exchange where
   it sits on a queue and has a TTL of 60 seconds. Nothing listens on
   this queue so it's never processed
3. After 60 seconds, it's automatically routed back to the
   `search_api_to_be_indexed` queue for re-processing
This job class is used in multiple places across the codebase to enqueue
a Sidekiq job to amend a document in the index.

We want to remove the Sidekiq dependency when processing jobs from
RabbitMQ, which is one of the places this job is enqueued from.

The way to do this with the least amount of friction is to keep the job
class, but allow it to be configured to either retry the Sidekiq job or
re-raise the error when caught.

This means that for parts of the app that still want to use Sidekiq
don't need any changes made, and continue to be able to enqueue and
retry the job on Sidekiq, but for the RabbitMQ consumer we can pass in
`reschedule_on_failure: true` and let the exception bubble up to where
we can catch it and retry the RabbitMQ message.
In order to remove the Sidekiq dependency and move retrying logic to
RabbitMQ, we need ot change this `ChangeNotificationProcessor` in a
couple of ways.

Firstly we call the `perform` method directly, rather than calling
`perform_async` which will enqueue the job in Sidekiq. Calling the
method directly will do the work synchronously, but means we can still
call `perform_async` elsewhere in the codebase.

Passing in `reschedule_on_failure: false` will re-raise an exception in
the `AmendJob` class if one is raised. Omitting this argument will cause
another Sidekiq job to be enqueued if an error is raised, which
obviously don't want if we're removing this Sidekiq dependency.

So instead we let the exception re-raise and the queue consumer code
(`Indexer::MessageProcessor`) will rescue it and handle it
appropriately.
`GovukMessageQueueConsumer::Consumer` now supports arguments for running
the consumers concurrently, so we want to lean on that to hopefully
improve performance.
This extends the existing `create_queues` Rake task to create and bind
all the necessary exchanges and queues to allow automatic retrying on
the `search_api_to_be_indexed` queue.

This configuration reflects the actual production RabbitMQ setup which
is defined in govuk-infrastructure.

* Added a `search_api_to_be_indexed_retry_dlx` dead-letter exchange.
  This is where messages from the existing `search_api_to_be_indexed`
  queue go when they are discarded
* In that exchange, we have a new
  `search_api_to_be_indexed_wait_to_retry` queue. This has a TTL of 30s
  configured on it. A message is put on that queue and nothing listens
  to the queue. After 30s, the message is automatically discarded and
  routed to the `search_api_to_be_indexed_discarded_dlx` dead-letter
  exchange
* A new `search_api_to_be_indexed` is configured on the
  `search_api_to_be_indexed_discarded_dlx` exchange. Because this has
  the same name as the original `search_api_to_be_indexed` queue, the
  existing consumer (processed by `Indexer::MessageProcessor`) is
  already listening to queues of that name, so it'll pick the message up
  for retrying.
@jackbot jackbot force-pushed the remove-sidekiq-search_api_to_be_indexed branch from 0793bed to fc880b0 Compare March 25, 2025 13:54
@jackbot jackbot requested a review from kevindew March 25, 2025 13:55
Copy link
Member

@kevindew kevindew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one

@jackbot jackbot merged commit cdfedcd into main Mar 26, 2025
6 checks passed
@jackbot jackbot deleted the remove-sidekiq-search_api_to_be_indexed branch March 26, 2025 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants