Skip to content

Use retry library #641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2025
Merged

Use retry library #641

merged 2 commits into from
Mar 17, 2025

Conversation

mkysel
Copy link
Collaborator

@mkysel mkysel commented Mar 17, 2025

Summary by CodeRabbit

  • Chores

    • Updated dependency management by consolidating and updating key components for improved maintenance and compatibility.
  • Refactor

    • Enhanced connection stability with a more robust error handling and exponential retry mechanism for smoother operations.

@mkysel mkysel requested a review from a team as a code owner March 17, 2025 19:53
Copy link
Contributor

coderabbitai bot commented Mar 17, 2025

Walkthrough

The pull request updates dependency management and error handling. In the go.mod file, several new dependencies are added while a redundant require block is removed to consolidate declarations. In the pkg/sync/syncWorker.go file, the subscribeToNodeRegistration method is modified to replace manual delay logic with an exponential backoff mechanism provided by the backoff package. The new approach encapsulates the connection logic in an operation function and leverages backoff.Retry for a cleaner, more robust retry mechanism.

Changes

File(s) Change Summary
go.mod Added new dependencies: github.com/cenkalti/backoff/v5, github.com/golang-jwt/jwt/v5, github.com/grpc-ecosystem/go-grpc-prometheus, github.com/testcontainers/testcontainers-go, github.com/Masterminds/semver/v3, and github.com/docker/docker; removed redundant require block to consolidate declarations.
pkg/sync/syncWorker.go Enhanced subscribeToNodeRegistration with an error handling and retry mechanism using backoff.Retry and an encapsulated connection logic function, replacing manual delay and doubling logic.

Sequence Diagram(s)

sequenceDiagram
    participant SW as syncWorker
    participant OP as ConnectionLogic
    participant BO as backoff.Retry
    participant LOG as Logger

    SW->>BO: Invoke backoff.Retry with operation
    BO->>OP: Execute connection attempt
    alt Successful Connection
        OP-->>BO: Return success
        BO-->>SW: Exit retry loop
    else Failed Connection
        OP-->>BO: Return error
        BO->>LOG: Log error details
        BO->>OP: Retry connection after delay
    end
Loading

Possibly related PRs

  • Use testcontainers in upgrade tests #603: The changes in the main PR, which involve adding new dependencies including github.com/cenkalti/backoff/v5, are related to the modifications in the retrieved PR that also utilize the backoff package for error handling and connection retry logic in the subscribeToNodeRegistration method.

Warning

There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure.

🔧 golangci-lint (1.62.2)

Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24)
Failed executing command with error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24)

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
go.mod (1)

12-12: Check gRPC integration for Prometheus.
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 should be compatible with gRPC v1.71.0, but confirm no known mismatch or required config changes to enable metrics collection.

pkg/sync/syncWorker.go (1)

195-205: Refine error handling within the deferred function.
Using a defer func() { if err != nil {...} }() inside operation captures the outer err variable. This works but can be confusing, because the final value of err is logged right before the function returns. Consider either logging errors inline or renaming the local error variable to avoid overshadowing:

-operation := func() (string, error) {
-   defer func() {
-       if err != nil {
-           s.log.Error("Error connecting to node. Retrying...", zap.Error(err))
-       }
-   }()
+operation := func() (string, error) {
+   var opErr error
+   defer func() {
+       if opErr != nil {
+           s.log.Error("Error connecting to node. Retrying...", zap.Error(opErr))
+       }
+   }()
    ...
-   conn, err = s.connectToNode(*node)
+   conn, opErr = s.connectToNode(*node)
    ...
}

Also applies to: 207-209, 211-215, 217-230

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cc88300 and 1a527e4.

⛔ Files ignored due to path filters (1)
  • go.sum is excluded by !**/*.sum
📒 Files selected for processing (2)
  • go.mod (2 hunks)
  • pkg/sync/syncWorker.go (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Test (Node)
  • GitHub Check: Build pre-baked anvil-xmtpd
  • GitHub Check: Lint-Go
🔇 Additional comments (5)
go.mod (2)

6-8: Pinning new dependencies looks good.
These additions—particularly cenkalti/backoff/v5—directly support a new retry mechanism. Ensure there's no version conflict with any existing references (e.g., older backoff versions marked as indirect in line 53).


10-10: Validate JWT dependency version.
github.com/golang-jwt/jwt/v5 v5.2.1 looks current, but consider verifying any known security advisories or bug reports to confirm no unresolved vulnerabilities in v5.2.1.

pkg/sync/syncWorker.go (3)

7-7: Imports align with new retry and metrics usage.
These newly added imports ("github.com/cenkalti/backoff/v5", "github.com/xmtp/xmtpd/pkg/metrics", as well as "io", "sync", and "time") appear consistent with the updated retry logic and streaming design.

Also applies to: 15-15, 24-26


192-194: Initialize maximum backoff parameters.
Currently, only InitialInterval is configured. Leaving MaxInterval or MaxElapsedTime at default might result in excessively long, unbounded retry intervals in certain failure modes. Consider capping these values:

 expBackoff := backoff.NewExponentialBackOff()
 expBackoff.InitialInterval = 1 * time.Second
+expBackoff.MaxInterval = 60 * time.Second      // Adjust as appropriate
+expBackoff.MaxElapsedTime = 0                 // 0 disables the total limit, or set a suitable time limit

232-238:

❓ Verification inconclusive

Clarify the infinite loop around backoff.Retry.
After operation eventually succeeds, backoff.Retry returns nil, causing the code to loop again. This re-invokes the same retry flow indefinitely. If the intent is to re-subscribe upon stream closure in perpetuity, this is fine—but it might be more intuitive to break out once we have a successful subscription, re-entering only if a new error arises:

 for {
-   _, err = backoff.Retry(registration.ctx, operation, backoff.WithBackOff(expBackoff))
-   if err != nil {
-       return
-   }
+   _, subErr := backoff.Retry(registration.ctx, operation, backoff.WithBackOff(expBackoff))
+   if subErr != nil {
+       return
+   }
+   // Potentially break here if we want to stop re-subscribing
 }

Infinite Loop Behavior in backoff.Retry

The current code repeatedly invokes backoff.Retry in an infinite loop—even when the operation eventually succeeds (i.e. when backoff.Retry returns nil). If the goal is to re-subscribe every time the stream closes, then the logic is valid, but it would benefit from an inline comment clarifying this intent. Alternatively, if you prefer to cease retry attempts after a successful subscription, consider breaking out of the loop. For example:

 for {
-   _, err = backoff.Retry(registration.ctx, operation, backoff.WithBackOff(expBackoff))
-   if err != nil {
-       return
-   }
+   _, subErr := backoff.Retry(registration.ctx, operation, backoff.WithBackOff(expBackoff))
+   if subErr != nil {
+       return
+   }
+   // Optionally: break out here if you do not intend to re-subscribe on a successful operation.
 }

Please verify that the intended behavior—whether continuous re-subscription or a single successful subscription—is clearly documented and matches your overall design.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
pkg/sync/syncWorker.go (2)

192-194: Good implementation of exponential backoff.

Setting up the exponential backoff with an initial interval of 1 second is a good starting point. The code now uses a more standardized and robust approach for handling retries.

You might consider making the initial interval configurable through a constant or configuration parameter for better flexibility in different environments.


234-238: Consider returning a more descriptive error message.

The function returns an empty string as the first return value and a nil error when the stream disconnects. While this works correctly to reset the backoff, consider providing a more descriptive string for logging or debugging purposes.

-        return "", nil // try again from the start and reset backoff
+        return "Stream disconnected, resetting backoff", nil // try again from the start and reset backoff
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1a527e4 and f238185.

📒 Files selected for processing (1)
  • pkg/sync/syncWorker.go (3 hunks)
🔇 Additional comments (5)
pkg/sync/syncWorker.go (5)

7-7: Import of backoff library looks good.

The introduction of the github.com/cenkalti/backoff/v5 package is appropriate for implementing exponential backoff retry logic, which is a significant improvement over manual retry logic.


195-238: Well-structured operation function with proper resource management.

The operation function encapsulates the connection logic very well with several improvements:

  1. Proper resource cleanup using defer statements
  2. Clear error handling and logging
  3. Context cancellation check to prevent unnecessary retries
  4. Appropriate metrics tracking for connection success/failure

This approach significantly improves the readability and maintainability of the retry logic.


220-222: Good use of permanent error for context cancellation.

Using backoff.Permanent to stop retrying when the context is canceled is the correct approach. This ensures the retry mechanism stops gracefully when the operation should no longer be attempted.


240-246: Clean implementation of the retry loop.

The retry loop using backoff.Retry with the configured exponential backoff is much cleaner and more robust than a manual implementation. The error handling is also appropriate, exiting the loop only when necessary (context cancellation or permanent errors).


242-242:

✅ Verification successful

Verify backoff.Retry usage with context.

The backoff.Retry function is correctly used with the context through the registration.ctx parameter. This ensures that the retry mechanism respects context cancellation and doesn't continue retrying indefinitely.


🌐 Web query:

How to properly use context cancellation with cenkalti/backoff v5?

💡 Result:

To properly handle context cancellation with cenkalti/backoff v5, use the library's context integration features in these key patterns:

1. Using Retry with Context

The modern approach uses Retry which natively handles context cancellation:

import (
    "context"
    "github.com/cenkalti/backoff/v5"
)

operation := func() error {
    // Your context-aware operation
    if err := ctx.Err(); err != nil {
        return err // Abort on context cancellation
    }
    return doWork()
}

err := backoff.Retry(operation,
    backoff.WithContext(
        backoff.NewExponentialBackOff(),
        ctx,
    ),
)

2. Manual Control with BackOffContext

For granular control in loops:

b := backoff.WithContext(
    backoff.NewExponentialBackOff(),
    ctx,
)

for {
    select {
    case <-ctx.Done():
        return ctx.Err()
    default:
        nextWait := b.NextBackOff()
        if nextWait == backoff.Stop {
            return errors.New("max retries exceeded")
        }
        time.Sleep(nextWait)
        
        err := operation()
        if err == nil {
            return nil
        }
    }
}

Key Features [1][5][13]

  • Automatic Cancellation: Wrapped backoffs via WithContext() automatically return Stop when context is cancelled
  • Early Exit: Always check ctx.Err() in operations for immediate cancellation
  • Resource Cleanup: Use defer cancel() in parent context to ensure proper cleanup

Common Patterns

Timeout Enforcement:

ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()

backoff.Retry(
    func() error { return ctxAwareOperation(ctx) },
    backoff.WithContext(backoff.NewExponentialBackOff(), ctx),
)

Graceful Shutdown:

b := backoff.NewExponentialBackOff()
bCtx := backoff.WithContext(b, ctx)

for !isShuttingDown() {
    next := bCtx.NextBackOff()
    if next == backoff.Stop {
        break // Context cancelled or max retries reached
    }
    time.Sleep(next)
    // Retry logic...
}

Important Notes

  • Operations should be context-aware themselves for best results [6][13]
  • Combine with WithMaxRetries() for hybrid timeout/retry limits [1][5]
  • The library handles channel management internally in Retry() [13]

For complex scenarios, consider wrapping the backoff policy early using WithContext() to ensure all retry logic respects cancellation signals.

Citations:


Context cancellation in backoff.Retry is correctly implemented

The use of backoff.Retry(registration.ctx, operation, backoff.WithBackOff(expBackoff)) in pkg/sync/syncWorker.go (line 242) properly integrates context handling. Passing registration.ctx as the first argument ensures that retries will stop as soon as the context is canceled, which aligns with the best practices described for cenkalti/backoff v5.

No changes are needed here.

@mkysel mkysel merged commit 59645fa into main Mar 17, 2025
8 checks passed
@mkysel mkysel deleted the mkysel/retry branch March 17, 2025 20:36
mkysel added a commit that referenced this pull request Mar 20, 2025
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

- **Chores**
- Updated dependency management by consolidating and updating key
components for improved maintenance and compatibility.

- **Refactor**
- Enhanced connection stability with a more robust error handling and
exponential retry mechanism for smoother operations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants