-
Notifications
You must be signed in to change notification settings - Fork 21
Add payer metrics #622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add payer metrics #622
Conversation
WalkthroughThe changes add new metric tracking across several modules. In the API payer package and blockchain nonce manager, calls to emit performance and operation metrics have been inserted into key methods. These modifications record durations, retry counts, message counts, and nonce values. Additionally, new Prometheus metrics collectors and emission functions are defined in the metrics package, extending observability without altering the core control flows. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant NodeCursorTracker
participant Metrics
Client->>NodeCursorTracker: BlockUntilDesiredCursorReached(desiredCursor)
NodeCursorTracker->>NodeCursorTracker: startTime = time.Now()
NodeCursorTracker-->>NodeCursorTracker: Wait until desired cursor is reached
NodeCursorTracker->>Metrics: EmitBlockUntilDesiredCursorReached(originatorId, duration)
NodeCursorTracker-->>Client: Return result
sequenceDiagram
participant Client
participant Service
participant Metrics
Client->>Service: publishToNodeWithRetry(...)
Service-->>Service: Attempt node publish + retry loop
alt Retry occurred
Service->>Metrics: EmitBanlistRetries(originatorId, retries)
end
Client->>Service: publishToNodes(...)
Service->>Metrics: EmitNodePublishDuration(originatorId, duration)
Service->>Metrics: EmitMessageOriginated(originatorId, messageCount)
Client->>Service: publishToBlockchain(...)
Service->>Metrics: EmitNodePublishDuration(originatorId, duration)
Service->>Metrics: EmitMessageOriginated(originatorId, messageCount)
sequenceDiagram
participant Caller
participant NonceManager
participant Metrics
Caller->>NonceManager: GetNonce()
NonceManager->>Metrics: EmitCurrentNonce(nonce)
NonceManager-->>Caller: Return NonceContext
Possibly related PRs
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 golangci-lint (1.62.2)Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24) Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/api/payer/service.go (2)
329-331
: Consider using a more specific identifier for blockchain metricsWhile the metrics implementation is good, consider using a more descriptive name for the blockchain originator ID to distinguish these metrics from node-related metrics.
-metrics.EmitNodePublishDuration(desiredOriginatorId, time.Since(start).Seconds()) -metrics.EmitMessageOriginated(desiredOriginatorId, 1) +metrics.EmitBlockchainPublishDuration(desiredOriginatorId, time.Since(start).Seconds()) +metrics.EmitBlockchainMessageOriginated(desiredOriginatorId, 1)This would make it clearer in your metrics dashboard which publishes are going to the blockchain versus nodes.
233-234
: Consider implementing a helper function to reduce duplicationThe metrics emission code is repeated in both
publishToNodes
andpublishToBlockchain
methods. This could be extracted to a helper function to improve maintainability.+func (s *Service) emitPublishMetrics(originatorID uint32, start time.Time, messageCount int) { + metrics.EmitNodePublishDuration(originatorID, time.Since(start).Seconds()) + metrics.EmitMessageOriginated(originatorID, messageCount) +} func (s *Service) publishToNodes( // existing code... - metrics.EmitNodePublishDuration(originatorID, time.Since(start).Seconds()) - metrics.EmitMessageOriginated(originatorID, len(payerEnvelopes)) + s.emitPublishMetrics(originatorID, start, len(payerEnvelopes)) return resp.OriginatorEnvelopes, nil } func (s *Service) publishToBlockchain( // existing code... - metrics.EmitNodePublishDuration(desiredOriginatorId, time.Since(start).Seconds()) - metrics.EmitMessageOriginated(desiredOriginatorId, 1) + s.emitPublishMetrics(desiredOriginatorId, start, 1)Also applies to: 329-331
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
pkg/api/payer/nodeCursorTracker.go
(3 hunks)pkg/api/payer/service.go
(5 hunks)pkg/blockchain/nonceManager.go
(2 hunks)pkg/metrics/metrics.go
(1 hunks)pkg/metrics/payer.go
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: Push Docker Images to GitHub Packages (xmtpd)
- GitHub Check: Build pre-baked anvil-xmtpd
- GitHub Check: Test (Node)
🔇 Additional comments (16)
pkg/metrics/metrics.go (1)
75-79
: Good enhancement of metrics collectionThe addition of these new metrics enhances observability for payer operations, which should help with monitoring and debugging. This approach of adding metrics to the existing collection mechanism is clean and follows the established pattern.
pkg/metrics/payer.go (4)
8-19
: LGTM: Good implementation of the node publish duration metricThe histogram is properly defined with appropriate namespace, help text, and label for originator_id.
21-32
: LGTM: Well-structured cursor block time metricThe metric is properly defined with a descriptive name and help text. The implementation follows best practices by using a histogram with the originator_id label.
34-44
: LGTM: Appropriate use of gauge for nonce trackingGood choice using a gauge for the nonce value as it represents a single changing value rather than a distribution or cumulative count.
60-71
: LGTM: Well-implemented message counterThe counter for originated messages is properly implemented with appropriate labels and emission function.
pkg/blockchain/nonceManager.go (2)
3-10
: LGTM: Clean import additionThe metrics package is properly imported.
54-55
: LGTM: Good instrumentation of nonce generationThe metrics emission is placed in the right location - after the nonce is generated and logged, but before it's returned.
pkg/api/payer/nodeCursorTracker.go (3)
3-12
: LGTM: Clean import additionsThe necessary imports for metrics and time are added appropriately.
50-51
: LGTM: Simple and effective timing setupUsing time.Now() at the start of the method is the right approach for measuring duration.
105-108
: LGTM: Well-positioned metrics emissionThe metrics emission is correctly placed where the desired sequence ID is reached, and it properly uses time.Since(start).Seconds() to calculate the duration.
pkg/api/payer/service.go (6)
6-6
: Good addition of metrics package importAdding metrics capabilities to this service is a valuable enhancement for observability.
184-186
: Appropriate conditional metrics emission for banlist retriesThe conditional check ensures the metric is only emitted when retries actually occur, avoiding unnecessary metric noise for successful first attempts.
225-225
: Useful timing capture for node publish operationsCorrectly positioned timestamp capture to measure the duration of the node publishing operation.
233-234
: Good metrics instrumentation for node publishingThe metrics track both the duration of the publish operation and the number of messages originated, providing valuable performance and throughput insights.
268-269
: Appropriate timing capture for blockchain publish operationsCorrectly positioned timestamp capture at the beginning of the blockchain publishing flow.
184-186
:✅ Verification successful
Consider error handling for metrics emission
The metrics emission functions don't appear to have any error handling. Verify that failures in metrics recording won't impact the service's core functionality.
Let's check how the metrics package handles errors in the implementation of these functions:
Also applies to: 233-234, 329-331
🏁 Script executed:
#!/bin/bash # Check how metrics functions are implemented and if they handle errors grep -r "func Emit" pkg/metrics/ --include="*.go" -A 10Length of output: 3958
Metrics Emission: No additional error handling required
After verifying the implementation in the
pkg/metrics
package, the metrics emission functions (includingEmitBanlistRetries
) simply invoke Prometheus instrumentation methods (e.g.,.With()
,.Observe()
,.Add()
), which do not return errors. Their design is fire‑and‑forget and does not impact the service’s core functionality if a metrics recording failure were to occur. This pattern is consistent across similar metrics functions in the package.
8970bfe
to
ddf910b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
pkg/metrics/payer.go (3)
10-10
: Inconsistent naming convention in metric nameThe metric name
xmtp_payer__node_publish_duration_seconds
uses a double underscore between "payer" and "node", while other metrics in this file use a single underscore. Consider standardizing the naming convention across all metrics.- Name: "xmtp_payer__node_publish_duration_seconds", + Name: "xmtp_payer_node_publish_duration_seconds",
21-32
: Function name doesn't match metric name conventionThe function name
EmitBlockUntilDesiredCursorReached
doesn't reflect the new metric namexmtp_payer_read_own_commit_in_time_seconds
that was agreed upon in the discussion. Consider renaming the function to match the metric's purpose for better code clarity.-func EmitBlockUntilDesiredCursorReached(originatorId uint32, duration float64) { +func EmitReadOwnCommitDuration(originatorId uint32, duration float64) { cursorBlockTime.With(prometheus.Labels{"originator_id": strconv.Itoa(int(originatorId))}). Observe(duration) }
48-49
: Consider shortening the metric nameThe metric name
xmtp_payer_failed_attempts_to_publish_to_node_via_banlist
is quite verbose. Consider a more concise name that still captures the essence of what's being measured.- Name: "xmtp_payer_failed_attempts_to_publish_to_node_via_banlist", + Name: "xmtp_payer_banlist_publish_retries",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
pkg/api/payer/nodeCursorTracker.go
(3 hunks)pkg/api/payer/service.go
(5 hunks)pkg/blockchain/nonceManager.go
(2 hunks)pkg/metrics/metrics.go
(1 hunks)pkg/metrics/payer.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (4)
- pkg/metrics/metrics.go
- pkg/blockchain/nonceManager.go
- pkg/api/payer/nodeCursorTracker.go
- pkg/api/payer/service.go
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Push Docker Images to GitHub Packages (xmtpd-cli)
- GitHub Check: Push Docker Images to GitHub Packages (xmtpd)
- GitHub Check: Build pre-baked anvil-xmtpd
- GitHub Check: Test (Node)
🔇 Additional comments (3)
pkg/metrics/payer.go (3)
34-44
: LGTM: Well-implemented gauge metricThe currentNonce gauge implementation is clean and includes a helpful comment about thread safety. The naming and help text are clear and informative.
60-71
: LGTM: Well-implemented counter metricThe messagesOriginated counter implementation is straightforward and correctly uses the Add method for incrementing. The naming and help text are clear.
1-71
:✅ Verification successful
Missing metric registration with Prometheus registry
None of the defined metrics are registered with the Prometheus registry in this file. Ensure that these metrics are registered somewhere in the codebase, typically during application initialization.
🏁 Script executed:
#!/bin/bash # Verify if metrics are registered somewhere in the codebase # Look for metrics registration patterns in the codebase echo "Searching for prometheus.Register or prometheus.MustRegister patterns..." rg "prometheus\.(Must)?Register" -t goLength of output: 271
Metrics registration verified in the codebase
The metrics defined in
pkg/metrics/payer.go
are indeed registered through theregisterCollectors
function inpkg/metrics/metrics.go
. No changes are required.
Far from complete, but a step in the right direction.
Summary by CodeRabbit
New Features
Chores