-
Notifications
You must be signed in to change notification settings - Fork 21
More sync metrics #634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More sync metrics #634
Conversation
WalkthroughThis change refactors several metric emission functions and variable names across multiple packages by renaming generic metrics to more context-specific ones (e.g., "payer" and "indexer"). In addition, new synchronization metrics and a connection status counter have been introduced, along with improved error propagation in envelope processing. The underlying control flow remains largely unchanged aside from these naming updates and minor adjustments in error handling. Changes
Sequence Diagram(s)sequenceDiagram
participant SW as syncWorker
participant MSC as SyncConnectionsStatusCounter
participant VE as validateAndInsertEnvelope
participant IE as insertEnvelope
SW->>MSC: NewSyncConnectionsStatusCounter(nodeID)
SW->>VE: validateAndInsertEnvelope(envelope)
alt Validation Fails
SW->>MSC: MarkFailure()
else Validation Succeeds
SW->>IE: insertEnvelope(envelope)
alt Insertion Fails
SW->>MSC: MarkFailure()
else Insertion Succeeds
SW->>MSC: MarkSuccess()
end
end
SW->>MSC: Close()
Possibly related PRs
Suggested reviewers
Warning There were issues while running some tools. Please review the errors and either fix the tool’s configuration or disable the tool if it’s a critical failure. 🔧 golangci-lint (1.62.2)Error: can't load config: the Go language version (go1.23) used to build golangci-lint is lower than the targeted Go version (1.24) Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pkg/metrics/sync.go (2)
8-15
: Consider retaining the explanatory comment or providing doc comments for clarity.Line 8 points out that this metric "should be a counter, but it does not support set and we can not rely on memory state." To avoid confusion for fellow developers and maintainers, consider either elaborating on this in the function-level doc comments or linking to relevant Prometheus documentation. This helps justify why a gauge is used here instead of a counter.
48-54
: Unify connection-related metrics for better maintainability.You are currently managing multiple related metrics:
•syncOutgoingSyncConnections
(gauge)
•syncFailedOutgoingSyncConnections
(gauge)
•syncFailedOutgoingSyncConnectionCounter
(counter)Consider combining them into a single GaugeVec or CounterVec, potentially with a status label (e.g., "failed", "open") to minimize duplication and reduce the surface area for mistakes. This also improves maintainability and consistency.
Also applies to: 55-60, 62-68
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
pkg/api/payer/nodeCursorTracker.go
(1 hunks)pkg/api/payer/service.go
(3 hunks)pkg/blockchain/nonceManager.go
(1 hunks)pkg/blockchain/rpcLogStreamer.go
(2 hunks)pkg/metrics/indexer.go
(1 hunks)pkg/metrics/metrics.go
(1 hunks)pkg/metrics/payer.go
(2 hunks)pkg/metrics/sync.go
(1 hunks)pkg/sync/syncWorker.go
(10 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Push Docker Images to GitHub Packages (xmtpd-cli)
- GitHub Check: Test (Node)
- GitHub Check: Build pre-baked anvil-xmtpd
- GitHub Check: Push Docker Images to GitHub Packages (xmtpd)
🔇 Additional comments (33)
pkg/blockchain/nonceManager.go (1)
88-88
: LGTM: Renamed metric with contextual prefixThe metric has been appropriately renamed from
EmitCurrentNonce
toEmitPayerCurrentNonce
, which aligns with the PR objective of improving metric naming for better searchability.pkg/api/payer/nodeCursorTracker.go (1)
105-108
: LGTM: Renamed metric with payer context prefixThe metric has been successfully renamed from
EmitBlockUntilDesiredCursorReached
toEmitPayerBlockUntilDesiredCursorReached
while maintaining the same parameters, improving metric organization.pkg/blockchain/rpcLogStreamer.go (2)
198-198
: LGTM: Renamed metric with indexer context prefixThe metric has been appropriately renamed from
EmitCurrentBlock
toEmitIndexerCurrentBlock
, providing better context and searchability.
229-229
: LGTM: Renamed metric with indexer context prefixThe metric has been successfully renamed from
EmitNumLogsFound
toEmitIndexerNumLogsFound
, maintaining consistency with the indexer context.pkg/metrics/metrics.go (1)
71-84
: LGTM: Improved metric organization with contextual prefixesThe metrics registration has been updated to use consistently prefixed metric names:
- "indexer" prefix for blockchain indexing metrics
- "payer" prefix for payment-related metrics
- "sync" prefix for synchronization metrics
The changes align perfectly with the PR objectives to enhance sync metrics and rename metrics for better searchability. The new sync metrics will provide valuable insights into synchronization operations.
pkg/api/payer/service.go (3)
185-185
: Updated metric prefix for consistencyThe banlist retries metric has been renamed to use the
Payer
prefix, aligning with the PR's objective to facilitate better metric filtering.
233-234
: Renamed metrics with payer prefixThe node publish duration and message originated metrics have been updated with the
Payer
prefix, improving metric categorization.
329-330
: Standardized metrics in blockchain publishingConsistent with other changes, these metrics have been updated to use the
Payer
prefix, ensuring uniformity throughout the codebase.pkg/sync/syncWorker.go (11)
170-171
: Added connection status tracking metricsNew connection status counter was added to track the health of node connections for synchronization.
181-182
: Added metrics for failed node connectionsThis change tracks failure cases when a node isn't found in the registry, providing visibility into sync issues.
187-188
: Added metrics for invalid node configurationsNew metric tracking for nodes with invalid configuration, improving observability.
212-213
: Added metrics for connection failures during retriesConnection failures during retry attempts are now recorded in metrics, enhancing monitoring capabilities.
233-234
: Added success metrics for node connectionsNew metric tracking for successful connections, completing the connection lifecycle monitoring.
393-395
: Updated error handling approachThe code now explicitly ignores errors from
validateAndInsertEnvelope
to continue processing subsequent envelopes even when one fails.
416-422
: Improved error propagation with metricsThe function now returns errors and adds metrics tracking for originator error messages, enhancing observability.
439-441
: Added new sync metricsNew metrics for tracking last seen sequence IDs and received message counts have been added, improving sync monitoring.
463-466
: Improved fee calculation clarityFee calculation has been restructured for better readability, separating base and congestion fees.
482-488
: Enhanced error handling in envelope insertionThe function now returns errors instead of just logging them, allowing for better error propagation.
521-523
: Changed log level for duplicate envelopesChanged log level from warning to info for already inserted envelopes, as this is an expected case rather than a warning condition.
pkg/metrics/indexer.go (8)
10-16
: Renamed log streamer metrics with indexer prefixUpdated the logs counter variable and its metric name to include the
indexer
prefix, improving metric categorization and searchability.
18-24
: Renamed current block metric with indexer prefixBlock tracking gauge has been updated with an
indexer
prefix for consistency and better filtering.
26-33
: Renamed duration metrics with indexer prefixLog retrieval duration histogram now includes an
indexer
prefix for improved metric organization.
35-41
: Renamed request counter with indexer prefixRequest counter now has a consistent
indexer
prefix, aligning with other metrics in this module.
43-46
: Updated emission function to match renamed metricFunction name updated to reflect the new
indexer
prefix convention.
48-51
: Updated block tracking function to match renamed metricConsistent renaming pattern applied to the block tracking function.
53-56
: Updated duration emission function to match renamed metricDuration tracking function updated with the
indexer
prefix for consistency.
62-65
: Updated function call to use new metrics nameUpdated the measurement function to use the newly renamed metrics function.
pkg/metrics/payer.go (5)
8-19
: Renamed node publish metrics with payer prefixThe node publish duration histogram and its associated emission function have been updated with a
payer
prefix for better categorization.
21-32
: Renamed cursor block time metrics with payer prefixThe cursor block time histogram and its associated emission function have been updated with a
payer
prefix for consistent naming.
34-44
: Renamed nonce metrics with payer prefixThe current nonce gauge and its associated emission function have been updated with a
payer
prefix to align with other metrics.
55-58
: Renamed banlist retries function with payer prefixUpdated the banlist retries function to use the
payer
prefix, consistent with other metric changes.
60-71
: Renamed messages originated metrics with payer prefixThe messages originated counter and its associated emission function have been updated with a
payer
prefix for better filtering.pkg/metrics/sync.go (1)
70-117
:✅ Verification successful
Address concurrent access to struct fields if used across goroutines.
The
SyncConnectionsStatusCounter
struct maintains boolean fields (hasFailed
andhasSucceeded
) that get updated inMarkFailure()
,MarkSuccess()
, andClose()
. If the same instance is accessed concurrently by multiple goroutines, these fields and their corresponding gauge increments/decrements can lead to data races or inconsistent gauge states. Ensure that each instance ofSyncConnectionsStatusCounter
is used by only one goroutine at a time, or introduce a synchronization mechanism (e.g. a mutex) around these field updates.Below is a script that you can run to locate all uses of
MarkFailure
,MarkSuccess
, andClose
across the codebase, so you can confirm whether multiple goroutines may invoke them simultaneously:
🏁 Script executed:
#!/bin/bash # Search the repository for potential concurrent usages of SyncConnectionsStatusCounter methods. rg -A 5 -B 3 'MarkFailure|MarkSuccess|Close'Length of output: 65960
Re: Concurrency Safety of SyncConnectionsStatusCounter
After investigating repository usage, it appears that each instance of
SyncConnectionsStatusCounter
is created and used locally within a single goroutine (for example, inpkg/sync/syncWorker.go
). The methodsMarkFailure()
,MarkSuccess()
, andClose()
are invoked sequentially within their respective execution paths, so no concurrent access of its internal boolean fields is observed at this time.If a future change requires sharing an instance across multiple goroutines, please ensure to introduce a synchronization mechanism (e.g., a mutex) around these field updates and the corresponding gauge operations.
func (fc *SyncConnectionsStatusCounter) Close() { | ||
if fc.hasFailed { | ||
fc.hasFailed = false | ||
syncFailedOutgoingSyncConnections.Dec() | ||
} | ||
if fc.hasSucceeded { | ||
fc.hasSucceeded = false | ||
syncOutgoingSyncConnections.Dec() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider marking sync connections as closed once.
Repeated calls to Close()
could cause gauge inconsistencies if invoked more than once on the same instance. Because there is no "closed" field or check, a second call to Close()
might wrongly decrement gauges again if this code changes in the future. Consider adding a dedicated "closed" state to guard against double-closing or re-opening actions.
.. also rename all the metrics with a prefix for easier grepping <!-- This is an auto-generated comment: release notes by coderabbit.ai --> - **New Features** - Introduced enhanced tracking metrics for synchronization, enabling improved monitoring of connection performance and message flows. - **Refactor** - Standardized metrics naming across operations to offer clearer insights on system performance. - Refined error handling in synchronization processes for more reliable operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
.. also rename all the metrics with a prefix for easier grepping
Summary by CodeRabbit
New Features
Refactor