Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-17006 cart: Publish Mercury counters as metrics #15870

Merged
merged 3 commits into from
Feb 14, 2025

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Feb 7, 2025

When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters as DAOS
telemetry for consumption by monitoring infrastructure.

Change-Id: I3b0bcb260ad970798ac1cd838f8469c4cfbede55
Signed-off-by: Michael MacDonald mjmac@google.com

Copy link

github-actions bot commented Feb 7, 2025

Ticket title is 'Expose Mercury perf counters as DAOS metrics'
Status is 'In Review'
https://daosio.atlassian.net/browse/DAOS-17006

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15870/1/testReport/

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15870/2/testReport/

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15870/2/testReport/

@mjmac mjmac changed the title WIP: republish mercury counters as engine metrics DAOS-17006 cart: Publish Mercury counters as metrics Feb 10, 2025
@mjmac
Copy link
Contributor Author

mjmac commented Feb 10, 2025

@soumagne: FYI... Do you see any major problems with the approach? I think the additional overhead for this should be pretty small.

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15870/3/testReport/

@mjmac mjmac force-pushed the mjmac/DAOS-17006 branch 2 times, most recently from 91c22aa to 310ee3e Compare February 10, 2025 21:44
@soumagne
Copy link
Collaborator

I think that looks good to me, thanks! Right there should not be any significant overhead.

When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters
as DAOS telemetry for consumption by monitoring
infrastructure.

Features: telemetry
Skip-nlt: true
Change-Id: I3b0bcb260ad970798ac1cd838f8469c4cfbede55
Signed-off-by: Michael MacDonald <mjmac@google.com>
@mjmac mjmac marked this pull request as ready for review February 11, 2025 13:40
@mjmac mjmac requested a review from frostedcmos February 11, 2025 13:40
Copy link
Contributor

@frostedcmos frostedcmos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the skype discussions I was under the impression that we also require 'reset counters' call to be performed after each query -- is the plan different now on how counters will be consumed in the end?

One other comment inline.
LGTM otherwise.

@mjmac
Copy link
Contributor Author

mjmac commented Feb 11, 2025

Based on the skype discussions I was under the impression that we also require 'reset counters' call to be performed after each query -- is the plan different now on how counters will be consumed in the end?

We don't need to reset the counters. There was a misunderstanding about what was needed to re-export the mercury diagnostics via DAOS metrics.

frostedcmos
frostedcmos previously approved these changes Feb 11, 2025
@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15870/6/testReport/

  * convert active_rpcs metric to gauge
  * remove debug repub_count metrics from ftest

Features: telemetry
Skip-nlt: true
Change-Id: I44d2528f5d3fc55069cfc66f2fb387723d3b8c81
Signed-off-by: Michael MacDonald <mjmac@google.com>
frostedcmos
frostedcmos previously approved these changes Feb 11, 2025
Allow-unstable-test: true
Features: telemetry

Change-Id: I0232d0da8007374fd1d28d395c65544c7fa57bc1
Signed-off-by: Jeff Olivier <jeffolivier@google.com>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15870/8/testReport/

@jolivier23 jolivier23 merged commit 2f1678f into google/2.6 Feb 14, 2025
68 of 72 checks passed
@jolivier23 jolivier23 deleted the mjmac/DAOS-17006 branch February 14, 2025 16:25
mjmac added a commit that referenced this pull request Feb 24, 2025
When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters
as DAOS telemetry for consumption by monitoring
infrastructure. NB: Requires Mercury > 2.4.0.

Change-Id: I0232d0da8007374fd1d28d395c65544c7fa57bc1
Signed-off-by: Michael MacDonald <mjmac@google.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
mjmac added a commit that referenced this pull request Feb 24, 2025
When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters
as DAOS telemetry for consumption by monitoring
infrastructure. NB: Requires Mercury > 2.4.0.

Change-Id: I0232d0da8007374fd1d28d395c65544c7fa57bc1
Signed-off-by: Michael MacDonald <mjmac@google.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
mjmac added a commit that referenced this pull request Feb 24, 2025
When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters
as DAOS telemetry for consumption by monitoring
infrastructure. NB: Requires Mercury > 2.4.0.

Change-Id: I0232d0da8007374fd1d28d395c65544c7fa57bc1
Signed-off-by: Michael MacDonald <mjmac@google.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
Co-authored-by: Nicholas Murphy <ncmurphy@google.com>
mjmac added a commit that referenced this pull request Feb 24, 2025
When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters
as DAOS telemetry for consumption by monitoring
infrastructure. NB: Requires Mercury > 2.4.0.

Change-Id: I0232d0da8007374fd1d28d395c65544c7fa57bc1
Signed-off-by: Michael MacDonald <mjmac@google.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
Co-authored-by: Nicholas Murphy <ncmurphy@google.com>
mjmac added a commit that referenced this pull request Feb 27, 2025
When Mercury has been built with diagnostic RPC counters
enabled, CaRT will periodically republish the counters
as DAOS telemetry for consumption by monitoring
infrastructure. NB: Requires Mercury > 2.4.0.

Signed-off-by: Michael MacDonald <mjmac@google.com>
Co-authored-by: Jeff Olivier <jeffolivier@google.com>
Co-authored-by: Nicholas Murphy <ncmurphy@google.com>
@mjmac mjmac mentioned this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants