Skip to content

Blazor - rendering metrics and tracing #61609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

pavelsavara
Copy link
Member

@pavelsavara pavelsavara commented Apr 22, 2025

Better rendering metrics

  • new meter Microsoft.AspNetCore.Components
    aspnetcore.components.navigation.count - Total number of route changes.
    aspnetcore.components.event.duration - Duration of processing browser event asynchronously.
    aspnetcore.components.event.exceptions - Duration of processing browser event asynchronously.

  • new meter Microsoft.AspNetCore.Components.Lifecycle
    aspnetcore.components.update_parameters.duration - Duration of processing component parameters asynchronously.
    aspnetcore.components.update_parameters.exceptions - Duration of processing component parameters asynchronously.
    aspnetcore.components.rendering.batch.duration - Duration of rendering batch.
    aspnetcore.components.rendering.batch.exceptions - Total number of exceptions during batch rendering.

Blazor activity tracing

  • new activity source Microsoft.AspNetCore.Components
  • Microsoft.AspNetCore.Components.OnCircuit: CIRCUIT {circuitId}
    • tags: circuit.id
    • links: HTTP activity
  • Microsoft.AspNetCore.Components.OnRoute: ROUTE {route} -> {componentType}
    • tags: circuit.id, component.type, route
    • links: HTTP trace, circuit trace
  • Microsoft.AspNetCore.Components.OnEvent: EVENT {attributeName} -> {componentType}.{methodName}
    • tags: circuit.id, component.type, component.method, attribute.name
    • links: HTTP trace, circuit trace, router trace

image

image

image

builder.Services.ConfigureOpenTelemetryMeterProvider(meterProvider =>
{
    meterProvider.AddMeter("Microsoft.AspNetCore.Components");
    meterProvider.AddMeter("Microsoft.AspNetCore.Components.Server.Circuits");
});
builder.Services.ConfigureOpenTelemetryTracerProvider(tracerProvider =>
{
    tracerProvider.AddSource("Microsoft.AspNetCore.Components");
    //tracerProvider.AddSource("Microsoft.AspNetCore.SignalR.Server");
});

Feedback

TODO - Metrics need to be documented at https://learn.microsoft.com/en-us/aspnet/core/log-mon/metrics/built-in

Out of scope

  • WASM

Contributes to #53613
Contributes to #29846
Feedback for #61516

@pavelsavara pavelsavara force-pushed the blazor_metrics_feedback branch from 328a584 to cebb68e Compare April 23, 2025 18:01
# Conflicts:
#	src/Components/Components/src/PublicAPI.Unshipped.txt
@pavelsavara pavelsavara changed the title Blazor - rendering metrics - feedback Blazor - rendering metrics and tracing Apr 24, 2025
@JamesNK
Copy link
Member

JamesNK commented Apr 24, 2025

You're adding a lot of metrics here. I think you should do some performance testing. There is performance overhead of metrics - they require some synronization when incrementing counters and recording values.

Having many low level metrics could cause performance issues.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 25, 2025

Having many low level metrics could cause performance issues.

I removed few and kept only the most useful ones.
The only aspnetcore.components.parameters.duration is per Blazor component.
It's async and executes customer's business logic.
If they have thousands of them they are in trouble anyway. And this will help them to figure it out.
The rest of them are per request, which should be OK.

I have 2 remaining issues

  • on metrics/duration histogram I'm unable to see any time bigger than 5 in dashboard. I assume it is the 0.005s bucket. Even if I have added 1s delay and validated stopwatch said 1.1s TotalSeconds. I think there is some problem with aggregation or display ? Or maybe all the other fast examples are drowning this one ?
  • on tracing/activities I would like to link HTTP activity from the place where SignalR is creating the Blazor circuit. So I capture it's Activity.Current.Context and use it later to AddLink() on my activity. In some cases it leads to the HTTP activity, but in may cases the HTTP activity is not in the dashboard at all. I'm thinking it could be sampling. I would like to skip the link if I know that the HTTP activity was not selected for sampling. But HTTP activity always (on dev machine without pressure) have .Recorded true and sometimes is missing on the dashboard anyway.

I would appreciate hints, many thanks! @noahfalk @JamesNK

@BrennanConroy
Copy link
Member

  • I would like to link HTTP activity from the place where SignalR is creating the Blazor circuit. So I capture it's Activity.Current.Context and use it later to AddLink() on my activity.

I don't know how Blazor circuits are created, but if it's from a Hub method then Activity.Current won't be the HTTP activity. We hop off the HTTP activity on purpose in SignalR:

// Hub invocation gets its parent from a remote source. Clear any current activity and restore it later.
var previousActivity = Activity.Current;
if (previousActivity != null)
{
Activity.Current = null;
}

  • in may cases the HTTP activity is not in the dashboard at all

Is that because the HTTP request is still running? I don't think activites show up in the dashboard until they're stopped, and if you're using SignalR you're likely using a websocket request which is long running.

@pavelsavara
Copy link
Member Author

I don't know how Blazor circuits are created, but if it's from a Hub method then Activity.Current won't be the HTTP activity. We hop off the HTTP activity on purpose in SignalR:

I'm capturing Activity.Current.Context in ComponentHub constructor, which happens before any Activity changes in the SignalR layer, and so I'm able to capture HTTP Activity. Capture works, I test the Name and I get the TraceId just fine.

Is that because the HTTP request is still running? I don't think activites show up in the dashboard until they're stopped, and if you're using SignalR you're likely using a websocket request which is long running.

This is it, thank you @BrennanConroy !

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 28, 2025

It's also topic to discuss for long running activities on Blazor.

  • current circuit - this is object in memory, with SignalR connection. Could be hours or even days long.
  • current route - this is more logical than physical. But it makes sense to link click event activities to the current route

We have 2 way how to deal with them I think

  • keep them open for the whole long duration
    • should we install them back to Activity.Current on next HTTP/SignalR request for child Activity sake ?
    • Should we use parent relationship instead of link ? I understood the feedback in 29846 to mean that long running parent makes bad UX in the dashboard.
  • close them soon - at the end of the current event/click/navigation
    • this makes them immediately visible in dashboard/OTEL and linkable
    • but we don't capture the true duration other sub-spans: disconnect, reconnect, close

Right now I have short+links implementation.

I guess developers use OTEL mostly in production and so even the long running traces would be recorded already.

But maybe developers also use it in inner dev loop ? In which case it would be great to have "trace preview" for thing that started but not stopped yet. To not get confused the same way as I did.

@pavelsavara pavelsavara requested a review from samsp-msft April 28, 2025 08:21
@pavelsavara pavelsavara marked this pull request as ready for review April 28, 2025 09:51
@pavelsavara pavelsavara requested a review from a team as a code owner April 28, 2025 09:51
@samsp-msft
Copy link
Member

Adding a general naming one here - Microsoft.AspNetCore.Components I would have no idea that this was for Blazor. I can understand why you may not want to directly put Blazor in the name, but maybe Microsoft.AspNetCore.UI or Microsoft.AspNetCore.UIComponents would be more obvious?

description: "Total number of exceptions during browser event processing.");

_parametersDuration = _meter.CreateHistogram(
"aspnetcore.components.parameters.duration",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a neophyte Blazor developer, I don't quite understand why you'd want metrics broken down to the level of parameters. I am probably misunderstanding what this represents?

Copy link
Member Author

@pavelsavara pavelsavara Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blazor "parameters" are properties of a component that can receive values from its parent component, marked with the [Parameter] attribute. They enable data to flow down from parent to child components. When Blazor parameters change, the component goes through a re-rendering cycle. I think it they are well defined term.

The duration measured here is the act of parameter propagation and the user business logic that is triggered by it.

See also https://learn.microsoft.com/en-us/aspnet/core/blazor/components/cascading-values-and-parameters?view=aspnetcore-9.0

Copy link
Member Author

@pavelsavara pavelsavara Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your feedback is that meaning of individual diagnostic instruments needs to be documented after we are done here.

cc @guardrex

Copy link
Member

@noahfalk noahfalk Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally an experienced Blazor developer should be able to make a good guess at the meaning of the metric given just its name. I'm fine to assume those devs understand the meaning of 'parameters'. As described above it sounded like 'parameters' is a noun that doesn't inherently have a notion of time duration associated with it? Perhaps we could name this something like aspnetcore.components.update_parameters.duration? I'm not sure if there is a better term than update that Blazor uses.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aspnetcore.components.update_parameters.duration

Sounds good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the granularity here is probably too fine. I don't think we need to be tracking this on a per component/parameter basis - there could potentially be hundreds of those on a page. I would suggest that we focus on what the end user will see which is that they take an action and that results in an update to the page. That will admittedly include a network round-trip, but understanding it from the server level is probably sufficient as it is what is in the developers control (unlike the network from the browser)

Copy link
Member Author

@pavelsavara pavelsavara Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user code that's running in the triggered events could make async HTTP call or database calls. If they do SELECT N+1 anti-pattern, it would be visible here.

Those problems are currently not easy to diagnose, especially if the components are from different vendors or teams.

I think it's good to know which component was rendered when state changed. How many times and how long it took.

The action they could take based on this data, is to cache/redesign data acquisition in their components or reduce number of components or tree depth. Maybe also reduce percentage of cases that the sub-tree is re-rendered on data propagation.

Maybe we could have separate meter called Microsoft.AspNetCore.Components.Lifecycle which have this finer granularity and Microsoft.AspNetCore.Components could be for the big events.

@danroth27 thoughts ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good to know which component was rendered when state changed. How many times and how long it took.

I agree

@pavelsavara
Copy link
Member Author

Adding a general naming one here - Microsoft.AspNetCore.Components I would have no idea that this was for Blazor.

Microsoft.AspNetCore.Components is a C# namespace in which whole Blazor lives since forever.

@noahfalk
Copy link
Member

Microsoft.AspNetCore.Components.OnCircuit: CIRCUIT {circuitId}
tags: circuit.id
links: HTTP activity
Microsoft.AspNetCore.Components.OnRoute: ROUTE {route} -> {componentType}
tags: circuit.id, component.type, route
links: HTTP trace, circuit trace
Microsoft.AspNetCore.Components.OnEvent: EVENT {attributeName} -> {componentType}.{methodName}
tags: circuit.id, component.type, component.method, attribute.name
links: HTTP trace, circuit trace, router trace

I'm a bit confused by the distributed tracing part of this. Partly that might be my limited experience with Blazor. I'm not sure what period of time is being measured by the different OnCircuit, OnRoute, OnEvent spans. For example I know what a Blazor circuit is but I don't know what 'OnCircuit' is measuring. Is this a span that measures the entire duration of 1 blazor circuit?

I'll probably have more questions once I understand what each of the spans represents.

@danroth27
Copy link
Member

danroth27 commented Apr 29, 2025

aspnetcore.components.rendering.batch.duration - Duration of rendering batch.

I don't think we currently define in our public Blazor docs what a render batch is. The only public mention of render batches that I could find is the CircuitOptions.MaxBufferedUnacknowledgedRenderBatches property. So, it's not clear to me what this value represents or whether it is useful. Should we be measuring something else that is more directly correlated the publicly documented component lifecycle? Or do we want to introduce the concept of a render batch in our docs?

aspnetcore.components.rendering.batch.exception - Total number of exceptions during batch rendering.

Again, since render batch isn't currently a publicly defined concept, should we be counting the number of exceptions per some other period?

aspnetcore.components.event.duration - Duration of processing browser event asynchronously.

I assume this is an average duration of all browser event handlers across the entire app regardless of render mode. That seems reasonable as a high-level view of the responsiveness of the app. But what does the "asynchronously" imply? Are synchronous event handlers not included in this metric?

aspnetcore.components.parameters.duration - Duration of processing component parameters asynchronously.

I'm not sure what's included in the "processing" of component parameters. Is this the duration of the OnParametersSet component lifecycle event?

aspnetcore.components.navigation.count - Total number of route changes.

Is this a total count of all page navigations across the entire app regardless of render mode? What would that be used for?

@noahfalk
Copy link
Member

What would that be used for?

I was under the (perhaps mistaken?) impression that the scenarios using the metrics had already been looked at and appropriate metrics identified. If not, perhaps a good starting point is to identify what diagnostic questions we'd like users to be able to solve here. Usually I'd recommend:

  • P0 - Is my service healthy? (Most responses are successful/correct, latency is reasonable)
  • P1 - What is the load on my service? (How many requests/sec?)
  • P1 - How many resources is my service using? (Often CPU, memory, disk are already measured elsewhere, but maybe a service would have some specialized resource it consumes that needs to be measured?)
  • P2 - Why is my service unhealthy? Metrics can be used to narrow down or even root cause problems but beware the slippery slope of adding too many metrics that are only useful in ever more niche cases. Its not unreasonable to require logs or distributed tracing for problem diagnosis.

pavelsavara and others added 3 commits April 30, 2025 12:22
Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>
Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>
Co-authored-by: Noah Falk <noahfalk@users.noreply.github.com>
@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

P0-P2 - This is useful angle, thanks!

P0 - Is my service healthy? (Most responses are successful/correct, latency is reasonable)

  • aspnetcore.components.event.exceptions - click to which component causes what type of exception. Stats, not trace, this doesn't tell you which sub component failed.
  • aspnetcore.components.event.duration - click to which component is slow ?

P1 - What is the load on my service? (How many requests/sec?)

  • aspnetcore.components.circuits.count - how many sessions I processed today ?
  • aspnetcore.components.event.duration - this is also counter, how many clicks I processed today ?
  • aspnetcore.components.circuits.duration - this is session duration, interesting for many other KPIs
  • aspnetcore.components.navigation.count - how many different Blazor pages people visited ? Route as tag.

P1 - How many resources is my service using?

  • aspnetcore.components.circuits.active_circuits - proxy to how much memory the sessions state holds ?
  • aspnetcore.components.circuits.connected_circuits - how many signalR connections are open ?
  • Delta between the two above is about WebSocket disconnect/re-connect, network quality, browser tab going to sleep etc.

P2 - Why is my service unhealthy?

  • aspnetcore.components.parameters.exceptions - which component failed ? Exception type tag/stats, no trace.
  • aspnetcore.components.rendering.batch.exceptions - which component failed ?
  • aspnetcore.components.parameters.duration - which component makes my app rendering slow ?
  • aspnetcore.components.rendering.batch.duration - which component makes events slow ? Also how many UI elements were in the diff ?

Note, I also mention aspnetcore.components.circuits metrics which already landed before, but we can improve them too if you want.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

For example I know what a Blazor circuit is but I don't know what 'OnCircuit' is measuring. Is this a span that measures the entire duration of 1 blazor circuit?

This goes back to my questions about long running activities.
That activity is relatively short lived at the moment, compared to whole life-time of the circuit. We can change that with consequences for inner dev loop.

We can definitely improve naming.

OnCircuit - is representing logical circuit (duration), but at the moment, we stop it earlier.
OnRoute - is representing logical route/page in the app. Logicaly it should be active until you navigate elsewhere, but right now we stop it early. It links to circuit (when Blazor interactive).
OnEvent - in Blazor interactive, something was clicked. It typically happens inside of specific route and circuit which are linked.

Right now, the short circuit and route activities mostly serve as something that click event activities could link to. For the context.

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

aspnetcore.components.rendering.batch.duration - Duration of rendering batch.

I don't think we currently define in our public Blazor docs what a render batch is.

Maybe we just need to rename it? Anyway, this is more on the troubleshooting side of misbehaving component. Producing long diffs/batches leads to network traffic, latency and slow rendering.

As I suggested above, we could have separate namespace for it with separate opt-in.
Or we could drop it and circle back in the future.

aspnetcore.components.rendering.batch.exception - Total number of exceptions during batch rendering.
should we be counting the number of exceptions per some other period?

We also count exceptions per click/event. But I need to see if the exceptions from batch related problems would appear there.

aspnetcore.components.event.duration - Duration of processing browser event asynchronously.

I assume this is an average duration of all browser event handlers across the entire app regardless of render mode.

At the moment this works only for SignalR interactive. I think we could also make it work for form-submit.
Making it work for WASM means that we need to fix OTEL for WASM and implement some publishing for it.
It's out of scope for Net10.

what does the "asynchronously" imply?

I already renamed this and dropped "async". It means including your DB request or whatever async business logic.

Is this the duration of the OnParametersSet component lifecycle event?

Yes, or OnInitialized.

aspnetcore.components.navigation.count - Total number of route changes.

Is this a total count of all page navigations across the entire app regardless of render mode?

Except WASM.

What would that be used for?

It has the route pattern as tag/dimension that you can use as filter. It's more business oriented KPI. Which of my pages are hot ?

@pavelsavara
Copy link
Member Author

pavelsavara commented Apr 30, 2025

Making circuit/route activity/trace long lived has troubles with re-installing them into Activity.Current.
And then troubles display of hours long one with sub-spans in the UI.
So, I think it's little benefit.

If we keep them short, maybe they should be literally 0ms long. Just an context anchor, grouping other traces.

Re Activity names: they are not very visible in the Aspire UI, and DisplayName prevails.

Circuit Activity/trace is created in internal ComponentHub.StartCircuit and so maybe StartCircuit is good name. There is also public API CircuitHandler.OnCircuitOpenedAsync, but I dislike Async postfix and also OnCircuitOpenedAsync happens short while later, being triggered by another message from JS. But OnCircuitOpened would be my second choice.

Route Activity/trace is created in Router.SetParametersAsync -> Router.Refresh. Not great names for the activity.
Maybe we can change it to OnRouteChanged

Regarding click/event. We already have concept of event. The activity should be active thru whole duration of DispatchEventAsync and would be parent for any business logic and distributed HTTP client traces.

I would like event Activity also trigger for form submit, interop call from JS, and enhanced navigation.

Maybe we can change it to Event or BrowserEvent ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-blazor Includes: Blazor, Razor Components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants