runtime/pprof: support efficient accumulation of custom event count profiles #18454

josharian · 2016-12-28T21:49:40Z

I want to be able to gather pprof-esque information about instantaneous events that have occurred over the lifetime of my program, possibly with sampling for performance reasons.

This is similar to what the heap profile does, at least when used with alloc_space and alloc_objects--it tracks memory allocations over a long period.

The existing runtime/pprof custom profile API seems ill-suited to this. (Insofar as I understand it. See #18453.) One could accomplish it by inventing a unique key for Add and never Remove anything. However, this will result in a giant, ever-growing map. It would be far more efficient to just keep a counter per pc, as many of the runtime-provided profiles do. It might be worth considering adding a different kind of custom profile geared more towards this use case.

I don't have a concrete proposal, since I haven't thought about this deeply. This issue is just intended to open discussion, particularly since pprof labels are coming for 1.9, and it might be worth considering how they interact with custom profiles--hopefully productively.

cc @matloob

rsc · 2017-01-09T22:05:00Z

@josharian Something like

// SetSampleRate sets the profile to record samples randomly with probability 1/n.
// It must be called before any calls to Add, Remove, or Event.
func (*Profile) SetSampleRate(n int)

// Event records that 'weight' events happened.
// For a simple event counter, weight should be 1.
// If events have different expenses, weight can vary according to the expense.
// A given Profile should be populated using Add/Remove or Event, but not both.
func (*Profile) Event(weight float64)

?

The comment text is bad but you get the idea. This would capture the kinds of things that we do for the mutex profile as well as basic counters.

I'd like to make sure we get the other pprof changes through first, but this seems like a reasonable followup.

josharian · 2017-01-09T22:37:40Z

@rsc yes! SGTM.

josharian · 2017-01-30T03:13:51Z

Opinions about implementing this using a probabilistic data structure like count-min-sketch-based top-k?

josharian · 2017-02-13T20:00:16Z

Ping for opinions. (Or if someone else is going to implement this, that's great, I'll just wait patiently.)

rsc · 2017-02-13T20:07:59Z

@josharian Sorry, looks like I dropped a bunch of Github notifications about two weeks ago.

You have more context here than I do, but I don't think a new data structure is required for the API I sketched. There's nothing about "top N" in the usual profiles; it's supposed to be a representative sample of the overall behavior, not just the "weighty" behavior.

I think all that is needed is a func shouldSample(rate int, weight float64) bool that determines whether the event is added to the profile (and then never removed). I believe that code (if not that precise function) already exists for deciding whether to sample a memory allocation. Then the profile itself is a map[stack]float64. I don't know that we have one of those in runtime/pprof right now, but we will once my pending work is done (moving that piece from the runtime to runtime/pprof).

josharian · 2017-02-13T20:12:30Z

Hmm. I'll look again once your pending work is done.

Sajmani · 2018-06-29T14:34:35Z

@josharian Perhaps the code in x/time/rate might be useful for this:
https://godoc.org/golang.org/x/time/rate#NewLimiter

mmcloughlin · 2021-05-13T07:50:51Z

I just had a use case where I wished the Event() API described above existed.

This is an old proposal that was accepted but seems to have gone stale. Is there still an appetite for this?

It also seems this could be tackled in two parts: the Event() API could be implemented independently of sampling, I think?

josharian · 2021-05-21T23:38:08Z

I’m still interested in using it, but don’t have a need urgent enough to implement myself.

gopherbot · 2022-05-07T00:17:21Z

Change https://go.dev/cl/404697 mentions this issue: runtime/pprof: add counting profile and sampling

rhysh · 2022-05-11T16:32:32Z

I have some working code for this, but there's a lot in the details of the interface that isn't clear to me. (I don't think there's enough time in the current cycle to nail down good answers for these.)

The Add method takes skip int as its second argument, so I included that in the Event signature too.

The built-in profiles that accumulate value over time (heap, block, mutex) give pairs of values for each record: the total weight, and the number of events sampled. I think that this type of custom profile should give that same pair, especially because the internal sampling (when enabled) will make it hard for users to track consistent counts on their own.

The heap profile uses a Poisson process for sampling. That gives small events a fair chance of being sampled, even when they consistently come after huge events (which would reset the clock). Each event is sampled 0 or 1 times. That seems like an appropriate approach to use here. But the smallest heap event has weight 1 (with an argument for it being 4 or 8), and the typical rate is 512*1024. There's special handling when the rate is 0 (sample nothing) or 1 (sample everything).

But accepting a float64 for weight in custom Event profiles means the weight can be less than 1. If a user makes 100 calls to Event with weight 0.1, I'd expect a profile with rate=10 to collect 1 sample. Setting rate=2 would mean 5 samples. But if the profile has rate=1, should it collect 10 samples or be a special case to collect every (all 100) samples? Collecting every sample means a big discontinuity, which isn't the case for the built-in profiles (which use integers for weight, often several orders of magnitude larger than 1).

If rate=1, should an Event with weight=1 always be sampled? Using a Poisson process means the average spacing between samples will be rate, but Event calls are discrete and will be sampled either 0 or 1 times, leading to a lower average spacing (because the sampled Events can only be spaced by 1, 2, 3, etc). And how different should the sampling be between Event calls with weight=1 vs weight=0.999? Maybe it's appropriate to roll any excess sampling weight over to the next counter, though that could give an artificial bump to tiny samples that come right after large ones.

Here are the docs I wrote, which describe what I think are decent answers to those (default of rate=0 means collect everything, rate>=1 means use Poisson):

// SetSampleRate sets the profile to record a fraction of samples based on rate.
// For a Profile populated using Add/Remove, each call to Add has weight 1. For
// a Profile populated using Event, the sampling algorithm uses the weight
// provided in each of those calls.
//
// When rate is 0, the Profile stores every sample. When rate is 1 or greater,
// the Profile will collect samples based on their weight at with average
// spacing between samples of rate.
//
// It must be called before any calls to Add, Remove, or Event.
func (p *Profile) SetSampleRate(rate int)

// Event records an event with the given weight.
//
// For a simple event counter, weight should be 1. If events have different
// expenses, weight can vary according to the expense.
//
// The skip parameter has the same meaning as Add's skip and controls where the
// stack trace begins.
//
// A given Profile should be populated using Add/Remove or Event, but not both.
func (p *Profile) Event(weight float64, skip int)

josharian added the Proposal label Dec 28, 2016

josharian added this to the Proposal milestone Dec 28, 2016

josharian changed the title ~~proposal: runtime/pprof: support efficient accumulative custom profiles~~ proposal: runtime/pprof: support efficient accumulation custom profiles Dec 28, 2016

rsc changed the title ~~proposal: runtime/pprof: support efficient accumulation custom profiles~~ runtime/pprof: support efficient accumulation of custom event count profiles Feb 13, 2017

rsc added Proposal-Accepted and removed Proposal labels Feb 13, 2017

rsc modified the milestones: Go1.9, Proposal Feb 13, 2017

rsc assigned josharian Feb 13, 2017

bradfitz modified the milestones: Go1.10, Go1.9 Jun 7, 2017

bradfitz modified the milestones: Go1.10, Go1.11 Nov 28, 2017

ianlancetaylor modified the milestones: Go1.11, Unplanned Jun 28, 2018

ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 28, 2018

josharian mentioned this issue Jun 29, 2018

proposal: os: profile open file descriptors through pprof #16379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime/pprof: support efficient accumulation of custom event count profiles #18454

runtime/pprof: support efficient accumulation of custom event count profiles #18454

josharian commented Dec 28, 2016

rsc commented Jan 9, 2017

josharian commented Jan 9, 2017

josharian commented Jan 30, 2017

josharian commented Feb 13, 2017

rsc commented Feb 13, 2017

josharian commented Feb 13, 2017

Sajmani commented Jun 29, 2018

mmcloughlin commented May 13, 2021

josharian commented May 21, 2021

gopherbot commented May 7, 2022

rhysh commented May 11, 2022

runtime/pprof: support efficient accumulation of custom event count profiles #18454

runtime/pprof: support efficient accumulation of custom event count profiles #18454

Comments

josharian commented Dec 28, 2016

rsc commented Jan 9, 2017

josharian commented Jan 9, 2017

josharian commented Jan 30, 2017

josharian commented Feb 13, 2017

rsc commented Feb 13, 2017

josharian commented Feb 13, 2017

Sajmani commented Jun 29, 2018

mmcloughlin commented May 13, 2021

josharian commented May 21, 2021

gopherbot commented May 7, 2022

rhysh commented May 11, 2022