Skip to content

Commit bd640c8

Browse files
committed
runtime: disable stack rescanning by default
With the hybrid barrier in place, we can now disable stack rescanning by default. This commit adds a "gcrescanstacks" GODEBUG variable that is off by default but can be set to re-enable STW stack rescanning. The plan is to leave this off but available in Go 1.8 for debugging and as a fallback. With this change, worst-case mark termination time at GOMAXPROCS=12 *not* including time spent stopping the world (which is still unbounded) is reliably under 100 µs, with a 95%ile around 50 µs in every benchmark I tried (the go1 benchmarks, the x/benchmarks garbage benchmark, and the gcbench activegs and rpc benchmarks). Including time spent stopping the world usually adds about 20 µs to total STW time at GOMAXPROCS=12, but I've seen it add around 150 µs in these benchmarks when a goroutine takes time to reach a safe point (see issue #10958) or when stopping the world races with goroutine switches. At GOMAXPROCS=1, where this isn't an issue, worst case STW is typically 30 µs. The go-gcbench activegs benchmark is designed to stress large numbers of dirty stacks. This commit reduces 95%ile STW time for 500k dirty stacks by nearly three orders of magnitude, from 150ms to 195µs. This has little effect on the throughput of the go1 benchmarks or the x/benchmarks benchmarks. name old time/op new time/op delta XGarbage-12 2.31ms ± 0% 2.32ms ± 1% +0.28% (p=0.001 n=17+16) XJSON-12 12.4ms ± 0% 12.4ms ± 0% +0.41% (p=0.000 n=18+18) XHTTP-12 11.8µs ± 0% 11.8µs ± 1% ~ (p=0.492 n=20+18) It reduces the tail latency of the x/benchmarks HTTP benchmark: name old p50-time new p50-time delta XHTTP-12 489µs ± 0% 491µs ± 1% +0.54% (p=0.000 n=20+18) name old p95-time new p95-time delta XHTTP-12 957µs ± 1% 960µs ± 1% +0.28% (p=0.002 n=20+17) name old p99-time new p99-time delta XHTTP-12 1.76ms ± 1% 1.64ms ± 1% -7.20% (p=0.000 n=20+18) Comparing to the beginning of the hybrid barrier implementation ("runtime: parallelize STW mcache flushing") shows that the hybrid barrier trades a small performance impact for much better STW latency, as expected. The magnitude of the performance impact is generally small: name old time/op new time/op delta BinaryTree17-12 2.37s ± 1% 2.42s ± 1% +2.04% (p=0.000 n=19+18) Fannkuch11-12 2.84s ± 0% 2.72s ± 0% -4.00% (p=0.000 n=19+19) FmtFprintfEmpty-12 44.2ns ± 1% 45.2ns ± 1% +2.20% (p=0.000 n=17+19) FmtFprintfString-12 130ns ± 1% 134ns ± 0% +2.94% (p=0.000 n=18+16) FmtFprintfInt-12 114ns ± 1% 117ns ± 0% +3.01% (p=0.000 n=19+15) FmtFprintfIntInt-12 176ns ± 1% 182ns ± 0% +3.17% (p=0.000 n=20+15) FmtFprintfPrefixedInt-12 186ns ± 1% 187ns ± 1% +1.04% (p=0.000 n=20+19) FmtFprintfFloat-12 251ns ± 1% 250ns ± 1% -0.74% (p=0.000 n=17+18) FmtManyArgs-12 746ns ± 1% 761ns ± 0% +2.08% (p=0.000 n=19+20) GobDecode-12 6.57ms ± 1% 6.65ms ± 1% +1.11% (p=0.000 n=19+20) GobEncode-12 5.59ms ± 1% 5.65ms ± 0% +1.08% (p=0.000 n=17+17) Gzip-12 223ms ± 1% 223ms ± 1% -0.31% (p=0.006 n=20+20) Gunzip-12 38.0ms ± 0% 37.9ms ± 1% -0.25% (p=0.009 n=19+20) HTTPClientServer-12 77.5µs ± 1% 78.9µs ± 2% +1.89% (p=0.000 n=20+20) JSONEncode-12 14.7ms ± 1% 14.9ms ± 0% +0.75% (p=0.000 n=20+20) JSONDecode-12 53.0ms ± 1% 55.9ms ± 1% +5.54% (p=0.000 n=19+19) Mandelbrot200-12 3.81ms ± 0% 3.81ms ± 1% +0.20% (p=0.023 n=17+19) GoParse-12 3.17ms ± 1% 3.18ms ± 1% ~ (p=0.057 n=20+19) RegexpMatchEasy0_32-12 71.7ns ± 1% 70.4ns ± 1% -1.77% (p=0.000 n=19+20) RegexpMatchEasy0_1K-12 946ns ± 0% 946ns ± 0% ~ (p=0.405 n=18+18) RegexpMatchEasy1_32-12 67.2ns ± 2% 67.3ns ± 2% ~ (p=0.732 n=20+20) RegexpMatchEasy1_1K-12 374ns ± 1% 378ns ± 1% +1.14% (p=0.000 n=18+19) RegexpMatchMedium_32-12 107ns ± 1% 107ns ± 1% ~ (p=0.259 n=18+20) RegexpMatchMedium_1K-12 34.2µs ± 1% 34.5µs ± 1% +1.03% (p=0.000 n=18+18) RegexpMatchHard_32-12 1.77µs ± 1% 1.79µs ± 1% +0.73% (p=0.000 n=19+18) RegexpMatchHard_1K-12 53.6µs ± 1% 54.2µs ± 1% +1.10% (p=0.000 n=19+19) Template-12 61.5ms ± 1% 63.9ms ± 0% +3.96% (p=0.000 n=18+18) TimeParse-12 303ns ± 1% 300ns ± 1% -1.08% (p=0.000 n=19+20) TimeFormat-12 318ns ± 1% 320ns ± 0% +0.79% (p=0.000 n=19+19) Revcomp-12 (*) 509ms ± 3% 504ms ± 0% ~ (p=0.967 n=7+12) [Geo mean] 54.3µs 54.8µs +0.88% (*) Revcomp is highly non-linear, so I only took samples with 2 iterations. name old time/op new time/op delta XGarbage-12 2.25ms ± 0% 2.32ms ± 1% +2.74% (p=0.000 n=16+16) XJSON-12 11.6ms ± 0% 12.4ms ± 0% +6.81% (p=0.000 n=18+18) XHTTP-12 11.6µs ± 1% 11.8µs ± 1% +1.62% (p=0.000 n=17+18) Updates #17503. Updates #17099, since you can't have a rescan list bug if there's no rescan list. I'm not marking it as fixed, since gcrescanstacks can still be set to re-enable the rescan lists. Change-Id: I6e926b4c2dbd4cd56721869d4f817bdbb330b851 Reviewed-on: https://go-review.googlesource.com/31766 Reviewed-by: Rick Hudson <rlh@golang.org>
1 parent 5380b22 commit bd640c8

File tree

4 files changed

+29
-2
lines changed

4 files changed

+29
-2
lines changed

src/runtime/extern.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,11 @@ It is a comma-separated list of name=val pairs setting these named variables:
5757
gcstackbarrierall: setting gcstackbarrierall=1 installs stack barriers
5858
in every stack frame, rather than in exponentially-spaced frames.
5959
60+
gcrescanstacks: setting gcrescanstacks=1 enables stack
61+
re-scanning during the STW mark termination phase. This is
62+
helpful for debugging if objects are being prematurely
63+
garbage collected.
64+
6065
gcstoptheworld: setting gcstoptheworld=1 disables concurrent garbage collection,
6166
making every garbage collection a stop-the-world event. Setting gcstoptheworld=2
6267
also disables concurrent sweeping after the garbage collection finishes.

src/runtime/mgc.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1600,7 +1600,7 @@ func gcMark(start_time int64) {
16001600
work.ndone = 0
16011601
work.nproc = uint32(gcprocs())
16021602

1603-
if work.full == 0 && work.nDataRoots+work.nBSSRoots+work.nSpanRoots+work.nStackRoots+work.nRescanRoots == 0 {
1603+
if debug.gcrescanstacks == 0 && work.full == 0 && work.nDataRoots+work.nBSSRoots+work.nSpanRoots+work.nStackRoots+work.nRescanRoots == 0 {
16041604
// There's no work on the work queue and no root jobs
16051605
// that can produce work, so don't bother entering the
16061606
// getfull() barrier.

src/runtime/mgcmark.go

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,7 +126,7 @@ func gcMarkRootCheck() {
126126

127127
lock(&allglock)
128128
// Check that stacks have been scanned.
129-
if gcphase == _GCmarktermination {
129+
if gcphase == _GCmarktermination && debug.gcrescanstacks > 0 {
130130
for i := 0; i < len(allgs); i++ {
131131
gp := allgs[i]
132132
if !(gp.gcscandone && gp.gcscanvalid) && readgstatus(gp) != _Gdead {
@@ -888,6 +888,15 @@ func scanframeworker(frame *stkframe, cache *pcvalueCache, gcw *gcWork) {
888888
// gp.gcscanvalid. The caller must own gp and ensure that gp isn't
889889
// already on the rescan list.
890890
func queueRescan(gp *g) {
891+
if debug.gcrescanstacks == 0 {
892+
// Clear gcscanvalid to keep assertions happy.
893+
//
894+
// TODO: Remove gcscanvalid entirely when we remove
895+
// stack rescanning.
896+
gp.gcscanvalid = false
897+
return
898+
}
899+
891900
if gcphase == _GCoff {
892901
gp.gcscanvalid = false
893902
return
@@ -917,6 +926,10 @@ func queueRescan(gp *g) {
917926
// dequeueRescan removes gp from the stack rescan list, if gp is on
918927
// the rescan list. The caller must own gp.
919928
func dequeueRescan(gp *g) {
929+
if debug.gcrescanstacks == 0 {
930+
return
931+
}
932+
920933
if gp.gcRescan == -1 {
921934
return
922935
}

src/runtime/runtime1.go

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,7 @@ var debug struct {
321321
gcshrinkstackoff int32
322322
gcstackbarrieroff int32
323323
gcstackbarrierall int32
324+
gcrescanstacks int32
324325
gcstoptheworld int32
325326
gctrace int32
326327
invalidptr int32
@@ -340,6 +341,7 @@ var dbgvars = []dbgVar{
340341
{"gcshrinkstackoff", &debug.gcshrinkstackoff},
341342
{"gcstackbarrieroff", &debug.gcstackbarrieroff},
342343
{"gcstackbarrierall", &debug.gcstackbarrierall},
344+
{"gcrescanstacks", &debug.gcrescanstacks},
343345
{"gcstoptheworld", &debug.gcstoptheworld},
344346
{"gctrace", &debug.gctrace},
345347
{"invalidptr", &debug.invalidptr},
@@ -386,6 +388,13 @@ func parsedebugvars() {
386388
setTraceback(gogetenv("GOTRACEBACK"))
387389
traceback_env = traceback_cache
388390

391+
if debug.gcrescanstacks == 0 {
392+
// Without rescanning, there's no need for stack
393+
// barriers.
394+
debug.gcstackbarrieroff = 1
395+
debug.gcstackbarrierall = 0
396+
}
397+
389398
if debug.gcstackbarrierall > 0 {
390399
firstStackBarrierOffset = 0
391400
}

0 commit comments

Comments
 (0)