Skip to content

Commit a108b28

Browse files
committed
runtime: implement GC pacer redesign
This change implements the GC pacer redesign outlined in #44167 and the accompanying design document, behind a GOEXPERIMENT flag that is on by default. In addition to adding the new pacer, this CL also includes code to track and account for stack and globals scan work in the pacer and in the assist credit system. The new pacer also deviates slightly from the document in that it increases the bound on the minimum trigger ratio from 0.6 (scaled by GOGC) to 0.7. The logic behind this change is that the new pacer much more consistently hits the goal (good!) leading to slightly less frequent GC cycles, but _longer_ ones (in this case, bad!). It turns out that the cost of having the GC on hurts throughput significantly (per byte of memory used), though tail latencies can improve by up to 10%! To be conservative, this change moves the value to 0.7 where there is a small improvement to both throughput and latency, given the memory use. Because the new pacer accounts for the two most significant sources of scan work after heap objects, it is now also safer to reduce the minimum heap size without leading to very poor amortization. This change thus decreases the minimum heap size to 512 KiB, which corresponds to the fact that the runtime has around 200 KiB of scannable globals always there, up-front, providing a baseline. Benchmark results: https://perf.golang.org/search?q=upload:20211001.6 tile38's KNearest benchmark shows a memory increase, but throughput (and latency) per byte of memory used is better. gopher-lua showed an increase in both CPU time and memory usage, but subsequent attempts to reproduce this behavior are inconsistent. Sometimes the overall performance is better, sometimes it's worse. This suggests that the benchmark is fairly noisy in a way not captured by the benchmarking framework itself. biogo-igor is the only benchmark to show a significant performance loss. This benchmark exhibits a very high GC rate, with relatively little work to do in each cycle. The idle mark workers are quite active. In the new pacer, mark phases are longer, mark assists are fewer, and some of that time in mark assists has shifted to idle workers. Linux perf indicates that the difference in CPU time can be mostly attributed to write-barrier slow path related calls, which in turn indicates that the write barrier being on for longer is the primary culprit. This also explains the memory increase, as a longer mark phase leads to more memory allocated black, surviving an extra cycle and contributing to the heap goal. For #44167. Change-Id: I8ac7cfef7d593e4a642c9b2be43fb3591a8ec9c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/309869 Trust: Michael Knyszek <mknyszek@google.com> Run-TryBot: Michael Knyszek <mknyszek@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Michael Pratt <mpratt@google.com>
1 parent 988efd5 commit a108b28

File tree

9 files changed

+680
-103
lines changed

9 files changed

+680
-103
lines changed

src/internal/goexperiment/exp_pacerredesign_off.go

+9
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/internal/goexperiment/exp_pacerredesign_on.go

+9
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/internal/goexperiment/flags.go

+6
Original file line numberDiff line numberDiff line change
@@ -83,4 +83,10 @@ type Flags struct {
8383
// Requires wrappers (to do ABI translation), and reflect (so
8484
// reflection calls use registers).
8585
RegabiArgs bool
86+
87+
// PacerRedesign enables the new GC pacer in the runtime.
88+
//
89+
// Details regarding the new pacer may be found at
90+
// https://golang.org/design/44167-gc-pacer-redesign
91+
PacerRedesign bool
8692
}

src/runtime/export_test.go

+3-1
Original file line numberDiff line numberDiff line change
@@ -1290,7 +1290,9 @@ type GCControllerReviseDelta struct {
12901290
func (c *GCController) Revise(d GCControllerReviseDelta) {
12911291
c.heapLive += uint64(d.HeapLive)
12921292
c.heapScan += uint64(d.HeapScan)
1293-
c.scanWork += d.HeapScanWork + d.StackScanWork + d.GlobalsScanWork
1293+
c.heapScanWork.Add(d.HeapScanWork)
1294+
c.stackScanWork.Add(d.StackScanWork)
1295+
c.globalsScanWork.Add(d.GlobalsScanWork)
12941296
c.revise()
12951297
}
12961298

src/runtime/mgc.go

+2
Original file line numberDiff line numberDiff line change
@@ -1084,6 +1084,8 @@ func gcMarkTermination(nextTriggerRatio float64) {
10841084
print(" ms cpu, ",
10851085
work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ",
10861086
work.heapGoal>>20, " MB goal, ",
1087+
gcController.stackScan>>20, " MB stacks, ",
1088+
gcController.globalsScan>>20, " MB globals, ",
10871089
work.maxprocs, " P")
10881090
if work.userForced {
10891091
print(" (forced)")

src/runtime/mgcmark.go

+84-33
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ package runtime
88

99
import (
1010
"internal/goarch"
11+
"internal/goexperiment"
1112
"runtime/internal/atomic"
1213
"runtime/internal/sys"
1314
"unsafe"
@@ -151,20 +152,28 @@ var oneptrmask = [...]uint8{1}
151152
//
152153
// Preemption must be disabled (because this uses a gcWork).
153154
//
155+
// Returns the amount of GC work credit produced by the operation.
156+
// If flushBgCredit is true, then that credit is also flushed
157+
// to the background credit pool.
158+
//
154159
// nowritebarrier is only advisory here.
155160
//
156161
//go:nowritebarrier
157-
func markroot(gcw *gcWork, i uint32) {
162+
func markroot(gcw *gcWork, i uint32, flushBgCredit bool) int64 {
158163
// Note: if you add a case here, please also update heapdump.go:dumproots.
164+
var workDone int64
165+
var workCounter *atomic.Int64
159166
switch {
160167
case work.baseData <= i && i < work.baseBSS:
168+
workCounter = &gcController.globalsScanWork
161169
for _, datap := range activeModules() {
162-
markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-work.baseData))
170+
workDone += markrootBlock(datap.data, datap.edata-datap.data, datap.gcdatamask.bytedata, gcw, int(i-work.baseData))
163171
}
164172

165173
case work.baseBSS <= i && i < work.baseSpans:
174+
workCounter = &gcController.globalsScanWork
166175
for _, datap := range activeModules() {
167-
markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-work.baseBSS))
176+
workDone += markrootBlock(datap.bss, datap.ebss-datap.bss, datap.gcbssmask.bytedata, gcw, int(i-work.baseBSS))
168177
}
169178

170179
case i == fixedRootFinalizers:
@@ -184,6 +193,7 @@ func markroot(gcw *gcWork, i uint32) {
184193

185194
default:
186195
// the rest is scanning goroutine stacks
196+
workCounter = &gcController.stackScanWork
187197
var gp *g
188198
if work.baseStacks <= i && i < work.baseEnd {
189199
// N.B. Atomic read of allglen in gcMarkRootPrepare
@@ -230,7 +240,7 @@ func markroot(gcw *gcWork, i uint32) {
230240
if gp.gcscandone {
231241
throw("g already scanned")
232242
}
233-
scanstack(gp, gcw)
243+
workDone += scanstack(gp, gcw)
234244
gp.gcscandone = true
235245
resumeG(stopped)
236246

@@ -239,13 +249,24 @@ func markroot(gcw *gcWork, i uint32) {
239249
}
240250
})
241251
}
252+
if goexperiment.PacerRedesign {
253+
if workCounter != nil && workDone != 0 {
254+
workCounter.Add(workDone)
255+
if flushBgCredit {
256+
gcFlushBgCredit(workDone)
257+
}
258+
}
259+
}
260+
return workDone
242261
}
243262

244263
// markrootBlock scans the shard'th shard of the block of memory [b0,
245264
// b0+n0), with the given pointer mask.
246265
//
266+
// Returns the amount of work done.
267+
//
247268
//go:nowritebarrier
248-
func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
269+
func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) int64 {
249270
if rootBlockBytes%(8*goarch.PtrSize) != 0 {
250271
// This is necessary to pick byte offsets in ptrmask0.
251272
throw("rootBlockBytes must be a multiple of 8*ptrSize")
@@ -256,7 +277,7 @@ func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
256277
// These tests are written to avoid any possible overflow.
257278
off := uintptr(shard) * rootBlockBytes
258279
if off >= n0 {
259-
return
280+
return 0
260281
}
261282
b := b0 + off
262283
ptrmask := (*uint8)(add(unsafe.Pointer(ptrmask0), uintptr(shard)*(rootBlockBytes/(8*goarch.PtrSize))))
@@ -267,6 +288,7 @@ func markrootBlock(b0, n0 uintptr, ptrmask0 *uint8, gcw *gcWork, shard int) {
267288

268289
// Scan this shard.
269290
scanblock(b, n, ptrmask, gcw, nil)
291+
return int64(n)
270292
}
271293

272294
// markrootFreeGStacks frees stacks of dead Gs.
@@ -681,6 +703,13 @@ func gcFlushBgCredit(scanWork int64) {
681703

682704
// scanstack scans gp's stack, greying all pointers found on the stack.
683705
//
706+
// For goexperiment.PacerRedesign:
707+
// Returns the amount of scan work performed, but doesn't update
708+
// gcController.stackScanWork or flush any credit. Any background credit produced
709+
// by this function should be flushed by its caller. scanstack itself can't
710+
// safely flush because it may result in trying to wake up a goroutine that
711+
// was just scanned, resulting in a self-deadlock.
712+
//
684713
// scanstack will also shrink the stack if it is safe to do so. If it
685714
// is not, it schedules a stack shrink for the next synchronous safe
686715
// point.
@@ -690,7 +719,7 @@ func gcFlushBgCredit(scanWork int64) {
690719
//
691720
//go:nowritebarrier
692721
//go:systemstack
693-
func scanstack(gp *g, gcw *gcWork) {
722+
func scanstack(gp *g, gcw *gcWork) int64 {
694723
if readgstatus(gp)&_Gscan == 0 {
695724
print("runtime:scanstack: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", hex(readgstatus(gp)), "\n")
696725
throw("scanstack - bad status")
@@ -701,7 +730,7 @@ func scanstack(gp *g, gcw *gcWork) {
701730
print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
702731
throw("mark - bad status")
703732
case _Gdead:
704-
return
733+
return 0
705734
case _Grunning:
706735
print("runtime: gp=", gp, ", goid=", gp.goid, ", gp->atomicstatus=", readgstatus(gp), "\n")
707736
throw("scanstack: goroutine not stopped")
@@ -713,6 +742,15 @@ func scanstack(gp *g, gcw *gcWork) {
713742
throw("can't scan our own stack")
714743
}
715744

745+
// stackSize is the amount of work we'll be reporting.
746+
//
747+
// We report the total stack size, more than we scan,
748+
// because this number needs to line up with gcControllerState's
749+
// stackScan and scannableStackSize fields.
750+
//
751+
// See the documentation on those fields for more information.
752+
stackSize := gp.stack.hi - gp.stack.lo
753+
716754
if isShrinkStackSafe(gp) {
717755
// Shrink the stack if not much of it is being used.
718756
shrinkstack(gp)
@@ -852,6 +890,7 @@ func scanstack(gp *g, gcw *gcWork) {
852890
if state.buf != nil || state.cbuf != nil || state.freeBuf != nil {
853891
throw("remaining pointer buffers")
854892
}
893+
return int64(stackSize)
855894
}
856895

857896
// Scan a stack frame: local variables and function arguments/results.
@@ -984,7 +1023,7 @@ func gcDrain(gcw *gcWork, flags gcDrainFlags) {
9841023
flushBgCredit := flags&gcDrainFlushBgCredit != 0
9851024
idle := flags&gcDrainIdle != 0
9861025

987-
initScanWork := gcw.scanWork
1026+
initScanWork := gcw.heapScanWork
9881027

9891028
// checkWork is the scan work before performing the next
9901029
// self-preempt check.
@@ -1007,7 +1046,7 @@ func gcDrain(gcw *gcWork, flags gcDrainFlags) {
10071046
if job >= work.markrootJobs {
10081047
break
10091048
}
1010-
markroot(gcw, job)
1049+
markroot(gcw, job, flushBgCredit)
10111050
if check != nil && check() {
10121051
goto done
10131052
}
@@ -1046,14 +1085,14 @@ func gcDrain(gcw *gcWork, flags gcDrainFlags) {
10461085
// Flush background scan work credit to the global
10471086
// account if we've accumulated enough locally so
10481087
// mutator assists can draw on it.
1049-
if gcw.scanWork >= gcCreditSlack {
1050-
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
1088+
if gcw.heapScanWork >= gcCreditSlack {
1089+
gcController.heapScanWork.Add(gcw.heapScanWork)
10511090
if flushBgCredit {
1052-
gcFlushBgCredit(gcw.scanWork - initScanWork)
1091+
gcFlushBgCredit(gcw.heapScanWork - initScanWork)
10531092
initScanWork = 0
10541093
}
1055-
checkWork -= gcw.scanWork
1056-
gcw.scanWork = 0
1094+
checkWork -= gcw.heapScanWork
1095+
gcw.heapScanWork = 0
10571096

10581097
if checkWork <= 0 {
10591098
checkWork += drainCheckThreshold
@@ -1066,12 +1105,12 @@ func gcDrain(gcw *gcWork, flags gcDrainFlags) {
10661105

10671106
done:
10681107
// Flush remaining scan work credit.
1069-
if gcw.scanWork > 0 {
1070-
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
1108+
if gcw.heapScanWork > 0 {
1109+
gcController.heapScanWork.Add(gcw.heapScanWork)
10711110
if flushBgCredit {
1072-
gcFlushBgCredit(gcw.scanWork - initScanWork)
1111+
gcFlushBgCredit(gcw.heapScanWork - initScanWork)
10731112
}
1074-
gcw.scanWork = 0
1113+
gcw.heapScanWork = 0
10751114
}
10761115
}
10771116

@@ -1095,10 +1134,10 @@ func gcDrainN(gcw *gcWork, scanWork int64) int64 {
10951134

10961135
// There may already be scan work on the gcw, which we don't
10971136
// want to claim was done by this call.
1098-
workFlushed := -gcw.scanWork
1137+
workFlushed := -gcw.heapScanWork
10991138

11001139
gp := getg().m.curg
1101-
for !gp.preempt && workFlushed+gcw.scanWork < scanWork {
1140+
for !gp.preempt && workFlushed+gcw.heapScanWork < scanWork {
11021141
// See gcDrain comment.
11031142
if work.full == 0 {
11041143
gcw.balance()
@@ -1117,13 +1156,13 @@ func gcDrainN(gcw *gcWork, scanWork int64) int64 {
11171156

11181157
if b == 0 {
11191158
// Try to do a root job.
1120-
//
1121-
// TODO: Assists should get credit for this
1122-
// work.
11231159
if work.markrootNext < work.markrootJobs {
11241160
job := atomic.Xadd(&work.markrootNext, +1) - 1
11251161
if job < work.markrootJobs {
1126-
markroot(gcw, job)
1162+
work := markroot(gcw, job, false)
1163+
if goexperiment.PacerRedesign {
1164+
workFlushed += work
1165+
}
11271166
continue
11281167
}
11291168
}
@@ -1134,25 +1173,25 @@ func gcDrainN(gcw *gcWork, scanWork int64) int64 {
11341173
scanobject(b, gcw)
11351174

11361175
// Flush background scan work credit.
1137-
if gcw.scanWork >= gcCreditSlack {
1138-
atomic.Xaddint64(&gcController.scanWork, gcw.scanWork)
1139-
workFlushed += gcw.scanWork
1140-
gcw.scanWork = 0
1176+
if gcw.heapScanWork >= gcCreditSlack {
1177+
gcController.heapScanWork.Add(gcw.heapScanWork)
1178+
workFlushed += gcw.heapScanWork
1179+
gcw.heapScanWork = 0
11411180
}
11421181
}
11431182

11441183
// Unlike gcDrain, there's no need to flush remaining work
11451184
// here because this never flushes to bgScanCredit and
11461185
// gcw.dispose will flush any remaining work to scanWork.
11471186

1148-
return workFlushed + gcw.scanWork
1187+
return workFlushed + gcw.heapScanWork
11491188
}
11501189

11511190
// scanblock scans b as scanobject would, but using an explicit
11521191
// pointer bitmap instead of the heap bitmap.
11531192
//
11541193
// This is used to scan non-heap roots, so it does not update
1155-
// gcw.bytesMarked or gcw.scanWork.
1194+
// gcw.bytesMarked or gcw.heapScanWork.
11561195
//
11571196
// If stk != nil, possible stack pointers are also reported to stk.putPtr.
11581197
//go:nowritebarrier
@@ -1282,7 +1321,7 @@ func scanobject(b uintptr, gcw *gcWork) {
12821321
}
12831322
}
12841323
gcw.bytesMarked += uint64(n)
1285-
gcw.scanWork += int64(i)
1324+
gcw.heapScanWork += int64(i)
12861325
}
12871326

12881327
// scanConservative scans block [b, b+n) conservatively, treating any
@@ -1521,7 +1560,19 @@ func gcmarknewobject(span *mspan, obj, size, scanSize uintptr) {
15211560

15221561
gcw := &getg().m.p.ptr().gcw
15231562
gcw.bytesMarked += uint64(size)
1524-
gcw.scanWork += int64(scanSize)
1563+
if !goexperiment.PacerRedesign {
1564+
// The old pacer counts newly allocated memory toward
1565+
// heapScanWork because heapScan is continuously updated
1566+
// throughout the GC cyle with newly allocated memory. However,
1567+
// these objects are never actually scanned, so we need
1568+
// to account for them in heapScanWork here, "faking" their work.
1569+
// Otherwise the pacer will think it's always behind, potentially
1570+
// by a large margin.
1571+
//
1572+
// The new pacer doesn't care about this because it ceases to updated
1573+
// heapScan once a GC cycle starts, effectively snapshotting it.
1574+
gcw.heapScanWork += int64(scanSize)
1575+
}
15251576
}
15261577

15271578
// gcMarkTinyAllocs greys all active tiny alloc blocks.

0 commit comments

Comments
 (0)