btf: split string table ahead of time #637

lmb · 2022-04-22T17:46:08Z

We currently read the string table into one big byte slice, and
allocate substrings in Lookup. This forces us to hang on to one
large allocation for the string table, and many small allocations
for substrings. Calling Lookup for the same offset also allocates
on each invocation.

Instead, split the string table into substrings at load time. In
Lookup we can then use a binary search to find the correct string.
As a result, both overall memory and allocations drop:

name            old time/op    new time/op    delta
ParseVmlinux-4    76.6ms ±12%    73.9ms ± 4%     ~     (p=0.343 n=4+4)

name            old alloc/op   new alloc/op   delta
ParseVmlinux-4    92.9MB ± 0%    84.3MB ± 0%   -9.22%  (p=0.029 n=4+4)

name            old allocs/op  new allocs/op  delta
ParseVmlinux-4      786k ± 0%      693k ± 0%  -11.87%  (p=0.029 n=4+4)

To keep the speed roughly the same I had to "vendor" sort.Search and
specialised it for uint32, since the function was prominent in profiles.

lmb · 2022-04-22T17:46:28Z

cc @paulcacheux, this was the idea I mentioned, please take a look.

lmb · 2022-04-22T19:43:51Z

I also tried replacing offsets, strings with a map[uint32]string but its slower:

name            old time/op    new time/op    delta
ParseVmlinux-4    74.3ms ± 5%    82.4ms ± 3%  +10.87%  (p=0.029 n=4+4)

name            old alloc/op   new alloc/op   delta
ParseVmlinux-4    84.3MB ± 0%    87.8MB ± 0%   +4.16%  (p=0.029 n=4+4)

name            old allocs/op  new allocs/op  delta
ParseVmlinux-4      693k ± 0%      693k ± 0%     ~     (p=0.943 n=4+4)

paulcacheux · 2022-04-23T07:32:03Z

This looks great, thanks for working on this

We currently read the string table into one big byte slice, and allocate substrings in Lookup. This forces us to hang on to one large allocation for the string table, and many small allocations for substrings. Calling Lookup for the same offset also allocates on each invocation. Instead, split the string table into substrings at load time. In Lookup we can then use a binary search to find the correct string. As a result, both overall memory and allocations drop: name old time/op new time/op delta ParseVmlinux-4 57.4ms ± 2% 59.4ms ± 1% +3.37% (p=0.029 n=4+4) name old alloc/op new alloc/op delta ParseVmlinux-4 55.3MB ± 0% 46.7MB ± 0% -15.49% (p=0.029 n=4+4) name old allocs/op new allocs/op delta ParseVmlinux-4 786k ± 0% 693k ± 0% -11.87% (p=0.029 n=4+4) To keep the speed roughly the same I had to "vendor" sort.Search and specialised it for uint32, since the function was prominent in profiles.

lmb force-pushed the btf-string-table-scanner branch from ca287d9 to b92d30c Compare April 22, 2022 19:15

lmb force-pushed the btf-string-table-scanner branch from b92d30c to 2dafb62 Compare April 23, 2022 10:52

lmb merged commit 0cb94e8 into cilium:master Apr 23, 2022

lmb deleted the btf-string-table-scanner branch April 23, 2022 11:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

btf: split string table ahead of time #637

btf: split string table ahead of time #637

lmb commented Apr 22, 2022

lmb commented Apr 22, 2022

lmb commented Apr 22, 2022

paulcacheux commented Apr 23, 2022

btf: split string table ahead of time #637

btf: split string table ahead of time #637

Conversation

lmb commented Apr 22, 2022

lmb commented Apr 22, 2022

lmb commented Apr 22, 2022

paulcacheux commented Apr 23, 2022