Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

btf: split string table ahead of time #637

Merged
merged 1 commit into from
Apr 23, 2022

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented Apr 22, 2022

We currently read the string table into one big byte slice, and
allocate substrings in Lookup. This forces us to hang on to one
large allocation for the string table, and many small allocations
for substrings. Calling Lookup for the same offset also allocates
on each invocation.

Instead, split the string table into substrings at load time. In
Lookup we can then use a binary search to find the correct string.
As a result, both overall memory and allocations drop:

name            old time/op    new time/op    delta
ParseVmlinux-4    76.6ms ±12%    73.9ms ± 4%     ~     (p=0.343 n=4+4)

name            old alloc/op   new alloc/op   delta
ParseVmlinux-4    92.9MB ± 0%    84.3MB ± 0%   -9.22%  (p=0.029 n=4+4)

name            old allocs/op  new allocs/op  delta
ParseVmlinux-4      786k ± 0%      693k ± 0%  -11.87%  (p=0.029 n=4+4)

To keep the speed roughly the same I had to "vendor" sort.Search and
specialised it for uint32, since the function was prominent in profiles.

@lmb
Copy link
Collaborator Author

lmb commented Apr 22, 2022

cc @paulcacheux, this was the idea I mentioned, please take a look.

@lmb lmb force-pushed the btf-string-table-scanner branch from ca287d9 to b92d30c Compare April 22, 2022 19:15
@lmb
Copy link
Collaborator Author

lmb commented Apr 22, 2022

I also tried replacing offsets, strings with a map[uint32]string but its slower:

name            old time/op    new time/op    delta
ParseVmlinux-4    74.3ms ± 5%    82.4ms ± 3%  +10.87%  (p=0.029 n=4+4)

name            old alloc/op   new alloc/op   delta
ParseVmlinux-4    84.3MB ± 0%    87.8MB ± 0%   +4.16%  (p=0.029 n=4+4)

name            old allocs/op  new allocs/op  delta
ParseVmlinux-4      693k ± 0%      693k ± 0%     ~     (p=0.943 n=4+4)

@paulcacheux
Copy link
Contributor

This looks great, thanks for working on this

We currently read the string table into one big byte slice, and
allocate substrings in Lookup. This forces us to hang on to one
large allocation for the string table, and many small allocations
for substrings. Calling Lookup for the same offset also allocates
on each invocation.

Instead, split the string table into substrings at load time. In
Lookup we can then use a binary search to find the correct string.
As a result, both overall memory and allocations drop:

    name            old time/op    new time/op    delta
    ParseVmlinux-4    57.4ms ± 2%    59.4ms ± 1%   +3.37%  (p=0.029 n=4+4)

    name            old alloc/op   new alloc/op   delta
    ParseVmlinux-4    55.3MB ± 0%    46.7MB ± 0%  -15.49%  (p=0.029 n=4+4)

    name            old allocs/op  new allocs/op  delta
    ParseVmlinux-4      786k ± 0%      693k ± 0%  -11.87%  (p=0.029 n=4+4)

To keep the speed roughly the same I had to "vendor" sort.Search and
specialised it for uint32, since the function was prominent in profiles.
@lmb lmb force-pushed the btf-string-table-scanner branch from b92d30c to 2dafb62 Compare April 23, 2022 10:52
@lmb lmb merged commit 0cb94e8 into cilium:master Apr 23, 2022
@lmb lmb deleted the btf-string-table-scanner branch April 23, 2022 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants