Skip to content

Postpone freeing a tracker entry, add a reference counter #1270

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

ldorau
Copy link
Contributor

@ldorau ldorau commented Apr 15, 2025

Description

Postpone freeing a tracker entry until it is really removed from the tracker.

Fixes: #1233

Checklist

  • Code compiles without errors locally
  • All tests pass locally
  • CI workflows execute properly

@ldorau ldorau changed the title Postpone freeing a tracker entry until it is really removed from tracker Postpone freeing a tracker entry until it is really removed from the tracker Apr 15, 2025
Copy link

Compute Benchmarks run (with params: --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398

Copy link

Compute Benchmarks run ( --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398
Job status: failure. Test status: failure.

Summary

(Emphasized values are the best results)

Improved 40 (threshold 2.00%)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> 1796.990000 ns 4972.830 ns 176.73%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider 1174.930000 ns 2564.380 ns 118.26%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 318.467000 ns 647.568 ns 103.34%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1508.560000 ns 3025.960 ns 100.59%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 395.143000 ns 767.476 ns 94.23%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 12691.600000 ns 24625.100 ns 94.03%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 323.984000 ns 624.879 ns 92.87%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1582.200000 ns 3037.880 ns 92.00%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 625.572000 ns 1194.930 ns 91.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1674.610000 ns 3150.710 ns 88.15%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc 450.721000 ns 838.137 ns 85.95%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 13023.400000 ns 24169.300 ns 85.58%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 5798.090000 ns 10613.300 ns 83.05%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 417.284000 ns 762.820 ns 82.81%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 656.280000 ns 1197.650 ns 82.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 14085.800000 ns 24818.100 ns 76.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> 313.102000 ns 539.665 ns 72.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy 330.768000 ns 568.249 ns 71.80%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 2655.420000 ns 4540.970 ns 71.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 340.066000 ns 580.669 ns 70.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 371.110000 ns 632.053 ns 70.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc 504.080000 ns 852.465 ns 69.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc 222.220000 ns 365.370 ns 64.42%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> 17325.100000 ns 28297.400 ns 63.33%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 367.259000 ns 599.047 ns 63.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 365.715000 ns 591.936 ns 61.86%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider 16056.600000 ns 25706.600 ns 60.10%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy 210.554000 ns 334.039 ns 58.65%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 221.357000 ns 350.589 ns 58.38%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> 1040.670000 ns 1637.570 ns 57.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc 181.914000 ns 282.643 ns 55.37%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 23592.200000 ns 36260.200 ns 53.70%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 221.747000 ns 340.160 ns 53.40%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 355.287000 ns 543.490 ns 52.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 238.835000 ns 360.015 ns 50.74%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 1121.910000 ns 1670.930 ns 48.94%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc 135.488000 ns 170.436 ns 25.79%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc 159.699000 ns 199.856 ns 25.15%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 172.138000 ns 204.384 ns 18.73%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc 149.274000 ns 172.393 ns 15.49%
Regressed 6 (threshold 2.00%)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> 4677.440 ns 752.146000 ns -83.92%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 196.851 ns 88.994000 ns -54.79%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 189.812 ns 88.265900 ns -53.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc 101.247 ns 64.327600 ns -36.46%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc 95.715 ns 63.410600 ns -33.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 35133.300 ns 30418.400000 ns -13.42%

Performance change in benchmark groups

UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 323.984000 ns 624.879 ns 92.87%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1582.200000 ns 3037.880 ns 92.00%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> 313.102000 ns 539.665 ns 72.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> 1040.670000 ns 1637.570 ns 57.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc 181.914000 ns 282.643 ns 55.37%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 189.812 ns 88.265900 ns -53.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> 4677.440 ns 752.146000 ns -83.92%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 12691.600000 ns 24625.100 ns 94.03%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 371.110000 ns 632.053 ns 70.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc 222.220000 ns 365.370 ns 64.42%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 355.287000 ns 543.490 ns 52.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 1121.910000 ns 1670.930 ns 48.94%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 35133.300 ns 30418.400000 ns -13.42%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 196.851 ns 88.994000 ns -54.79%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 625.572000 ns 1194.930 ns 91.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1674.610000 ns 3150.710 ns 88.15%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc 450.721000 ns 838.137 ns 85.95%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy 330.768000 ns 568.249 ns 71.80%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 2655.420000 ns 4540.970 ns 71.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 340.066000 ns 580.669 ns 70.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc 159.699000 ns 199.856 ns 25.15%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 656.280000 ns 1197.650 ns 82.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 14085.800000 ns 24818.100 ns 76.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc 504.080000 ns 852.465 ns 69.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 367.259000 ns 599.047 ns 63.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 365.715000 ns 591.936 ns 61.86%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 23592.200000 ns 36260.200 ns 53.70%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 172.138000 ns 204.384 ns 18.73%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 318.467000 ns 647.568 ns 103.34%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1508.560000 ns 3025.960 ns 100.59%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 395.143000 ns 767.476 ns 94.23%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy 210.554000 ns 334.039 ns 58.65%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 221.747000 ns 340.160 ns 53.40%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc 135.488000 ns 170.436 ns 25.79%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc 95.715 ns 63.410600 ns -33.75%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 13023.400000 ns 24169.300 ns 85.58%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 5798.090000 ns 10613.300 ns 83.05%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 417.284000 ns 762.820 ns 82.81%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 221.357000 ns 350.589 ns 58.38%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 238.835000 ns 360.015 ns 50.74%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc 149.274000 ns 172.393 ns 15.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc 101.247 ns 64.327600 ns -36.46%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> 1796.990000 ns 4972.830 ns 176.73%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider 1174.930000 ns 2564.380 ns 118.26%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> 17325.100000 ns 28297.400 ns 63.33%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider 16056.600000 ns 25706.600 ns 60.10%
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> 0.000000 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider 0.000000 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> 0.000000 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider 0.000000 % -
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> 0.064949 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> 30.607300 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> 60.801600 % -
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 0.016245 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 30.589100 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 55.478400 % -
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 26.002600 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 64.508100 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 62.411100 % -
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 25.507300 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 60.786800 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 54.893400 % -
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 23.636700 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 85.085300 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 85.085300 % -
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 20.038400 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 84.818400 % -
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 88.068200 % -
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc 185.427000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> 4683.950000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> 1012.640000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> 317.499000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1599.570000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 193.304000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 326.985000 ns -
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc 230.601000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 35481.000000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 1152.200000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 387.822000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 12994.000000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 204.267000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 389.277000 ns -
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc 450.370000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 2627.180000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 622.544000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 329.937000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1645.940000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc 158.885000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy 326.537000 ns -
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc 508.403000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 24059.100000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 669.748000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 382.320000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 13026.500000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 167.881000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 360.961000 ns -
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc 136.707000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 325.845000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 388.318000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 217.319000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1510.160000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc 92.599300 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy 206.031000 ns -
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc 146.831000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 6542.910000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 413.041000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 227.619000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 12405.600000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc 104.040000 ns -
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 223.409000 ns -
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> 50.000000 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> 99.990400 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> 99.960900 % -
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> 20.000000 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> 99.990300 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> 99.962800 % -
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 97.184200 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 99.992500 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 99.989800 % -
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 90.346000 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 99.991800 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 99.987700 % -
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> 99.536100 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> 99.996800 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> 99.992800 % -
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> 98.763000 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> 99.996800 % -
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> 99.994200 % -
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 (6)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 glibc - 366.197000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 jemalloc_pool<os_provider> - 1672.840000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 scalable_pool<os_provider> - 549.361000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 umfProxy - 53569.500000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 jemalloc - 89.618300 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 tbbProxy - 632.770000 ns
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 (6)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 glibc - 366.670000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 jemalloc_pool<os_provider> - 1674.080000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 scalable_pool<os_provider> - 544.330000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 umfProxy - 78951.900000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 jemalloc - 89.786300 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 tbbProxy - 638.970000 ns
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 (6)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 glibc - 845.858000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 jemalloc_pool<os_provider> - 1198.180000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 scalable_pool<os_provider> - 597.241000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 umfProxy - 53934.300000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 jemalloc - 204.593000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 tbbProxy - 584.623000 ns
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 (6)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 glibc - 855.042000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 jemalloc_pool<os_provider> - 1202.610000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 scalable_pool<os_provider> - 603.377000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 umfProxy - 79284.100000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 jemalloc - 203.389000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 tbbProxy - 594.468000 ns
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 (6)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 glibc - 173.740000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 jemalloc_pool<os_provider> - 764.419000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 scalable_pool<os_provider> - 362.542000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 umfProxy - 53422.200000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 jemalloc - 64.340700 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 tbbProxy - 352.649000 ns
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 (6)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 glibc - 173.294000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 jemalloc_pool<os_provider> - 763.153000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 scalable_pool<os_provider> - 359.356000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 umfProxy - 78621.600000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 jemalloc - 64.383900 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 tbbProxy - 354.263000 ns

Details

Benchmark details contain too many chars to display

@ldorau
Copy link
Contributor Author

ldorau commented Apr 15, 2025

"Exception: The directory /home/test-user/bench_workdir_umf exists but is not a benchmark work directory."
See: https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398/job/40592824780
@lplewa @lukaszstolarczuk Could you fix it?

@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch 4 times, most recently from c23a254 to fa661de Compare April 17, 2025 12:53
@ldorau ldorau changed the title Postpone freeing a tracker entry until it is really removed from the tracker [WIP] Postpone freeing a tracker entry, add a ref count Apr 17, 2025
@ldorau
Copy link
Contributor Author

ldorau commented Apr 17, 2025

The last commit accde78 is not finished yet !

@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch from fa661de to accde78 Compare April 17, 2025 12:57
@vinser52
Copy link
Contributor

What is the purpose of this PR, could you please describe the scenario and how these changes help?

@lplewa
Copy link
Contributor

lplewa commented Apr 18, 2025

What is the purpose of this PR, could you please describe the scenario and how these changes help?

We have a race, between insert and delete of entry in tracker

image
Top screen in thread one, bottom screen is thread two. Numbers shows order of the operations.

@vinser52
Copy link
Contributor

But how is it possible that one thread has some pointer which belongs to the entry in the memory tracker and another thread removes that entry from the tracker? The first thing that came to my mind is the following:

  1. T1 allocates memory, and the corresponding region is put into the tracker
  2. T2 frees the memory allocated by T1, and the corresponding region is removed from the tracker
  3. T1 is still trying to use the memory which is already been freed by T2.

But it is an ill-formed client's code. What is the real scenario?

@lplewa
Copy link
Contributor

lplewa commented Apr 23, 2025

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

@vinser52
Copy link
Contributor

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.

@ldorau
Copy link
Contributor Author

ldorau commented Apr 28, 2025

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.

@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2).

This is the real scenario:
a) thread T2 allocates memory region R (for example: 0x100-0x300 - address 0x100 and size 0x200) and adds it to the tracker (this is the only entry in the tracker yet) and gets preempted ...
b) thread T1 allocates memory of size 0x500 and receives pointer P (address 0x400, size 0x500 - range 0x400-0x900) and tries to add it to the tracker, so it calls umfMemoryTrackerAdd(), so critnib_find(FIND_LE) finds and returns the only region R existing in the tracker R: 0x100-0x300 (step 1 on the picture) and T1 gets preempted (step 2 on the picture) ...
c) thread T2 removes the region R from the tracker (step 3 on the picture), frees its memory (step 4 on the picture) and T2 gets preempted ...
d) thread T1 tries to read a size of the already freed region R (step 5 on the picture) and it crashes ...

@vinser52
Copy link
Contributor

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.

@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2).

This is the real scenario: a) thread T2 allocates memory region R (for example: 0x100-0x300 - address 0x100 and size 0x200) and adds it to the tracker (this is the only entry in the tracker yet) and gets preempted ... b) thread T1 allocates memory of size 0x500 and receives pointer P (address 0x400, size 0x500 - range 0x400-0x900) and tries to add it to the tracker, so it calls umfMemoryTrackerAdd(), so critnib_find(FIND_LE) finds and returns the only region R existing in the tracker R: 0x100-0x300 (step 1 on the picture) and gets preempted (step 2 on the picture) ... c) thread T2 removes the region R from the tracker (step 3 on the picture), frees its memory (step 4 on the picture) and gets preempted ... d) thread T1 tries to read a size of the already freed region R (step 5 on the picture) and crashes ...

Thank you, @ldorau, for the clear description of the scenario. That makes sense.

@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch 6 times, most recently from 947a237 to a2f62b5 Compare May 6, 2025 07:34
@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch 5 times, most recently from 65c4126 to f0a0881 Compare May 7, 2025 09:40
@ldorau ldorau requested a review from lplewa May 12, 2025 09:52
@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch 4 times, most recently from 2dc59f3 to 4cce135 Compare May 13, 2025 10:17
@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch 2 times, most recently from dd0a103 to aa366df Compare May 13, 2025 13:49
ldorau added 2 commits May 13, 2025 20:52
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch from aa366df to f3c555f Compare May 13, 2025 18:52
@bratpiorka
Copy link
Contributor

@lplewa please review

Add a reference counter and critnib_release() function.

The following 4 functions:
- critnib_remove(),
- critnib_get(),
- critnib_find_le() and
- critnib_find()

return a reference (void *ref) to the returned value,
that MUST be released by calling critnib_release()
when it is no longer used and can be freed
using the cb_free_leaf() callback.

Fixes: oneapi-src#1233

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>
@ldorau ldorau force-pushed the Postpone_freeing_a_tracker_entry_until_it_is_really_removed_from_tracker branch from f3c555f to c8b4737 Compare May 14, 2025 11:20
Copy link

Compute Benchmarks run (with params: --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/15045823044

Copy link

Compute Benchmarks run ( --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/15045823044
Job status: success. Test status: success.

Failures

Name Failure
umf-benchmark Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>.

Summary

(Emphasized values are the best results)

Improved 2 (threshold 2.00%)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 12164.200000 ns 12647.100 ns 3.97%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 187.435000 ns 193.480 ns 3.23%
Regressed 11 (threshold 2.00%)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1646.060 ns 1574.480000 ns -4.35%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 329.011 ns 315.617000 ns -4.07%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1755.560 ns 1690.190000 ns -3.72%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 13578.200 ns 13073.800000 ns -3.71%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1553.960 ns 1496.810000 ns -3.68%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 227.521 ns 221.170000 ns -2.79%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 12919.000 ns 12559.600000 ns -2.78%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1544.590 ns 1504.390000 ns -2.60%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 12858.900 ns 12525.100000 ns -2.60%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 13072.600 ns 12769.400000 ns -2.32%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 174.781 ns 170.738000 ns -2.31%

Performance change in benchmark groups

UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 328.804000 ns 333.006 ns 1.28%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 189.766000 ns 190.840 ns 0.57%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1646.060 ns 1574.480000 ns -4.35%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 13019.900000 ns 13187.900 ns 1.29%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 378.043 ns 373.724000 ns -1.14%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 209.408 ns 205.971000 ns -1.64%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc 163.133 ns 163.003000 ns -0.08%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy 332.262 ns 331.895000 ns -0.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1691.350 ns 1675.930000 ns -0.91%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 367.719000 ns 368.443 ns 0.20%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 174.781 ns 170.738000 ns -2.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 12858.900 ns 12525.100000 ns -2.60%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc 94.083200 ns 95.116 ns 1.10%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy 210.777 ns 210.628000 ns -0.07%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1553.960 ns 1496.810000 ns -3.68%
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 12164.200000 ns 12647.100 ns 3.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc 97.298 ns 96.802900 ns -0.51%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 227.521 ns 221.170000 ns -2.79%
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc 187.435000 ns 193.480 ns 3.23%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy 1621.540 ns 1600.770000 ns -1.28%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy 329.011 ns 315.617000 ns -4.07%
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc 203.334000 ns 204.438 ns 0.54%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy 391.455 ns 390.035000 ns -0.36%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy 13072.600 ns 12769.400000 ns -2.32%
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc 159.787000 ns 160.033 ns 0.15%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy 324.810 ns 321.573000 ns -1.00%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy 1755.560 ns 1690.190000 ns -3.72%
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy 361.206 ns 359.364000 ns -0.51%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc 171.032 ns 169.215000 ns -1.06%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy 13578.200 ns 13073.800000 ns -3.71%
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy 206.195000 ns 207.213 ns 0.49%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc 94.475900 ns 94.611 ns 0.14%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy 1544.590 ns 1504.390000 ns -2.60%
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
Benchmark This PR Baseline_PVC Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc 103.738 ns 102.602000 ns -1.10%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy 223.097 ns 220.533000 ns -1.15%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy 12919.000 ns 12559.600000 ns -2.78%

Details

Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libjemalloc.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy

Command:

/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv

Environment Variables:

LD_PRELOAD=libtbbmalloc_proxy.so

@ldorau
Copy link
Contributor Author

ldorau commented May 16, 2025

@lplewa please review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AddressSanitizer: unknown-crash in umfMemoryTrackerAdd()
4 participants