Postpone freeing a tracker entry, add a reference counter #1270

ldorau · 2025-04-15T15:28:35Z

Description

Postpone freeing a tracker entry until it is really removed from the tracker.

Checklist

Code compiles without errors locally
All tests pass locally
CI workflows execute properly

github-actions · 2025-04-15T15:29:37Z

Compute Benchmarks run (with params: --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398

github-actions · 2025-04-15T15:30:15Z

Compute Benchmarks run ( --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398
Job status: failure. Test status: failure.

Summary

(Emphasized values are the best results)

Improved 40 (threshold 2.00%)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider>	1796.990000 ns	4972.830 ns	176.73%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider	1174.930000 ns	2564.380 ns	118.26%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	318.467000 ns	647.568 ns	103.34%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1508.560000 ns	3025.960 ns	100.59%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	395.143000 ns	767.476 ns	94.23%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy	12691.600000 ns	24625.100 ns	94.03%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy	323.984000 ns	624.879 ns	92.87%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy	1582.200000 ns	3037.880 ns	92.00%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	625.572000 ns	1194.930 ns	91.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy	1674.610000 ns	3150.710 ns	88.15%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc	450.721000 ns	838.137 ns	85.95%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	13023.400000 ns	24169.300 ns	85.58%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	5798.090000 ns	10613.300 ns	83.05%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	417.284000 ns	762.820 ns	82.81%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	656.280000 ns	1197.650 ns	82.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	14085.800000 ns	24818.100 ns	76.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider>	313.102000 ns	539.665 ns	72.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy	330.768000 ns	568.249 ns	71.80%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	2655.420000 ns	4540.970 ns	71.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	340.066000 ns	580.669 ns	70.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy	371.110000 ns	632.053 ns	70.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc	504.080000 ns	852.465 ns	69.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc	222.220000 ns	365.370 ns	64.42%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider>	17325.100000 ns	28297.400 ns	63.33%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	367.259000 ns	599.047 ns	63.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy	365.715000 ns	591.936 ns	61.86%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider	16056.600000 ns	25706.600 ns	60.10%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy	210.554000 ns	334.039 ns	58.65%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy	221.357000 ns	350.589 ns	58.38%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider>	1040.670000 ns	1637.570 ns	57.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc	181.914000 ns	282.643 ns	55.37%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	23592.200000 ns	36260.200 ns	53.70%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	221.747000 ns	340.160 ns	53.40%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider>	355.287000 ns	543.490 ns	52.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	238.835000 ns	360.015 ns	50.74%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider>	1121.910000 ns	1670.930 ns	48.94%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc	135.488000 ns	170.436 ns	25.79%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc	159.699000 ns	199.856 ns	25.15%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc	172.138000 ns	204.384 ns	18.73%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc	149.274000 ns	172.393 ns	15.49%

Regressed 6 (threshold 2.00%)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider>	4677.440 ns	752.146000 ns	-83.92%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc	196.851 ns	88.994000 ns	-54.79%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc	189.812 ns	88.265900 ns	-53.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc	101.247 ns	64.327600 ns	-36.46%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc	95.715 ns	63.410600 ns	-33.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider>	35133.300 ns	30418.400000 ns	-13.42%

Performance change in benchmark groups

UMF

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy	323.984000 ns	624.879 ns	92.87%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy	1582.200000 ns	3037.880 ns	92.00%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider>	313.102000 ns	539.665 ns	72.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider>	1040.670000 ns	1637.570 ns	57.36%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc	181.914000 ns	282.643 ns	55.37%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc	189.812 ns	88.265900 ns	-53.50%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider>	4677.440 ns	752.146000 ns	-83.92%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy	12691.600000 ns	24625.100 ns	94.03%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy	371.110000 ns	632.053 ns	70.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc	222.220000 ns	365.370 ns	64.42%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider>	355.287000 ns	543.490 ns	52.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider>	1121.910000 ns	1670.930 ns	48.94%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider>	35133.300 ns	30418.400000 ns	-13.42%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc	196.851 ns	88.994000 ns	-54.79%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	625.572000 ns	1194.930 ns	91.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy	1674.610000 ns	3150.710 ns	88.15%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc	450.721000 ns	838.137 ns	85.95%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy	330.768000 ns	568.249 ns	71.80%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	2655.420000 ns	4540.970 ns	71.01%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	340.066000 ns	580.669 ns	70.75%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc	159.699000 ns	199.856 ns	25.15%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	656.280000 ns	1197.650 ns	82.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	14085.800000 ns	24818.100 ns	76.19%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc	504.080000 ns	852.465 ns	69.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	367.259000 ns	599.047 ns	63.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy	365.715000 ns	591.936 ns	61.86%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	23592.200000 ns	36260.200 ns	53.70%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc	172.138000 ns	204.384 ns	18.73%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	318.467000 ns	647.568 ns	103.34%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1508.560000 ns	3025.960 ns	100.59%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	395.143000 ns	767.476 ns	94.23%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy	210.554000 ns	334.039 ns	58.65%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	221.747000 ns	340.160 ns	53.40%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc	135.488000 ns	170.436 ns	25.79%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc	95.715 ns	63.410600 ns	-33.75%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	13023.400000 ns	24169.300 ns	85.58%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	5798.090000 ns	10613.300 ns	83.05%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	417.284000 ns	762.820 ns	82.81%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy	221.357000 ns	350.589 ns	58.38%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	238.835000 ns	360.015 ns	50.74%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc	149.274000 ns	172.393 ns	15.49%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc	101.247 ns	64.327600 ns	-36.46%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider>	1796.990000 ns	4972.830 ns	176.73%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider	1174.930000 ns	2564.380 ns	118.26%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider>	17325.100000 ns	28297.400 ns	63.33%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider	16056.600000 ns	25706.600 ns	60.10%

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)

Benchmark	This PR	Baseline_PVC	Change
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider>	0.000000 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider	0.000000 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider>	0.000000 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider	0.000000 %	-

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider>	0.064949 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider>	30.607300 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider>	60.801600 %	-

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider>	0.016245 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider>	30.589100 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider>	55.478400 %	-

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	26.002600 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	64.508100 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	62.411100 %	-

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	25.507300 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	60.786800 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	54.893400 %	-

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	23.636700 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	85.085300 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	85.085300 %	-

Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	20.038400 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	84.818400 %	-
FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	88.068200 %	-

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)

Benchmark	This PR	Baseline_PVC
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc	185.427000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider>	4683.950000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider>	1012.640000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider>	317.499000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy	1599.570000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc	193.304000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy	326.985000 ns	-

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)

Benchmark	This PR	Baseline_PVC
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc	230.601000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider>	35481.000000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider>	1152.200000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider>	387.822000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy	12994.000000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc	204.267000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy	389.277000 ns	-

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)

Benchmark	This PR	Baseline_PVC
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc	450.370000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	2627.180000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	622.544000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	329.937000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy	1645.940000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc	158.885000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy	326.537000 ns	-

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)

Benchmark	This PR	Baseline_PVC
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc	508.403000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	24059.100000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	669.748000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	382.320000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	13026.500000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc	167.881000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy	360.961000 ns	-

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)

Benchmark	This PR	Baseline_PVC
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc	136.707000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	325.845000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	388.318000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	217.319000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1510.160000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc	92.599300 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy	206.031000 ns	-

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)

Benchmark	This PR	Baseline_PVC
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc	146.831000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	6542.910000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	413.041000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	227.619000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	12405.600000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc	104.040000 ns	-
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy	223.409000 ns	-

Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider>	50.000000 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider>	99.990400 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider>	99.960900 %	-

Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider>	20.000000 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider>	99.990300 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider>	99.962800 %	-

Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	97.184200 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	99.992500 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	99.989800 %	-

Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	90.346000 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	99.991800 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	99.987700 %	-

Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider>	99.536100 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider>	99.996800 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider>	99.992800 %	-

Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider>	98.763000 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider>	99.996800 %	-
FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider>	99.994200 %	-

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 (6)

Benchmark	This PR	Baseline_PVC
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 glibc	-	366.197000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 jemalloc_pool<os_provider>	-	1672.840000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 scalable_pool<os_provider>	-	549.361000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 umfProxy	-	53569.500000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 jemalloc	-	89.618300 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 tbbProxy	-	632.770000 ns

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 (6)

Benchmark	This PR	Baseline_PVC
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 glibc	-	366.670000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 jemalloc_pool<os_provider>	-	1674.080000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 scalable_pool<os_provider>	-	544.330000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 umfProxy	-	78951.900000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 jemalloc	-	89.786300 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 tbbProxy	-	638.970000 ns

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 (6)

Benchmark	This PR	Baseline_PVC
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 glibc	-	845.858000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 jemalloc_pool<os_provider>	-	1198.180000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 scalable_pool<os_provider>	-	597.241000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 umfProxy	-	53934.300000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 jemalloc	-	204.593000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 tbbProxy	-	584.623000 ns

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 (6)

Benchmark	This PR	Baseline_PVC
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 glibc	-	855.042000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 jemalloc_pool<os_provider>	-	1202.610000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 scalable_pool<os_provider>	-	603.377000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 umfProxy	-	79284.100000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 jemalloc	-	203.389000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 tbbProxy	-	594.468000 ns

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 (6)

Benchmark	This PR	Baseline_PVC
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 glibc	-	173.740000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 jemalloc_pool<os_provider>	-	764.419000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 scalable_pool<os_provider>	-	362.542000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 umfProxy	-	53422.200000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 jemalloc	-	64.340700 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 tbbProxy	-	352.649000 ns

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 (6)

Benchmark	This PR	Baseline_PVC
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 glibc	-	173.294000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 jemalloc_pool<os_provider>	-	763.153000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 scalable_pool<os_provider>	-	359.356000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 umfProxy	-	78621.600000 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 jemalloc	-	64.383900 ns
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 tbbProxy	-	354.263000 ns

Details

Benchmark details contain too many chars to display

ldorau · 2025-04-15T15:32:20Z

"Exception: The directory /home/test-user/bench_workdir_umf exists but is not a benchmark work directory."
See: https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398/job/40592824780
@lplewa @lukaszstolarczuk Could you fix it?

ldorau · 2025-04-17T12:56:55Z

The last commit accde78 is not finished yet !

vinser52 · 2025-04-17T14:39:18Z

What is the purpose of this PR, could you please describe the scenario and how these changes help?

lplewa · 2025-04-18T13:09:58Z

What is the purpose of this PR, could you please describe the scenario and how these changes help?

We have a race, between insert and delete of entry in tracker

Top screen in thread one, bottom screen is thread two. Numbers shows order of the operations.

vinser52 · 2025-04-22T09:53:16Z

But how is it possible that one thread has some pointer which belongs to the entry in the memory tracker and another thread removes that entry from the tracker? The first thing that came to my mind is the following:

T1 allocates memory, and the corresponding region is put into the tracker
T2 frees the memory allocated by T1, and the corresponding region is removed from the tracker
T1 is still trying to use the memory which is already been freed by T2.

But it is an ill-formed client's code. What is the real scenario?

lplewa · 2025-04-23T09:58:42Z

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

vinser52 · 2025-04-28T10:23:35Z

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.

ldorau · 2025-04-28T20:20:37Z

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.

@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2).

This is the real scenario:
a) thread T2 allocates memory region R (for example: 0x100-0x300 - address 0x100 and size 0x200) and adds it to the tracker (this is the only entry in the tracker yet) and gets preempted ...
b) thread T1 allocates memory of size 0x500 and receives pointer P (address 0x400, size 0x500 - range 0x400-0x900) and tries to add it to the tracker, so it calls umfMemoryTrackerAdd(), so critnib_find(FIND_LE) finds and returns the only region R existing in the tracker R: 0x100-0x300 (step 1 on the picture) and T1 gets preempted (step 2 on the picture) ...
c) thread T2 removes the region R from the tracker (step 3 on the picture), frees its memory (step 4 on the picture) and T2 gets preempted ...
d) thread T1 tries to read a size of the already freed region R (step 5 on the picture) and it crashes ...

vinser52 · 2025-04-29T08:53:07Z

Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.

That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.

@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2).

This is the real scenario: a) thread T2 allocates memory region R (for example: 0x100-0x300 - address 0x100 and size 0x200) and adds it to the tracker (this is the only entry in the tracker yet) and gets preempted ... b) thread T1 allocates memory of size 0x500 and receives pointer P (address 0x400, size 0x500 - range 0x400-0x900) and tries to add it to the tracker, so it calls umfMemoryTrackerAdd(), so critnib_find(FIND_LE) finds and returns the only region R existing in the tracker R: 0x100-0x300 (step 1 on the picture) and gets preempted (step 2 on the picture) ... c) thread T2 removes the region R from the tracker (step 3 on the picture), frees its memory (step 4 on the picture) and gets preempted ... d) thread T1 tries to read a size of the already freed region R (step 5 on the picture) and crashes ...

Thank you, @ldorau, for the clear description of the scenario. That makes sense.

src/critnib/critnib.c

src/critnib/critnib.h

src/pool/pool_disjoint.c

src/provider/provider_tracking.c

src/critnib/critnib.c

Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>

src/critnib/critnib.c

bratpiorka · 2025-05-14T07:06:58Z

@lplewa please review

Add a reference counter and critnib_release() function. The following 4 functions: - critnib_remove(), - critnib_get(), - critnib_find_le() and - critnib_find() return a reference (void *ref) to the returned value, that MUST be released by calling critnib_release() when it is no longer used and can be freed using the cb_free_leaf() callback. Fixes: oneapi-src#1233 Signed-off-by: Lukasz Dorau <lukasz.dorau@intel.com>

github-actions · 2025-05-15T13:09:53Z

Compute Benchmarks run (with params: --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/15045823044

github-actions · 2025-05-15T13:20:30Z

Compute Benchmarks run ( --compare 'Baseline_PVC'):
https://github.com/oneapi-src/unified-memory-framework/actions/runs/15045823044
Job status: success. Test status: success.

Failures

Name	Failure
umf-benchmark	Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>.

Summary

(Emphasized values are the best results)

Improved 2 (threshold 2.00%)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	12164.200000 ns	12647.100 ns	3.97%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc	187.435000 ns	193.480 ns	3.23%

Regressed 11 (threshold 2.00%)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy	1646.060 ns	1574.480000 ns	-4.35%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy	329.011 ns	315.617000 ns	-4.07%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy	1755.560 ns	1690.190000 ns	-3.72%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	13578.200 ns	13073.800000 ns	-3.71%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1553.960 ns	1496.810000 ns	-3.68%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy	227.521 ns	221.170000 ns	-2.79%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	12919.000 ns	12559.600000 ns	-2.78%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1544.590 ns	1504.390000 ns	-2.60%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	12858.900 ns	12525.100000 ns	-2.60%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy	13072.600 ns	12769.400000 ns	-2.32%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc	174.781 ns	170.738000 ns	-2.31%

Performance change in benchmark groups

UMF

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy	328.804000 ns	333.006 ns	1.28%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc	189.766000 ns	190.840 ns	0.57%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy	1646.060 ns	1574.480000 ns	-4.35%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy	13019.900000 ns	13187.900 ns	1.29%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy	378.043 ns	373.724000 ns	-1.14%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc	209.408 ns	205.971000 ns	-1.64%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc	163.133 ns	163.003000 ns	-0.08%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy	332.262 ns	331.895000 ns	-0.11%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy	1691.350 ns	1675.930000 ns	-0.91%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy	367.719000 ns	368.443 ns	0.20%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc	174.781 ns	170.738000 ns	-2.31%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	12858.900 ns	12525.100000 ns	-2.60%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc	94.083200 ns	95.116 ns	1.10%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy	210.777 ns	210.628000 ns	-0.07%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1553.960 ns	1496.810000 ns	-3.68%

Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC	Change
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	12164.200000 ns	12647.100 ns	3.97%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc	97.298 ns	96.802900 ns	-0.51%
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy	227.521 ns	221.170000 ns	-2.79%

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC	Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc	187.435000 ns	193.480 ns	3.23%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy	1621.540 ns	1600.770000 ns	-1.28%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy	329.011 ns	315.617000 ns	-4.07%

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC	Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc	203.334000 ns	204.438 ns	0.54%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy	391.455 ns	390.035000 ns	-0.36%
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy	13072.600 ns	12769.400000 ns	-2.32%

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC	Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc	159.787000 ns	160.033 ns	0.15%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy	324.810 ns	321.573000 ns	-1.00%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy	1755.560 ns	1690.190000 ns	-3.72%

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC	Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy	361.206 ns	359.364000 ns	-0.51%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc	171.032 ns	169.215000 ns	-1.06%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy	13578.200 ns	13073.800000 ns	-3.71%

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)

Benchmark	This PR	Baseline_PVC	Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy	206.195000 ns	207.213 ns	0.49%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc	94.475900 ns	94.611 ns	0.14%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy	1544.590 ns	1504.390000 ns	-2.60%

Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)

Benchmark	This PR	Baseline_PVC	Change
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc	103.738 ns	102.602000 ns	-1.10%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy	223.097 ns	220.533000 ns	-1.15%
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy	12919.000 ns	12559.600000 ns	-2.78%

Details

Benchmark details - environment, command...

multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy