Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.mem.indexOfSentinel() vectorized path violates pointer aliasing rules #23184

Open
alexrp opened this issue Mar 10, 2025 · 0 comments
Open
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Milestone

Comments

@alexrp
Copy link
Member

alexrp commented Mar 10, 2025

zig/lib/std/mem.zig

Lines 1097 to 1142 in 8e0a4ca

// The below branch assumes that reading past the end of the buffer is valid, as long
// as we don't read into a new page. This should be the case for most architectures
// which use paged memory, however should be confirmed before adding a new arch below.
.aarch64, .x86, .x86_64 => if (std.simd.suggestVectorLength(T)) |block_len| {
const page_size = std.heap.page_size_min;
const block_size = @sizeOf(T) * block_len;
const Block = @Vector(block_len, T);
const mask: Block = @splat(sentinel);
comptime assert(std.heap.page_size_min % @sizeOf(Block) == 0);
assert(page_size % @sizeOf(Block) == 0);
// First block may be unaligned
const start_addr = @intFromPtr(&p[i]);
const offset_in_page = start_addr & (page_size - 1);
if (offset_in_page <= page_size - @sizeOf(Block)) {
// Will not read past the end of a page, full block.
const block: Block = p[i..][0..block_len].*;
const matches = block == mask;
if (@reduce(.Or, matches)) {
return i + std.simd.firstTrue(matches).?;
}
i += @divExact(std.mem.alignForward(usize, start_addr, block_size) - start_addr, @sizeOf(T));
} else {
@branchHint(.unlikely);
// Would read over a page boundary. Per-byte at a time until aligned or found.
// 0.39% chance this branch is taken for 4K pages at 16b block length.
//
// An alternate strategy is to do read a full block (the last in the page) and
// mask the entries before the pointer.
while ((@intFromPtr(&p[i]) & (block_size - 1)) != 0) : (i += 1) {
if (p[i] == sentinel) return i;
}
}
assert(std.mem.isAligned(@intFromPtr(&p[i]), block_size));
while (true) {
const block: *const Block = @ptrCast(@alignCast(p[i..][0..block_len]));
const matches = block.* == mask;
if (@reduce(.Or, matches)) {
return i + std.simd.firstTrue(matches).?;
}
i += block_len;
}
},

I think this is pretty clearly a violation of LLVM's pointer aliasing rules. LLVM could break this code whenever and would be fully justified in doing so.

@alexrp alexrp added bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library. labels Mar 10, 2025
@alexrp alexrp added this to the 0.15.0 milestone Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Observed behavior contradicts documented or intended behavior standard library This issue involves writing Zig code for the standard library.
Projects
None yet
Development

No branches or pull requests

1 participant