Skip to content

Correctly calculate access chains for compound types #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 21, 2025

Conversation

LegNeato
Copy link
Collaborator

@LegNeato LegNeato commented Apr 12, 2025

Fixes #46.

This was extremely painful and I am still not sure I understand the code completely and all the various cases.

@schell , @Firestar99 , please try it in your projects and see if I broke anything!

@Firestar99
Copy link
Member

My stuff still works with this branch (+ nonuniform patches) 🎉

@schell
Copy link
Contributor

schell commented Apr 18, 2025

My project compiled just fine and passes tests 👍 . I also successfully built a shader with [Default::default(); N] in it.

Awesome work!

Copy link
Contributor

@schell schell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but take my review with a grain of salt 🧂 as I never got a chance to dive in very deep, but I have tested the results.

@LegNeato
Copy link
Collaborator Author

@eddyb said he wanted to take a look at this so I'll wait a bit before merging.

@LegNeato
Copy link
Collaborator Author

Ok, I am just going to land it and we can fix forward or revert if it isn't correct.

@LegNeato LegNeato merged commit 2f915a0 into Rust-GPU:main Apr 21, 2025
7 checks passed
@LegNeato LegNeato deleted the fixregression branch April 21, 2025 02:45
Copy link
Collaborator

@eddyb eddyb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was late on this and I'm not even sure I should leave this review.

I've got mixed feelings about it - don't mind all the tracing usage and the comments, but it turns out that there are fewer functional changes than I expected - in fact, just one, and it doesn't make sense to me.

There's a non-trivial chance this is a case of "two wrongs make a right", or more specifically something else ends up reacting better to what seems to be invalid SPIR-V (AFAICT).

Now, too be fair, I finally looked at this because of:

And the simplest workaround for that seems to be "run the GEP merge logic twice, using a loop, and the second time use pointercast first which can generate its own GEP via recover_access_chain_from_offset" - so there's a chance the "byte GEP" stuff hits a similar weird combo.

In fact, funnily enough, I think we could've always done that pointercast before the merge, because of the if ty == original_pointee_ty that currently gates merging: if that condition holds, the pointercast is a noop anyway.
(EDIT: yupp, that works, and it amounts to moving just two lets slightly earlier in the function)

I should try to figure out the overall "decision trees" for maybe_inbounds_gep and other recover_access_chain_from_offset callers, then reduce duplicated work and handle all possible cases that can be statically resolved.

Comment on lines -440 to -441
// FIXME(eddyb) this isn't efficient, `recover_access_chain_from_offset`
// could instead be doing all the extra digging itself.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what this comment meant exactly, but I suspect it still applies to the loop being a loop, and not a single call to recover_access_chain_from_offset.

return Some((indices, ty));
}
}
}

#[instrument(level = "trace", skip(self), fields(ty = ?self.debug_type(ty), ptr, combined_indices = ?combined_indices.iter().map(|x| (self.debug_type(x.ty), x.kind)).collect::<Vec<_>>(), is_inbounds))]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised rustfmt didn't make this multi-line, I feel like it could be more readable if each entry in fields was on its own line.

Comment on lines +693 to +699
// Special case: If we started with a byte GEP (`is_byte_gep` is true) and
// we are currently indexing into a byte type, the result is still a byte type.
// This prevents errors if `indices` contains non-zero values when `ty` is u8/i8.
SpirvType::Integer(8, signedness) if is_byte_gep => {
// Define the resulting byte type as it might not exist yet).
SpirvType::Integer(8, signedness).def(self.span(), self)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this could be reached - indices can only dig into structs/arrays, and only a bug elsewhere would allow indices.len() > 0 when doing byte offsetting.

Locally removing this case does not fail any tests, so at the very least none were added that could hit it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh weird, I added it specifically because byte indexing was broken in tests. Perhaps a later change fixed this and I didn't notice.

// Search the current function's instructions...
// FIXME(eddyb) this could get ridiculously expensive, at the very least
// it could use `.rev()`, hoping the base pointer was recently defined?
let search_result = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the extra variable? It just inflates the diff needlessly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably did this for logging which looks like it was removed

Comment on lines +780 to +804
// Determine the result type for the `AccessChain` instruction we might
// emit. By default, use the `final_spirv_ptr_type` strictly calculated
// earlier from `ty` and `indices`.
//
// If this is a byte GEP *and* the recovered type happens to be a byte
// type, we can use the pointer type derived from the *recovered* type
// (`base_pointee_ty`). This helps preserve type information when
// recovery works for byte addressing.
let result_wrapper_type = if !is_byte_gep
|| matches!(self.lookup_type(base_pointee_ty), SpirvType::Integer(8, _))
{
trace!(
"Using strictly calculated type for wrapper: {}",
// Use type based on input `ty` + `indices`
self.debug_type(calculated_pointee_type)
);
final_spirv_ptr_type
} else {
trace!(
"Byte GEP allowing recovered type for wrapper: {}",
// Use type based on recovery result
self.debug_type(base_pointee_ty)
);
self.type_ptr_to(base_pointee_ty)
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the only addition to the "constant offset" path and I don't understand it.

You have to use final_spirv_ptr_type (i.e. pointer-to-calculated_pointee_type) because that's what the type has to be.

When indices is empty, that does mean that pointer-to-base_pointee_ty can be obtained and it has the right offset, even if it's not pointing to a ty/calculated_pointee_type (which are equal in the empty indices case) - but there are two issues there:

  • "ty is NOT byte, OR base_pointee_ty is byte" doesn't seem like it matches the comment, nor can it relate to "do we have the desired pointer, even if of the wrong type?"
    • maybe the actual issue here is that the "then" and "else" sides got flipped for no good reason, so the condition ends up negated, and should be is_byte_gep && ...?
  • the caller definitely expects a value of type final_spirv_ptr_type, so if the access chain inherently has a different type, it must be cast to the right type (which would create a LogicalPtrCast, and hopefully not create downstream issues)
    • to explain "inherently" a bit further: an access chain always has only one valid output type, given the input pointer type and the number of indices (and value, when dealing with struct fields), and we can't just put in a different type and have valid SPIR-V

My bad, I kept missing the let result_pointee_type = if indices.is_empty() { in the old code, now it makes sense that there was something there (but I've kept my original comment for posterity).

That's exactly the condition I was describing earlier, where indices being empty determines what the correct type is. However, I suppose the old code was lacking the pointercast I mentioned above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This keeps issue-46 passing, but I will admit I don't know why it's valid:

@@ -785,8 +778,9 @@
                 // type, we can use the pointer type derived from the *recovered* type
                 // (`base_pointee_ty`). This helps preserve type information when
                 // recovery works for byte addressing.
-                let result_wrapper_type = if !is_byte_gep
-                    || matches!(self.lookup_type(base_pointee_ty), SpirvType::Integer(8, _))
+                let result_wrapper_type = if !(is_byte_gep
+                    && !matches!(self.lookup_type(base_pointee_ty), SpirvType::Integer(8, _))
+                    && indices.is_empty())
                 {
                     trace!(
                         "Using strictly calculated type for wrapper: {}",
@@ -828,13 +822,17 @@
                     );
                     // Emit a single AccessChain using the original pointer `ptr_id` and the fully combined index list.
                     // Note: We don't pass `ptr_base_index` here because its effect is incorporated into `base_indices`.
-                    return self.emit_access_chain(
+                    let mut final_ptr = self.emit_access_chain(
                         result_wrapper_type, // The chosen result pointer type
                         ptr_id,              // The original base pointer ID
                         None,                // No separate base index needed
                         combined_indices,    // The combined structured + original indices
                         is_inbounds,         // Preserve original inbounds request
                     );
+                    if result_wrapper_type != final_spirv_ptr_type {
+                        final_ptr = self.pointercast(final_ptr, final_spirv_ptr_type);
+                    }
+                    return final_ptr;
                 } else {
                     // Recovery happened, but the recovered type `base_pointee_ty` doesn't match the input `ty`,
                     // AND there are more `indices` to process. Using the `base_indices` derived from

I suppose the trick is that is_byte_gep requires indices.is_empty() (as bytes have no inner structure to further index), so:

  • my && indices.is_empty() is redundant (explains why nothing broke with your version)
  • you strengthened the condition for using base_pointee_ty, meaning final_spirv_ptr_type is used even in cases where it's incorrect to do so (and a cast should have been used instead)
    • generating the cast to final_spirv_ptr_type doesn't help, which means... what exactly? that two wrongs make a right?

Comment on lines +969 to +970
// HACK(eddyb): Workaround for potential upstream issues where pointers might lack precise type info.
// FIXME(eddyb): Ideally, this should use untyped memory features if available/necessary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should've stayed next to the pointercast - the new comments might not even make sense, because the cast is only due to the mismatch - anyway, it's all a historical mess, and lots of concepts don't fit the new (much more untyped) approach, sadly.

let search_result = {
let emit = self.emit();
let module = emit.module_ref();
emit.selected_function().and_then(|func_idx| {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This accounts for selected_function() being None - that should never be possible when you are in a method of a builder that will emit instructions into a selected block, not just function.

Comment on lines 903 to +910
if let Some((original_ptr, mut original_indices)) = maybe_original_access_chain {
// Transform the following:
// OpAccessChain original_ptr [a, b, c]
// OpPtrAccessChain ptr base [d, e, f]
// into
// OpAccessChain original_ptr [a, b, c + base, d, e, f]
// to remove the need for OpPtrAccessChain
let last = original_indices.last_mut().unwrap();
*last = self
.add(last.with_type(ptr_base_index.ty), ptr_base_index)
.def(self);
original_indices.extend(indices);
return self.emit_access_chain(
result_type,
original_ptr,
None,
original_indices,
is_inbounds,
);
trace!("has original access chain, attempting to merge GEPs");

// Check if merging is possible. Requires:
// 1. The original AccessChain had at least one index.
// 2. The *last* index of the original AccessChain is a constant.
// 3. The *first* index (`ptr_base_index`) of the *current* GEP is a constant.
// Merging usually involves adding these two constant indices.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, you replaced a potentially-runtime addition which allows array indexing GEPs to merge, with a constant-only addition?

How could that help? If they were constants, the outcome is unchanged, you're still using add, and if they were not constants, not merging now means it hits the worst path and I can't think of any reason why that would result in better recovery downstream.


Changing this back to what it was before doesn't impact tests, btw, so that's another instance of that.

I am almost certain now that the weird !is_byte_gep || ... condition is the only actual change in this PR that impacted tests, and that's the one I don't understand yet.

@LegNeato
Copy link
Collaborator Author

LegNeato commented May 7, 2025

@eddyb feel free to fix or back out whatever you want!

The new code flow made more sense to me and I confirmed a bunch of stuff with traces in various contexts. But I am still grappling with the underlying model and how best to apply it. I've paged this diff out of my brain (finally!) so I'd have to play around with it again to give reasons why I made the choices I did. I will say this was like the 5th revision / attempt, so it wasn't like I had an overarching strategy...I just tried to understand the code's overall goal and make it work w/o thinking of the best way to do things. It also wasn't clear where best to make changes...there are a lot of moving pieces in this code that all rely on each other. (rakes as you say!)

Feel free to just ignore this and focus on qptr if you want, as it is a bit of a side quest that will be obsoleted by your work (AFAIK).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regression on main when allocating arrays of arrays
4 participants