Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix missed exploration of edges in constant propagation #1635

Merged
merged 4 commits into from
Sep 8, 2024

Conversation

luigifusco
Copy link
Contributor

There is a bug related to the missed exploration of interstate edges during constant propagation in case a loop body has a conditional assignment. The reverse DFS yields parent-node pairs and analyzes only the edge connecting the two. The DFS will yield a certain node only once, while the assumption in the code is that the uniqueness is enforced on the parent-node pair. This results in only one outgoing interstate edge per body state being visited, leading to mistakes in the common case of conditional assignments (which result in two outgoing edges performing different assignments). If the visited edge does not perform an assignment or assigns the initialization value, the symbol will be wrongly interpreted as a constant and replaced in downstream states.

A short reproducing example is:

N = dace.symbol('N', dace.int64)

@dace.program
def program(in_arr: dace.bool[N], arr: dace.bool[N]):
    check = False
    for i in range(N):
        if in_arr[i]:
            check = True
        else:
            check = False
    for i in dace.map[0:N]:
        arr[i] = check

sdfg = program.to_sdfg(simplify=True)
sdfg.save('bug.sdfg')

# "arr[i] = check" will be replaced by "arr[i] = False"

The fix makes sure all interstate edges are visited at least once.

@luigifusco luigifusco force-pushed the constant_propagation_fix branch from 99c1f78 to dd62030 Compare September 2, 2024 15:29
Copy link
Collaborator

@phschaad phschaad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please add the reproducer or a derivative thereof as a regression test to our CI test suite. Make sure it fails (reproduces) without the fix and passes with it. Thanks!

@tbennun
Copy link
Collaborator

tbennun commented Sep 3, 2024

I second @phschaad's comment about the test. I also think that there may be issues with this code concerning loop guards (which will have another incoming edge from the end/latch of a loop, thereby adding more new_symbols that are not really new yet).

Also, how do we know that this process works for multiple chained input edges going into the current node? This only propagates one step backwards. Ideally every state should traverse all the way back to a root node (or a dominator).

@luigifusco
Copy link
Contributor Author

This version ignores unvisited states but iterates until no change is made to the symbol definitions. Having more than one pass is the correct way of doing this. In general a loop body needs to be visited at least twice (it needs to propagate its definitions into itself).

An optimization of my code would be to explicitly detect a loop (there is an edge with yet undefined result at its source) -> perform a sub-constant-propagation on the loop body only (going recursively if nested loops are found) and then continue with the normal propagation. This however would complicate the code a lot more (requiring changes into the topological sort function to start from arbitrary states). Do you think that is needed?

@tbennun
Copy link
Collaborator

tbennun commented Sep 3, 2024

Maybe the loop-specific behavior is not needed, especially when control flow blocks are raised from the code.
We need to take a closer look at the algorithm before approving. I want to see if the algorithm is sound.
Do you know how many passes are done on an SDFG? Is there a considerable slowdown on larger graphs (@acalotoiu maybe check that)?

Thanks for changing it!

@luigifusco
Copy link
Contributor Author

checking velocity_tendencies_simplified it completes in 5 iterations taking about 2.3 seconds total. This number should be related to the maximum loop depth (you need multiple passes to bring the definition/correction out).

@tbennun
Copy link
Collaborator

tbennun commented Sep 4, 2024

checking velocity_tendencies_simplified it completes in 5 iterations taking about 2.3 seconds total. This number should be related to the maximum loop depth (you need multiple passes to bring the definition/correction out).

I have no idea what this means with relation to the existing code. How much longer is it than before?

@luigifusco
Copy link
Contributor Author

Testing a bit better. Before ~2.1s, now ~2.6s. I only change the collect_constants part. The way this scales depending on the number of iterations is not linear as successive iterations will perform fewer dictionary operations. Considering this function only, 1 iter -> 1.7s, 5 iter -> 2.4s, 10 iter -> 3s. The SDFG is part of ICON and is made of 291 nodes and 385 edges.

@tbennun tbennun merged commit 7210cb6 into master Sep 8, 2024
0 of 10 checks passed
@tbennun tbennun deleted the constant_propagation_fix branch September 8, 2024 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants