Skip to content

feat[next][dace]: Filter unused connectivity tables #1502

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 26, 2024

Conversation

edopao
Copy link
Contributor

@edopao edopao commented Mar 20, 2024

During SDFG construction, only include the connectivity tables that are used by the stencil program. By convention, DaCe will not remove unnecessary data dependencies from program arguments.

Before this PR:
Screenshot 2024-03-20 at 08 50 30

with this PR:
Screenshot 2024-03-20 at 08 50 49

@havogt
Copy link
Contributor

havogt commented Mar 20, 2024

This is mainly a cosmetic change, right?

@edopao
Copy link
Contributor Author

edopao commented Mar 20, 2024

It might have some impact, indeed. If we reduce the number of arguments, we reduce the overhead of sorting and mapping the program arguments to the SDFG binary interface. Besides, we could test this simplified interface in the fused SDFG prototype and see if it solves the problem of closure propagation (for some reason, the connectivity tables are not automatically parsed by the dace frontend: one theory is that the dace frontend is expecting all tables with the current SDFG, while the diffusion modules is passing only the tables that are actually used by the stencil).

@edopao
Copy link
Contributor Author

edopao commented Mar 20, 2024

Moreover, a simpler SDFG seems to trigger a different optimization strategy. The resulting SDFG is different for some stencils.

@havogt
Copy link
Contributor

havogt commented Mar 20, 2024

Interesting, this should probably also be fixed in DaCe then?

@edopao
Copy link
Contributor Author

edopao commented Mar 20, 2024

Adding @philip-paul-mueller and @kotsaloscv so they are aware of this discussion. My understanding is that the non-transient arrays allocated in the outermost SDFG are supposed to be program arguments. You can also see this in the other way: you define the program interface by allocating non-transient arrays in the outermost SDFG. Should DaCe then silently modify the program interface, in case some of the arguments are not used? This we can discuss with the DaCe team, but I see some risks for confusion there. What DaCe does already is to remove data dependencies between a nested SDFG and its parent.

@philip-paul-mueller
Copy link
Contributor

Sorry for the wall of text.

What do you mean by 'unnecessary data dependencies from program arguments' exactly?
Do you mean that the unused connectivity tables are not removed or that the global memory is first copied in a transient (2 left most inputs; I can not read their names in the image).

If it is the first one, then DaCe will never touch global memory, so it will by design (or rather convention) not remove them.

If it is the second, then I have experienced the same issue in the past, when I looked at advenction stencil 13, i.e.

def adv13(A, B):
	A -= B
	return A

I can not read the names of the names of the access nodes, but the following assumes that the inputs, that have these intermediate transient, are also used as output.
My current understanding, i.e. working hypothesis, for this behaviour is, that it is done to "ensure correctness", with respect to the guarantees that the __restrict__ keywords in C++ has to uphold.

In my above example it is trivial to see that there is not issue, since everything is done pointwise, but I am sure that determining this in general is hard, thus DaCe play it safe by adding some transient (if you want I can give you some C example that illustrates it).

Since we porting ICON stencils, we knew from the get go that this will never happen.
For that reason I wrote a transformation to get rid of it.

@edopao
Copy link
Contributor Author

edopao commented Mar 20, 2024

We were discussing here the first case you described, that the unused connectivity tables are not removed.

@philip-paul-mueller
Copy link
Contributor

philip-paul-mueller commented Mar 20, 2024

Okay then the explanation is: They are global hence they are not removed.

@edopao Some transformations have a permissive flag that allows them to do more extreme stuff, maybe enabling this could do the job.

@havogt
Copy link
Contributor

havogt commented Mar 20, 2024

I totally agree with you, @edopao. They should not be removed, because they are the interface. But unused parameters should not lead to different optimizations. That's the part that should be fixed, no?

@edopao
Copy link
Contributor Author

edopao commented Mar 20, 2024

Maybe I used the word optimization inappropriately. I should have used transformations.
And also, this PR is not only pruning the program arguments, but at each SDFG nesting-level it is removing unnecessary non-transient arrays. This second kind of pruning was already done by DaCe, but maybe at a different stage in the simplify pass.
There is no guarantee for commutativity of SDFG transformations: the pruning of unnecessary non-transient arrays on nested SDFGs at early stage could be the trigger for different transformation results. The differences are small, at least at SDFG representation level.

Without this PR:
Screenshot 2024-03-20 at 12 36 14

With this PR:
Screenshot 2024-03-20 at 12 38 04

Also, there is a DaCe bug which shows up in icon4py stencils and that prevents this PR from being merged:
dace.sdfg.validation.InvalidSDFGEdgeError: Memlet data does not match source or destination data nodes) (at state lambda_291f6607757fd22e_body, edge __s0_n26None_n13IN__var_455[0] (_var_463:None -...

@philip-paul-mueller
Copy link
Contributor

philip-paul-mueller commented Mar 20, 2024

I thought about it a bit more and came to the conclusion that at least from DaCe's perspective they are "used", at least in a sense.
If you think of a nested SDFG as a glorified tasklet, then the connectivity tables are all used, as input to the "tasklet" in the outermost SDFG.
To know that they are unused DaCe would have to decent into the nested SDFG and look if the connectors are used on the inside.
And I am pretty sure that there is no DaCe transformation that does this. Actually, there seems to be one that does this.

@edopao could you post an example where the removal triggered a different optimization path?
Based on our discussion this morning I guess you are using auto optimize.
As an information for you, I noticed in the past that auto optimize is not necessaraly idempotent, i.e. applying it again again changed the SDFG.
In that regard it does not really surprises me that a different route is chosen.

@edopao
Copy link
Contributor Author

edopao commented Mar 22, 2024

@philip-paul-mueller I forgot to answer you, the example of different transformation result is shown above, in my previous comment, without and with this PR. I agree with your comment.

@edopao edopao marked this pull request as ready for review March 26, 2024 14:03
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve it, but please see my comment to to filter_neighbor_table().

@edopao edopao merged commit 375878c into GridTools:main Mar 26, 2024
31 checks passed
@edopao edopao deleted the dace-remove_unused_connectivities branch March 26, 2024 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants