Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

FlorianDeconinck · 2024-07-10T13:24:46Z

The current bridge between gt4py.cartesian and DaCe uses a concept of expansion for stencils. This means that stencils, once they got through the entire GT4Py workflow, are pickled and shoved down a library node. Then the expansion mechanism turns those into a single map/tasklet per Horizontal Computation. This means that a lot of control-flow (mask, while loop, etc.) is hidden within from DaCe, leading to some optimization power being hidden.

We need to undo this and describe everything back to DaCe.

Expansion code lives around src/gt4py/cartesian/gtc/dace/expansion/*.py in GT4Py

Parent: GEOS-ESM/SMT-Nebulae#31

NB: This is a requirement for DaCe AD future features to kick in

Expose more code to DaCe by exposing the mask/while to the SDFG instead of keeping them in the Tasklet
ALTERNATIVELY: subtask further if the refactor work is bigger

The text was updated successfully, but these errors were encountered:

FlorianDeconinck · 2024-07-16T18:23:28Z

Below is a small example showing the above problem

from gt4py.cartesian.gtscript import (
    computation,
    interval,
    PARALLEL,
)
from ndsl.boilerplate import get_factories_single_tile_orchestrated_cpu
from ndsl.constants import X_DIM, Y_DIM, Z_DIM
from ndsl.dsl.typing import FloatField
from ndsl import StencilFactory, orchestrate
import numpy as np

domain = (3, 3, 4)

stcil_fctry, ijk_qty_fctry = get_factories_single_tile_orchestrated_cpu(domain[0], domain[1], domain[2], 0)

def small_conditional(
    in_field: FloatField,
    out_field: FloatField
):
    with computation(PARALLEL), interval(...):
        if in_field > 5.0 and in_field < 20:
            out_field = in_field


class DaCeGT4Py_Bridge:
    def __init__(self, stencil_factory: StencilFactory):
        orchestrate(
            obj=self,
            config=stencil_factory.config.dace_config)
        self.stencil = stcil_fctry.from_dims_halo(
            func=small_conditional,
            compute_dims=[X_DIM, Y_DIM, Z_DIM],
        )

    def __call__(self, in_field: FloatField, out_field: FloatField):
        self.stencil(in_field, out_field)



if __name__ == "__main__":
    I = np.arange(domain[0]*domain[1]*domain[2], dtype=np.float64).reshape(domain)
    O = np.zeros(domain)

    bridge = DaCeGT4Py_Bridge(stcil_fctry)
    bridge(I, O)

    print(f"Input:\n{I}\n")
    print(f"Output:\n{O}\n")

This generate the following SDFG (under dace:cpu)

where the tasklet code carries the conditional:

mask_140660478452496_gen_0: dace.bool_
mask_140660478452496_gen_0 = ((gtIN__in_field > dace.float64(5.0)) and (gtIN__in_field < dace.float64(dace.int64(20))))
if mask_140660478452496_gen_0:
    gtOUT__out_field = gtIN__in_field

We want this conditional to be extracted before the tasklet. See dace.sdfg.SDFG:add_loop code to see how to dynamically create an if condition using the guard mechanism

FlorianDeconinck · 2024-08-05T13:27:27Z

Known native construct that are folded inside the Tasklet instead of described to DaCe:

conditional
while loops

GT4Py-DaCe uses an "expansion" mechanism. The stencils are dace.library_nodes while the code is being parsed (orchestration). Then expansion of those nodes is triggered and unrolls the stencil into an SDFG compatible description. It's within this code that the decision to write all code has tasklet is finalized. Refactor might need to impact pre-expansion as well in the DaceIRBuilder

`MaskInlining` will be a key feature of the improved gt4py / DaCe bridge as envisioned in GEOS-ESM/NDSL#53, which will propagate more control flow elements from gt4py to DaCe. We thus want to make sure if `MaskInlining` itself is posing any problems. We tracked the git history of this explicit exclude back to PR #300, which not only added the npir backend, but also refactored how pipeline stages are defined. Previous to this PR optimization passes to be specified explicitly and the PR changed this to an exclude list. It looks like the `MaskInlining` pass was added after the list of optimization passes was formed and thus excluded (to keep the list constant). From this history we thus don't expect `MaskInlining` to break anything. Co-authored-by: Roman Cattaneo <>

…on and other cleanups (#1630) Being new to the codebase and in preparation of the [GT4Py-DaCe bridge refactor](GEOS-ESM/NDSL#53) I was reading a lot of code. Some things that caught my eye are summarized in this PR. Bigger changes include 1. Moving `daceir.py` into the `dace/` backend folder. This follows the convention of all other backends who have their IR inside the backend folder. (last commit) 2. Readability improvements in `gtir_to_oir.py` (the "Cleanup visit_*" commits): In my opinion, there's no point in shortening "statement" to "stmt". I wouldn't go as far as renaming the classes. For the renamed local variables, I think this adds a lot to readability. 3. `visit_While` in `gtir_to_oir.py` was having an extra mask statement around the `while` loop (in case a mask was defined. I inlined the potential `mask` with the condition of `while` loop. 4. In `gtir_to_oir.py` don't translate every body statement into a single `MaskStmt`. Instead, create one (or two incase of `else`) `MaskStmt` with multiple statements in the `body` (which is already typed as a list of statements). Tested locally by running the test suite. All tests passing. Co-authored-by: Roman Cattaneo <>

romanc · 2024-11-05T14:40:54Z

Latest work is in https://github.com/romanc/gt4py/romanc/bridge-on-top-of-cleanups, which is re-based on top of the cleanups in PR GridTools/gt4py#1724 and should only contain changes necessary for the new bridge and no cleanups anymore.

The new bridge is currently pending on the following two DaCe issues:

Dead Dataflow Elimination type hints do not work with all types spcl/dace#1710
Scalar to symbol promotion fails if scalar is read in another state spcl/dace#1727

Working around those two, we still (only sometimes) have an issue with writes (in k) inside a k-loop (i.e. with computation FORWARD / BACKWARD). To be investigated further at a later point.

## Description This PR is split off the work for the new GT4Py - DaCe bridge, which should allow to expose control flow statements (`if` and `while`) to DaCe to better use DaCe's analytics capabilities. This PR is concerned with adding type hints and generally improving code readability. Main parts are - `daceir_builder.py`: early returns and renamed variable - `sdfg_builder.py`: type hints and early returns - `tasklet_codegen.py`: type hints and early returns `TaskletCodegen` was given `sdfg_ctx`, which wasn't used. That parameter was thus removed. Parent issue: GEOS-ESM/NDSL#53 ## Requirements - [x] All fixes and/or new features come with corresponding tests. Assumed to be covered by existing tests. - [ ] Important design decisions have been documented in the approriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <>

romanc · 2024-12-04T22:06:58Z

Latest work is still in https://github.com/romanc/gt4py/romanc/bridge-on-top-of-cleanups, which can be rebased on top of mainline gt4py once GridTools/gt4py#1752 to get a clearer picture.

Main ideas in that branch are the following

Introduce oir.CodeBlock to (recursively) break down oir.HorizontalExecutions into smaller pieces, which are then evaluated in a dcir.Tasklet.
Replace dcir.MaskStmt / dcir.While with dcir.Condition and dcir.WhileLoop (names aren't perfect and could be renamed again)
Add support for if statements and while loops in the state machine of sdfg_builder, i.e. add_condition and add_while on SDFGContext together with visit_Condition and visit_WhileLoop in the visitor (for the new dcir nodes).
Tasklets might define temporaries (scalars and/or tmp fields) in one tasklet and re-use them in others. We thus opt to "export" all temporaries and then "re-import" them again if needed. A DaCe pass called "DeadDataflowElimination" will take care of removing unnecessary "exports".
Memlets will need to be generated per tasklet at a more fine grained level compared to now where memlets are calculated just once per horizontal execution. Commit GridTools/gt4py@19f59a3 outlines the necessary changes. One thing I forgot in this PR is the fact that both, Conditions and WhileLoops generate tasklets on the fly in sdfg_builder. They will also need to calculate their memlets at this point.

…ble k offsets (#1882)  ## Description We figured that DaCe backends are currently missing support for casting in variable k offsets. This PR - adds a codegen test with a cast in a variable k offset - adds a node validator for the DaCe backends complaining about missing for support. - adds an `xfail` test for the node validator This should be fixed down the road. Here's the issue #1881 to keep track. The PR also has two smaller and unrelated commits - 741c448 increases test coverage with another codgen test that has a couple of read after write access patterns which were breaking the "new bridge" (see GEOS-ESM/NDSL#53). - e98ddc5 just forwards all keyword arguments when visiting offsets. I don't think this was a problem until now, but it's best practice to forward everything. ## Requirements - [x] All fixes and/or new features come with corresponding tests. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com> Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>

## Description In preparation for PR #1894, pull out some refactors and cleanups. Notable in this PR are the changes to `src/gt4py/cartesian/gtc/dace/oir_to_dace.py` - visit `stencil.vertical_loops` directly instead of calling `generic_visit` (simplification since there's nothing else to visit) - rename library nodes from `f"{sdfg_name}_computation_{id(node)}"` to `f"{sdfg_name}_vloop_{counter}_{node.loop_order}_{id(node)}"`. This adds a bit more information (because `sdfg_name` is the same for all library nodes) and thus simplifies debugging workflows. Related issue: GEOS-ESM/NDSL#53 ## Requirements - [x] All fixes and/or new features come with corresponding tests. covered by existing test suite - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. N/A --------- Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

## Description This PR refactors the GT4Py/DaCe bridge to expose control flow elements (`if` statements and `while` loops) to DaCe. Previously, the whole contents of a vertical loop was put in one big Tasklet. With this PR, that Tasklet is broken apart in case control flow is found such that control flow is visible in the SDFG. This allows DaCe to better analyze code and will be crucial in future (within the current milestone) performance optimization work. The main ideas in this PR are the following 1. Introduce `oir.CodeBlock` to recursively break down `oir.HorizontalExecution`s into smaller pieces that are either code flow or evaluated in (smaller) Tasklets. 2. Introduce `dcir.Condition`and `dcir.WhileLoop` to represent if statements and while loops that are translated into SDFG states. We keep the current `dcir.MaskStmt` / `dcir.While` for if statements / while loops inside horizontal regions, which aren't yet exposed to DaCe (see #1900). 3. Add support for `if` statements and `while` loops in the state machine of `sdfg_builder.py` 4. We are breaking up vertical loops inside stencils in multiple Tasklets. It might thus happen that we write a "local" scalar in one Tasklet and read it in another Tasklet (downstream). We thus create output connectors for all scalar writes in a Tasklet and input connectors for all reads (unless previously written in the same Tasklet). 5. Memlets can't be generated per horizontal execution anymore and need to be more fine grained. `TaskletAccessInfoCollector` does this work for us, duplicating some logic in `AccessInfoCollector`. A refactor task has been logged to fix/re-evaluate this later. This PR depends on the following (downstream) DaCe fixes - spcl/dace#1954 - spcl/dace#1955 which have been merged by now. Follow-up issues - unrelated changes have been moved to #1895 - #1896 - #1898 - #1900 Related issue: GEOS-ESM/NDSL#53 ## Requirements - [x] All fixes and/or new features come with corresponding tests. Added new tests and increased coverage of horizontal regions with PRs #1807 and #1851. - [ ] Important design decisions have been documented in the appropriate ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md) folder. Docs are [in our knowledge base](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/) for now. Will be ported. --------- Co-authored-by: Roman Cattaneo <> Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>

romanc · 2025-03-24T16:28:21Z

Main chunk of this work was merged with PR GridTools/gt4py#1894. The PR pull this version of GT4Py into NDSL is NOAA-GFDL#119, which is expected to be merged in the coming days.

Follow-up issues include

Given that, let's close this issue and treat potential fall-out (nothing big is expected since pace and pyFV3 test suites both pass) in follow-up issues.

FlorianDeconinck assigned romanc Aug 2, 2024

romanc mentioned this issue Sep 2, 2024

refactor[cartesian]: Enable MaskInlining in gt4yp - DaCe bridge GridTools/gt4py#1624

Merged

3 tasks

romanc mentioned this issue Sep 5, 2024

refactor[cartesian]: readability improvements in gtir -> oir conversion and other cleanups GridTools/gt4py#1630

Merged

3 tasks

romanc mentioned this issue Nov 1, 2024

DeadDataflowElimination: Fix type hint annotation for booleans spcl/dace#1721

Closed

romanc mentioned this issue Nov 5, 2024

refactor[catesian]: Type hints and code redability improvements GridTools/gt4py#1724

Merged

2 tasks

FlorianDeconinck mentioned this issue Dec 12, 2024

[CI] Unhook Pace testing post Control Flow Regions merge spcl/dace#1826

Open

2 tasks

romanc mentioned this issue Feb 20, 2025

fix[cartesian, dace]: warn about missing support for casting in variable k offsets GridTools/gt4py#1882

Merged

2 tasks

This was referenced Mar 3, 2025

refactor[cartesian]: Dace backend: expose control flow GridTools/gt4py#1894

Merged

refactor[cartesian]: gt4py/dace bridge cleanup GridTools/gt4py#1895

Merged

romanc mentioned this issue Mar 21, 2025

Update GT4Py and DaCe to bring in refactored GT4Py/DaCe bridge that exposes control flow NOAA-GFDL/NDSL#119

Merged

7 tasks

romanc closed this as completed Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

FlorianDeconinck commented Jul 10, 2024

FlorianDeconinck commented Jul 16, 2024

FlorianDeconinck commented Aug 5, 2024

romanc commented Nov 5, 2024

romanc commented Dec 4, 2024

romanc commented Mar 24, 2025

Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

Comments

FlorianDeconinck commented Jul 10, 2024

FlorianDeconinck commented Jul 16, 2024

FlorianDeconinck commented Aug 5, 2024

romanc commented Nov 5, 2024

romanc commented Dec 4, 2024

romanc commented Mar 24, 2025