Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor of GT4Py-DaCe bridge to expose all control-flow to Dace #53

Closed
2 tasks
FlorianDeconinck opened this issue Jul 10, 2024 · 5 comments · Fixed by NOAA-GFDL/NDSL#119
Closed
2 tasks
Assignees

Comments

@FlorianDeconinck
Copy link
Collaborator

The current bridge between gt4py.cartesian and DaCe uses a concept of expansion for stencils. This means that stencils, once they got through the entire GT4Py workflow, are pickled and shoved down a library node. Then the expansion mechanism turns those into a single map/tasklet per Horizontal Computation. This means that a lot of control-flow (mask, while loop, etc.) is hidden within from DaCe, leading to some optimization power being hidden.

We need to undo this and describe everything back to DaCe.

Expansion code lives around src/gt4py/cartesian/gtc/dace/expansion/*.py in GT4Py

Parent: GEOS-ESM/SMT-Nebulae#31

NB: This is a requirement for DaCe AD future features to kick in


  • Expose more code to DaCe by exposing the mask/while to the SDFG instead of keeping them in the Tasklet
  • ALTERNATIVELY: subtask further if the refactor work is bigger
@FlorianDeconinck
Copy link
Collaborator Author

Below is a small example showing the above problem

from gt4py.cartesian.gtscript import (
    computation,
    interval,
    PARALLEL,
)
from ndsl.boilerplate import get_factories_single_tile_orchestrated_cpu
from ndsl.constants import X_DIM, Y_DIM, Z_DIM
from ndsl.dsl.typing import FloatField
from ndsl import StencilFactory, orchestrate
import numpy as np

domain = (3, 3, 4)

stcil_fctry, ijk_qty_fctry = get_factories_single_tile_orchestrated_cpu(domain[0], domain[1], domain[2], 0)

def small_conditional(
    in_field: FloatField,
    out_field: FloatField
):
    with computation(PARALLEL), interval(...):
        if in_field > 5.0 and in_field < 20:
            out_field = in_field


class DaCeGT4Py_Bridge:
    def __init__(self, stencil_factory: StencilFactory):
        orchestrate(
            obj=self,
            config=stencil_factory.config.dace_config)
        self.stencil = stcil_fctry.from_dims_halo(
            func=small_conditional,
            compute_dims=[X_DIM, Y_DIM, Z_DIM],
        )

    def __call__(self, in_field: FloatField, out_field: FloatField):
        self.stencil(in_field, out_field)



if __name__ == "__main__":
    I = np.arange(domain[0]*domain[1]*domain[2], dtype=np.float64).reshape(domain)
    O = np.zeros(domain)

    bridge = DaCeGT4Py_Bridge(stcil_fctry)
    bridge(I, O)

    print(f"Input:\n{I}\n")
    print(f"Output:\n{O}\n")

This generate the following SDFG (under dace:cpu)

Image

where the tasklet code carries the conditional:

mask_140660478452496_gen_0: dace.bool_
mask_140660478452496_gen_0 = ((gtIN__in_field > dace.float64(5.0)) and (gtIN__in_field < dace.float64(dace.int64(20))))
if mask_140660478452496_gen_0:
    gtOUT__out_field = gtIN__in_field

We want this conditional to be extracted before the tasklet. See dace.sdfg.SDFG:add_loop code to see how to dynamically create an if condition using the guard mechanism

@FlorianDeconinck
Copy link
Collaborator Author

Known native construct that are folded inside the Tasklet instead of described to DaCe:

  • conditional
  • while loops

GT4Py-DaCe uses an "expansion" mechanism. The stencils are dace.library_nodes while the code is being parsed (orchestration). Then expansion of those nodes is triggered and unrolls the stencil into an SDFG compatible description. It's within this code that the decision to write all code has tasklet is finalized. Refactor might need to impact pre-expansion as well in the DaceIRBuilder

egparedes pushed a commit to GridTools/gt4py that referenced this issue Sep 4, 2024
`MaskInlining` will be a key feature of the improved gt4py / DaCe bridge
as envisioned in GEOS-ESM/NDSL#53, which will
propagate more control flow elements from gt4py to DaCe. We thus want to
make sure if `MaskInlining` itself is posing any problems. We tracked
the git history of this explicit exclude back to PR
#300, which not only added the
npir backend, but also refactored how pipeline stages are defined.
Previous to this PR optimization passes to be specified explicitly and
the PR changed this to an exclude list. It looks like the `MaskInlining`
pass was added after the list of optimization passes was formed and thus
excluded (to keep the list constant). From this history we thus don't
expect `MaskInlining` to break anything.

Co-authored-by: Roman Cattaneo <>
egparedes pushed a commit to GridTools/gt4py that referenced this issue Sep 11, 2024
…on and other cleanups (#1630)

Being new to the codebase and in preparation of the [GT4Py-DaCe bridge
refactor](GEOS-ESM/NDSL#53) I was reading a
lot of code. Some things that caught my eye are summarized in this PR.
Bigger changes include

1. Moving `daceir.py` into the `dace/` backend folder. This follows the
convention of all other backends who have their IR inside the backend
folder. (last commit)
2. Readability improvements in `gtir_to_oir.py` (the "Cleanup visit_*"
commits): In my opinion, there's no point in shortening "statement" to
"stmt". I wouldn't go as far as renaming the classes. For the renamed
local variables, I think this adds a lot to readability.
3. `visit_While` in `gtir_to_oir.py` was having an extra mask statement
around the `while` loop (in case a mask was defined. I inlined the
potential `mask` with the condition of `while` loop.
4. In `gtir_to_oir.py` don't translate every body statement into a
single `MaskStmt`. Instead, create one (or two incase of `else`)
`MaskStmt` with multiple statements in the `body` (which is already
typed as a list of statements).

Tested locally by running the test suite. All tests passing.

Co-authored-by: Roman Cattaneo <>
@romanc
Copy link

romanc commented Nov 5, 2024

Latest work is in https://github.com/romanc/gt4py/romanc/bridge-on-top-of-cleanups, which is re-based on top of the cleanups in PR GridTools/gt4py#1724 and should only contain changes necessary for the new bridge and no cleanups anymore.

The new bridge is currently pending on the following two DaCe issues:

Working around those two, we still (only sometimes) have an issue with writes (in k) inside a k-loop (i.e. with computation FORWARD / BACKWARD). To be investigated further at a later point.

FlorianDeconinck pushed a commit to GridTools/gt4py that referenced this issue Nov 15, 2024
## Description

This PR is split off the work for the new GT4Py - DaCe bridge, which
should allow to expose control flow statements (`if` and `while`) to
DaCe to better use DaCe's analytics capabilities. This PR is concerned
with adding type hints and generally improving code readability. Main
parts are

- `daceir_builder.py`: early returns and renamed variable
- `sdfg_builder.py`: type hints and early returns
- `tasklet_codegen.py`: type hints and early returns

`TaskletCodegen` was given `sdfg_ctx`, which wasn't used. That parameter
was thus removed.

Parent issue: GEOS-ESM/NDSL#53

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
  Assumed to be covered by existing tests.
- [ ] Important design decisions have been documented in the approriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <>
@romanc
Copy link

romanc commented Dec 4, 2024

Latest work is still in https://github.com/romanc/gt4py/romanc/bridge-on-top-of-cleanups, which can be rebased on top of mainline gt4py once GridTools/gt4py#1752 to get a clearer picture.

Main ideas in that branch are the following

  1. Introduce oir.CodeBlock to (recursively) break down oir.HorizontalExecutions into smaller pieces, which are then evaluated in a dcir.Tasklet.
  2. Replace dcir.MaskStmt / dcir.While with dcir.Condition and dcir.WhileLoop (names aren't perfect and could be renamed again)
  3. Add support for if statements and while loops in the state machine of sdfg_builder, i.e. add_condition and add_while on SDFGContext together with visit_Condition and visit_WhileLoop in the visitor (for the new dcir nodes).
  4. Tasklets might define temporaries (scalars and/or tmp fields) in one tasklet and re-use them in others. We thus opt to "export" all temporaries and then "re-import" them again if needed. A DaCe pass called "DeadDataflowElimination" will take care of removing unnecessary "exports".
  5. Memlets will need to be generated per tasklet at a more fine grained level compared to now where memlets are calculated just once per horizontal execution. Commit GridTools/gt4py@19f59a3 outlines the necessary changes. One thing I forgot in this PR is the fact that both, Conditions and WhileLoops generate tasklets on the fly in sdfg_builder. They will also need to calculate their memlets at this point.

FlorianDeconinck added a commit to GridTools/gt4py that referenced this issue Feb 21, 2025
…ble k offsets (#1882)

<!--
Delete this comment and add a proper description of the changes
contained in this PR. The text here will be used in the commit message
since the approved PRs are always squash-merged. The preferred format
is:

- PR Title: <type>[<scope>]: <one-line-summary>

    <type>:
- build: Changes that affect the build system or external dependencies
        - ci: Changes to our CI configuration files and scripts
        - docs: Documentation only changes
        - feat: A new feature
        - fix: A bug fix
        - perf: A code change that improves performance
- refactor: A code change that neither fixes a bug nor adds a feature
        - style: Changes that do not affect the meaning of the code
        - test: Adding missing tests or correcting existing tests

    <scope>: cartesian | eve | next | storage
    # ONLY if changes are limited to a specific subsystem

- PR Description:

Description of the main changes with links to appropriate
issues/documents/references/...
-->

## Description

We figured that DaCe backends are currently missing support for casting
in variable k offsets. This PR

- adds a codegen test with a cast in a variable k offset
- adds a node validator for the DaCe backends complaining about missing
for support.
- adds an `xfail` test for the node validator

This should be fixed down the road. Here's the issue
#1881 to keep track.

The PR also has two smaller and unrelated commits

- 741c448 increases test coverage with
another codgen test that has a couple of read after write access
patterns which were breaking the "new bridge" (see
GEOS-ESM/NDSL#53).
- e98ddc5 just forwards all keyword
arguments when visiting offsets. I don't think this was a problem until
now, but it's best practice to forward everything.

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
Co-authored-by: Florian Deconinck <deconinck.florian@gmail.com>
romanc added a commit to GridTools/gt4py that referenced this issue Mar 4, 2025
## Description

In preparation for PR #1894, pull
out some refactors and cleanups. Notable in this PR are the changes to
`src/gt4py/cartesian/gtc/dace/oir_to_dace.py`

- visit `stencil.vertical_loops` directly instead of calling
`generic_visit` (simplification since there's nothing else to visit)
- rename library nodes from `f"{sdfg_name}_computation_{id(node)}"` to
`f"{sdfg_name}_vloop_{counter}_{node.loop_order}_{id(node)}"`. This adds
a bit more information (because `sdfg_name` is the same for all library
nodes) and thus simplifies debugging workflows.

Related issue: GEOS-ESM/NDSL#53

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
  covered by existing test suite
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
  N/A

---------

Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
romanc added a commit to GridTools/gt4py that referenced this issue Mar 18, 2025
## Description

This PR refactors the GT4Py/DaCe bridge to expose control flow elements
(`if` statements and `while` loops) to DaCe. Previously, the whole
contents of a vertical loop was put in one big Tasklet. With this PR,
that Tasklet is broken apart in case control flow is found such that
control flow is visible in the SDFG. This allows DaCe to better analyze
code and will be crucial in future (within the current milestone)
performance optimization work.

The main ideas in this PR are the following

1. Introduce `oir.CodeBlock` to recursively break down
`oir.HorizontalExecution`s into smaller pieces that are either code flow
or evaluated in (smaller) Tasklets.
2. Introduce `dcir.Condition`and `dcir.WhileLoop` to represent if
statements and while loops that are translated into SDFG states. We keep
the current `dcir.MaskStmt` / `dcir.While` for if statements / while
loops inside horizontal regions, which aren't yet exposed to DaCe (see
#1900).
3. Add support for `if` statements and `while` loops in the state
machine of `sdfg_builder.py`
4. We are breaking up vertical loops inside stencils in multiple
Tasklets. It might thus happen that we write a "local" scalar in one
Tasklet and read it in another Tasklet (downstream). We thus create
output connectors for all scalar writes in a Tasklet and input
connectors for all reads (unless previously written in the same
Tasklet).
5. Memlets can't be generated per horizontal execution anymore and need
to be more fine grained. `TaskletAccessInfoCollector` does this work for
us, duplicating some logic in `AccessInfoCollector`. A refactor task has
been logged to fix/re-evaluate this later.

This PR depends on the following (downstream) DaCe fixes

- spcl/dace#1954
- spcl/dace#1955

which have been merged by now.

Follow-up issues

- unrelated changes have been moved to
#1895
- #1896
- #1898
- #1900

Related issue: GEOS-ESM/NDSL#53

## Requirements

- [x] All fixes and/or new features come with corresponding tests.
Added new tests and increased coverage of horizontal regions with PRs
#1807 and #1851.
- [ ] Important design decisions have been documented in the appropriate
ADR inside the [docs/development/ADRs/](docs/development/ADRs/Index.md)
folder.
Docs are [in our knowledge
base](https://geos-esm.github.io/SMT-Nebulae/technical/backend/dace-bridge/)
for now. Will be ported.

---------

Co-authored-by: Roman Cattaneo <>
Co-authored-by: Roman Cattaneo <1116746+romanc@users.noreply.github.com>
@romanc
Copy link

romanc commented Mar 24, 2025

Main chunk of this work was merged with PR GridTools/gt4py#1894. The PR pull this version of GT4Py into NDSL is NOAA-GFDL#119, which is expected to be merged in the coming days.

Follow-up issues include

Given that, let's close this issue and treat potential fall-out (nothing big is expected since pace and pyFV3 test suites both pass) in follow-up issues.

@romanc romanc closed this as completed Mar 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants