Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpointer tests to CircleCI #336

Merged
merged 53 commits into from
Oct 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
beefed4
fix variable mismatch
elynnwu Aug 30, 2022
7ad0486
reinitialize dycore to reset temp storages
elynnwu Sep 1, 2022
e873534
Merge branch 'main' into fix/checkpointer-validation
elynnwu Sep 1, 2022
8119251
Merge branch 'main' into fix/checkpointer-validation
elynnwu Sep 8, 2022
6788d77
fix var name
elynnwu Sep 9, 2022
70c044e
fvdynamics-out tracers in compute domain only
elynnwu Sep 9, 2022
deef15c
update data version
elynnwu Sep 9, 2022
2c0fbdc
revert comment
elynnwu Sep 12, 2022
f91c22e
fix driver subset
elynnwu Sep 12, 2022
534a887
clean up translate test
elynnwu Sep 13, 2022
ab5992f
Merge branch 'main' into fix/checkpointer-validation
elynnwu Sep 13, 2022
9154218
add comment
elynnwu Sep 14, 2022
5842e3e
Merge branch 'main' into current
elynnwu Sep 14, 2022
3a33976
Bump to 8.1.3
elynnwu Sep 15, 2022
cd8907b
add physics and driver savepoint tests
elynnwu Sep 16, 2022
36573ac
specify experiement name
elynnwu Sep 16, 2022
e0beac8
add python version to caching
elynnwu Sep 16, 2022
7dbb1b9
separate test, increase threshold for circleci
elynnwu Sep 16, 2022
ef9a475
update test names
elynnwu Sep 16, 2022
889e19f
update config
elynnwu Sep 16, 2022
04cc70a
add first attempt at unified savepoints target
mcgibbon Sep 16, 2022
52d89ad
Merge branch 'feature/add-circleci-physics-driver-test' of github.com…
mcgibbon Sep 16, 2022
89df5ce
attempt to put most of the test logic in a command
mcgibbon Sep 16, 2022
e36a10a
fix config to validate
mcgibbon Sep 16, 2022
343a1c2
fix test_driver, lint, and dycore_savepoints_mpi
mcgibbon Sep 16, 2022
74d0e1d
add remaining circleci plans
mcgibbon Sep 16, 2022
1941476
activate venv correctly in driver test
mcgibbon Sep 16, 2022
0ead115
use independent keys for gt caching
mcgibbon Sep 16, 2022
c6011ca
downgrade serial savepoint tests to medium resource type
mcgibbon Sep 16, 2022
3e9d080
add h5netcdf to requirement
elynnwu Sep 19, 2022
42cfc9b
specify xarray engine
elynnwu Sep 19, 2022
d877737
fix bug in checkpointer test, add more savepoints, update thresholds
mcgibbon Sep 19, 2022
d4949e4
Merge branch 'feature/add-circleci-physics-driver-test' into feature/…
mcgibbon Sep 19, 2022
2d207f0
add circleci execution of checkpoint tests, label as savepoint tests,…
mcgibbon Sep 19, 2022
859d2a5
fix config validation
mcgibbon Sep 19, 2022
e743e62
define google application credentials env var for top level savepoint…
mcgibbon Sep 19, 2022
915e6e3
use machine worker for test_savepoints
mcgibbon Sep 19, 2022
516f60f
Merge branch 'main' into feature/checkpointer_circleci
mcgibbon Sep 19, 2022
c79eec4
run the 54rank test on an xlarge worker
mcgibbon Sep 19, 2022
f96cb35
delete duplicated test definitions
mcgibbon Sep 19, 2022
e0e6e79
fix path for savepoint tests
mcgibbon Sep 19, 2022
6f88f0b
move directory to match moved path in last commit
mcgibbon Sep 19, 2022
4becd03
remove driver 54rank compiled test from CircleCI
mcgibbon Sep 20, 2022
8589707
increase driver_savepoints_mpi resource type to xlarge
mcgibbon Sep 20, 2022
19fbcf1
Merge branch 'main' into feature/checkpointer_circleci
mcgibbon Sep 20, 2022
fcecc1a
checkpointer test works with remapping and tracer advection
mcgibbon Sep 21, 2022
92bb68e
Merge branch 'main' into feature/checkpointer_circleci
mcgibbon Sep 23, 2022
bdda2bf
Merge branch 'main' into feature/checkpointer_circleci
elynnwu Sep 26, 2022
2b01cfb
Merge branch 'main' into feature/checkpointer_circleci
elynnwu Oct 4, 2022
37d7430
delete unused variable in remapping-In
elynnwu Oct 7, 2022
0e99d11
fix orchestration test
elynnwu Oct 18, 2022
ef9efc7
fix checkpointer with orch
elynnwu Oct 18, 2022
4405849
lint
elynnwu Oct 18, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 23 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,7 @@ commands:
description: "make target to run"
type: enum
enum:
- "test_savepoint"
- "savepoint_tests"
- "savepoint_tests_mpi"
- "physics_savepoint_tests"
Expand Down Expand Up @@ -231,7 +232,7 @@ jobs:
driver_savepoints_mpi:
machine:
image: ubuntu-2004:202111-02
resource_class: large
resource_class: xlarge
parameters:
backend:
description: "gt4py backend"
Expand Down Expand Up @@ -402,7 +403,7 @@ jobs:
test_mpi_54rank:
docker:
- image: cimg/python:3.8
resource_class: large
resource_class: xlarge
working_directory: ~/repo
steps:
- checkout
Expand All @@ -416,6 +417,20 @@ jobs:
. venv/bin/activate
mpirun -n 54 --oversubscribe --mca btl_vader_single_copy_mechanism none python3 -m mpi4py -m pytest tests/mpi_54rank

test_savepoint:
machine:
image: ubuntu-2004:202111-02
resource_class: large
environment:
GOOGLE_APPLICATION_CREDENTIALS: /tmp/key.json
steps:
- checkout
- make_savepoints:
backend: numpy
experiment: c12_6ranks_standard
target: test_savepoint
num_ranks: 6

test_notebooks:
docker:
- image: gcr.io/vcm-ml/pace_notebook_examples
Expand Down Expand Up @@ -528,6 +543,12 @@ workflows:
filters:
tags:
only: /^v.*/
- test_savepoint:
context:
- GCLOUD_ENCODED_KEY
filters:
tags:
only: /^v.*/
- test_util:
filters:
tags:
Expand Down
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ test_util:
$(MAKE) -C util test; \
fi

savepoint_tests: build
savepoint_tests: build ## dycore-only savepoint tests
TARGET=dycore $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && pytest --data_path=$(EXPERIMENT_DATA_RUN)/dycore/ $(TEST_ARGS) $(FV3CORE_THRESH_ARGS) $(PACE_PATH)/fv3core/tests/savepoint"

Expand All @@ -130,6 +130,10 @@ physics_savepoint_tests_mpi: build
test_main: build
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && pytest $(TEST_ARGS) $(PACE_PATH)/tests/main"

test_savepoint: ## top level savepoint tests
TARGET=dycore $(MAKE) get_test_data
$(CONTAINER_CMD) $(CONTAINER_FLAGS) bash -c "$(SAVEPOINT_SETUP) && cd $(PACE_PATH) && $(MPIRUN_CALL) python -m pytest --data_path=$(EXPERIMENT_DATA_RUN)/dycore/ $(TEST_ARGS) $(PACE_PATH)/tests/savepoint"

test_mpi_54rank:
mpirun -n 54 $(MPIRUN_ARGS) python3 -m mpi4py -m pytest tests/mpi_54rank

Expand Down
2 changes: 1 addition & 1 deletion driver/examples/configs/baroclinic_c12_orch_cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ performance_config:
nx_tile: 12
nz: 79
dt_atmos: 225
minutes: 1
minutes: 5
layout:
- 1
- 1
Expand Down
128 changes: 126 additions & 2 deletions fv3core/pace/fv3core/stencils/fv_dynamics.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
from pace.fv3core.stencils.neg_adj3 import AdjustNegativeTracerMixingRatio
from pace.fv3core.stencils.remapping import LagrangianToEulerian
from pace.stencils.c2l_ord import CubedToLatLon
from pace.util import Timer
from pace.util import X_DIM, Y_DIM, Z_INTERFACE_DIM, Timer
from pace.util.grid import DampingCoefficients, GridData
from pace.util.mpi import MPI
from pace.util.quantity import Quantity
Expand Down Expand Up @@ -179,6 +179,34 @@ def __init__(
dace_compiletime_args=["state", "tag"],
)

orchestrate(
obj=self,
config=stencil_factory.config.dace_config,
method_to_orchestrate="_checkpoint_remapping_in",
dace_compiletime_args=[
"state",
],
)

orchestrate(
obj=self,
config=stencil_factory.config.dace_config,
method_to_orchestrate="_checkpoint_remapping_out",
dace_compiletime_args=["state"],
)

orchestrate(
obj=self,
config=stencil_factory.config.dace_config,
method_to_orchestrate="_checkpoint_tracer_advection_in",
dace_compiletime_args=["state"],
)
orchestrate(
obj=self,
config=stencil_factory.config.dace_config,
method_to_orchestrate="_checkpoint_tracer_advection_out",
dace_compiletime_args=["state"],
)
# nested and stretched_grid are options in the Fortran code which we
# have not implemented, so they are hard-coded here.
self.call_checkpointer = checkpointer is not None
Expand Down Expand Up @@ -234,7 +262,11 @@ def __init__(

# Build advection stencils
self.tracer_advection = tracer_2d_1l.TracerAdvection(
stencil_factory, tracer_transport, self.grid_data, comm, self.tracers
stencil_factory,
tracer_transport,
self.grid_data,
comm,
self.tracers,
)
self._ak = grid_data.ak
self._bk = grid_data.bk
Expand Down Expand Up @@ -316,6 +348,7 @@ def __init__(
NQ,
self._pfull,
tracers=self.tracers,
checkpointer=checkpointer,
)

full_xyz_spec = grid_indexing.get_quantity_halo_spec(
Expand Down Expand Up @@ -353,6 +386,93 @@ def _checkpoint_fvdynamics(self, state: DycoreState, tag: str):
qvapor=state.qvapor,
)

def _checkpoint_remapping_in(
self,
state: DycoreState,
):
if self.call_checkpointer:
self.checkpointer(
"Remapping-In",
pt=state.pt,
delp=state.delp,
delz=state.delz,
peln=state.peln.transpose(
[X_DIM, Z_INTERFACE_DIM, Y_DIM]
), # [x, z, y] fortran data
u=state.u,
v=state.v,
w=state.w,
ua=state.ua,
va=state.va,
cappa=self._cappa,
pkz=state.pkz,
pk=state.pk,
pe=state.pe.transpose(
[X_DIM, Z_INTERFACE_DIM, Y_DIM]
), # [x, z, y] fortran data
phis=state.phis,
te_2d=self._te0_2d,
ps=state.ps,
wsd=self._wsd,
omga=state.omga,
dp1=self._dp1,
)

def _checkpoint_remapping_out(
self,
state: DycoreState,
):
if self.call_checkpointer:
self.checkpointer(
"Remapping-Out",
pt=state.pt,
delp=state.delp,
delz=state.delz,
peln=state.peln.transpose(
[X_DIM, Z_INTERFACE_DIM, Y_DIM]
), # [x, z, y] fortran data
u=state.u,
v=state.v,
w=state.w,
cappa=self._cappa,
pkz=state.pkz,
pk=state.pk,
pe=state.pe.transpose(
[X_DIM, Z_INTERFACE_DIM, Y_DIM]
), # [x, z, y] fortran data
te_2d=self._te0_2d,
omga=state.omga,
dp1=self._dp1,
)

def _checkpoint_tracer_advection_in(
self,
state: DycoreState,
):
if self.call_checkpointer:
self.checkpointer(
"Tracer2D1L-In",
dp1=self._dp1,
mfxd=state.mfxd,
mfyd=state.mfyd,
cxd=state.cxd,
cyd=state.cyd,
)

def _checkpoint_tracer_advection_out(
self,
state: DycoreState,
):
if self.call_checkpointer:
self.checkpointer(
"Tracer2D1L-Out",
dp1=self._dp1,
mfxd=state.mfxd,
mfyd=state.mfyd,
cxd=state.cxd,
cyd=state.cyd,
)

def step_dynamics(
self,
state: DycoreState,
Expand Down Expand Up @@ -449,6 +569,7 @@ def _compute(self, state: DycoreState, timer: pace.util.Timer):
if __debug__:
log_on_rank_0("Remapping")
with timer.clock("Remapping"):
self._checkpoint_remapping_in(state)
self._lagrangian_to_eulerian_obj(
self.tracer_storages,
state.pt,
Expand Down Expand Up @@ -483,6 +604,7 @@ def _compute(self, state: DycoreState, timer: pace.util.Timer):
self._timestep / self._k_split,
self._timestep,
)
self._checkpoint_remapping_out(state)
if last_step:
da_min: float = self._get_da_min()
self.post_remap(
Expand Down Expand Up @@ -519,6 +641,7 @@ def _dyn(
if __debug__:
log_on_rank_0("TracerAdvection")
with timer.clock("TracerAdvection"):
self._checkpoint_tracer_advection_in(state)
self.tracer_advection(
tracers,
self._dp1,
Expand All @@ -528,6 +651,7 @@ def _dyn(
state.cyd,
self._timestep / self._k_split,
)
self._checkpoint_tracer_advection_out(state)

def post_remap(
self,
Expand Down
8 changes: 7 additions & 1 deletion fv3core/pace/fv3core/stencils/remapping.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Dict
from typing import Dict, Optional

from gt4py.gtscript import (
__INLINED,
Expand All @@ -15,6 +15,7 @@

import pace.dsl.gt4py_utils as utils
import pace.fv3core.stencils.moist_cv as moist_cv
import pace.util
from pace.dsl.dace.orchestration import orchestrate
from pace.dsl.stencil import StencilFactory
from pace.dsl.typing import FloatField, FloatFieldIJ, FloatFieldK
Expand Down Expand Up @@ -286,12 +287,17 @@ def __init__(
nq,
pfull,
tracers: Dict[str, Quantity],
checkpointer: Optional[pace.util.Checkpointer] = None,
):
orchestrate(
obj=self,
config=stencil_factory.config.dace_config,
dace_compiletime_args=["tracers"],
)
self._checkpointer = checkpointer
# this is only computed in init because Dace does not yet support
# this operation
self._call_checkpointer = checkpointer is not None
grid_indexing = stencil_factory.grid_indexing
if config.kord_tm >= 0:
raise NotImplementedError("map ppm, untested mode where kord_tm >= 0")
Expand Down
Loading