Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing out restart files in stretched-grid simulations #481

Open
mcdonh1718 opened this issue Mar 12, 2025 · 2 comments
Open

Writing out restart files in stretched-grid simulations #481

mcdonh1718 opened this issue Mar 12, 2025 · 2 comments
Assignees
Labels
category: Bug Something isn't working topic: Restart Files Related to GCHP restart files topic: Stretched Grid Specific to stretched grid simulation

Comments

@mcdonh1718
Copy link

Your name

Helena McDonald

Your affiliation

MIT

What happened? What did you expect to happen?

I'm running a stretched-grid simulation with base resolution C180 and a stretch factor of 30 (so approx 0.02 deg resolution) centered on Florida. Generating the initial restart file goes fine, though I did have to make sure I gave the longitude coordinates in [0,360] rather than [-180,180] space. When running a simulation, the input file works fine, but I can't string together simulations because the output restart file stores the longitude coordinate just slightly incorrectly, and so throws a 'factories not equal' error. From the initial restart file:

Image

From the simulation write out:
Image

This isn't challenging to fix myself by opening and editing the netcdf (and I can use the restart files after this edit), but it is irritating.

Similarly, I'm having issues where some simulations don't write out restart files at all; the model will be done generating collection files and the log file writes out the component time use breakdown, but it gets stuck converting the placeholder gcchem-internal-checkpoint file to a restart file. See logfile:

Image

It results in this array of errors in my slurm log, but never kills the run; my cluster kills it by timing out.

<img width="609" alt="Image" src="https://github.com/user-attachments/assets/a1d97073-836f-436c-a5c5-e9e2a72aea7b"

What are the steps to reproduce the bug?

Generated a restart file with --stretched-grid --stretch-factor 30 --target-latitude 28.56 --target-longitude 279.56. Enabled NEI2016_MONMEAN in HEMCO_Config. Ran a 3hr simulation starting 07 01 1200z using the restart file I generated using gridspec, ESMF.

Please attach any relevant configuration and log files.

config:
ExtData.txt
gchp-20190701_1200z-log.txt
HEMCO_Config.txt
HISTORY.txt
setCommonRunSettings.txt

slurm files:
gchp_run.txt
slurm-488693-out.txt

What GCHP version were you using?

14.3.1

What environment were you running GCHP on?

Local cluster

What compiler and version were you using?

gcc 9.4.0

What MPI library and version were you using?

OpenMPI 4.0.3

Will you be addressing this bug yourself?

No

Additional information

No response

@mcdonh1718 mcdonh1718 added the category: Bug Something isn't working label Mar 12, 2025
@lizziel lizziel self-assigned this Mar 17, 2025
@lizziel
Copy link
Contributor

lizziel commented Mar 18, 2025

Hi @mcdonh1718, this is a bug that was fixed in 14.4.0. You can either update versions or manually update your MAPL version by merge in main (let me know if you need help figuring out how to do this). You can see the fix in the linked PR on the issue report at #409.

@lizziel
Copy link
Contributor

lizziel commented Mar 18, 2025

Regarding the output restart file write issue, is there a pattern to when it takes a long time and hangs?

@lizziel lizziel added topic: Stretched Grid Specific to stretched grid simulation topic: Restart Files Related to GCHP restart files labels Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working topic: Restart Files Related to GCHP restart files topic: Stretched Grid Specific to stretched grid simulation
Projects
None yet
Development

No branches or pull requests

2 participants