Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCHP crashes immediately in MAPL_CapGridComp.F90 #479

Open
Xinying331 opened this issue Mar 3, 2025 · 1 comment
Open

GCHP crashes immediately in MAPL_CapGridComp.F90 #479

Xinying331 opened this issue Mar 3, 2025 · 1 comment
Assignees
Labels
category: Bug Something isn't working

Comments

@Xinying331
Copy link

Your name

Xinying Wang

Your affiliation

Rutgers University

What happened? What did you expect to happen?

I'm running default collection with 6 cores 1 node GCHP simulation and crashed immediately.
Failed run output:

Image

What are the steps to reproduce the bug?

The content and location of the error message are exactly the same as those in #8 and #443 (comment) . I followed the suggestion in the tickeyts and re-install and compile ESMF package and make sure ESMF_COMM points to openmpi instead of mpiuni. However, the error persists.

Image

Any insights or comments would be greatly appreciated. Thank you so much for you help and time!

Best,
Xinying

Please attach any relevant configuration and log files.

gchp-GNU.env.txt
run-20250303_1626.log.txt
run.sh.txt

I'm also attching the compile and install log for ESMF:
compile.log
install.log

What GCHP version were you using?

14.4.3

What environment were you running GCHP on?

Local cluster

What compiler and version were you using?

gcc-12.1

What MPI library and version were you using?

openmpi-4.1.6

Will you be addressing this bug yourself?

Yes, but I will need some help

Additional information

No response

@Xinying331 Xinying331 added the category: Bug Something isn't working label Mar 3, 2025
@lizziel lizziel self-assigned this Mar 17, 2025
@lizziel
Copy link
Contributor

lizziel commented Mar 18, 2025

Hi @Xinying331, the model is crashing at this section of MAPL based on the traceback message with filename and line number:

    call ESMF_VMGet(cap%vm, petcount=npes, mpicommunicator=comm, rc=status)
    _VERIFY(status)
     _ASSERT(CoresPerNode <= npes, 'something impossible happened')

One option is to print CoresPerNode and npes at this point in the code, right before the _Assert call. Before doing that check what you put for CoresPerNode in setCommonRunSettings.sh and compare against your run script. Make sure they are consistent. Also check to see if there are any messages in log file allPEs.log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants