Skip to content

Multi-node / multi core tests + python testing. #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fdmalone opened this issue May 9, 2020 · 9 comments
Open

Multi-node / multi core tests + python testing. #7

fdmalone opened this issue May 9, 2020 · 9 comments
Assignees

Comments

@fdmalone
Copy link

fdmalone commented May 9, 2020

This might be easier to setup in house given the lack of access we have and that the CI machines only run on single nodes.

ncores > 1 cpu is suspicious.

@fdmalone fdmalone self-assigned this May 9, 2020
@mmorale3
Copy link
Owner

This is very easy to break. Without testing we are flying blind.

@fdmalone
Copy link
Author

Regarding this, it seems LLNL allows for CI if the repo is mirrored to gitlab. Then we can trigger LC builds with a comment (like they do at oakridge). Figuring this out seems like potentially the most consistent option as we could add multi-node tests to the main qmcpack repo which would live forever. These would only have to be triggered for our PRs I imagine.

Alternatively I can cook something up which we can run manually ourselves. Possibly through a cron job.

@mmorale3
Copy link
Owner

We should at least start with something in house at LLNL.
We can modify your build scripts to run the unit tests with 2 nodes.

@fdmalone
Copy link
Author

Isn't the distribution over cores/nodes controlled by the input file?

@mmorale3
Copy link
Owner

Some unit tests are setup to run with nnodes>1 if they are run with more than 1 node.
Look at the unit test wfn_fac_distributed for example. There is also another one in Propagator.
Passing these unit tests catch most of the issues.
We can extend these tests or setup longer runs later on.
Getting these tested regularly would be a big first step.

@fdmalone
Copy link
Author

Ok. I need to add a cmake function and split some of the files.

@mmorale3
Copy link
Owner

Actually, all the unit tests will run with multiple nodes.
They'll just repeat tests serially if they are not distributed tests.
You'll just get a lot of repeated chatter.

@fdmalone
Copy link
Author

fdmalone commented Jun 9, 2020

I've added multi-node/core testing script in /usr/gapps/afqmc/codes/testing, with benchmark data in /usr/workspace/afqmc/testing. I'll set these up to submit on lassen/quartz maybe twice a week or nightly. Currently complex double (kpoint) appears to be failing on lassen.

@fdmalone
Copy link
Author

They will also track the timing of the runs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants