Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-17094 test: auto storage config for daos_server_restart #16050

Merged
merged 1 commit into from
Mar 6, 2025

Conversation

kccain
Copy link
Contributor

@kccain kccain commented Mar 6, 2025

With this change, a test with a hard-coded tier 0 scm class: ram configuration is replaced with storage: auto (an ftest abstraction). This will steer testing to use the correct class: dcpm on functional hardware clusters with PMEM. Also, the pool size is increased to avoid pool create failures that would happen in the new configuration, i.e., avoiding:
"requested SCM capacity is too small".

Before the change, scm class: ram was used with PMEM, and led to Argobots ULT stack overflows and segmentation faults observed when executing in its mem pool allocation logic.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: DaosServerTest
Test-repeat: 7

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Copy link

github-actions bot commented Mar 6, 2025

Ticket title is 'server/daos_server_restart.py:DaosServerTest.test_engine_restart - segmentation fault in ABTI_mem_pool_alloc'
Status is 'In Progress'
Labels: 'ci_2.6_daily,ci_master_daily,daily_test,scrubbed_2.8'
https://daosio.atlassian.net/browse/DAOS-17094

With this change, a test with a hard-coded tier 0 scm
class: ram configuration is replaced with storage: auto
(an ftest abstraction). This will steer testing to use the
correct class: dcpm on functional hardware clusters with PMEM.
Also, the pool size is increased to avoid pool create failures
that would happen in the new configuration, i.e., avoiding:
 "requested SCM capacity is too small".

Before the change, scm class: ram was used with PMEM,
and led to Argobots ULT stack overflows and segmentation faults
observed when executing in its mem pool allocation logic.

Skip-unit-tests: true
Skip-fault-injection-test: true
Skip-func-hw-test-medium-md-on-ssd: false
Test-tag: DaosServerTest
Test-repeat: 7

Signed-off-by: Kenneth Cain <kenneth.cain@hpe.com>
@kccain kccain force-pushed the kccain/daos_17094_testfix_rel2p6 branch from b84f7e2 to 181ebfe Compare March 6, 2025 16:43
@kccain kccain added the unclean-cherry-pick Indicates that a cherry-pick had merge conflicts that needed resolving. label Mar 6, 2025
@kccain kccain added the forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. label Mar 6, 2025
@kccain kccain requested a review from daltonbohning March 6, 2025 20:43
@kccain kccain requested a review from a team March 6, 2025 21:16
@kccain kccain marked this pull request as ready for review March 6, 2025 21:17
@kccain kccain requested review from a team as code owners March 6, 2025 21:17
@phender phender merged commit 1f73816 into release/2.6 Mar 6, 2025
50 checks passed
@phender phender deleted the kccain/daos_17094_testfix_rel2p6 branch March 6, 2025 21:18
@mjmac mjmac mentioned this pull request Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
forced-landing The PR has known failures or has intentionally reduced testing, but should still be landed. unclean-cherry-pick Indicates that a cherry-pick had merge conflicts that needed resolving.
Development

Successfully merging this pull request may close these issues.

3 participants