Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-17161 rebuild: cannot retry infinitely for dsc_obj_fetch timeout #16083

Merged
merged 1 commit into from
Mar 19, 2025

Conversation

liuxuezhao
Copy link
Contributor

Check "tls->mpt_fini" to abort retry to avoid hang.

Test-tag: test_ec_online_rebuild_mdtest
Test-repeat: 3

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

Sorry, something went wrong.

@liuxuezhao liuxuezhao requested review from a team as code owners March 12, 2025 10:10
Copy link

Ticket title is 'erasurecode/online_rebuild_mdtest.py:EcodOnlineRebuildMdtest.test_ec_online_rebuild_mdtest - test timeout caused by pool query Pool child isn't found'
Status is 'In Progress'
Labels: 'ci_2.6_daily,daily_test,scrubbed_2.8'
https://daosio.atlassian.net/browse/DAOS-17161

Check "tls->mpt_fini" to abort retry to avoid hang.

Signed-off-by: Xuezhao Liu <xuezhao.liu@hpe.com>
@liuxuezhao liuxuezhao force-pushed the lxz/rb_fetch_timeout branch from 8d40740 to cb5886e Compare March 14, 2025 06:26
@liuxuezhao liuxuezhao requested review from liw, NiuYawei and kccain and removed request for a team March 17, 2025 02:59
@liuxuezhao liuxuezhao requested a review from a team March 18, 2025 09:56
@liuxuezhao
Copy link
Contributor Author

@daos-stack/daos-gatekeeper the "[Trivy scan]" failure is unrelated with this PR, all other tests passed.

@daltonbohning daltonbohning merged commit 61dcef1 into master Mar 19, 2025
56 of 63 checks passed
@daltonbohning daltonbohning deleted the lxz/rb_fetch_timeout branch March 19, 2025 16:09
liuxuezhao added a commit that referenced this pull request Mar 20, 2025
Check "tls->mpt_fini" to abort retry to avoid hang (#16083)

several other backport commits -
caf1a25 - DAOS-17142 rebuild: exit rebuild_tgt_status_check_ult when RPT stale (#15994)
9712130 - DAOS-15847 rebuild: refine one case's err handling (#15943)
c86aa7b - DAOS-16170 cart: refine corpc fail handling for CRT_RPC_FLAG_CO_FAILOUT (#15572)

Signed-off-by: Xuezhao Liu <xuezhao.liu@hpe.com>
liuxuezhao added a commit that referenced this pull request Mar 20, 2025
Check "tls->mpt_fini" to abort retry to avoid hang (#16083)

several other backport commits -
caf1a25 - rebuild: exit rebuild_tgt_status_check_ult when RPT stale (#15994)
9712130 - rebuild: refine one case's err handling (#15943)
c86aa7b - cart: refine corpc fail handling for CRT_RPC_FLAG_CO_FAILOUT (#15572)

Signed-off-by: Xuezhao Liu <xuezhao.liu@hpe.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants