Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-16908 object: add client-side target compound rpc pinging on update retry #16093

Draft
wants to merge 9 commits into
base: feature/firewall
Choose a base branch
from

Conversation

karthjyojay
Copy link
Contributor

When the transaction's return code is equal to DER_RECONNECT, that means one of the packed operations resulted in the server not being able to establish a connection to a client. When this happens, we should have the client ping the server. In this case, we will retry all targets across all update operations in the compound RPC because we don't know which subrequest resulted in the DER_RECONNECT.

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

This change adds logic which pings all targets that are involved in the object retry.
When the retry function gets an error signifying that the server could not reach clients,
the update will ping the relevant targets to establish a connection so the update can
retry.

Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Previously, I was getting an error in the unit test saying that HG_Finalize
could not work since the bulk handle was not being freed. This is because we
were incorrectly returning early.

Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Signed-off-by: Yokesh Jayakumar <karthj@google.com>
Copy link

Ticket title is 'Modify DAOS to use new mercury changes to implement improved firewall handling'
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-16908

@daosbuild1
Copy link
Collaborator

Test stage Build on Leap 15.5 with Intel-C and TARGET_PREFIX completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/1/execution/node/307/log

@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/1/execution/node/292/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/1/execution/node/369/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/1/execution/node/261/log

@karthjyojay karthjyojay force-pushed the dev/karthj/firewall-simplification-compound-rpc branch from f0f8003 to 25bb7eb Compare March 13, 2025 20:46
@daosbuild1
Copy link
Collaborator

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/2/execution/node/323/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/2/execution/node/320/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/2/execution/node/319/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/2/execution/node/359/log

When the transaction's return code is equal to DER_RECONNECT, that means one of the packed operations
resulted in the server not being able to establish a connection to a client. When this happens,
we should have the client ping the server. In this case, we will retry all targets across all update
operations in the compound RPC because we don't know which subrequest resulted in the DER_RECONNECT.

Signed-off-by: Yokesh Jayakumar <karthj@google.com>
@karthjyojay karthjyojay force-pushed the dev/karthj/firewall-simplification-compound-rpc branch from 25bb7eb to 31d0dc9 Compare March 13, 2025 21:05
@karthjyojay karthjyojay force-pushed the dev/karthj/firewall-simplification branch 3 times, most recently from 60cee9a to 3493494 Compare March 18, 2025 03:30
Base automatically changed from dev/karthj/firewall-simplification to feature/firewall March 19, 2025 19:44
@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/4/execution/node/306/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on Leap 15.5 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/4/execution/node/403/log

@daosbuild1
Copy link
Collaborator

Test stage Build RPM on EL 9 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/4/execution/node/390/log

@daosbuild1
Copy link
Collaborator

Test stage Build DEB on Ubuntu 20.04 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16093/4/execution/node/443/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants