Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TL/UCP: transition to barrier for sync for onesided a2a #1096

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wfaderhold21
Copy link
Collaborator

@wfaderhold21 wfaderhold21 commented Mar 17, 2025

What

Switch from using pSync array with atomic increment to TL/UCP barrier for synchronization

Why ?

There are multiple reason to switch to this: (1) knomial barrier scales better and has better performance than atomic increment and (2) there can be instances where processes leave the alltoall collective before remote writes have been completed. In addition, when PR #1070 is merged, this allows usage of this algorithm with memory handles.

@swx-jenkins3
Copy link

Can one of the admins verify this patch?

@janjust janjust force-pushed the topic/a2a-barrier branch from 96449db to eaa8091 Compare March 26, 2025 17:06
@janjust
Copy link
Collaborator

janjust commented Mar 26, 2025

@wfaderhold21 didn't we say we were also going to change the test to reflect oshmem behavior?
edit: nvm, I just realized it's the other PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants