-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GraphBolt][CUDA] Pipelined sampling optimization #7039
Merged
frozenbugs
merged 72 commits into
dmlc:master
from
mfbalin:gb_cuda_pipelined_sampling_optimization
Feb 5, 2024
Merged
[GraphBolt][CUDA] Pipelined sampling optimization #7039
frozenbugs
merged 72 commits into
dmlc:master
from
mfbalin:gb_cuda_pipelined_sampling_optimization
Feb 5, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
To trigger regression tests:
|
e8ae274
to
7d46316
Compare
f3dd0b3
to
ba18b7f
Compare
frozenbugs
reviewed
Feb 4, 2024
frozenbugs
reviewed
Feb 4, 2024
Overall LGTM, questions on python/dgl/graphbolt/impl/neighbor_sampler.py. |
frozenbugs
approved these changes
Feb 4, 2024
@Rhett-Ying Does the CI have a problem? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
We overlap the UVA graph accesses with the rest of the computations by launching them in the UVA stream. We use a separate thread becase the
index_select_csc
is blocking, unlikeindex_select
, which is nonblocking. I suspect the GIL in python is making multithreading inefficient a little bit, so more performance can be unlocked in the future after python with no GIL is available, see https://peps.python.org/pep-0703/#gpu-heavy-workloads-require-multi-core-processingI am getting %10-%20 end-to-end speedup for
sample_layer_neighbor
.sample_neighbor
is currently not getting much speedup because it does not need to access the whole indices tensor, which we are fetching through this optimization. Will need to specialize forsample_neighbor
later.This optimization will be the core of us providing the users a way to define their own custom samplers using
dgl.sparse
. My next PR will show how to implement LADIES while still taking advantage of the GraphBolt optimizations.Checklist
Please feel free to remove inapplicable items for your PR.
Changes