Skip to content

Get rid of unneeded version checks #2765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

kshyatt
Copy link
Member

@kshyatt kshyatt commented Apr 29, 2025

According to the release notes, CUDA 11.4 ships with CUBLAS 11.6.5, so these checks for version() < 11 aren't reachable.

@kshyatt kshyatt requested a review from maleadt April 29, 2025 16:59
@kshyatt kshyatt added the cuda libraries Stuff about CUDA library wrappers. label Apr 29, 2025
@maleadt maleadt force-pushed the ksh/cublas_version branch from ac8f3e1 to ca96e09 Compare April 30, 2025 07:23
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Benchmark suite Current: ca96e09 Previous: 5645ddc Ratio
latency/precompile 42854033083 ns 48881022269.5 ns 0.88
latency/ttfp 7148038767 ns 7292511536 ns 0.98
latency/import 3387489703 ns 3494844990 ns 0.97
integration/volumerhs 9624264.5 ns 9618776.5 ns 1.00
integration/byval/slices=1 147256 ns 147140 ns 1.00
integration/byval/slices=3 425877 ns 425273 ns 1.00
integration/byval/reference 145298 ns 145049 ns 1.00
integration/byval/slices=2 286513 ns 286349 ns 1.00
integration/cudadevrt 103694 ns 103459 ns 1.00
kernel/indexing 14436 ns 14269 ns 1.01
kernel/indexing_checked 14861 ns 14648 ns 1.01
kernel/occupancy 777.5090909090909 ns 720.354609929078 ns 1.08
kernel/launch 2285.5555555555557 ns 2197.3333333333335 ns 1.04
kernel/rand 18461 ns 14675 ns 1.26
array/reverse/1d 19857 ns 19825.5 ns 1.00
array/reverse/2d 24929 ns 25322 ns 0.98
array/reverse/1d_inplace 10796 ns 10664 ns 1.01
array/reverse/2d_inplace 12244 ns 12288 ns 1.00
array/copy 21381 ns 21417 ns 1.00
array/iteration/findall/int 158440 ns 160202.5 ns 0.99
array/iteration/findall/bool 139170 ns 140858 ns 0.99
array/iteration/findfirst/int 153951 ns 153807 ns 1.00
array/iteration/findfirst/bool 154987 ns 154741 ns 1.00
array/iteration/scalar 72845 ns 72443 ns 1.01
array/iteration/logical 216325 ns 219718.5 ns 0.98
array/iteration/findmin/1d 41346.5 ns 41952 ns 0.99
array/iteration/findmin/2d 94176 ns 94212 ns 1.00
array/reductions/reduce/1d 35853 ns 36691.5 ns 0.98
array/reductions/reduce/2d 41462 ns 49814 ns 0.83
array/reductions/mapreduce/1d 34029 ns 34861 ns 0.98
array/reductions/mapreduce/2d 41368 ns 41414 ns 1.00
array/broadcast 21097 ns 20872 ns 1.01
array/copyto!/gpu_to_gpu 12972 ns 13686 ns 0.95
array/copyto!/cpu_to_gpu 214298 ns 211802 ns 1.01
array/copyto!/gpu_to_cpu 283270 ns 245699 ns 1.15
array/accumulate/1d 109578 ns 109690 ns 1.00
array/accumulate/2d 80665 ns 80775 ns 1.00
array/construct 1281.8 ns 1289.6 ns 0.99
array/random/randn/Float32 43925 ns 44847 ns 0.98
array/random/randn!/Float32 25118 ns 26739.5 ns 0.94
array/random/rand!/Int64 27236 ns 27115 ns 1.00
array/random/rand!/Float32 8871 ns 8873 ns 1.00
array/random/rand/Int64 30222 ns 38396 ns 0.79
array/random/rand/Float32 13438 ns 13462 ns 1.00
array/permutedims/4d 61819 ns 61480 ns 1.01
array/permutedims/2d 55360 ns 55970.5 ns 0.99
array/permutedims/3d 56269 ns 56435 ns 1.00
array/sorting/1d 2766536 ns 2776551 ns 1.00
array/sorting/by 3354483 ns 3368660 ns 1.00
array/sorting/2d 1082927.5 ns 1086044 ns 1.00
cuda/synchronization/stream/auto 1025.9 ns 1062.2 ns 0.97
cuda/synchronization/stream/nonblocking 7550.6 ns 6529.8 ns 1.16
cuda/synchronization/stream/blocking 798.6632653061224 ns 804.8736842105263 ns 0.99
cuda/synchronization/context/auto 1182.1 ns 1171.2 ns 1.01
cuda/synchronization/context/nonblocking 8394.6 ns 6701.6 ns 1.25
cuda/synchronization/context/blocking 904.4 ns 921.8888888888889 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@maleadt
Copy link
Member

maleadt commented Apr 30, 2025

Hm, looks like the dist-upgrade of gpuci broke some stuff, and c1009e6 isn't sufficient as a workaround. It will have to wait until next week before I have the time to investigate though...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda libraries Stuff about CUDA library wrappers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants