CUDA runtime error in cublasLtMatmul, CUBLAS_STATUS_EXECUTION_FAILED #700

WhiteDoveBuct · 2023-12-19T11:49:46Z

build

python build.py \
--model_dir /AIED-data/xxx/Llama-2-70b-hf/ \
--dtype float16 \
--remove_input_padding \
--use_gpt_attention_plugin float16 \
--enable_context_fmha \
--use_gemm_plugin float16 \
--output_dir /AIED-data/xxx/trt_engines/Llama-2-70b-hf-32/ \
--world_size 8 \
--tp_size 4 \
--pp_size 2 \
--max_batch_size 32 \
--max_input_len 1024 \
--max_output_len 3072 \
--parallel_build \
--use_rmsnorm_plugin float16 \
--use_inflight_batching \
--use_fused_mlp \
--paged_kv_cache

benchmark

in_out_sizes=("1:1024:3072" "2:1024:3072" "4:1024:3072" "8:1024:3072", "16:1024:3072", "32:1024:3072")
for in_out in ${in_out_sizes[@]}
do
batch_size=$(echo $in_out | awk -F':' '{ print $1 }')
in_out_dims=$(echo $in_out | awk -F':' '{ print $2 }')
echo "BS: $batch_size, ISL/OSL: $in_out_dims"

    mpirun -n 8 --allow-run-as-root --oversubscribe \                                                                                                                      
./cpp/build/benchmarks/gptSessionBenchmark \                                                                                                                               
--model llama \                                                                                                                                                            
--engine_dir /AIED-data/xxx/trt_engines/Llama-2-70b-hf-32 \                                                                                                         
--warm_up 1 \                                                                                                                                                              
--batch_size $batch_size \                                                                                                                                                 
--duration 0 \                                                                                                                                                             
--num_runs 5 \                                                                                                                                                             
--input_output_len $in_out_dims

done

error log

[1702983381.525127] [AI-99-141-release:95101:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525298] [AI-99-141-release:95102:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525336] [AI-99-141-release:95098:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525344] [AI-99-141-release:95097:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525349] [AI-99-141-release:95095:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525352] [AI-99-141-release:95100:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525353] [AI-99-141-release:95099:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525355] [AI-99-141-release:95096:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
[BENCHMARK] batch_size 1 input_length 1024 output_length 3072 latency(ms) 138983.22 tokensPerSec
22.10
BS: 2, ISL/OSL: 1024,3072
[1702984696.791064] [AI-99-141-release:13206:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.791721] [AI-99-141-release:13205:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.791813] [AI-99-141-release:13209:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792377] [AI-99-141-release:13204:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792585] [AI-99-141-release:13208:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792633] [AI-99-141-release:13203:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792756] [AI-99-141-release:13210:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792766] [AI-99-141-release:13207:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] CUDA runtime error in cublasLtMatmul(getCublasLtHandle(), mOpera
tionDesc, alpha, A, mADesc, B, mBDesc, beta, C, mCDesc, C, mCDesc, (hasAlgo ? (&algo) : NULL), mC
ublasWorkspace, workspaceSize, mStream): CUBLAS_STATUS_EXECUTION_FAILED (/code/tensorrt_llm/cpp/t
ensorrt_llm/common/cublasMMWrapper.cpp:140)
1 0x7f5e902009ce /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0xac9ce) [0x7f5e902009ce]
2 0x7f5e90254dc6 /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0x100dc6) [0x7f5e90254dc6]
3 0x7f5e9025519b /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0x10119b) [0x7f5e9025519b]
4 0x7f5e902262d1 /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0xd22d1) [0x7f5e902262d1]
5 0x7f5e90226bba tensorrt_llm::plugins::GemmPlugin::enqueue(nvinfer1::PluginTensorDesc cons
t*, nvinfer1::PluginTensorDesc const*, void const* const*, void* const*, void*, CUstream_st*) + 2
66
6 0x7f5e46d3cba9 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10cdba9) [0x7f5e46d3cba9]
7 0x7f5e46d126af /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a36af) [0x7f5e46d126af]
8 0x7f5e46d14320 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a5320) [0x7f5e46d14320]
9 0x7f5ed5ee787f tensorrt_llm::runtime::GptSession::executeGenerationStep(int, std::vector<
tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> >
const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime
::GenerationOutput> >&, std::vector<int, std::allocator > const&, tensorrt_llm::batch_man
age r::kv_cache_manager::KVCacheManager*, std::vector<bool, std::alloca
tor >&) + 1903
10 0x7f5ed5ee912e tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_ll
m::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::ve
ctor<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInpu
t> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> con
st& ) + 3070
11 0x7f5ed5eeb18b tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::Generat
ionOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig
const&) + 7003
12 0x5556de223dff ./cpp/build/benchmarks/gptSessionBenchmark(+0x19dff) [0x5556de223dff]
13 0x7f5e8fcfad90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f5e8fcfad90]
14 0x7f5e8fcfae40 __libc_start_main + 128
15 0x5556de225ef5 ./cpp/build/benchmarks/gptSessionBenchmark(+0x1bef5) [0x5556de225ef5]

The text was updated successfully, but these errors were encountered:

byshiue · 2023-12-25T08:55:07Z

From error

[1702984696.792585] [AI-99-141-release:13208:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device

it looks like a issue of your device. Could you try on another device?

WhiteDoveBuct · 2023-12-25T08:55:26Z

您好，我会尽快回复您的！谢谢！

WhiteDoveBuct · 2024-11-18T03:27:07Z

您好，我会尽快回复您的！谢谢！

byshiue self-assigned this Dec 25, 2023

byshiue added the triaged Issue has been triaged by maintainers label Dec 25, 2023

hello-11 closed this as completed Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDA runtime error in cublasLtMatmul, CUBLAS_STATUS_EXECUTION_FAILED #700

CUDA runtime error in cublasLtMatmul, CUBLAS_STATUS_EXECUTION_FAILED #700

WhiteDoveBuct commented Dec 19, 2023 •

edited

Loading

byshiue commented Dec 25, 2023

Uh oh!

WhiteDoveBuct commented Dec 25, 2023 via email

Uh oh!

WhiteDoveBuct commented Nov 18, 2024 via email

Uh oh!

CUDA runtime error in cublasLtMatmul, CUBLAS_STATUS_EXECUTION_FAILED #700

CUDA runtime error in cublasLtMatmul, CUBLAS_STATUS_EXECUTION_FAILED #700

Comments

WhiteDoveBuct commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

build

benchmark

error log

byshiue commented Dec 25, 2023

Uh oh!

WhiteDoveBuct commented Dec 25, 2023 via email

Uh oh!

WhiteDoveBuct commented Nov 18, 2024 via email

Uh oh!

WhiteDoveBuct commented Dec 19, 2023 •

edited

Loading