We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python build.py \ --model_dir /AIED-data/xxx/Llama-2-70b-hf/ \ --dtype float16 \ --remove_input_padding \ --use_gpt_attention_plugin float16 \ --enable_context_fmha \ --use_gemm_plugin float16 \ --output_dir /AIED-data/xxx/trt_engines/Llama-2-70b-hf-32/ \ --world_size 8 \ --tp_size 4 \ --pp_size 2 \ --max_batch_size 32 \ --max_input_len 1024 \ --max_output_len 3072 \ --parallel_build \ --use_rmsnorm_plugin float16 \ --use_inflight_batching \ --use_fused_mlp \ --paged_kv_cache
in_out_sizes=("1:1024:3072" "2:1024:3072" "4:1024:3072" "8:1024:3072", "16:1024:3072", "32:1024:3072") for in_out in ${in_out_sizes[@]} do batch_size=$(echo $in_out | awk -F':' '{ print $1 }') in_out_dims=$(echo $in_out | awk -F':' '{ print $2 }') echo "BS: $batch_size, ISL/OSL: $in_out_dims"
mpirun -n 8 --allow-run-as-root --oversubscribe \ ./cpp/build/benchmarks/gptSessionBenchmark \ --model llama \ --engine_dir /AIED-data/xxx/trt_engines/Llama-2-70b-hf-32 \ --warm_up 1 \ --batch_size $batch_size \ --duration 0 \ --num_runs 5 \ --input_output_len $in_out_dims
done
[1702983381.525127] [AI-99-141-release:95101:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525298] [AI-99-141-release:95102:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525336] [AI-99-141-release:95098:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525344] [AI-99-141-release:95097:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525349] [AI-99-141-release:95095:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525352] [AI-99-141-release:95100:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525353] [AI-99-141-release:95099:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702983381.525355] [AI-99-141-release:95096:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. Benchmarking done. Iteration: 5, duration: 694.92 sec. [BENCHMARK] batch_size 1 input_length 1024 output_length 3072 latency(ms) 138983.22 tokensPerSec 22.10 BS: 2, ISL/OSL: 1024,3072 [1702984696.791064] [AI-99-141-release:13206:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.791721] [AI-99-141-release:13205:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.791813] [AI-99-141-release:13209:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.792377] [AI-99-141-release:13204:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.792585] [AI-99-141-release:13208:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.792633] [AI-99-141-release:13203:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.792756] [AI-99-141-release:13210:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device [1702984696.792766] [AI-99-141-release:13207:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device terminate called after throwing an instance of 'tensorrt_llm::common::TllmException' what(): [TensorRT-LLM][ERROR] CUDA runtime error in cublasLtMatmul(getCublasLtHandle(), mOpera tionDesc, alpha, A, mADesc, B, mBDesc, beta, C, mCDesc, C, mCDesc, (hasAlgo ? (&algo) : NULL), mC ublasWorkspace, workspaceSize, mStream): CUBLAS_STATUS_EXECUTION_FAILED (/code/tensorrt_llm/cpp/t ensorrt_llm/common/cublasMMWrapper.cpp:140) 1 0x7f5e902009ce /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor rt_llm.so.9(+0xac9ce) [0x7f5e902009ce] 2 0x7f5e90254dc6 /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor rt_llm.so.9(+0x100dc6) [0x7f5e90254dc6] 3 0x7f5e9025519b /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor rt_llm.so.9(+0x10119b) [0x7f5e9025519b] 4 0x7f5e902262d1 /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor rt_llm.so.9(+0xd22d1) [0x7f5e902262d1] 5 0x7f5e90226bba tensorrt_llm::plugins::GemmPlugin::enqueue(nvinfer1::PluginTensorDesc cons t*, nvinfer1::PluginTensorDesc const*, void const* const*, void* const*, void*, CUstream_st*) + 2 66 6 0x7f5e46d3cba9 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10cdba9) [0x7f5e46d3cba9] 7 0x7f5e46d126af /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a36af) [0x7f5e46d126af] 8 0x7f5e46d14320 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a5320) [0x7f5e46d14320] 9 0x7f5ed5ee787f tensorrt_llm::runtime::GptSession::executeGenerationStep(int, std::vector< tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> > const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime ::GenerationOutput> >&, std::vector<int, std::allocator > const&, tensorrt_llm::batch_man age r::kv_cache_manager::KVCacheManager*, std::vector<bool, std::alloca tor >&) + 1903 10 0x7f5ed5ee912e tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_ll m::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::ve ctor<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInpu t> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> con st& ) + 3070 11 0x7f5ed5eeb18b tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::Generat ionOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig const&) + 7003 12 0x5556de223dff ./cpp/build/benchmarks/gptSessionBenchmark(+0x19dff) [0x5556de223dff] 13 0x7f5e8fcfad90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f5e8fcfad90] 14 0x7f5e8fcfae40 __libc_start_main + 128 15 0x5556de225ef5 ./cpp/build/benchmarks/gptSessionBenchmark(+0x1bef5) [0x5556de225ef5]
The text was updated successfully, but these errors were encountered:
From error
[1702984696.792585] [AI-99-141-release:13208:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat ch(/tmp) failed: No space left on device
it looks like a issue of your device. Could you try on another device?
Sorry, something went wrong.
byshiue
No branches or pull requests
Uh oh!
There was an error while loading. Please reload this page.
build
python build.py \
--model_dir /AIED-data/xxx/Llama-2-70b-hf/ \
--dtype float16 \
--remove_input_padding \
--use_gpt_attention_plugin float16 \
--enable_context_fmha \
--use_gemm_plugin float16 \
--output_dir /AIED-data/xxx/trt_engines/Llama-2-70b-hf-32/ \
--world_size 8 \
--tp_size 4 \
--pp_size 2 \
--max_batch_size 32 \
--max_input_len 1024 \
--max_output_len 3072 \
--parallel_build \
--use_rmsnorm_plugin float16 \
--use_inflight_batching \
--use_fused_mlp \
--paged_kv_cache
benchmark
in_out_sizes=("1:1024:3072" "2:1024:3072" "4:1024:3072" "8:1024:3072", "16:1024:3072", "32:1024:3072")
for in_out in ${in_out_sizes[@]}
do
batch_size=$(echo $in_out | awk -F':' '{ print $1 }')
in_out_dims=$(echo $in_out | awk -F':' '{ print $2 }')
echo "BS: $batch_size, ISL/OSL: $in_out_dims"
done
error log
[1702983381.525127] [AI-99-141-release:95101:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525298] [AI-99-141-release:95102:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525336] [AI-99-141-release:95098:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525344] [AI-99-141-release:95097:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525349] [AI-99-141-release:95095:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525352] [AI-99-141-release:95100:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525353] [AI-99-141-release:95099:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702983381.525355] [AI-99-141-release:95096:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
Benchmarking done. Iteration: 5, duration: 694.92 sec.
[BENCHMARK] batch_size 1 input_length 1024 output_length 3072 latency(ms) 138983.22 tokensPerSec
22.10
BS: 2, ISL/OSL: 1024,3072
[1702984696.791064] [AI-99-141-release:13206:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.791721] [AI-99-141-release:13205:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.791813] [AI-99-141-release:13209:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792377] [AI-99-141-release:13204:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792585] [AI-99-141-release:13208:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792633] [AI-99-141-release:13203:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792756] [AI-99-141-release:13210:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
[1702984696.792766] [AI-99-141-release:13207:f] vfs_fuse.c:281 UCX ERROR inotify_add_wat
ch(/tmp) failed: No space left on device
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] CUDA runtime error in cublasLtMatmul(getCublasLtHandle(), mOpera
tionDesc, alpha, A, mADesc, B, mBDesc, beta, C, mCDesc, C, mCDesc, (hasAlgo ? (&algo) : NULL), mC
ublasWorkspace, workspaceSize, mStream): CUBLAS_STATUS_EXECUTION_FAILED (/code/tensorrt_llm/cpp/t
ensorrt_llm/common/cublasMMWrapper.cpp:140)
1 0x7f5e902009ce /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0xac9ce) [0x7f5e902009ce]
2 0x7f5e90254dc6 /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0x100dc6) [0x7f5e90254dc6]
3 0x7f5e9025519b /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0x10119b) [0x7f5e9025519b]
4 0x7f5e902262d1 /code/tensorrt_llm/cpp/build/tensorrt_llm/plugins/libnvinfer_plugin_tensor
rt_llm.so.9(+0xd22d1) [0x7f5e902262d1]
5 0x7f5e90226bba tensorrt_llm::plugins::GemmPlugin::enqueue(nvinfer1::PluginTensorDesc cons
t*, nvinfer1::PluginTensorDesc const*, void const* const*, void* const*, void*, CUstream_st*) + 2
66
6 0x7f5e46d3cba9 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10cdba9) [0x7f5e46d3cba9]
7 0x7f5e46d126af /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a36af) [0x7f5e46d126af]
8 0x7f5e46d14320 /usr/local/tensorrt/lib/libnvinfer.so.9(+0x10a5320) [0x7f5e46d14320]
9 0x7f5ed5ee787f tensorrt_llm::runtime::GptSession::executeGenerationStep(int, std::vector<
tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInput> >
const&, std::vector<tensorrt_llm::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime
::GenerationOutput> >&, std::vector<int, std::allocator > const&, tensorrt_llm::batch_man
age r::kv_cache_manager::KVCacheManager*, std::vector<bool, std::alloca
tor >&) + 1903
10 0x7f5ed5ee912e tensorrt_llm::runtime::GptSession::generateBatched(std::vector<tensorrt_ll
m::runtime::GenerationOutput, std::allocator<tensorrt_llm::runtime::GenerationOutput> >&, std::ve
ctor<tensorrt_llm::runtime::GenerationInput, std::allocator<tensorrt_llm::runtime::GenerationInpu
t> > const&, tensorrt_llm::runtime::SamplingConfig const&, std::function<void (int, bool)> con
st& ) + 3070
11 0x7f5ed5eeb18b tensorrt_llm::runtime::GptSession::generate(tensorrt_llm::runtime::Generat
ionOutput&, tensorrt_llm::runtime::GenerationInput const&, tensorrt_llm::runtime::SamplingConfig
const&) + 7003
12 0x5556de223dff ./cpp/build/benchmarks/gptSessionBenchmark(+0x19dff) [0x5556de223dff]
13 0x7f5e8fcfad90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f5e8fcfad90]
14 0x7f5e8fcfae40 __libc_start_main + 128
15 0x5556de225ef5 ./cpp/build/benchmarks/gptSessionBenchmark(+0x1bef5) [0x5556de225ef5]
The text was updated successfully, but these errors were encountered: