Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error was raised when importing model in v1.0.x #40

Open
sdjksdafji opened this issue Jan 17, 2022 · 12 comments
Open

[BUG] Error was raised when importing model in v1.0.x #40

sdjksdafji opened this issue Jan 17, 2022 · 12 comments
Labels
bug Something isn't working

Comments

@sdjksdafji
Copy link

sdjksdafji commented Jan 17, 2022

Describe the bug
CUDA error was raised when importing models. This issue only happens with BMInf 1.0.x version. I could run BmInf 0.0.5 successfully. Any help would be appreciated. Thanks.

Minimal steps to reproduce
Tried the following on both WSL2 Ubuntu 20.04 with GTX 3080 16G and native Ubuntu 18.04 with GTX 1070 8G

conda create --name bminfnew python=3.8
conda activate bminfnew
conda install cudatoolkit=11.3
pip install bminf==1.0.1

Then run

import bminf
cpm2 = bminf.models.CPM2()

Expected behavior
Start downloading the model.

Screenshots

Python 3.8.12 (default, Oct 12 2021, 13:49:34) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bminf
>>> cpm2 = bminf.models.CPM2()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 55, in __init__
    SizeLimitedAllocator( self._cudaAlloc.allocate( dynamic_memory ))
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/core/allocators/cuda.py", line 20, in allocate
    ptr = cudart.cudaMalloc(nbytes).value
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 375, in cudaMalloc
    checkCUDAStatus(cuda.cudaMalloc(ctypes.byref(ptr), size))
  File "/home/mira/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 327, in checkCUDAStatus
    raise RuntimeError("CUDA Runtime Error: %s" % cudaGetErrorString(error))
RuntimeError: CUDA Runtime Error: out of memory

Environment:
Tried with various cuda versions including 10.2 11.0 and 11.3

@a710128 a710128 added the bug Something isn't working label Jan 17, 2022
@a710128
Copy link
Collaborator

a710128 commented Jan 17, 2022

BMinf will request 512MB of memory before loading the model. From your screenshot, it seems that the error is happening here. I'm going to spend some time trying to reproduce this error.

@sdjksdafji
Copy link
Author

@a710128 Thanks for the quick response. Please keep me updated.

@a710128
Copy link
Collaborator

a710128 commented Jan 19, 2022

@sdjksdafji

I ran the examples with my GTX 1070 on Windows. Everything turned out fine. Could it be that the conda environment is causing some effects?
Also, ave you tried running generate_cpm1.py ?

@sdjksdafji
Copy link
Author

@a710128 I tried to import all 3 models. Surprisingly, CPM1 is fine. It started downloading after model = bminf.models.CPM1(). However, CPM2 and EVA reported the same CUDA OOM error.

Seems like your env is windows. Could you try it under Linux?

@a710128
Copy link
Collaborator

a710128 commented Jan 19, 2022

@a710128 I tried to import all 3 models. Surprisingly, CPM1 is fine. It started downloading after model = bminf.models.CPM1(). However, CPM2 and EVA reported the same CUDA OOM error.

Seems like your env is windows. Could you try it under Linux?

I've tested it under Ubuntu 20.04 using 1080Ti, 2080Ti, V100 and it works fine.

@sdjksdafji
Copy link
Author

@a710128 Could you share your installation script and cuda version?

@a710128
Copy link
Collaborator

a710128 commented Jan 19, 2022

I'm confused that importing CPM1 and importing CPM2 will run almost the same code. But importing CPM2 gives an error at line 55.

SizeLimitedAllocator( self._cudaAlloc.allocate( dynamic_memory ))

CPM1

class CPM1:
def __init__(self,
device_idx : Optional[int] = None,
dynamic_memory : int = 512 * 1024 * 1024,
memory_limit : Optional[int] = None,
version : Optional[str] = None
) -> None:
if version is None:
version = LATEST_VERSION
if version not in SUPPORTED_VERSION and not version.startswith("file://"):
raise RuntimeError("CPM1 version %s is not supported (requires %s)" % (version, SUPPORTED_VERSION))
config = CPM1Configuration()
config.MODEL_NAME = version
if device_idx is None:
device_idx = cudart.cudaGetDevice()
config.DEVICE = device_idx
config.MEMORY_LIMIT = memory_limit
self.device = Device(config.DEVICE)
self._cudaAlloc = CUDAAllocator(config.DEVICE)
self._ctx = Context([config.DEVICE], [
SizeLimitedAllocator(self._cudaAlloc.allocate(dynamic_memory))
])

CPM2

class CPM2:
def __init__(self,
device_idx : Optional[int] = None,
dynamic_memory : int = 512 * 1024 * 1024, # 512MB
memory_limit : Optional[int] = None,
version : Optional[str] = None
) -> None:
if version is None:
version = LATEST_VERSION
if version not in SUPPORTED_VERSION and not version.startswith("file://"):
raise RuntimeError("CPM2 version %s is not supported (requires %s)" % (version, SUPPORTED_VERSION))
config = CPM2Configuration()
config.MODEL_NAME = version
if device_idx is None:
device_idx = cudart.cudaGetDevice()
config.DEVICE = device_idx
config.MEMORY_LIMIT = memory_limit
self.device = Device(config.DEVICE)
self._cudaAlloc = CUDAAllocator(config.DEVICE)
self._ctx = Context([config.DEVICE], [
SizeLimitedAllocator( self._cudaAlloc.allocate( dynamic_memory ))
])

@a710128
Copy link
Collaborator

a710128 commented Jan 19, 2022

@a710128 Could you share your installation script and cuda version?

pip install bminf
CUDA 11.1

@sdjksdafji
Copy link
Author

@a710128 Actually the previous logs are not matched with my latest runs. The error actually comes from the T5 model file during the first init of the pinned decoder layer.
code: self.dec_layers[i].init_data(pinned=True)
This could explain why the CPM1 works fine but EVA and CPM2 are not.

Here is my script

import bminf
from cpm_kernels.library import cudart

print(bminf.__version__)

print(cudart.cudaGetDeviceCount())
print(cudart.cudaRuntimeGetVersion())
print(cudart.cudaDriverGetVersion())

cpm2 = bminf.models.CPM2()

And the output is:

1.0.1
1
10020
11060
Traceback (most recent call last):
  File "/home/sdjksdafji/repo/mira/bminf-backend/debug.py", line 10, in <module>
    cpm2 = bminf.models.CPM2()
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 57, in __init__
    self._model = T5Model(config)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/arch/t5/model.py", line 107, in __init__
    self.dec_layers[i].init_data(pinned=True)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/core/layer.py", line 158, in init_data
    ptr = cudart.cudaMallocHost(self.nbytes)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 385, in cudaMallocHost
    checkCUDAStatus(cuda.cudaMallocHost(ctypes.byref(ptr), size))
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cudart.py", line 327, in checkCUDAStatus
    raise RuntimeError("CUDA Runtime Error: %s" % cudaGetErrorString(error))
RuntimeError: CUDA Runtime Error: out of memory

Process finished with exit code 1

a710128 added a commit that referenced this issue Jan 22, 2022

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
@a710128 a710128 mentioned this issue Jan 25, 2022
@a710128
Copy link
Collaborator

a710128 commented Jan 25, 2022

@sdjksdafji Try BMInf 1.0.2

@sdjksdafji
Copy link
Author

@a710128 Thanks for the fix. I tried 1.0.2. The import works fine for me but the inference does not. Here is the latest error:

>>> import bminf

>>> cpm2 = bminf.models.CPM2()
Downloading cpm2.1-new/checkpoint.pt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11.3G/11.3G [32:51<00:00, 5.73MiB/s]
Downloading cpm2.1-new/vocab.txt: 160kiB [00:00, 2.72MiB/s]
>>> text = "北京环球度假区相关负责人介绍,北京环球影城指定单日门票将采用<span>制度,即推出淡季日、平季日、旺季日和特定日门票。<span>价格为418元,<span>价格为528元,<span>价格为638元,<span>价格为<span>元。北京环球度假区将提供90天滚动价格日历,以方便游客提前规划行程。"
>>> for result in cpm2.fill_blank(text, 
...     top_p=1.0,
...     top_n=5, 
...     temperature=0.5,
...     frequency_penalty=0,
...     presence_penalty=0
... ):
...     value = result["text"]
...     text = text.replace("<span>", "\033[0;32m" + value + "\033[0m", 1)
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 245, in fill_blank
    for token in res:
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/models/cpm2.py", line 135, in _gen_iter
    self._model.encode(
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/arch/t5/model.py", line 195, in encode
    layer.forward(
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/layers/transformer_block.py", line 34, in forward
    self.self_attn.forward(ctx, x_mid, x_mid, mask, position_bias, x_mid)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/layers/attention.py", line 42, in forward
    self.project_q.forward(ctx, hidden_q, h_q)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/bminf/layers/linear.py", line 43, in forward
    ck.gemm_int8(
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/kernels/gemm.py", line 172, in gemm_int8
    cublaslt.cublasLtMatmul(
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/base.py", line 94, in wrapper
    return f(*args, **kwargs)
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 137, in cublasLtMatmul
    checkCublasStatus(cublasLt.cublasLtMatmul(
  File "/home/sdjksdafji/miniconda3/envs/bminfnew/lib/python3.8/site-packages/cpm_kernels/library/cublaslt.py", line 98, in checkCublasStatus
    raise RuntimeError("CUBLAS error: {}".format(
RuntimeError: CUBLAS error: CUBLAS_STATUS_EXECUTION_FAILED

BTW, I have some questions regarding the fix. Seems like the actual fix here is to use the non-cuda pinned numpy array if the cuda malloc operation fails. Even if it worked, would it affect the inference performance? I assume now the computation happens on CPU instead of GPU, right? Instead of a fix, to me, this sounds like a workaround with the sacrifice of perf. Shall we try to figure out the root cause of the failed CUDA malloc? My 3080 has 16g GPU-MEM so the OOM error definitely does not make sense.

@a710128
Copy link
Collaborator

a710128 commented Feb 9, 2022

Seems like the actual fix here is to use the non-cuda pinned numpy array if the cuda malloc operation fails. Even if it worked, would it affect the inference performance?I assume now the computation happens on CPU instead of GPU, right?

Even if a non-cuda pinned numpy array is used, the computation still happens on the GPU. The difference is that non-pinned memory spends more time transferring data from CPU to GPU.

Shall we try to figure out the root cause of the failed CUDA malloc?
I think the root cause of failed memory requests is because of some system limitations. Some operating systems limit the total size of pinned memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants