使用CPU无法运行chatglm-6b-int4,但可以运行chatglm-6b, 主要的运行错误如下
Traceback (most recent call last):
File "C:\Users\Azure/.cache\huggingface\modules\transformers_modules\chatglm_6b_int_4\quantization.py", line 18, in <module>
from cpm_kernels.kernels.base import LazyKernelCModule, KernelFunction, round_up
File "C:\Users\Azure\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\cpm_kernels\__init__.py", line 1, in <module>
from . import library
File "C:\Users\Azure\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\cpm_kernels\library\__init__.py", line 2, in <module>
from . import cuda
File "C:\Users\Azure\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\cpm_kernels\library\cuda.py", line 7, in <module>
cuda = Lib.from_lib("cuda", ctypes.WinDLL("nvcuda.dll"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.11_3.11.1008.0_x64__qbz5n2kfra8p0\Lib\ctypes\__init__.py", line 376, in __init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'nvcuda.dll' (or one of its dependencies). Try using the full path with constructor syntax.
Message: 'Failed to load cpm_kernels:'
Arguments: (FileNotFoundError("Could not find module 'nvcuda.dll' (or one of its dependencies). Try using the full path with constructor syntax."),)
No compiled kernel found.
Compiling kernels : C:\Users\Azure\.cache\huggingface\modules\transformers_modules\chatglm_6b_int_4\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\Azure\.cache\huggingface\modules\transformers_modules\chatglm_6b_int_4\quantization_kernels_parallel.c -shared -o C:\Users\Azure\.cache\huggingface\modules\transformers_modules\chatglm_6b_int_4\quantization_kernels_parallel.so
Kernels compiled : C:\Users\Azure\.cache\huggingface\modules\transformers_modules\chatglm_6b_int_4\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Cannot load cuda kernel, quantization failed.
我的文件布局如下
Expected Behavior无法使用cpu运行int4模型,No compiled kernel found. Cannot load cpu kernel,Cannot load cuda kernel
Steps To Reproduce下载 https://github.com/THUDM/ChatGLM-6B 到 ChatGLM-6B 下载 https://huggingface.co/THUDM/chatglm-6b-int4 到 ChatGLM-6B\chatglm_6b_int_4
修改 cli_demo.py
...
tokenizer = AutoTokenizer.from_pretrained("chatglm_6b_int_4", trust_remote_code=True)
model = AutoModel.from_pretrained("chatglm_6b_int_4", trust_remote_code=True).float()
...
Environment
- OS: Window 11 Insider Preview 25336
- Python: 3.11
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) : False
Anything else?
https://www.datalearner.com/blog/1051680925189690