修改脚本里的 CUDA_VISIBLE_DEVICES
到你想要用的GPU编号(比如 0,1,2,3
)即可
是可行的,感谢!
但是有个小疑惑,似乎不能指定 4,而且最多指定四块, 触发其中任意一个情况都会报 ValueError: 130004 is not in list。最终尝试了指定1,2,3,5,是正常的。
完整报错:
ValueError: Caught ValueError in replica 4 on device 4.
Original Traceback (most recent call last):
File "/root/anaconda3/envs/chatGLM/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 64, in _worker
output = module(*input, kwargs)
File "/root/anaconda3/envs/chatGLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 1190, in forward
transformer_outputs = self.transformer(
File "/root/anaconda3/envs/chatGLM/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 936, in forward
attention_mask = self.get_masks(
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 682, in get_masks
context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids]
File "/root/.cache/huggingface/modules/transformers_modules/chatglm-6b/modeling_chatglm.py", line 682, in
context_lengths = [seq.tolist().index(self.config.bos_token_id) for seq in input_ids]
ValueError: 130004 is not in list