[THUDM/ChatGLM-6B][BUG/Help] 在官方Ptuning文档的帮助下，微调了模型，并加载了原模型和微调后的模型，但是却返回RuntimeError，BFloat16

您好，我使用官方的文档微调模型以后，加载后却出现该问题

RuntimeError: mixed dtype (CPU): expect input to have scalar type of BFloat16

config = AutoConfig.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True, pre_seq_len=128) model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True) prefix_state_dict = torch.load(os.path.join(CHECKPOINT_PATH, "pytorch_model.bin")) new_prefix_state_dict = {} for k, v in prefix_state_dict.items(): if k.startswith("transformer.prefix_encoder."): new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

Environment

- OS:centos7
- Python:3.9
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

grep-w

你是在CPU上微调的吗

duzx16

你是在CPU上微调的吗

我用16G的V100微调的，但是没有使用量化方法

grep-w

这边也碰到了同样的报错。根据 https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning 最小显存参数训练。 “模型部署”中的两种方式使用都是一样的报错。

fireice009

这边也碰到了同样的报错。根据 https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning 最小显存参数训练。 “模型部署”中的两种方式使用都是一样的报错。

后面我尝试了CPU推理就成功了，代码如下： `

model = AutoModel.from_pretrained("model", config=config, trust_remote_code=True).half().cuda()

model = AutoModel.from_pretrained("model", config=config, trust_remote_code=True).float()

prefix_state_dict = torch.load(os.path.join("output\checkpoint-20000", "pytorch_model.bin"))

new_prefix_state_dict = {}

for k, v in prefix_state_dict.items(): new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v

model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

grep-w

这边也碰到了同样的报错。根据 https://github.com/THUDM/ChatGLM-6B/tree/main/ptuning 最小显存参数训练。 “模型部署”中的两种方式使用都是一样的报错。

后面我尝试了CPU推理就成功了，代码如下： ` #model = AutoModel.from_pretrained("model", config=config, trust_remote_code=True).half().cuda()

model = AutoModel.from_pretrained("model", config=config, trust_remote_code=True).float()

prefix_state_dict = torch.load(os.path.join("output\checkpoint-20000", "pytorch_model.bin"))

new_prefix_state_dict = {}

for k, v in prefix_state_dict.items(): new_prefix_state_dict[k[len("transformer.prefix_encoder."):]] = v

model.transformer.prefix_encoder.load_state_dict(new_prefix_state_dict)

`

CPU 推理慢很多吧。10万条语料训练出来的参数，10核 Xeon 8255C 跑满，平均一条耗时20秒左右。

fireice009

model.eval()前增加model.half().cuda()即可，不过推理出来的有些乱码

Mylszd

将这个后面加上.half().cuda() model = AutoModel.from_pretrained("THUDM/chatglm-6b", config=config, trust_remote_code=True).half().cuda()

Aleluya009

[THUDM/ChatGLM-6B][BUG/Help] 在官方Ptuning文档的帮助下，微调了模型，并加载了原模型和微调后的模型，但是却返回RuntimeError，BFloat16

回答