[THUDM/ChatGLM-6B][BUG/Help] readme文档里为啥没有多卡ptuning的教程呢，或者解释一句会有什么情况吧，这个问题困扰好多人啊

7

这个只能自己研究了。

cywjava

9

新项目，没那么齐全

bookug

7

+1 ，好像只看到了多卡部署，没有并行训练

ztfmars

3

PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 \ --master_port $MASTER_PORT main.py \ --deepspeed deepspeed.json \ --do_train \ --train_file AdvertiseGen/train.json \ --test_file AdvertiseGen/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b \ --output_dir ./output/adgen-chatglm-6b-ft-$LR \ --overwrite_output_dir \ --max_source_length 64 \ --max_target_length 64 \ --per_device_train_batch_size 32 \ --per_device_eval_batch_size 32 \ --gradient_accumulation_steps 1 \ --predict_with_generate \ --num_train_epochs 10 \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate $LR \ --pre_seq_len $PRE_SEQ_LEN \ --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None:

P-tuning v2

    model = model.half()
    model.transformer.prefix_encoder.float()

else:

Finetune

    model = model.float()

HuuY

1

PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None: # P-tuning v2 model = model.half() model.transformer.prefix_encoder.float() else: # Finetune model = model.float()

我使用了你的方法，报错如上所示，这是为什么呢？ PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None:

P-tuning v2

model = model.half() model.transformer.prefix_encoder.float() else:

Finetune

model = model.float()

MathamPollard

5

详见main.py里：

我使用了你的方法，报错如上所示，这是为什么呢？ PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)

deepspeed --include localhost:2,3,4,5 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train --train_file AdvertiseGen/train.json --test_file AdvertiseGen/dev.json --prompt_column content --response_column summary --overwrite_cache --model_name_or_path /home/cser/hugo/ChatGLM-6B/chatglm-6b --output_dir ./output/adgen-chatglm-6b-ft-$LR --overwrite_output_dir --max_source_length 64 --max_target_length 64 --per_device_train_batch_size 32 --per_device_eval_batch_size 32 --gradient_accumulation_steps 1 --predict_with_generate --num_train_epochs 10 --logging_steps 10 --save_steps 1000 --learning_rate $LR --pre_seq_len $PRE_SEQ_LEN --fp16

可以吧，我是这样跑的四卡，关键是加上pre_seq_len走的ptuning分支。详见main.py里：

if model_args.pre_seq_len is not None:

MathamPollard

6

参数结尾没加 \ 吧

HuuY

8

换行符号复制上来自动github被删掉了。你自己加一下

HuuY

2

@jby20180901 请问这个问题您解决了吗？

summer-silence

4

@HuuY 大佬，您好，看起来这是个不错的方法，想请问一下这是用deepseed模式下执行多卡训练，可以直接在原先的train.sh中进行改动吗？其他issue中好多人说直接在train.sh 中CUDA_VISIBLE_DEVICES=1,2,3写成这样的形式也是可以实现的，但是实际上我这样实现，耗时与资源远远大于单卡运行的？麻烦大佬指导一下

summer-silence

[THUDM/ChatGLM-6B][BUG/Help] readme文档里为啥没有多卡ptuning的教程呢，或者解释一句会有什么情况吧，这个问题困扰好多人啊

回答