我直接把train指定为下面这样,主要是CUDA_VISIBLE_DEVICES=0,1,2,3这里改了
PRE_SEQ_LEN=128 LR=2e-2
CUDA_VISIBLE_DEVICES=0,1,2,3 python3 main.py \ --do_train \ --train_file junshi/full_train.json \ --validation_file junshi/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path ../../chatglm_finetuning/data/chatglm-6b \ --output_dir output/testtest-$PRE_SEQ_LEN-$LR \ --overwrite_output_dir \ --max_source_length 128 \ --max_target_length 256 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --predict_with_generate \ --num_train_epochs 1 \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate $LR \ --pre_seq_len $PRE_SEQ_LEN \
然后就爆显存了,CUDA out of memory,但是如果单卡训练是不爆显存的。每张显卡是RTX 3090 24G。
然后我看到在issue那里有一个人用下面这种,说可以多卡训练
PRE_SEQ_LEN=128 LR=2e-2 MASTER_PORT=$(shuf -n 1 -i 10000-65535)
deepspeed --include localhost:4,5,6,7 --master_port $MASTER_PORT main.py --deepspeed deepspeed.json --do_train \ --train_file junshi/full_train.json \ --validation_file junshi/dev.json \ --prompt_column content \ --response_column summary \ --overwrite_cache \ --model_name_or_path ../../chatglm_finetuning/data/chatglm-6b \ --output_dir output/testtest-$PRE_SEQ_LEN-$LR \ --overwrite_output_dir \ --max_source_length 128 \ --max_target_length 256 \ --per_device_train_batch_size 1 \ --per_device_eval_batch_size 1 \ --gradient_accumulation_steps 1 \ --predict_with_generate \ --num_train_epochs 1 \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate $LR \ --pre_seq_len $PRE_SEQ_LEN \
但是我运行时报错了:
usage: deepspeed [-h] [-H HOSTFILE] [-i INCLUDE] [-e EXCLUDE] [--num_nodes NUM_NODES] [--min_elastic_nodes MIN_ELASTIC_NODES] [--max_elastic_nodes MAX_ELASTIC_NODES] [--num_gpus NUM_GPUS] [--master_port MASTER_PORT] [--master_addr MASTER_ADDR] [--launcher LAUNCHER] [--launcher_args LAUNCHER_ARGS] [--module] [--no_python] [--no_local_rank] [--no_ssh_check] [--force_multi] [--save_pid] [--enable_each_rank_log ENABLE_EACH_RANK_LOG] [--autotuning {tune,run}] [--elastic_training] [--bind_cores_to_rank] [--bind_core_list BIND_CORE_LIST] user_script ... deepspeed: error: the following arguments are required: user_script, user_args train3.sh: line 6: --master_port: command not found train3.sh: line 7: --deepspeed: command not found train3.sh: line 9: --do_train: command not found
能不能出一个详细的多卡训练的ptuning v2教程啊?
Environment- OS:Ubantu 18.04
- Python: 3.10.9
- Transformers: 4.29.2
- PyTorch: 1.12.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
11.3版本
print(torch.cuda.is_available()):True