多gpu运行问题
预测的时候多gpu可以这样写: model = load_model_on_gpus("./THUDM/chatglm-6b", num_gpus=2)
但是训练的时候多了 config=config 参数怎么办: model = AutoModel.from_pretrained(model_args.model_name_or_path, config=config, trust_remote_code=True) #这里可以用预测的多GPU吗?
报错: RuntimeError: CUDA out of memory. Tried to allocate 192.00 MiB (GPU 0; 11.17 GiB total capacity; 10.24 GiB already allocated; 171.69 MiB free; 10.71 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
到底训练要多大的GPU显存才行? 我有两个卡22G,现在好像只能用一张卡: 我尝试这样改也不行: