[THUDM/ChatGLM-6B][BUG/Help] 随着对话轮数增加，有什么方法可以防止/减少ChatGLM的显存增加吗？

Current Behavior

“进行 2 至 3 轮对话后，8-bit 量化下 GPU 显存占用约为 10GB，4-bit 量化下仅需 6GB 占用。随着对话轮数的增多，对应消耗显存也随之增长”

我使用ChatGLM实现一个角色扮演的应用，也遇到了类似问题。每次对话后，ChatGLM显存使用增加几百到1、2G。

感谢！

Steps To Reproduce

正常和ChatGLM对话即可

Environment

- OS: Ubuntu 20.04
- Python: 3.10
- Transformers: 4.26.1
- PyTorch: 1.12
- CUDA Support: True

jasonliu119

你好我也是这个问题请问解决没有

superyoman

要是并发有10个人同时提问，此不是每次ChatGLM显存使用增加到10、20G。

qq516249940

限制历史的对话输入就行。

finlay-liu

限制历史的对话输入就行。

问题是我没有设置history，glm是默认增加历史对话记录吗，那么如何只限制单轮对话，不储存对话历史

superyoman

限制历史的对话输入就行。

问题是我没有设置history，glm是默认增加历史对话记录吗，那么如何只限制单轮对话，不储存对话历史

你去看底层的api，有history的使用。

finlay-liu

chatglm_llm.py def generatorAnswer(self, prompt: str, history: List[List[str]] = [], streaming: bool = False): history = [] ##强制制空列表 if streaming:

shleo

mark

aleimu

api.py中有torch_gc

umbraclet16

回答