[THUDM/ChatGLM-6B]批量推理api,支持高并发

2024-05-10 726 views
0

修改自evaluate.py,支持多线程接受请求,设计了个请求池,当积累到一定数量(MAX_BATCH_SIZE)或等待到一定时间(MAX_WAIT_TIME)后,可以执行批量推理,大大加快了推理速度。可以根据需要设置MAX_BATCH_SIZE和MAX_WAIT_TIME这两个超参数

回答

8

修改自evaluate.py,支持多线程接受请求,设计了个请求池,当积累到一定数量(MAX_BATCH_SIZE)或等待到一定时间(MAX_WAIT_TIME)后,可以执行批量推理,大大加快了推理速度。可以根据需要设置MAX_BATCH_SIZE和MAX_WAIT_TIME这两个超参数

请问一下,您使用的显卡型号和显存大小是啥,这样不会报显存吗?

2

HL0718

应该不会的,我用的是80G的A800。可以把batch调到300左右。你可以根据你的显卡来调整MAX_BATCH_SIZE,显存小你就调小些。

PS: 虽然这种方式也会让显存增加,但是推理速度有极大提升,所以收益是明显的。我用相同的一千条数据测过,这种分batch的方式只需要30秒就可以推理完成(batch=100),而完全串行需要500秒左右。分batch比串行的显存只增加了数倍。

0

HL0718

应该不会的,我用的是80G的A800。可以把batch调到300左右。你可以根据你的显卡来调整MAX_BATCH_SIZE,显存小你就调小些。

PS: 虽然这种方式也会让显存增加,但是推理速度有极大提升,所以收益是明显的。我用相同的一千条数据测过,这种分batch的方式只需要30秒就可以推理完成(batch=100),而完全串行需要500秒左右。分batch比串行的显存只增加了数倍。

您每个batch里面的每天数据的长度应该不会很长,我试了一下,当长度为2048时,80G显存的A100最多支持batch为32

7

HL0718

应该不会的,我用的是80G的A800。可以把batch调到300左右。你可以根据你的显卡来调整MAX_BATCH_SIZE,显存小你就调小些。 PS: 虽然这种方式也会让显存增加,但是推理速度有极大提升,所以收益是明显的。我用相同的一千条数据测过,这种分batch的方式只需要30秒就可以推理完成(batch=100),而完全串行需要500秒左右。分batch比串行的显存只增加了数倍。

您每个batch里面的每天数据的长度应该不会很长,我试了一下,当长度为2048时,80G显存的A100最多支持batch为32

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

7

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

这个确实是有提升的,串行的时间大概是80s左右,batch的形式的话大概就是18s左右

0

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

这个确实是有提升的,串行的时间大概是80s左右,batch的形式的话大概就是18s左右

对啊,说明分batch这种方式还是有效果的,请问您是用我提供的代码运行的吗?

6

我的数据确实不长。但是你这么长的数据,完全串行更费时间吧,你可以比较一下跟串行相比的速度有无提升。

这个确实是有提升的,串行的时间大概是80s左右,batch的形式的话大概就是18s左右

对啊,说明分batch这种方式还是有效果的,请问您是用我提供的代码运行的吗?

目前没有,目前暂时没有这种高并发的场景要求

1

多个请求会报错,RuntimeError: Task <Task pending name='Task-7' coro=<RequestResponseCycle.run_asgi() running at /home/nrp/anaconda3/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py:436> cb=[set.discard()]> got Future attached to a different loop

5

多个请求会报错,RuntimeError: Task <Task pending name='Task-7' coro=<RequestResponseCycle.run_asgi() running at /home/nrp/anaconda3/lib/python3.9/site-packages/uvicorn/protocols/http/httptools_impl.py:436> cb=[set.discard()]> got Future attached to a different loop

请求您服务器端是用哪种方式部署的?我这套代码,只需要用fastAPI部署就行了

2

我的显存只有16g,目前堪堪部署fp16的模型,如果我替换成int4的权重,再配上您的代码是不是可以测试一下这个高并发能力了

6

我的显存只有16g,目前堪堪部署fp16的模型,如果我替换成int4的权重,再配上您的代码是不是可以测试一下这个高并发能力了

可以试一下,但是batch应该不能开太大

7

就是用的fastapi,用的你得代码,同时两个页面进行请求,就出现了上面的错误

8

就是用的fastapi,用的你得代码,同时两个页面进行请求,就出现了上面的错误

我这边多线程测试是没问题的。您可以把完整报错信息发来我看一下

6

我是用chatglm2-6b模型,批量推理的回复效果会变差有没有遇到这个问题?

2

小建议:可以根据batch中文本的平均长度或者总长度以及可用的显存大小来分配batch size

8

ERROR: Exception in ASGI application Traceback (most recent call last): File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/applications.py", line 282, in call await super().call(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call raise e File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 167, in run_endpoint_function return await dependant.call(**values) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 97, in handle_data return await data_processor.process_data(prompt) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 80, in process_data await self.wait_for_result(data) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 66, in wait_for_result await self.event.wait() File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/asyncio/locks.py", line 226, in wait await fut RuntimeError: Task <Task pending name='Task-8' coro=<RequestResponseCycle.run_asgi() running at /root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:428> cb=[set.discard()]> got Future attached to a different loop

发起多个请求返回报错

6

ERROR: Exception in ASGI application Traceback (most recent call last): File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py", line 428, in run_asgi result = await app( # type: ignore[func-returns-value] File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in call return await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/applications.py", line 282, in call await super().call(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/applications.py", line 122, in call await self.middleware_stack(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 184, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/errors.py", line 162, in call await self.app(scope, receive, _send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 79, in call raise exc File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 68, in call await self.app(scope, receive, sender) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 20, in call raise e File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in call await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 718, in call await route.handle(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 276, in handle await self.app(scope, receive, send) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/starlette/routing.py", line 66, in app response = await func(request) File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 241, in app raw_response = await run_endpoint_function( File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/fastapi/routing.py", line 167, in run_endpoint_function return await dependant.call(**values) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 97, in handle_data return await data_processor.process_data(prompt) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 80, in process_data await self.wait_for_result(data) File "/root/chatglm/ChatGLM-6B/ptuning/api_batch.py", line 66, in wait_for_result await self.event.wait() File "/root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/asyncio/locks.py", line 226, in wait await fut RuntimeError: Task <Task pending name='Task-8' coro=<RequestResponseCycle.run_asgi() running at /root/anaconda3/envs/chatglm-py39-cu117-torch1131/lib/python3.9/site-packages/uvicorn/protocols/http/h11_impl.py:428> cb=[set.discard()]> got Future attached to a different loop

发起多个请求返回报错

请问你是用什么方式发起的多个请求?多线程吗