Skip to content

Commit

Permalink
Merge pull request #2562 from kevincheng2/develop
Browse files Browse the repository at this point in the history
[LLM] update v1.2 images
  • Loading branch information
juncaipeng authored Nov 21, 2024
2 parents 3bb05ac + 203f3ae commit fd44c00
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@
# 挂载模型文件
export MODEL_PATH=${PWD}/Llama-3-8B-A8W8C8
docker run --gpus all --shm-size 5G --network=host \
docker run --gpus all --shm-size 5G --network=host --privileged --cap-add=SYS_PTRACE \
-v ${MODEL_PATH}:/models/ \
-dit registry.baidubce.com/paddlepaddle/fastdeploy:llm-serving-cuda123-cudnn9-v1.0 \
-dit registry.baidubce.com/paddlepaddle/fastdeploy:llm-serving-cuda123-cudnn9-v1.2 \
bash -c 'export USE_CACHE_KV_INT8=1 && cd /opt/output/Serving && bash start_server.sh; exec bash'
```

Expand Down
4 changes: 2 additions & 2 deletions llm/docs/FastDeploy_usage_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ health接口:(模型是否准备好推理)
from fastdeploy_client.chatbot import ChatBot
hostname = "127.0.0.1" # 服务部署的hostname
port = 8000 # 服务配置的GRPC_PORT
port = 8811 # 服务配置的GRPC_PORT
chatbot = ChatBot(hostname=hostname, port=port)
Expand All @@ -153,7 +153,7 @@ result = chatbot.generate("你好", topp=0.8, max_dec_len=128, timeout=120)
print(result)
# 流式接口
chatbot = ChatBot(hostname=hostname, port=port, model_id=model_id, mode=mode)
chatbot = ChatBot(hostname=hostname, port=port)
stream_result = chatbot.stream_generate("你好", max_dec_len=128, timeout=120)
for res in stream_result:
print(res)
Expand Down

0 comments on commit fd44c00

Please sign in to comment.