Currently, we have implemented an oneflow-backend for the Triton Inference Server that enables model serving.
OneFlow Backend For Triton Inference Server
Here is a tutorial about how to export the model and how to deploy it. You can also follow the instructions below to get started.
- Download and save model
cd examples/resnet50/
python3 export_model.py
- Launch triton server
cd ../../ # back to root of the serving
docker run --rm --runtime=nvidia --network=host -v$(pwd)/examples:/models \
oneflowinc/oneflow-serving
curl -v localhost:8000/v2/health/ready # ready check
- Send images and predict
pip3 install tritonclient[all]
cd examples/resnet50/
curl -o cat.jpg https://images.pexels.com/photos/156934/pexels-photo-156934.jpeg
python3 client.py --image cat.jpg
- Tutorial (Chinese)
- Build
- Model Configuration
- OneFlow Cookies: Serving (Chinese)
- OneFlow Cookies: Serving (English)
- Command Line Tool: oneflow-serving
The current version of oneflow does not support concurrent execution of multiple model instances. You can launch multiple containers (which is easy with Kubernetes) to bypass this limitation.