1.0.0 Release Note
全场景高性能AI部署工具⚡️FastDeploy 1.0.0正式发布!🎉 支持飞桨及开源社区150+模型的多硬件高性能部署,为开发者提供简单全场景、简单易用、极致高效的全新部署体验!
多推理后端与多硬件支持
FastDeploy支持在多种硬件上以不同后端的方式进行推理部署,各后端模块可根据开发者需求灵活编译集成,自行编译参考 FastDeploy编译文档。
后端 | 平台 | 支持模型格式 | 支持硬件 |
---|---|---|---|
Paddle Inference | Linux(x64)/Windows(x64) | Paddle | x86 CPU/NVIDIA GPU/Jetson/GraphCore IPU |
Paddle Lite | Linux(aarch64/armhf)/Android | Paddle | Arm CPU/Kunlun R200/RV1126 |
Poros | Linux(x64) | TorchScript | x86 CPU/NVIDIA GPU |
OpenVINO | Linux(x64)/Windows(x64)/OSX(x86) | Paddle/ONNX | x86 CPU/Intel GPU |
TensorRT | Linux(x64/aarch64)/Windows(x64) | Paddle/ONNX | NVIDIA GPU/Jetson |
ONNX Runtime | Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) | Paddle/ONNX | x86 CPU/Arm CPU/NVIDIA GPU |
除此之外,FastDeploy也基于Paddle.js 支持模型在网页前端及智能小程序部署工具,参阅 Web部署 了解更多细节。
丰富的AI模型端到端推理
FastDeploy支持如下飞桨模型套件的端到端部署
除飞桨开发套件外,FastDeploy同时支持了开源社区内热门深度学习模型的部署,目前v1.0共完成150+模型的支持,下表为部分重点模型的支持情况,阅读 部署示例 了解更多详细内容。
场景 | 支持模型 |
---|---|
图像分类 | ResNet/MobileNet/PP-LCNet/YOLOv5-Clas等系列模型 |
目标检测 | PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet等系列模型 |
语义分割 | PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet等系列模型 |
图像/视频抠图 | PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting |
文字识别 | PP-OCRv2/PP-OCRv3 |
视频超分 | PP-MSVSR/BasicVSR/EDVR |
目标跟踪 | PP-Tracking |
姿态/关键点识别 | PP-TinyPose/HeadPose-FSANet |
人脸对齐 | PFLD/FaceLandmark1000/PIPNet等系列模型 |
人脸检测 | RetinaFace/UltraFace/YOLOv5-Face/SCRFD等系列模型 |
人脸识别 | ArcFace/CosFace/PartialFC/VPL/AdaFace等系列模型 |
语音合成 | PaddleSpeech 流式语音合成模型 |
语义表示 | PaddleNLP ERNIE 3.0 Tiny系列模型 |
信息抽取 | PaddleNLP 通用信息抽取UIE模型 |
文图生成 | Stable Diffusion |
高性能服务化部署
FastDeploy基于 Triton Inference Server 提供服务化部署能力。支持Paddle/ONNX模型在不同硬件以及不同后端上的高性能服务化部署体验。
自动化压缩与模型转换
PaddleSlim自动化压缩
FastDeploy基于 PaddleSlim 提供一键量化工具,通过如下命令快速完成模型的无损压缩加速。
fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
--method='PTQ' --save_dir='./yolov5s_ptq_model/'
目前FastDeploy已完成量化模型与如下后端的适配测试
硬件/推理后端 | ONNX Runtime | Paddle Inference | TensorRT | Paddle Inference TensorRT | Paddle Lite |
---|---|---|---|---|---|
CPU | 支持 | 支持 | - | - | 支持 |
GPU | - | - | 支持 | 支持 | - |
RK1126 | - | - | - | - | 支持 |
自动压缩精度与性能对比如下表所示,精度近乎无损,性能最高提升400%
一键压缩的更多细节与使用方式,参阅FastDeploy一键压缩功能。
模型转换
为了便于对多框架模型的部署支持,FastDeploy预置了 X2Paddle 转换能力,在安装FastDeploy后,通过如下命令可快速完成转换,并通过FastDeploy部署。
fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model
更多使用方式,参阅FastDeploy模型转换。
端到端部署性能优化
FastDeploy在各模型的部署中,重点关注端到端到的部署体验和性能。在1.0版本中,FastDeploy在端到端进行如下优化
- 服务端对预处理过程进行融合,降低内存创建开销和计算量
- 移动端集成百度视觉技术部自研高性能图像处理库 FlyCV
结合FastDeploy多后端支持的优势,相较原有部署代码,所有模型端到端性能大幅提升,下表为其中部分模型的测试数据,
1.0.0 Release Note
We are excited to announce the release of ⚡️FastDeploy 1.0.0! 🎉 FastDeploy supports high performance end-to-end deployment for over 150 AI models from PaddlePaddle and other open source community on multiple hardware.
Multiple Inference Backend and Hardware Support
FastDeploy supports inference deployment on multiple hardware with different backends, each backend module can be flexibly compiled and integrated according to the developer's needs, please refer to FastDeploy compilation documentation。
Backend | Platform | Model Format | Supported Hardware in FastDeploy |
---|---|---|---|
Paddle Inference | Linux(x64)/Windows(x64) | Paddle | x86 CPU/NVIDIA GPU/GraphCore IPU |
Paddle Lite | Linux(aarch64/armhf)/Android | Paddle | Arm CPU/Kunlun R200/RV1126 |
Poros | Linux(x64)/Windows(x64) | TorchScript | x86 CPU/NVIDIA GPU |
OpenVINO | Linux(x64)/Windows(x64)/OSX(x86) | Paddle/ONNX | x86 CPU/Intel GPU |
TensorRT | Linux(x64/aarch64)/Windows(x64) | Paddle/ONNX | NVIDIA GPU/Jetson |
ONNX Runtime | Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) | Paddle/ONNX | x86 CPU/Arm CPU/NVIDIA GPU |
In addition, FastDeploy also supports the deployment of models on the web and mini application based on Paddle.js, see Web Deployment for more details.
AI Model End-to-end Inference Support
FastDeploy supports end-to-end deployment of the following PaddlePaddle models are as follows:
- PaddleOCR deployment tutorial
- PaddleDetection deployment tutorial
- PaddleSeg deployment tutorial
- PaddleClas deployment tutorial
- PaddleGAN deployment tutorial
In addition, FastDeploy also supports the deployment of popular deep learning models in the open source community. over 150 models are currently supported in release 1.0, the table below shows some of the key models supported, refer to deployment examples for more details.
Task | Supported Models |
---|---|
Classification | ResNet/MobileNet/PP-LCNet/YOLOv5-Clas and other series models |
Object Detection | PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet and other series models |
Segmentation | PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet and other series models |
Image/Video Matting | PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting |
OCR | PP-OCRv2/PP-OCRv3 |
Video Super-Resolution | PP-MSVSR/BasicVSR/EDVR |
Object Tracking | PP-Tracking |
Posture/Key-point Recognition | PP-TinyPose/HeadPose-FSANet |
Face Align | PFLD/FaceLandmark1000/PIPNet and other series models |
Face Detection | RetinaFace/UltraFace/YOLOv5-Face/SCRFD and other series models |
Face Recognition | ArcFace/CosFace/PartialFC/VPL/AdaFace and other series models |
Text-to-Speech | PaddleSpeech Streaming Speech Synthesis Model |
Semantic Representation | PaddleNLP ERNIE 3.0 series models |
Information Extraction | PaddleNLP Universal Information Extraction UIE model |
Content Generation | Stable Diffusion |
High Performance Serving Deployment
⚡️FastDeploy provides high performance serving system for AI model based on Triton Inference Server . Supports the Paddle/ONNX model for a fast service-base deployment experience on different hardware and different backends.
- FastDeploy Serving Image preparation
- Serving deployment process description
- PaddleClas models Service-based deployment process description
- PaddleDetection models Service-based deployment process description
- PaddleOCR models
Tool Components
PaddleSlim Auto Compression Toolkit
FastDeploy provides a one-click quantization tool based on PaddleSlim to quickly speed up the lossless compression of models with the following commands.
fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
--method='PTQ' --save_dir='./yolov5s_ptq_model/'
FastDeploy has now completed testing the adaptation of the quantitative model on the following backend
Hardware/Deployment backend | ONNX Runtime | Paddle Inference | TensorRT | Paddle Inference TensorRT | Paddle Lite |
---|---|---|---|---|---|
CPU | Supported | Supported | - | - | Supported |
GPU | - | - | Supported | Supported | - |
RK1126 | - | - | - | - | Supported |
The following table compares the accuracy and performance of auto-compression, with virtually no loss of overall accuracy and improved performance 100%~400%
For more details and usage of the one-click quantization, see FastDeploy one-click quantization.
Model Conversion
To facilitate deployment support for multiple framework models, FastDeploy integrates X2Paddle conversion capabilities, which can be quickly completed and deployed via FastDeploy with the following command after installing FastDeploy.
fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model
For more information on how to use it, see FastDeploy Model Conversion。
End-to-end Deployment Performance Optimisation
FastDeploy focuses on the end-to-end deployment experience and performance in each model deployment. In version 1.0, FastDeploy has made the following end-to-end optimisations:
- Server-side fusion of pre-processing processes to reduce memory creation overhead and computation
- Mobile integration with Baidu Vision's own high-performance image processing library FlyCV
The end-to-end inference performance of all models is significantly improved compared to the original deployment code which has Combined with the advantages of FastDeploy's multi-backend support. and the following table shows the test data of some of the models
Thanks to the following developers for their contributions to FastDeploy! Contributors List
@leiqing1 @jiangjiajun @DefTruth @joey12300 @felixhjh @ziqi-jin @yunyaoXYY @wjj19950828 @heliqi @ZeyuChen @ChaoII @Zheng-Bicheng @wang-xinyu @HexToString @yeliang2258 @WinterGeng @LDOUBLEV @rainyfly @czr-gc @chenqianhe @kiddyjinjin @Zeref996 @TrellixVulnTeam @D-DanielYang @totorolin @hguandl @ChrisKong93 @Xiue233 @jm12138 @triple-Mu @yingshengBD @GodIsBoom @PatchTester @onecatcn