Skip to content

FastDeploy 1.0.0

Latest
Compare
Choose a tag to compare
@jiangjiajun jiangjiajun released this 28 Nov 06:20
· 1176 commits to develop since this release
d38aa45

1.0.0 Release Note

全场景高性能AI部署工具⚡️FastDeploy 1.0.0正式发布!🎉 支持飞桨及开源社区150+模型的多硬件高性能部署,为开发者提供简单全场景简单易用极致高效的全新部署体验!

多推理后端与多硬件支持

FastDeploy支持在多种硬件上以不同后端的方式进行推理部署,各后端模块可根据开发者需求灵活编译集成,自行编译参考 FastDeploy编译文档

后端 平台 支持模型格式 支持硬件
Paddle Inference Linux(x64)/Windows(x64) Paddle x86 CPU/NVIDIA GPU/Jetson/GraphCore IPU
Paddle Lite Linux(aarch64/armhf)/Android Paddle Arm CPU/Kunlun R200/RV1126
Poros Linux(x64) TorchScript x86 CPU/NVIDIA GPU
OpenVINO Linux(x64)/Windows(x64)/OSX(x86) Paddle/ONNX x86 CPU/Intel GPU
TensorRT Linux(x64/aarch64)/Windows(x64) Paddle/ONNX NVIDIA GPU/Jetson
ONNX Runtime Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) Paddle/ONNX x86 CPU/Arm CPU/NVIDIA GPU

除此之外,FastDeploy也基于Paddle.js 支持模型在网页前端及智能小程序部署工具,参阅 Web部署 了解更多细节。

丰富的AI模型端到端推理

FastDeploy支持如下飞桨模型套件的端到端部署

除飞桨开发套件外,FastDeploy同时支持了开源社区内热门深度学习模型的部署,目前v1.0共完成150+模型的支持,下表为部分重点模型的支持情况,阅读 部署示例 了解更多详细内容。

场景 支持模型
图像分类 ResNet/MobileNet/PP-LCNet/YOLOv5-Clas等系列模型
目标检测 PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet等系列模型
语义分割 PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet等系列模型
图像/视频抠图 PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting
文字识别 PP-OCRv2/PP-OCRv3
视频超分 PP-MSVSR/BasicVSR/EDVR
目标跟踪 PP-Tracking
姿态/关键点识别 PP-TinyPose/HeadPose-FSANet
人脸对齐 PFLD/FaceLandmark1000/PIPNet等系列模型
人脸检测 RetinaFace/UltraFace/YOLOv5-Face/SCRFD等系列模型
人脸识别 ArcFace/CosFace/PartialFC/VPL/AdaFace等系列模型
语音合成 PaddleSpeech 流式语音合成模型
语义表示 PaddleNLP ERNIE 3.0 Tiny系列模型
信息抽取 PaddleNLP 通用信息抽取UIE模型
文图生成 Stable Diffusion

高性能服务化部署

FastDeploy基于 Triton Inference Server 提供服务化部署能力。支持Paddle/ONNX模型在不同硬件以及不同后端上的高性能服务化部署体验。

自动化压缩与模型转换

PaddleSlim自动化压缩

FastDeploy基于 PaddleSlim 提供一键量化工具,通过如下命令快速完成模型的无损压缩加速。

fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
                    --method='PTQ' --save_dir='./yolov5s_ptq_model/'  

目前FastDeploy已完成量化模型与如下后端的适配测试

硬件/推理后端 ONNX Runtime Paddle Inference TensorRT Paddle Inference TensorRT Paddle Lite
CPU 支持 支持 - - 支持
GPU - - 支持 支持 -
RK1126 - - - - 支持

自动压缩精度与性能对比如下表所示,精度近乎无损,性能最高提升400%
image

一键压缩的更多细节与使用方式,参阅FastDeploy一键压缩功能

模型转换

为了便于对多框架模型的部署支持,FastDeploy预置了 X2Paddle 转换能力,在安装FastDeploy后,通过如下命令可快速完成转换,并通过FastDeploy部署。

fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model

更多使用方式,参阅FastDeploy模型转换

端到端部署性能优化

FastDeploy在各模型的部署中,重点关注端到端到的部署体验和性能。在1.0版本中,FastDeploy在端到端进行如下优化

  • 服务端对预处理过程进行融合,降低内存创建开销和计算量
  • 移动端集成百度视觉技术部自研高性能图像处理库 FlyCV

结合FastDeploy多后端支持的优势,相较原有部署代码,所有模型端到端性能大幅提升,下表为其中部分模型的测试数据,
bf6bea741738e3e2944945cda30d95c2

1.0.0 Release Note

We are excited to announce the release of ⚡️FastDeploy 1.0.0! 🎉 FastDeploy supports high performance end-to-end deployment for over 150 AI models from PaddlePaddle and other open source community on multiple hardware.

Multiple Inference Backend and Hardware Support

FastDeploy supports inference deployment on multiple hardware with different backends, each backend module can be flexibly compiled and integrated according to the developer's needs, please refer to FastDeploy compilation documentation

Backend Platform Model Format Supported Hardware in FastDeploy
Paddle Inference Linux(x64)/Windows(x64) Paddle x86 CPU/NVIDIA GPU/GraphCore IPU
Paddle Lite Linux(aarch64/armhf)/Android Paddle Arm CPU/Kunlun R200/RV1126
Poros Linux(x64)/Windows(x64) TorchScript x86 CPU/NVIDIA GPU
OpenVINO Linux(x64)/Windows(x64)/OSX(x86) Paddle/ONNX x86 CPU/Intel GPU
TensorRT Linux(x64/aarch64)/Windows(x64) Paddle/ONNX NVIDIA GPU/Jetson
ONNX Runtime Linux(x64/aarch64)/Windows(x64)/OSX(x86/arm64) Paddle/ONNX x86 CPU/Arm CPU/NVIDIA GPU

In addition, FastDeploy also supports the deployment of models on the web and mini application based on Paddle.js, see Web Deployment for more details.

AI Model End-to-end Inference Support

FastDeploy supports end-to-end deployment of the following PaddlePaddle models are as follows:

In addition, FastDeploy also supports the deployment of popular deep learning models in the open source community. over 150 models are currently supported in release 1.0, the table below shows some of the key models supported, refer to deployment examples for more details.

Task Supported Models
Classification ResNet/MobileNet/PP-LCNet/YOLOv5-Clas and other series models
Object Detection PP-YOLOE/PicoDet/RCNN/PP-YOLOE/YOLOv5/YOLOv6/YOLOv7/YOLOX/NanoDet and other series models
Segmentation PP-LiteSeg/PP-HumanSeg/DeepLabv3p/UNet and other series models
Image/Video Matting PP-Matting/PP-Mattingv2/ModNet/RobustVideoMatting
OCR PP-OCRv2/PP-OCRv3
Video Super-Resolution PP-MSVSR/BasicVSR/EDVR
Object Tracking PP-Tracking
Posture/Key-point Recognition PP-TinyPose/HeadPose-FSANet
Face Align PFLD/FaceLandmark1000/PIPNet and other series models
Face Detection RetinaFace/UltraFace/YOLOv5-Face/SCRFD and other series models
Face Recognition ArcFace/CosFace/PartialFC/VPL/AdaFace and other series models
Text-to-Speech PaddleSpeech Streaming Speech Synthesis Model
Semantic Representation PaddleNLP ERNIE 3.0 series models
Information Extraction PaddleNLP Universal Information Extraction UIE model
Content Generation Stable Diffusion

High Performance Serving Deployment

⚡️FastDeploy provides high performance serving system for AI model based on Triton Inference Server . Supports the Paddle/ONNX model for a fast service-base deployment experience on different hardware and different backends.

Tool Components

PaddleSlim Auto Compression Toolkit

FastDeploy provides a one-click quantization tool based on PaddleSlim to quickly speed up the lossless compression of models with the following commands.

fastdeploy compress --config_path=./configs/detection/yolov5s_quant.yaml \
                    --method='PTQ' --save_dir='./yolov5s_ptq_model/'  

FastDeploy has now completed testing the adaptation of the quantitative model on the following backend

Hardware/Deployment backend ONNX Runtime Paddle Inference TensorRT Paddle Inference TensorRT Paddle Lite
CPU Supported Supported - - Supported
GPU - - Supported Supported -
RK1126 - - - - Supported

The following table compares the accuracy and performance of auto-compression, with virtually no loss of overall accuracy and improved performance 100%~400%

image

For more details and usage of the one-click quantization, see FastDeploy one-click quantization.

Model Conversion

To facilitate deployment support for multiple framework models, FastDeploy integrates X2Paddle conversion capabilities, which can be quickly completed and deployed via FastDeploy with the following command after installing FastDeploy.

fastdeploy convert --framework onnx --model yolov5s.onnx --save_dir yolov5s_paddle_model

For more information on how to use it, see FastDeploy Model Conversion

End-to-end Deployment Performance Optimisation

FastDeploy focuses on the end-to-end deployment experience and performance in each model deployment. In version 1.0, FastDeploy has made the following end-to-end optimisations:

  • Server-side fusion of pre-processing processes to reduce memory creation overhead and computation
  • Mobile integration with Baidu Vision's own high-performance image processing library FlyCV

The end-to-end inference performance of all models is significantly improved compared to the original deployment code which has Combined with the advantages of FastDeploy's multi-backend support. and the following table shows the test data of some of the models

image

Thanks to the following developers for their contributions to FastDeploy! Contributors List
@leiqing1 @jiangjiajun @DefTruth @joey12300 @felixhjh @ziqi-jin @yunyaoXYY @wjj19950828 @heliqi @ZeyuChen @ChaoII @Zheng-Bicheng @wang-xinyu @HexToString @yeliang2258 @WinterGeng @LDOUBLEV @rainyfly @czr-gc @chenqianhe @kiddyjinjin @Zeref996 @TrellixVulnTeam @D-DanielYang @totorolin @hguandl @ChrisKong93 @Xiue233 @jm12138 @triple-Mu @yingshengBD @GodIsBoom @PatchTester @onecatcn