Skip to content

Latest commit

 

History

History
149 lines (100 loc) · 9.86 KB

File metadata and controls

149 lines (100 loc) · 9.86 KB

OpenVINO Stable Diffusion (with LoRA) C++ Image Generation Pipeline

The pure C++ text-to-image pipeline, driven by the OpenVINO native C++ API for Stable Diffusion v1.5 with LMS Discrete Scheduler, supports both static and dynamic model inference. It includes advanced features like LoRA integration with safetensors and OpenVINO Tokenizers. Loading openvino_tokenizers to ov::Core enables tokenization. The sample uses diffusers for image generation and imwrite for saving .bmp images. This demo has been tested on Windows and Unix platforms. There is also a Jupyter notebook which provides an example of image generation in Python.

Note

This tutorial assumes that the current working directory is <openvino.genai repo>/image_generation/stable_diffusion_1_5/cpp/ and all paths are relative to this folder.

Step 1: Prepare Build Environment

Prerequisites:

C++ Packages:

Prepare a python environment and install dependencies:

conda create -n openvino_sd_cpp python==3.10
conda activate openvino_sd_cpp
conda install -c conda-forge openvino=2024.2.0 c-compiler cxx-compiler git make cmake
# Ensure that Conda standard libraries are used
conda env config vars set LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

Step 2: Obtain Stable Diffusion Model

  1. Install dependencies to import models from HuggingFace:

    git submodule update --init
    # Reactivate Conda environment after installing dependencies and setting env vars
    conda activate openvino_sd_cpp
    python -m pip install -r ../../requirements.txt
    python -m pip install ../../../thirdparty/openvino_tokenizers/[transformers]
  2. Download the model from Huggingface and convert it to OpenVINO IR via optimum-intel CLI.

    Example models to download:

    Example command for downloading dreamlike-art/dreamlike-anime-1.0 model and exporting it with FP16 precision:

    optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 models/dreamlike_anime_1_0_ov/FP16

    You can also choose other precision and export FP32 or INT8 model.

    Please, refer to the official website for 🤗 Optimum and optimum-intel to read more details.

    If https://huggingface.co/ is down, the script won't be able to download the model.

Note

Now the pipeline support batch size = 1 only, i.e. static model (1, 3, 512, 512)

(Optional) Enable LoRA Weights with Safetensors

Low-Rank Adaptation (LoRA) is a technique introduced to deal with the problem of fine-tuning Diffusers and Large Language Models (LLMs). In the case of Stable Diffusion fine-tuning, LoRA can be applied to the cross-attention layers for the image representations with the latent described.

LoRA weights can be enabled for Unet model of Stable Diffusion pipeline to generate images with different styles.

In this sample LoRA weights are used in safetensors format. Safetensors is a serialization format developed by Hugging Face that is specifically designed for efficiently storing and loading large tensors. It provides a lightweight and efficient way to serialize tensors, making it easier to store and load machine learning models.

The LoRA safetensors model is loaded via safetensors.h. The layer name and weight are modified with Eigen library and inserted into the SD models with ov::pass::MatcherPass in the file common/diffusers/src/lora.cpp.

There are various LoRA models on https://civitai.com/tag/lora and on HuggingFace, you can consider to choose your own LoRA model in safetensor format. For example, you can use LoRA soulcard model. Download and put LoRA safetensors model into the models directory. When running the built sample provide the path to the LoRA model with -l, --loraPath arg argument.

Step 3: Build the SD Application

conda activate openvino_sd_cpp
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build
cmake --build build --parallel

Step 4: Run Pipeline

./build/stable_diffusion [-p <posPrompt>] [-n <negPrompt>] [-s <seed>] [--height <output image>] [--width <output image>] [-d <device>] [-r <readNPLatent>] [-l <lora.safetensors>] [-a <alpha>] [-h <help>] [-m <modelPath>] [-t <modelType>] [--guidanceScale <guidanceScale>] [--dynamic]

Usage:
  stable_diffusion [OPTION...]
  • -p, --posPrompt arg Initial positive prompt for SD (default: "cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting")
  • -n, --negPrompt arg The prompt to guide the image generation away from. Ignored when not using guidance (--guidanceScale is less than 1) (default: "")
  • -d, --device arg AUTO, CPU, or GPU. Doesn't apply to Tokenizer model, OpenVINO Tokenizers can be inferred on a CPU device only (default: CPU)
  • --step arg Number of diffusion step ( default: 20)
  • -s, --seed arg Number of random seed to generate latent (default: 42)
  • --guidanceScale arg A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality (default: 7.5)
  • --num arg Number of image output(default: 1)
  • --height arg Height of output image (default: 512)
  • --width arg Width of output image (default: 512)
  • -c, --useCache Use model caching
  • -r, --readNPLatent Read numpy generated latents from file
  • -m, --modelPath arg Specify path of SD model IR (default: ../models/dreamlike_anime_1_0_ov)
  • -t, --type arg Specify the type of SD model IRs (FP32, FP16 or INT8) (default: FP16)
  • --dynamic Specify the model input shape to use dynamic shape
  • -l, --loraPath arg Specify path of lora file. (*.safetensors). (default: )
  • -a, --alpha arg alpha for lora (default: 0.75)
  • -h, --help Print usage

Note

The tokenizer model will always be loaded to CPU: OpenVINO Tokenizers can be inferred on a CPU device only.

Examples

Positive prompt: cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting

Negative prompt: (empty, check the Notes for details)

To read the numpy latent instead of C++ std lib for the alignment with Python pipeline, use -r, --readNPLatent argument.

  • Generate image without lora ./build/stable_diffusion -r

  • Generate image with soulcard lora ./build/stable_diffusion -r -l path/to/soulcard.safetensors

  • Generate different size image with dynamic model (C++ lib generated latent): ./build/stable_diffusion -m ./models/dreamlike_anime_1_0_ov -t FP16 --dynamic --height 448 --width 704

Notes

For the generation quality, be careful with the negative prompt and random latent generation. C++ random generation with MT19937 results differ from numpy.random.randn(). Hence, please use -r, --readNPLatent for the alignment with Python (this latent file is for output image 512X512 only).

Guidance Scale

Guidance scale controls how similar the generated image will be to the prompt. A higher guidance scale means the model will try to generate an image that follows the prompt more strictly. A lower guidance scale means the model will have more creativity. guidance_scale is a way to increase the adherence to the conditional signal that guides the generation (text, in this case) as well as overall sample quality. It is also known as classifier-free guidance.

Negative Prompt

To improve image generation quality, model supports negative prompting. Technically, positive prompt steers the diffusion toward the images associated with it, while negative prompt steers the diffusion away from it. In other words, negative prompt declares undesired concepts for generation image, e.g. if we want to have colorful and bright image, gray scale image will be result which we want to avoid, in this case gray scale can be treated as negative prompt. The positive and negative prompt are in equal footing. You can always use one with or without the other. More explanation of how it works can be found in this article.

Note

Negative prompting is applicable only for high guidance scale (at least > 1).

LoRA Weights Enabling

Refer to the OpenVINO blog to get more information on enabling LoRA weights.