pharmapsychotic commited on
Commit
0e13e08
1 Parent(s): c58b854

Updated README

Browse files
Files changed (2) hide show
  1. README.md +22 -10
  2. examples.jpg +0 -0
README.md CHANGED
@@ -1,9 +1,11 @@
1
  ---
2
  license: openrail++
 
3
  language:
4
  - en
5
  tags:
6
  - stable-diffusion
 
7
  - tensorrt
8
  - text-to-image
9
  ---
@@ -13,19 +15,29 @@ tags:
13
  ### Introduction
14
  This repository contains Stable Diffusion XL 1.0 ONNX models compatible with TensorRT.
15
 
16
- Source models:
17
- - [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
18
- - [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0)
19
 
 
20
 
21
- ### Benchmark
22
 
23
- Timings at 1024x1024
24
- | | CLIP | UNet 40 steps | VAE decode | Pipline | Throughput |
25
- |------|---------|---------------|------------|-------------|--------------|
26
- | A10 | 8.98 ms | 12576.53 ms | 0.00 ms | 12588.26 ms | 0.08 image/s |
27
- | A100 | 5.99 ms | 3358.87 ms | 0.00 ms | 3367.04 ms | 0.30 image/s |
28
- | H100 | 4.70 ms | 1772.29 ms | 0.00 ms | 1779.01 ms | 0.56 image/s |
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ### Model Description
31
 
 
1
  ---
2
  license: openrail++
3
+ base_model: stabilityai/stable-diffusion-xl-base-1.0
4
  language:
5
  - en
6
  tags:
7
  - stable-diffusion
8
+ - stable-diffusion-xl
9
  - tensorrt
10
  - text-to-image
11
  ---
 
15
  ### Introduction
16
  This repository contains Stable Diffusion XL 1.0 ONNX models compatible with TensorRT.
17
 
18
+ See [SDXL base 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) and [SDXL refiner 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-refiner-1.0) for the source models.
 
 
19
 
20
+ ![examples](./examples.jpg)
21
 
 
22
 
23
+ ### Performance Comparison
24
+
25
+ Timings for 30 steps at 1024x1024
26
+
27
+ | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement |
28
+ |-------------|--------------------------|-----------------------------|------------------------|
29
+ | A10 | 9399 ms | 8160 ms | ~13% |
30
+ | A100 | 3704 ms | 2742 ms | ~26% |
31
+ | H100 | 2496 ms | 1471 ms | ~41% |
32
+
33
+ Image throughput for 30 steps
34
+
35
+ | Accelerator | Baseline (non-optimized) | NVIDIA TensorRT (optimized) | Percentage improvement |
36
+ |-------------|--------------------------|-----------------------------|------------------------|
37
+ | A10 | 0.10 images/sec | 0.12 images/sec | ~20% |
38
+ | A100 | 0.27 images/sec | 0.36 images/sec | ~33% |
39
+ | H100 | 0.40 images/sec | 0.68 images/sec | ~70% |
40
+
41
 
42
  ### Model Description
43
 
examples.jpg ADDED