Connect with us


Stable Diffusion can also compress images: smaller than JPEG, clearer to the naked eye, but don’t try faces



The free and open source Stable Diffusion has been played with new tricks:

This time it was broughtCompress image.

Stable Diffusion can not only reduce the same original image to a smaller size, but also performVisually outperforms JPEG and WebP.

For the same original image, the image compressed by Stable Diffusion not only has more details, but also has fewer compression artifacts.

But Matthias Bühlmann (let’s call him Brother MB), a software engineer who uses Stable Diffusion to compress graphs, also pointed out that this approach also has obvious limitations.

Because it is not very good at dealing with faces and texts, etc., sometimes even after decoding and expanding back, features that do not exist in the original image are transformed.

For example like this (the effect can be startling):

The left is the original image, and the right is the generated image of Stable Diffusion compression and expansion

But then again-

How does Stable Diffusion compress images?

To explain how Stable Diffusion compresses images, let’s start with some important working principles of Stable Diffusion.

Stable Diffusion is a special diffusion model calledpotential spread (Latent Diffusion).

Unlike Standard Diffusion, Latent Diffusion performs the diffusion process on a lower-dimensional latent space without using the actual pixel space.

That is, the representation of the latent space results in some compressed maps with lower resolution, but these maps have higher accuracy.

Let me say here that the resolution and accuracy of the image are two different things. Resolution is a parameter that represents the amount of data in a graph, while precision is a quantity that reflects how close the result is to the true value.

Take the head photo of this camel as an example: the original image size is 768KB, the resolution is 512×512, and the precision is 3×8 bits.

After compressing to 4.98KB with Stable Diffusion, the resolution is reduced to 64 × 64, and the precision is increased to 4 × 32 bits.

So it seems that the compressed image of Stable Diffusion is not much different from the original image.

If further specific, the potential diffusion model of Stable Diffusion hasthree main components:

VAE (Variational Auto Encoder, Variational Auto Encoder),U-Net,andtext encoder (Text-encoder).

In this test of compressing images, however, the text encoder was useless.

Playing the main role is the VAE, which consists of two parts: an encoder and a decoder.

Therefore, VAE can encode and decode a picture from the image space to obtain some latent space representations.

Brother MB found that the decoding function of VAE is very stable for quantizing latent representations.

By scaling, dragging, and remapping, quantizing the underlying representation from floating point to 8-bit unsigned integer, you can get a less distorted compressed image:

First, quantize latents into 8-bit unsigned integers, and the image size is 64×64×4×8Bit=16 kB (original image size is 512×512×3×8Bit=768 kB).

Then use Palette and Dither to further reduce the data to 5kB, while also improving the restoration of the image.

As a rigorous programmer, in addition to observing with the naked eye, Brother MB also conducted data analysis on the image quality.

However, from the perspective of two important indicators of image quality evaluation, PSNR (peak signal-to-noise ratio) and SSIM (structural similarity), the compression results of Stable Diffusion are not much better than JPG and WebP.

In addition, when the latent representation re-decoding is extended to the original image resolution, although the main features of the image are still visible, the VAE also assigns high-resolution features to these pixel values.

In vernacular, the reconstructed image is often different from the original image, and contains many newly generated “ghost animal” features.

Let’s review this image again:

Although there are indeed many problems in using Stable Diffusion to compress images, in the words of Brother MB, the effect is still amazing and very promising.

Now Brother MB has put the relevant code on Google Colab, and interested friends can take a closer look~


Reference link:

  • [1]

  • [2]

  • [3]

This article is from the WeChat public account:Qubit (ID: QbitAI)Author: Alex

Continue Reading
Click to comment

Leave a Reply

Your email address will not be published. Required fields are marked *