Tuesday, May 28, 2024

Tensor Quantization: The Untold Story | by Dhruv Matani | Sep, 2023

Must read


Let’s begin with a quite simple instance and construct it up.

Instance-1: Symmetric uint8 quantization

Let’s say we want to map the floating level vary [0.0 .. 1000.0] to the quantized vary [0 .. 255]. The vary [0 .. 255] is the set of values that may slot in an unsigned 8-bit integer.

To carry out this transformation, we need to rescale the floating level vary in order that the next is true:

Floating level 0.0 = Quantized 0

Floating level 1000.0 = Quantized 255

That is known as symmetric quantization as a result of the floating level 0.0 is quantized 0.

Therefore, we outline a scale, which is the same as

The place,

On this case, scale = 3.9215

To transform from a floating level worth to a quantized worth, we will merely divide the floating level worth by the size. For instance, the floating level worth 500.0 corresponds to the quantized worth

On this easy instance, the 0.0 of the floating level vary maps precisely to the 0 within the quantized vary. That is known as symmetric quantization. Let’s see what occurs when this isn’t the case.

Instance-2: Affine uint8 quantization

Let’s say we want to map the floating level vary [-20.0 .. 1000.0] to the quantized vary [0 .. 255].

On this case, now we have a unique scaling issue since our xmin is completely different.

Let’s see what the floating level quantity 0.0 is represented by within the quantized vary if we apply the scaling issue to 0.0

Nicely, this doesn’t fairly appear proper since, based on the diagram above, we’d have anticipated the floating level worth -20.0 to map to the quantized worth 0.

That is the place the idea of zero-point is available in. The zero-point acts as a bias for shifting the scaled floating level worth and corresponds to the worth within the quantized vary that represents the floating level worth 0.0. In our case, the zero level is the unfavorable of the scaled floating level illustration of -20.0, which is -(-5) = 5. The zero level is all the time the unfavorable of the illustration of the minimal floating level worth for the reason that minimal will all the time be unfavorable or zero. We’ll discover out extra about why that is the case within the part that explains instance 4.

Each time we quantize a worth, we are going to all the time add the zero-point to this scaled worth to get the precise quantized worth within the legitimate quantization vary. In case we want to quantize the worth -20.0, we compute it because the scaled worth of -20.0 plus the zero-point, which is -5 + 5 = 0. Therefore, quantized(-20.0, scale=4, zp=5) = 0.

Instance-3: Affine int8 quantization

What occurs if our quantized vary is a signed 8-bit integer as a substitute of an unsigned 8-bit integer? Nicely, the vary is now [-128 .. 127].

On this case, -20.0 within the float vary maps to -128 within the quantized vary, and 1000.0 within the float vary maps to 127 within the quantized vary.

The best way we calculate zero level is that we compute it as if the quantized vary is [0 .. 255] after which offset it with -128, so the zero level within the new vary is

Therefore, the zero-point for the brand new vary is -123.

Up to now, we’ve checked out examples the place the floating level vary consists of the worth 0.0. Within the subsequent set of examples, we’ll check out what occurs when the floating level vary doesn’t embrace the worth 0.0

The significance of 0.0

Why is it vital for the floating level worth 0.0 to be represented within the floating level vary?

When utilizing a padded convolution, we anticipate the border pixels to be padded utilizing the worth 0.0 in the commonest case. Therefore, it’s vital for 0.0 to be represented within the floating level vary. Equally, if the worth X goes for use for padding in your community, that you must ensure that the worth X is represented within the floating level vary and that quantization is conscious of this.

Instance-4: The untold story — skewed floating level vary

Now, let’s check out what occurs if 0.0 isn’t a part of the floating level vary.

On this instance, we’re making an attempt to quantize the floating level vary [40.0 .. 1000.0] into the quantized vary [0 .. 255].

Since we will’t characterize the worth 0.0 within the floating level vary, we have to prolong the decrease restrict of the vary to 0.0.

We will see that some a part of the quantized vary is wasted. To find out how a lot, let’s compute the quantized worth that the floating level worth 40.0 maps to.

Therefore, we’re losing the vary [0 .. 9] within the quantized vary, which is about 3.92% of the vary. This might considerably have an effect on the mannequin’s accuracy post-quantization.

This skewing is critical if we want to ensure that the worth 0.0 within the floating level vary could be represented within the quantized vary.

Another excuse for together with the worth 0.0 within the floating level vary is that effectively evaluating a quantized worth to test if it’s 0.0 within the floating level vary could be very worthwhile. Consider operators equivalent to ReLU, which clip all values under 0.0 within the floating level vary to 0.0.

It is vital for us to have the ability to characterize the zero-point utilizing the identical knowledge sort (signed or unsigned int8) because the quantized values. This permits us to carry out these comparisons shortly and effectively.

Subsequent, let’s check out how activation normalization helps with mannequin quantization. We’ll particularly deal with how the standardization of the activation values permits us to make use of your entire quantized vary successfully.



Supply hyperlink

More articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest article