Ridged mirror

Q is a fixed point number format where the number of fractional bits (and optionally the number of integer bits) is specified. For example, a Q15 number has 15 fractional bits; a Q1.14 number has 1 integer bit and 14 fractional bits. Q format is often used in hardware that does not have a floating-point unit and in applications that require constant resolution.

Characteristics

Q format numbers are (notionally) fixed point numbers (but not actually a number itself); that is, they are stored and operated upon as regular binary numbers (i.e. signed integers), thus allowing standard integer hardware/ALU to perform rational number calculations. The number of integer bits, fractional bits and the underlying word size are to be chosen by the programmer on an application-specific basis — the programmer's choices of the foregoing will depend on the range and resolution needed for the numbers. The machine itself remains oblivious to the notional fixed point representation being employed — it merely performs integer arithmetic the way it knows how. Ensuring that the computational results are valid in the Q format representation is the responsibility of the programmer.

The Q notation is written as Qm.n, where:

Q designates that the number is in the Q format notation — the Texas Instruments representation for signed fixed-point numbers (the "Q" being reminiscent of the standard symbol for the set of rational numbers).
m is the number of bits set aside to designate the two's complement integer portion of the number, exclusive of the sign bit (therefore if m is not specified it is taken as zero).
n is the number of bits used to designate the fractional portion of the number, i.e. the number of bits to the right of the binary point. (If n = 0, the Q numbers are integers — the degenerate case).

Note that the most significant bit is always designated as the sign bit (the number is stored as a two's complement number) in order to allow standard arithmetic-logic hardware to manipulate Q numbers. Representing a signed fixed-point data type in Q format therefore always requires m+n+1 bits to account for the sign bit. Hence the smallest machine word size required to accommodate a Qm.n number is m+n+1, with the Q number left justified in the machine word.

For a given Qm.n format, using an m+n+1 bit signed integer container with n fractional bits:

its range is [-2^m, 2^m - 2^-n]
its resolution is 2^-n

For example, a Q14.1 format number:

requires 14+1+1 = 16 bits
its range is [-2¹⁴, 2¹⁴ - 2⁻¹] = [-16384.0, +16383.5] = [0x8000, 0x8001 … 0xFFFF, 0x0000, 0x0001 … 0x7FFE, 0x7FFF]
its resolution is 2⁻¹ = 0.5

Unlike floating point numbers, the resolution of Q numbers will remain constant over the entire range.

Conversion

Float to Q

To convert a number from floating point to Qm.n format:

Multiply the floating point number by 2ⁿ
Round to the nearest integer

Q to float

To convert a number from Qm.n format to floating point:

Convert the number to floating point as if it were an integer
Multiply by 2⁻ⁿ

Math operations

Q numbers are a ratio of two integers: the numerator is kept in storage, the denominator is equal to 2ⁿ.

Consider the following example:

The Q8 denominator equals 2⁸ = 256

1.5 equals 384/256

384 is stored, 256 is inferred because it is a Q8 number.

If the Q number's base is to be maintained (n remains constant) the Q number math operations must keep the denominator constant. The following formulas shows math operations on the general Q numbers $N_{1}$ and $N_{2}$ .

\begin{aligned} \frac{N_{1}}{d} + \frac{N_{2}}{d} & = \frac{N_{1} + N_{2}}{d} \\ \frac{N_{1}}{d} - \frac{N_{2}}{d} & = \frac{N_{1} - N_{2}}{d} \\ (\frac{N_{1}}{d} \times \frac{N_{2}}{d}) \times d & = \frac{N_{1} \times N_{2}}{d} \\ (\frac{N_{1}}{d} / \frac{N_{2}}{d}) / d & = \frac{N_{1} / N_{2}}{d} \end{aligned}

Because the denominator is a power of two the multiplication can be implemented as an arithmetic shift to the left and the division as an arithmetic shift to the right; on many processors shifts are faster than multiplication and division.

To maintain accuracy the intermediate multiplication and division results must be double precision and care must be taken in rounding the intermediate result before converting back to the desired Q number.

Using C the operations are (note that here, Q refers to the fractional part's number of bits) :

Addition

signed int a, b, result;
result = a+b;

Subtraction

signed int a,b,result;
result = a-b;

Multiplication

// precomputed value:
#define K   (1 << (Q-1))

signed int       a, b, result;
signed long int  temp;
temp = (long int)a * (long int)b; // result type is operand's type
// Rounding; mid values are rounded up
temp += K;
// Correct by dividing by base
result = temp >> Q;

Division

signed int  a, b, result;
signed long int temp;
// pre-multiply by the base (Upscale to Q16 so that the result will be in Q8 format)
temp = (long int)a << Q;
// So the result will be rounded ; mid values are rounded up.
temp += b/2;
result = temp/b;

External links

Fixed Point Representation And Fractional Math (Note: the accuracy of the article is in dispute; see discussion.)

Ridged mirror

Contents

Characteristics

Conversion

Float to Q

Q to float

Math operations

Addition

Subtraction

Multiplication

Division

See also

External links

Navigation menu

Ridged mirror

Characteristics

Conversion

Float to Q

Q to float

Math operations

Addition

Subtraction

Multiplication

Division

See also

External links

Navigation menu

Search