|
|
Line 1: |
Line 1: |
| [[File:OpenArena-Rocket.jpg|thumb|300px|right|Lighting and reflection calculations (shown here in the [[first-person shooter]] ''[[OpenArena]]'') use the fast inverse square root code to compute [[Angle of incidence|angles of incidence]] and reflection.]] | | Whenever you get there, her feelings are made by Jemma quickly clear: the beach!. To research additional info, we know people take a gander at: [https://www.youtube.com/watch?v=zr5hweh0zJs hitachi magic wand massager]. So, you buy several buc... <br><br>Its a lovely warm weekend in the middle-of May and youre off with the family over a day-trip for the shore. Youve got a four-year old woman called a new child and Jemma called Carl who"s only sixteen months old. Therefore, you make preparations together with your wife and set off. The beach here we come! Jemmas quite pleased with her new picture book and Carl sleeps all of the way. <br><br>Whenever you make it happen, her feelings are made by Jemma quickly clear: the beach!. So, you purchase a number of spades and buckets, make certain the children have got sun-cream on and head for the beach. Exactly what a time. The children are busy building sand castles as you relax and go all in. A time later, Jemma spots the donkeys which have only been brought onto the beach, so that you just take the children for a donkey ride. They think it"s great! <br><br>Time later you begin getting hungry, and the family is prepared for a bite to eat. You collect every thing, clean down the kids and take a walk into town. So you park yourselves for an area of lunch you find a small caf with fair rates. <br><br>After the dinner you all have a wonder around-town, and as your partner is performing some window shopping you suddenly slip off the causeway edge. You hear a crack and find yourself on the ground. Considering the worst, you remain still for a few minutes. You dont feel any pain straight away therefore, with your wifes support, you gingerly reach your feet and surprisingly you are in a position to put your weight o-n your ankle. Well, it appears okay you say, so you continue travelling. After a while though, the leg starts to ache and slowly over-the next thirty minutes or so, you cant put any weight about it. Their beginning to swell as well. <br><br>Now you begin to worry. It could be damaged your lady says. When someone comes up to help youre starting to worry about the drive home. They call an ambulance for-you and youre taken up to the local hospital. After having an x-ray on your foot, a doctor confirms that theres a broken bone and that theres no-way youll be in a position to generate. Your thinking turn to your household and ways to get home. <br><br>Exactly what a situation to be in, miles from your home with your wife and two children and because the only driver you. Youre an RAC member but youre not sure this is really a car breakdown? <br><br>Technically its not, but you remember the pleasant RAC Face to Face agent pointing out the RAC can help if the driver becomes too ill or injured to travel, not only in an automobile break-down situation. <br><br>Therefore, you call the RAC Rescue Centre and explain the situation, to which the friendly lady responds; Fine friend, well get someone to you when possible. Well send a dysfunction recovery vehicle instantly. Exceptional you say. Thanks very much. <br><br>Ten minutes later you get a text telling you that help is all about twenty minutes away. You clamber into the taxi with the rest of one"s family, If the car arrives, with your wifes help. Thirty minutes later your vehicle is loaded onto the breakdown truck and secured. You make it home safely and cant mistake the RAC for a great days work. [https://www.youtube.com/watch?v=BoOUBXKVGiA Via] contains more about the inner workings of it. What a terrific service.. We discovered [https://www.youtube.com/watch?v=z-SpwKtsXmo the rabbit vibrator] by browsing newspapers.<br><br>When you cherished this short article and you want to obtain more info about best health insurance companies ([http://www.xfire.com/blog/secretiveroute663 click through the following document]) generously pay a visit to the internet site. |
| | |
| '''Fast inverse square root''' (sometimes referred to as '''Fast InvSqrt()''' or by the [[hexadecimal]] constant '''0x5f3759df''') is a method of calculating x<sup>−½</sup>, the [[Multiplicative inverse|reciprocal]] (or multiplicative inverse) of a [[square root]] for a 32-bit [[floating point]] number in [[Single precision floating-point format#IEEE 754 single precision binary floating-point format: binary32|IEEE 754 floating point format]]. The algorithm was probably developed at [[Silicon Graphics]] in the early 1990s, and an implementation appeared in 1999 in the ''[[Quake III Arena]]'' source code, but the method did not appear on public forums such as [[Usenet]] until 2002 or 2003.<ref name="Beyond3D" /> At the time, the primary advantage of the algorithm came from avoiding [[computationally expensive]] floating point operations in favor of integer operations. Inverse square roots are used to compute [[angles of incidence]] and [[Reflection (computer graphics)|reflection]] for [[lighting]] and [[shading]] in [[computer graphics]].
| |
| | |
| The algorithm accepts a 32-bit floating point number as the input and stores a halved value for later use. Then, treating the bits representing the floating point number as a 32-bit integer, a [[logical shift]] right of one bit is performed and the result subtracted from the "magic constant" [[Hexadecimal#Representing hexadecimal|0x]]5f3759df. This is the first approximation of the inverse square root of the input. Treating the bits again as floating point it runs one iteration of [[Newton's method]] to return a more precise approximation. This computes an approximation of the inverse square root of a floating point number approximately four times faster than [[floating point arithmetic|floating point division]].
| |
| | |
| The algorithm was originally attributed to [[John D. Carmack|John Carmack]], but an investigation showed that the code had deeper roots in both the hardware and software side of computer graphics. Adjustments and alterations passed through both Silicon Graphics and [[3dfx Interactive]], with Gary Tarolli's implementation for the [[SGI Indigo]] as the earliest known use. It is not known how the constant was originally derived, though investigation has shed some light on possible methods.
| |
| | |
| ==Motivation==
| |
| [[File:Surface normal.png|thumb|right|[[Surface normal]]s are used extensively in lighting and shading calculations, requiring the calculation of norms for vectors. A field of vectors normal to a surface is shown here.]]
| |
| The inverse square root of a floating point number is used in calculating a [[Unit vector|normalized vector]].{{sfntag|Blinn|2003|p=130}} Since a [[3D graphics]] program uses these normalized vectors to determine lighting and [[Lambert's cosine law|reflection]], millions of these calculations must be done per second. Before the creation of specialized hardware to handle [[Transform, clipping, and lighting|transform and lighting]], software computations could be slow. Specifically, when the code was developed in the early 1990s, most floating point processing power lagged behind the speed of integer processing.<ref name="Beyond3D" />
| |
| | |
| To normalize a vector, the length of the vector is determined by calculating its [[Euclidean norm]]: the square root of the sum of squares of the [[vector components]]. When each component of the vector is divided by that length, the new vector will be a [[unit vector]] pointing in the same direction.
| |
| | |
| :<math>\|\boldsymbol{v}\| = \sqrt{v_1^2+v_2^2+v_3^2}</math> is the Euclidean norm of the vector, analogous to the calculation of the [[Euclidean distance]] between two points in [[Euclidean space]].
| |
| | |
| :<math>\boldsymbol{\hat{v}} = \boldsymbol{v} / \|\boldsymbol{v}\|</math> is the normalized (unit) vector. Using <math>x</math> to represent <math>v_1^2+v_2^2+v_3^2</math>, | |
| | |
| :<math>\boldsymbol{\hat{v}} = \boldsymbol{v} / \sqrt{x}</math>, which relates the unit vector to the inverse square root of the distance components.
| |
| | |
| ''Quake III Arena'' used the fast inverse square root algorithm to speed graphics processing unit computation, but the algorithm has since been implemented in some dedicated hardware [[vertex shader]]s using [[field-programmable gate array]]s (FPGA).{{sfntag|Middendorf|2007|pp=155-164}}
| |
| | |
| ==Overview of the code==
| |
| The following code is the fast inverse square root implementation from ''[[Quake III Arena]]'', stripped of [[C preprocessor]] directives, but including the exact original comment text:<ref name="quakesrc" /><!--
| |
| YES, THE ORIGINAL SOURCE CODE INCLUDES "fuck" IN THE COMMENTS. THIS IS A DIRECT QUOTE. See [[WP:CENSOR]] before trying to remove it. -->
| |
| <source lang="c">
| |
| float Q_rsqrt( float number )
| |
| {
| |
| long i;
| |
| float x2, y;
| |
| const float threehalfs = 1.5F;
| |
| | |
| x2 = number * 0.5F;
| |
| y = number;
| |
| i = * ( long * ) &y; // evil floating point bit level hacking
| |
| i = 0x5f3759df - ( i >> 1 ); // what the fuck?
| |
| y = * ( float * ) &i;
| |
| y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
| |
| // y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
| |
| | |
| return y;
| |
| }
| |
| </source>
| |
| | |
| In order to determine the inverse square root, an approximation for <math>x^{-1/2}</math> would be determined by the software, then some numerical method would revise that approximation until it came within an acceptable error range of the actual result. [[Methods of computing square roots|Common software methods]] in the early 1990s drew a first approximation from a [[lookup table]].{{sfntag|Eberly|2001|p=504}} This bit of code proved faster than table lookups and approximately four times faster than regular floating point division.{{sfntag|Lomont|2003|p=1}} Some loss of precision occurred, but was offset by the significant gains in performance.{{sfntag|McEniry|2007|p=1}} The algorithm was designed with the [[IEEE 754-1985]] 32-bit floating point specification in mind, but investigation from Chris Lomont and later Charles McEniry showed that it could be implemented in other floating point specifications.
| |
| | |
| The advantages in speed offered by the fast inverse square root [[Kludge#In computer science|kludge]] came from treating the longword<ref group=note>Use of the type <code>long</code> reduces the portability of this code on modern systems. For the code to execute properly, <code>sizeof(long)</code> must be 4 bytes, otherwise negative outputs may result. Under many modern 64-bit systems, <code>sizeof(long)</code> is 8 bytes.</ref> containing the floating point number as an integer then subtracting it from a specific constant, '''0x5f3759df'''. The purpose of the constant is not immediately clear to someone viewing the code, so, like other such constants found in code, it is often called a "[[Magic number (programming)#Unnamed numerical constants|magic number]]".<ref name="Beyond3D" />{{sfntag|Lomont|2003|p=3}}{{sfntag|McEniry|2007|p=2, 16}}{{sfntag|Eberly|2002|p=2}} This integer subtraction and bit shift results in a longword which when treated as a floating point number is a rough approximation for the inverse square root of the input number. One iteration of Newton's method is performed to gain some precision, and the code is finished. The algorithm generates reasonably accurate results using a unique first approximation for [[Newton's method]]; however, it is much slower and less accurate than using the [[Streaming SIMD Extensions|SSE]] instruction <code>rsqrtss</code> on x86 processors, and also released in 1999.<ref name="ruskin" />
| |
| | |
| ===Aliasing from floating point to integer and back===
| |
| [[File:Float w significand.svg|right]]
| |
| Breaking this down requires some understanding of how a floating point number is stored. A floating point number represents a [[rational number]] expressed in three portions across [[Single precision|32 bits]]. The example decimal value of 0.15625 is 0.00101 in binary which is to be normalised by multiplying by a suitable power of two so as to have a leading "one" bit, thus 1.01×2<sup>−3</sup> which is to say an exponent of -3 and a mantissa of 1.01. Since all normalised numbers have their leading bit one, in this format it is ''not'' represented in storage but is ''implicit'' so that the represented mantissa is M = .01. The storage form then has the first bit, the sign bit, ''0'' for positive numbers and ''1'' for negative numbers. The next 8 bits form the exponent, which is [[Exponent bias|biased]] in order to result in a range of values from −127 to 128 (if interpreted as an eight-bit two's complement number) so that the required exponent of -3 is represented as E = 124 (= -3 + 127). The [[significand]] comprises the next 23 bits and represents the significant digits of the number stored. This is expressed as <math>x = (-1)^{\mathrm{Sign}}(1 + M)2^{E - B}</math> where the bias <math>B=127</math>.{{sfntag|McEniry|2007|p=2}} So, for the example, <math>0.15625_{10} = (-1)^{0}(1 + .01_2)2^{124 - 127}</math>
| |
| The value of the exponent determines whether the significand (referred to as the mantissa by Lomont 2003 and McEniry 2007) represents a fraction or an integer.{{sfntag|Hennessey|Patterson|1998|p=276}}
| |
| | |
| <div class="thumb tleft"><div style="width:20em;">
| |
| {|style="width:100%; margin:0;" cellspacing="0"
| |
| |- style="text-align:center;"
| |
| | style="background:#def;"| sign bit || style="text-align:center;" colspan="9"|
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''127'''
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''2'''
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''1'''
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''0'''
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''−1'''
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''−2'''
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''−127
| |
| |-
| |
| | style="text-align:center; width:2em; background:#def; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-left:1px solid #aaa;"| '''1'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em; border-top:1px solid #aaa; border-bottom:1px solid #aaa; border-right:1px solid #aaa;"| '''0'''
| |
| | style="text-align:center; width:2em;"| '''=''' || style="text-align:right; width:2em;"| '''−128'''
| |
| |}
| |
| {{caption|8-bit two's-complement integers}}
| |
| </div></div>
| |
| A positive, signed [[Integer (computer science)|integer]] represented in a [[two's complement]] system has a first bit of ''0'', followed by a binary representation of the value. [[Aliasing (computing)|Aliasing]] the floating point number as a two-complement integer gives an integer value of <math>I = E*2^{23} + M</math> where ''I'' is the integer value of the number, ''E'' the exponent as stored (with bias applied and interpreted as an ''un''signed eight bit number) and ''M'' the significand as stored (with leading one omitted) and interpreted as an integer, not a fractional number. Since the inverse square root function deals only with positive numbers, the floating point sign bit (''Sign'') will always be ''0'' and so the resulting signed integer is also positive.
| |
| | |
| Notice in particular that the implicit leading-one bit as well as the exponent field means that the aliasing produces an integer ''very different'' from what would be the result of a float-to-integer value conversion. For the example, E = 124 and as an integer M becomes 2<sup>21</sup> because its only "on" bit is for power 21. That is, in binary it is 01000000000000000000000'''.''' not '''.'''01000000000000000000000. Thus <math>I = 124*2^{23}+ 2^{21}</math> which is 1,040,187,392 + 2,097,152 or 1,042,284,544.
| |
| This "conversion" or rather, reinterpretation is done immediately, without the need for the complex adjustments that a non-zero exponent would require. Then follow a few likewise computationally inexpensive operations to obtain the initial estimate that will be improved via an application of Newton's method.
| |
| | |
| The first of these, the 1-bit right shift, divides the integer by two,{{sfntag|Hennessey|Patterson|1998|p=305}} which result is then subtracted from a magic integer and the result re-aliased back to a floating point number to become the initial value. Notice that this shift halves the value in what will again be the mantissa field (but ''not'' the value ''of'' the mantissa because the implicit '''1''' remains: 1.(x) becomes 1.(x/2) not (1.x)/2), the value in what will again be the exponent field (treating it as unsigned), and also shifts the low-order exponent field bit 23 (zero in the example) into the high order end of the mantissa field (bit 22). The implicit-on bit of the floating point interpretation is not involved during these integer operations, as follow:
| |
| | |
| Seeeeeeee(1)mmmmmmmmmmmmmmmmmmmmmmm 1 Sign, 8 exponent bits, (implicit bit), 23 mantissa bits.
| |
| 001111100 01000000000000000000000 1,042,284,544 The example value, ''i''
| |
| 000111110 00100000000000000000000 521,142,272 Shift right one. ''i/2''
| |
| 010111110 01101110101100111011111 1,597,463,007 The magic number = 5F3759DF
| |
| 010000000 01001110101100111011111 1,076,320,735 The result of ''5F3759DF - i/2''
| |
| | |
| Which is recast as a floating point number: the sign bit is still zero, the exponent field has E = 128 and the mantissa field has '''.'''01001110101100111011111 in binary which is 2578911/2<sup>23</sup> or 0.3074301481247 so that the result is <math>y = (1 + .3074301481247)2^{128 - 127} = 2.614</math>
| |
| :<math>\frac{1}{\sqrt{0.15625}} = 2.5298221281347</math>, so this is a good first approximation. One application of the iteration gives 2.5255 and a second 2.529811. A third would exhaust the precision of the variable.
| |
| | |
| If instead the value were double, 0.3125, the mantissa would be the same but the exponent would be one higher, thus turning on bit 23 which via the halving of the integer value would turn up in the mantissa field (as bit 22) and with the implicit '''1''' bit results in an initial value of 1.807 as an approximation to 1.788854; the first iteration improves this to 1.788564.
| |
| | |
| ===The "magic number"===
| |
| {| class="wikitable" style="float:right;"
| |
| |-
| |
| ! S(ign)
| |
| ! E(xponent)
| |
| ! M(antissa)
| |
| |-
| |
| | ''1 bit''
| |
| | ''b bits''
| |
| | ''(n-1-b) bits''
| |
| |-
| |
| | colspan="3" style="text-align:center;"|''n bits''{{sfntag|McEniry|2007|p=3}}
| |
| |}
| |
| The selection of '''0x5f3759df''' as a constant prompted much of the original speculation surrounding the fast inverse square root function. In an attempt to determine how a programmer might have originally determined that constant as a mechanism to approximate the inverse square root, Charles McEniry first determined how the choice of any constant ''R'' could give a first approximation for the inverse square root. Recalling the integer and floating point comparison from above, note that <math>x</math>, our floating point number, is <math>\scriptstyle x=(1+m_x)2^{e_x}</math> and <math>I_x</math>, our integer value for that same number, is <math>I_x=E_xL+M_x</math>.<ref group=note>Floating point numbers are normalized—the significand is expressed as a number <math>m_x\in[0,1)</math>. See {{cite journal|author = David Goldberg|title = What Every Computer Scientist Should Know About Floating-Point Arithmetic|journal = [[ACM Computing Surveys]]|date=March 1991|volume = 23|issue = 1|pages = 5–48|doi = 10.1145/103162.103163}} for further explanation.</ref> These identities introduce a few new elements, which are simply restatements of values for the exponent and significand.
| |
| :<math>\textstyle m_x=\frac{M_x}{L}</math> where <math>L=2^{n-1-b}</math>.
| |
| :<math>e_x=E_x-B</math> where <math>B=2^{b-1}-1</math>.
| |
| The illustration from McEniry 2007 proceeds:
| |
| :<math>y=\frac{1}{\sqrt{x}}</math>
| |
| :<math>\log_2{(y)}=-\frac{1}{2}\log_2{(x)}</math>
| |
| :<math>\log_2(1+m_y)+e_y=-\frac{1}{2}\log_2{(1+m_x)}-\frac{1}{2}e_x</math>
| |
| taking the [[binary logarithm]] or <math>\log_2</math> of both sides. The binary logarithm is the [[inverse function]] of <math>f(n)=2^n</math> and makes the multiplied terms in the floating point numbers ''x'' and ''y'' [[Logarithmic identity#Using simpler operations|reduce to addition]]. The relationship between <math>\log_2{(x)}</math> and <math>\log_2{(x^{-1/2})}</math> is linear, allowing an equation to be constructed which can express ''x'' and <math>y_0</math> (The input and first approximation) as a [[linear combination]] of terms.{{sfntag|McEniry|2007|p=2}} McEniry introduces a term <math>\sigma</math> which serves as an approximation for <math>\scriptstyle \log_2{(1+x)}</math> in an intermediate step toward approximating ''R''.<ref group=note>Lomont 2003 approaches the determination of ''R'' in a different fashion, splitting R into <math>R_1</math> and <math>R_2</math> for the significand and exponent bits of ''R''.</ref> Since <math>\scriptstyle 0\le x< 1</math>, <math>\scriptstyle \log_2{(1+x)}\approx {x}</math>, <math>\scriptstyle \log_2{(1+x)}\cong x+\sigma</math> can now be defined. This definition affords a first approximation of the binary logarithm. For our purposes, <math>\sigma</math> is a [[real number]] bounded by [0,1/3]—for an ''R'' equal to '''0x5f3759df''', <math>\sigma=0.0450461875791687011756</math>.{{sfntag|McEniry|2007|p=3}}
| |
| :<math>m_y+\sigma+e_y=-\frac{1}{2}m_x-\frac{1}{2}\sigma-\frac{1}{2}e_x</math>
| |
| Using the identities for <math>M_x</math>, <math>E_x</math>, <math>B</math> and <math>L</math>:
| |
| :<math>M_y+(E_y-B)L=-\frac{3}{2}\sigma{L}-\frac{1}{2}M_x-\frac{1}{2}(E_x-B)L</math>
| |
| Rearranging of terms leads to:
| |
| :<math>E_yL+M_y=\frac{3}{2}(B-\sigma)L-\frac{1}{2}(E_xL+M_x)</math>
| |
| The integer value of a strictly positive floating point number ''x'' is <math>I_x=E_xL+M_x</math>. This gives an expression for the integer value of ''y'' (where <math>\textstyle y=\frac{1}{\sqrt{x}}</math>, our first approximation for the inverse square root) in terms of the integer components of ''x''. Specifically,
| |
| :<math>I_y=E_yL+M_y=R-\frac{1}{2}(E_xL+M_x)=R-\frac{1}{2}I_x</math> where <math>R=\frac{3}{2}(B-\sigma)L</math>.
| |
| The equation <math>\scriptstyle I_y=R-\frac{1}{2}I_x</math> is the line <code>i = 0x5f3759df - (i>>1);</code> in '''Fast InvSqrt()''', the integer approximation for <math>y_0</math> is the integer value for ''x'', shifted to the right and subtracted from ''R''.{{sfntag|McEniry|2007|p=3}} McEniry's proof here shows only that a constant ''R'' can be used to approximate the integer value of the inverse square root of a floating point number. It does not prove that ''R'' assumes the value used in the code itself.
| |
| | |
| A relatively simple explanation for how a bit shift and a subtraction operation using the expected value of ''R'' results in division of the exponent ''E'' by negative two can be found in Chris Lomont's paper exploring the constant. As an example, for <math>10000=10^4</math>, a division of the exponent by −2 would produce <math>10000^{-1/2}=10^{-2}=1/100</math>. Since the exponent is biased, the true value of the exponent (here ''e'') is <math>e=E-127</math>, making the value of the biased result <math>-e/2+127</math>.{{sfntag|Hennessey|Patterson|1998|p=278, 282}} Subtracting the integer from ''R'' (the "magic" number '''0x5f3759df''') forces the least significant bit of the exponent to carry into the significand and when returned to floating point notation, outputs a floating point number very close to the inverse square root of the input. The specific value of ''R'' was chosen to minimize the expected error in division of the exponent as well as the expected error in shifting the significand. '''0xbe''' represents an integer value which minimizes the expected error resulting from division of the floating point exponent through bit shift operations—notably the value of '''0xbe''' shifted one to the right is '''0x5f''', the first digits in the magic number ''R''.{{sfntag|Lomont|2003|p=5}}
| |
| | |
| ===Accuracy===
| |
| [[File:Invsqrt0-10000.svg|thumb|right|A graph showing the difference between the heuristic Fast Inverse Square Root and the inversion of square root supplied by libstdc.{{citation needed|date=April 2012}} (Note logscale on both axes.)]]
| |
| As noted above, the approximation is surprisingly accurate. The graph on the right plots the error of the function (that is, the error of the approximation after it has been improved by running one iteration of Newton's method), for inputs starting at 0.01, where the standard library gives 10.0 as a result, while InvSqrt() gives 9.982522, making the difference 0.017479, or 0.175%. The absolute error only drops from then on, while the relative error stays within the same bounds across all orders of magnitude.
| |
| | |
| ==Newton's method==
| |
| {{Main|Newton's method}}
| |
| After performing those integer operations, the algorithm once again treats the longword as a floating point number (<code>x = *(float*)&i;</code>) and performs a floating point multiplication operation (<code>x = x*(1.5f - xhalf*x*x);</code>). The floating point operation represents a single iteration of Newton's method of finding roots for a given equation. For this example,
| |
| | |
| :<math>y=\frac{1}{\sqrt{x}}</math> is the inverse square root, or, as a function of ''y'',
| |
| | |
| :<math>f(y)=\frac{1}{y^2}-x=0</math>.
| |
| | |
| :As <math>y_{n+1} = y_{n} - \frac{f(y_n)}{f'(y_n)}</math> represents a general expression of Newton's method with <math>\, y_n</math> as the first approximation,
| |
| | |
| :<math>y_{n+1} = \frac{y_{n}(3-xy_n^2)}{2}</math> is the particularized expression where <math>f(y)=\frac{1}{y^2}-x</math> and <math>f'(y)=\frac{-2}{y^3}</math>.
| |
| | |
| :Hence <code>x = x*(1.5f - xhalf*x*x);</code> is the same as <math>\, y_{n+1} = y_{n}(1.5-\frac{xy_n^2}{2}) = \frac{y_{n}(3-xy_n^2)}{2}</math>
| |
| | |
| The first approximation is generated above through the integer operations and input into the last two lines of the function. Repeated iterations of the algorithm, using the output of the function (<math>y_{n+1}</math>) as the input of the next iteration, cause the algorithm to [[Rate of convergence|converge]] on the root with increasing precision.{{sfntag|McEniry|2007|p=6}} For the purposes of the [[id Tech 3|''Quake III'' engine]], only one iteration was used. A second iteration remained in the code but was [[Comment out|commented out]].{{sfntag|Eberly|2002|p=2}}
| |
| | |
| ==History and investigation==
| |
| [[File:John Carmack E3 2006.jpg|thumb|[[John Carmack]], co-founder of id Software, is commonly associated with the code, though he actually did not write it.]]
| |
| The source code for ''Quake III'' was not released until [[QuakeCon#2005|QuakeCon 2005]], but copies of the fast inverse square root code appeared on [[Usenet]] and other forums as early as 2002 or 2003.<ref name="Beyond3D" /> Initial speculation pointed to John Carmack as the probable author of the code, but he demurred and suggested it was written by Terje Mathisen, an accomplished assembly programmer who had previously helped id Software with ''Quake'' optimization. Mathisen had written an implementation of a similar bit of code in the late 1990s, but the original authors proved to be much further back in the history of 3D computer graphics with Gary Tarolli's implementation for the SGI Indigo as a possible earliest known use. Rys Sommefeldt concluded that the original algorithm was devised by Greg Walsh at [[Ardent Computer]] in consultation with [[Cleve Moler]] of [[MATLAB]] fame.<ref name="Beyond3Dp2" /> [[Cleve Moler]] learned about this trick from code written by Velvel Kahan and K.C. Ng at Berkeley around 1986 (see the comment section at the end of fdlibm code for sqrt<ref name="fdlibm" />).<ref name="MolerResp" /> [[Jim Blinn]] also demonstrated a simple approximation of the inverse square root in a 1997 column for ''[[List of Institute of Electrical and Electronics Engineers publications|IEEE Computer Graphics and Applications]]''.{{sfntag|Blinn|1997|pp=80–84}}
| |
| | |
| It is not known precisely how the exact value for the magic number was determined. Chris Lomont developed a function to minimize [[approximation error]] by choosing the magic number ''R'' over a range. He first computed the optimal constant for the linear approximation step as '''0x5f37642f''', close to '''0x5f3759df''', but this new constant gave slightly less accuracy after one iteration of Newton's method.{{sfntag|Lomont|2003|p=10}} Lomont then searched for a constant optimal even after one and two Newton iterations and found '''0x5f375a86''', which is more accurate than the original at every iteration stage.{{sfntag|Lomont|2003|p=10}} He concluded by asking whether the exact value of the original constant was chosen through derivation or [[trial and error]].{{sfntag|Lomont|2003|pp=10–11}} Lomont pointed out that the "magic number" for 64 bit IEEE754 size type double is '''0x5fe6ec85e7de30da''', but in fact it was shown to be exactly '''0x5fe6eb50c7b537a9'''.<ref name="robertson" /> Charles McEniry performed a similar but more sophisticated optimization over likely values for R. His initial [[Brute-force search|brute force]] search resulted in the same constant that Lomont determined.{{sfntag|McEniry|2007|pp=11–12}} When he attempted to find the constant through [[Bisection method|weighted bisection]], the specific value of ''R'' used in the function occurred, leading McEniry to believe that the constant may have originally been derived through "bisecting to a given tolerance".{{sfntag|McEniry|2007|p=16}}
| |
| | |
| ==See also==
| |
| * [[Methods of computing square roots#Approximations that depend on IEEE representation|Methods of computing square roots: Approximations that depend on IEEE representation]]
| |
| | |
| ==Notes==
| |
| <references group=note/>
| |
| | |
| == References ==
| |
| {{reflist
| |
| |2
| |
| |30em
| |
| |refs=
| |
| <ref name="fdlibm">{{cite web|url=http://www.netlib.org/fdlibm/e_sqrt.c|title=sqrt implementation in fdlibm}}</ref>
| |
| <ref name="MolerResp">{{cite web|url=http://blogs.mathworks.com/cleve/2012/06/19/symplectic-spacewar/#comment-13}}</ref>
| |
| <ref name="Beyond3D">{{cite web|url=http://www.beyond3d.com/content/articles/8/|title=Origin of Quake3's Fast InvSqrt()|last=Sommefeldt|first=Rys|date=November 29, 2006|work=Beyond3D|accessdate=2009-02-12}}</ref>
| |
| <ref name="quakesrc">{{cite web|url=ftp://ftp.idsoftware.com/idstuff/source/quake3-1.32b-source.zip|title=quake3-1.32b/code/game/q_math.c|work=[[Quake III Arena]]|publisher=[[id Software]]|accessdate=2010-11-15}}</ref>
| |
| <ref name="Beyond3Dp2">{{cite web|url=http://www.beyond3d.com/content/articles/15/|title=Origin of Quake3's Fast InvSqrt() - Part Two|accessdate=2008-04-19|author=Sommefeldt, Rys|date=2006-12-19|publisher=Beyond3D}}</ref>
| |
| <ref name="ruskin">{{cite web|url=http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/ |title=Timing square root |work=Some Assembly Required |first=Elan |last=Ruskin |date=2009-10-16 |accessdate=2010-09-13}}</ref>
| |
| <ref name="robertson">{{cite web|url=http://shelfflag.com/rsqrt.pdf|title=A Brief History of InvSqrt|author=Matthew Robertson|date=2012-04-24|publisher=UNBSJ}}</ref>
| |
| }}
| |
| | |
| === Documents ===
| |
| {{harvfoot|3}}
| |
| | |
| *{{cite journal
| |
| |ref=harv
| |
| |last=Blinn
| |
| |first= Jim
| |
| |title=Floating Point Tricks
| |
| |journal=Computer Graphics & Applications, IEEE
| |
| |date=July 1997
| |
| |volume=17
| |
| |issue=4
| |
| |doi=10.1109/38.595279
| |
| |pages=80}}
| |
| *{{cite book
| |
| |last=Blinn
| |
| |ref=harv
| |
| |first=Jim
| |
| |title=Jim Blinn's Corner: Notation, notation notation
| |
| |publisher=Morgan Kaufmann
| |
| |year=2003
| |
| |isbn=1-55860-860-5}}
| |
| *{{cite book
| |
| |ref=harv
| |
| |last=Eberly
| |
| |first=David
| |
| |title=3D Game Engine Design
| |
| |publisher=Morgan Kaufmann
| |
| |year=2001
| |
| |isbn=978-1-55860-593-0}}
| |
| *{{Cite journal
| |
| | ref = harv
| |
| | last = Eberly
| |
| | first = David
| |
| | title = Fast Inverse Square Root
| |
| |publisher=Geometric Tools
| |
| | year = 2002
| |
| | url = http://www.geometrictools.com/Documentation/FastInverseSqrt.pdf
| |
| | accessdate = 2009-03-22}}
| |
| *{{cite book
| |
| |ref={{harvid|Hennessey|Patterson|1998}}
| |
| |last=Hennessey
| |
| |first=John
| |
| |coauthors=Patterson, David A.
| |
| |title=Computer Organization and Design
| |
| |publisher=Morgan Kaufmann Publishers
| |
| |location=San Francisco, CA
| |
| |year=1998
| |
| |edition=2nd
| |
| |isbn=978-1-55860-491-9
| |
| |url=http://books.google.com/?id=7-ZQAAAAMAAJ}}
| |
| <!-- no reference exists in article-->
| |
| *{{cite journal
| |
| |ref=harv
| |
| |last=Kushner
| |
| |first=David
| |
| |date=August 2002
| |
| |title=The wizardry of Id
| |
| |journal=IEEE Spectrum
| |
| |volume=39
| |
| |issue=8
| |
| |pages=42–47
| |
| |doi=10.1109/MSPEC.2002.1021943}}
| |
| *{{cite web
| |
| |ref=harv
| |
| |url=http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
| |
| |title=Fast Inverse Square Root
| |
| |last=Lomont
| |
| |first=Chris
| |
| |date=February 2003
| |
| |accessdate=2009-02-13}}
| |
| *{{cite web
| |
| |ref=harv
| |
| |url=http://www.daxia.com/bibis/upload/406Fast_Inverse_Square_Root.pdf
| |
| |title=The Mathematics Behind the Fast Inverse Square Root Function Code
| |
| |last=McEniry
| |
| |first=Charles
| |
| |date=August 2007
| |
| |accessdate=2009-02-13}}
| |
| *{{cite conference
| |
| | ref = {{harvid|Middendorf|2007}}
| |
| | last =Middendorf
| |
| | first =Lars
| |
| | coauthors = Mühlbauer, Felix; Umlauf, George; Bodba, Christophe
| |
| | date = June 1, 2007
| |
| | title = Embedded Vertex Shader in FPGA
| |
| | conference = [[International Federation for Information Processing|IFIP]] TC10 Working Conference:International Embedded Systems Symposium (IESS)
| |
| | booktitle = Embedded System Design: Topics, Techniques and Trends
| |
| | editor = Rettberg, Achin
| |
| | others = et al.
| |
| | publisher = Springer
| |
| | location = [[Irvine, California]]
| |
| | isbn = 978-0-387-72257-3
| |
| }}
| |
| <!-- no reference exists in article-->
| |
| *{{cite web
| |
| |archiveurl=http://web.archive.org/web/20090215020337/http://www.hackszine.com/blog/archive/2008/12/quakes_fast_inverse_square_roo.html
| |
| |url=http://www.hackszine.com/blog/archive/2008/12/quakes_fast_inverse_square_roo.html
| |
| |title=Quake's fast inverse square root
| |
| |last=Striegel
| |
| |first=Jason
| |
| |date=December 4, 2008
| |
| |work=Hackszine
| |
| |publisher=[[O'Reilly Media]]
| |
| |archivedate=2009-02-15
| |
| |accessdate=2013-01-07}}
| |
| | |
| ==External links==
| |
| *[http://shelfflag.com/rsqrt.pdf A Brief History of InvSqrt] by Matthew Robertson
| |
| *[http://blog.quenta.org/2012/09/0x5f3759df.html 0x5f3759df], further investigations into accuracy and generalizability of the algorithm by Christian Plesner Hansen
| |
| *[http://www.beyond3d.com/content/articles/8/ Origin of Quake3's Fast InvSqrt()]
| |
| *[http://www.idsoftware.com/business/techdownloads/ Quake III Arena source code]
| |
| *{{cite web|url=http://www.codemaestro.com/reviews/9|title=Magical Square Root Implementation In Quake III|last=Margolin|first=Tomer|date=27 August 2005|work=CodeMaestro|publisher=The Coding Experience}}
| |
| | |
| {{Quake}}
| |
| {{Id Software}}
| |
| | |
| {{good article}}
| |
| | |
| [[Category:Quake]]
| |
| [[Category:Source code]]
| |
| [[Category:Root-finding algorithms]]
| |
| | |
| {{Link GA|zh}}
| |