Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Initial draft of documentation on floating point, IEEE 754 binary64, and the ieee754 and decimal extensions. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
448d185aacd48af25ef2bbe8f0219de2 |
User & Date: | drh 2020-07-14 00:20:05.293 |
Context
2020-07-15
| ||
01:42 | Update the change log. Typo fixes in the floating-point documentation. (check-in: 846fd3a3f7 user: drh tags: trunk) | |
2020-07-14
| ||
00:20 | Initial draft of documentation on floating point, IEEE 754 binary64, and the ieee754 and decimal extensions. (check-in: 448d185aac user: drh tags: trunk) | |
2020-07-09
| ||
15:12 | Fix minor typo in the transaction documentation. (check-in: fd1f6c62f7 user: drh tags: trunk) | |
Changes
Added pages/floatingpoint.in.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 | <title>Floating Pointer Numbers</title> <table_of_contents> <h1>How SQLite Stores Numbers</h1> <p> SQLite stores integer values in the 64-bit <a href="https://en.wikipedia.org/wiki/Two%27s_complement">twos-complement</a> format. This gives a storage range of -9223372036854775808 to +9223372036854775807, inclusive. Integers within this range are exact. <p> So-called "REAL" or floating point values are stored in the <a href="https://en.wikipedia.org/wiki/IEEE_754">IEEE 754</a> <a href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format">Binary-64</a> format. This gives a range of positive values between approximately 1.7976931348623157e+308 and 4.9406564584124654e-324 with an equivalent range of negative values. A binary64 can also be 0.0 (and -0.0), postive and negative infinity and "NaN" or "Not-a-Number". Floating point values approximate. <p> Pay close attention to the last sentence in the previous paragraph: <blockquote><b> Floating point values are approximate ← <u>Always</u> remember this! </b></blockquote> <p> If you need an exact answer, you should not use binary64 floating-point values, in SQLite or in any other product. This is not an SQLite limitation. It is a mathematical limitation inherient in the design of floating-point numbers. <h2>Floating-Point Accuracy</h2> <p> SQLite promises to preserve the 15 most significant digits of a floating point value. However, it makes no guarantees about the accuracy of computations on floating point values, as no such guarantees are possible. Performing math on floating-point values introduces error. For example, consider what happens if you attempt to subtract two floating-point numbers of similar magnitude: <blockquote> <table border="0" cellpadding="0" cellspacing="0"> <tr><td align="right">1152693165.1106291898</td></tr> <tr><td align="right">-1152693165.1106280772</td></tr> <tr><td><hr> <tr><td align="right">0.0000011126 </table> </blockquote> <p>The result shown above (0.0000011126) is the correct answer. But if you do this computation using binary64 floating-point, the answer you get is 0.00000095367431640625 - an error of about 14%. If you do many similar computations as part of your program, the errors add up so that your final result might be completely meaningless. <p>The error arises because only about the first 15 significant digits of each number are stored accurately, and first the difference between the two numbers being subtracted is in the 16th digit. <h2>Floating Point Numbers</h2> <p> The binary64 floating-point format uses 64 bits per number. Hence there are 1.845e+19 different possible floating point values. On the other hand there are infinitely many real numbers in the range of 1.7977e+308 and 4.9407e-324. It follows then that binary64 cannot possibly represent all possible real numbers within that range. Approximations are required. <p> An IEEE 754 floating-point is an integer multiplied by a power of two: <blockquote> <big>M × 2<sup><small>E</small></sup></big> </blockquote> <p>The M value is the "mantissa" and E is the "exponent". Both M and E are integers. <p>For Binary64, M is a 53-bit integer and E is an 11-bit integer that represents a range of values between -1074 and +972, inclusive. <p><i>(NB: The usual description of IEEE 754 is more complex, and it is important to understand the added complexity if you really want to appreciate the details, merits, and limitations of IEEE 754. However, the integer description shown here, while not exactly right, is easier to understand and is sufficient for the purposes of this article.)</i></p> <h3>Unrepresentable numbers</h3> <p>Not ever decimal number with fewer than 16 significant digits can be represented exactly as a binary64 number. In fact, most decimal numbers with digits to the right of the decimal point lack an exact binary64 equivalent. For example, if you have a database column that is intended to hold an item price in dollars and cents, the only cents value that can be exactly represented are 0.00, 0.25, 0.50, and 0.75. Any other numbers to the right of the decimal point result in an approximatation. If you provide a "price" value of 47.49, that number will be represented in binary64 as: <blockquote> 6683623321994527 × 2<sup><small>-47</small></sup> </blockquote> <p>Which works out to be: <blockquote> 47.49000000000000198951966012828052043914794921875 </blockquote> <p>That number is very close to 47.49, but it is not exact. It is a little too big. But if we reduce M by one to 6683623321994526 so that we have the next smaller possible binary64 value, we get: <blockquote> 47.4899999999999948840923025272786617279052734375 </blockquote> <p> This second number is too small. The first number is closer to the desired value of 47.49, so that is the one that gets used. But it is not exact. Most decimal values work this way in IEEE 754. Remember the key point we made above: <blockquote><b> Floating point values are approximate. </b></blockquote> <p>If you remember nothing else about floating-point values, please don't forget this one key idea. <h3>Is it close enough?</h3> <p>15 significant digits of precision are sufficient for most computations. For example, if "47.49" represents the price, and inflation is running at 2% per year, then the price is going up by about 0.0000000301 dollars per second. The error in the recorded value of 47.49 represents about 65 nanoseconds worth of inflation. So, if price of 47.49 is exact when you enter it, then value actually stored (47.4900000000000019895196601282805204391479492187) will be exact about 65 nanoseconds later, due to inflation, and will be too small long before you get your finger off of the Enter key. Surely that level of precision is sufficient for most purposes? <h1>Extensions For Dealing With Floating Point Numbers</h1> <tcl>hd_fragment ieee754ext {ieee754 extension}</tcl> <h2>The ieee754.c Extension</h2> <p>The ieee754 extension converts a floating point number between its binary64 representation and the M×2<sup><small>E</small></sup> format. In other words in the expression: <blockquote> <big>F = M × 2<sup><small>E</small></sup></big> </blockquote> <p>The ieee754 extension converts between F and (M,E) and back again. <p>The ieee754 extension is not part of the [amalgamation], but it is included by default in the [CLI]. If you want to include the ieee754 extension in your application, you will need to compile and load it separately. <tcl>hd_fragment ieee754 {ieee754() function}</tcl> <h3>The ieee754() function</h3> <p>The ieee754(F) SQL function takes a single floating-point argument as its input and returns a string that looks like this: <blockquote> 'ieee754(M,E)' </blockquote> <p>Except that the M and E are replaced by the mantissa and exponent of the floating point number. For example: <codeblock> sqlite> .mode box sqlite> SELECT ieee754(47.49) AS x; ┌───────────────────────────────┐ │ x │ ├───────────────────────────────┤ │ ieee754(6683623321994527,-47) │ └───────────────────────────────┘ </codeblock> <p> Going in the other direction, the 2-argument version of ieee754() tags the M and E values and converts them into the corresponding F value: <codeblock> sqlite> select ieee754(6683623321994527,-47) as x; ┌───────┐ │ x │ ├───────┤ │ 47.49 │ └───────┘ </codeblock> <tcl>hd_fragment ieee754m {ieee754_mantissa() function} \ {ieee754_exponent() function} </tcl> <h3>The ieee754_mantissa() and ieee754_exponent() functions</h3> <p>The text output of the one-argument form of ieee754() is great for human readability, but it awkward to use as part of a larger expression. Hence The ieee754_mantissa() and ieee754_exponent() routines were added to return respectively, the M and E values corresponding to their single argument F value. For example: <codeblock> sqlite> .mode box sqlite> SELECT ieee754_mantissa(47.49) AS M, ieee754_exponent(47.49) AS E; ┌──────────────────┬─────┐ │ M │ E │ ├──────────────────┼─────┤ │ 6683623321994527 │ -47 │ └──────────────────┴─────┘ </codeblock> <tcl>hd_fragment ieee754b {ieee754_from_blob() function} \ {ieee754_to_blob() function} </tcl> <h3>The ieee754_from_blob() and ieee754_to_blob() functions</h3> <p>The ieee754_to_blob(F) SQL function converts the floating point number F into an 8-byte BLOB that is the big-endian binary64 encoding of that number. The ieee754_from_blob(B) function goes the other way, converting an 8-byte blob into the floating-point value that the binary64 encoding represents. <p>So, for example, if you read <a href='https://en.wikipedia.org/wiki/Double-precision_floating-point_format'>on Wikipedia</a> that the encoding for the minimum positive binary64 value is 0x0000000000000001, then you can find the corresponding floating point value like this: <codeblock> sqlite> .mode box sqlite> SELECT ieee754_from_blob(x'0000000000000001') AS F; ┌───────────────────────┐ │ F │ ├───────────────────────┤ │ 4.94065645841247e-324 │ └───────────────────────┘ </codeblock> <p>Or go the other way: <codeblock> sqlite> .mode box sqlite> SELECT quote(ieee754_to_blob(4.94065645841247e-324)) AS binary64; ┌─────────────────────┐ │ binary64 │ ├─────────────────────┤ │ X'0000000000000001' │ └─────────────────────┘ </codeblock> <tcl>hd_fragment decext {decimal extension}</tcl> <h2>The decimal.c Extension</h2> <p>The decimal extension provides arbitrary-precision decimal arithmetic on numbers stored as text strings. Because the numbers are stored to arbitrary precision and as text, no approximations are needed. Computations can be done exactly. <p>The decimal extension is not (currently) part of the SQLite [amalgamation]. However, it is included in the [CLI]. <p>There are three math functions available: <p> <ul> <li> decimal_add(A,B) <li> decimal_sub(A,B) <li> decimal_mul(A,B) </ul> </p> <p>These functions respectively add, substract, and multiply their arguments and return a new text string that is the decimal representation of the result. There is no division operator at this time. <p>Use the decimal_cmp(A,B) to compare to decimal values. The result will be negative, zero, or positive if A is less than, equal to, or greater than B, respectively. <p>The decimal_sum(X) function is an aggregate, like the built-in [sum() aggregate function] except that decimal_sum() computes its result to arbitrary precision and is therefore precise. <p>Finally, the decimal extension provides the "decimal" collating sequences that compares decimal text strings in numeric order. <h1>Techniques</h1> <p> The following SQL illustrates how to use the ieee754 and decimal extensions compute the exact decimal equivalent for a binary64 floating-point number. <codeblock> -- The pow2 table will hold all the necessary powers of two. CREATE TABLE pow2(x INTEGER PRIMARY KEY, v TEXT); WITH RECURSIVE c(x,v) AS ( VALUES(0,'1') UNION ALL SELECT x+1, decimal_mul(v,'2') FROM c WHERE x+1<=971 ) INSERT INTO pow2(x,v) SELECT x, v FROM c; WITH RECURSIVE c(x,v) AS ( VALUES(-1,'0.5') UNION ALL SELECT x-1, decimal_mul(v,'0.5') FROM c WHERE x-1>=-1075 ) INSERT INTO pow2(x,v) SELECT x, v FROM c; -- This query finds the decimal representation of each value in the "c" table. WITH c(n) AS (VALUES(47.49)) ---------------^^^^^---- Replace with whatever you want SELECT decimal_mul(ieee754_mantissa(c.n),pow2.v) FROM pow2, c WHERE pow2.x=ieee754_exponent(c.n); </codeblock> |