Documentation Source Text

Check-in [448d185aac]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Initial draft of documentation on floating point, IEEE 754 binary64, and the ieee754 and decimal extensions.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 448d185aacd48af25ef2bbe8f0219de28d2c1409b65676f0017e327a6146612c
User & Date: drh 2020-07-14 00:20:05.293
Context
2020-07-15
01:42
Update the change log. Typo fixes in the floating-point documentation. (check-in: 846fd3a3f7 user: drh tags: trunk)
2020-07-14
00:20
Initial draft of documentation on floating point, IEEE 754 binary64, and the ieee754 and decimal extensions. (check-in: 448d185aac user: drh tags: trunk)
2020-07-09
15:12
Fix minor typo in the transaction documentation. (check-in: fd1f6c62f7 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Added pages/floatingpoint.in.














































































































































































































































































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
<title>Floating Pointer Numbers</title>

<table_of_contents>

<h1>How SQLite Stores Numbers</h1>

<p>
SQLite stores integer values in the 64-bit 
<a href="https://en.wikipedia.org/wiki/Two%27s_complement">twos-complement</a>
format.
This gives a storage range of -9223372036854775808 to +9223372036854775807,
inclusive.  Integers within this range are exact.

<p>
So-called "REAL" or floating point values are stored in the
<a href="https://en.wikipedia.org/wiki/IEEE_754">IEEE 754</a>
<a href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format">Binary-64</a>
format.
This gives a range of positive values between approximately
1.7976931348623157e+308 and 4.9406564584124654e-324 with an equivalent
range of negative values.  A binary64 can also be 0.0 (and -0.0), postive
and negative infinity and "NaN" or "Not-a-Number".  Floating point
values approximate.

<p>
Pay close attention to the last sentence in the previous paragraph:
<blockquote><b>
Floating point values are approximate &larr; <u>Always</u> remember this!
</b></blockquote>

<p>
If you need an exact answer, you should not use binary64 floating-point
values, in SQLite or in any other product.  This is not an SQLite limitation.
It is a mathematical limitation inherient in the design of floating-point numbers.

<h2>Floating-Point Accuracy</h2>

<p>
SQLite promises to preserve the 15 most significant digits of a floating
point value.  However, it makes no guarantees about the accuracy of
computations on floating point values, as no such guarantees are possible.
Performing math on floating-point values introduces error.
For example, consider what happens if you attempt to subtract two floating-point
numbers of similar magnitude:

<blockquote>
<table border="0" cellpadding="0" cellspacing="0">
<tr><td align="right">1152693165.1106291898</td></tr>
<tr><td align="right">-1152693165.1106280772</td></tr>
<tr><td><hr>
<tr><td align="right">0.0000011126
</table>
</blockquote>

<p>The result shown above (0.0000011126) is the correct answer.  But if you
do this computation using binary64 floating-point, the answer you get is
0.00000095367431640625 - an error of about 14%.  If you do many similar
computations as part of your program, the errors add up so that your final
result might be completely meaningless.

<p>The error arises because only about the first 15 significant digits of
each number are stored accurately, and first the difference between the two numbers
being subtracted is in the 16th digit.  

<h2>Floating Point Numbers</h2>

<p>
The binary64 floating-point format uses 64 bits per number.  Hence there
are 1.845e+19 different possible floating point values.  On the other hand
there are infinitely many real numbers in the range of 
1.7977e+308 and 4.9407e-324.  It follows then that binary64 cannot possibly
represent all possible real numbers within that range.  Approximations are
required.

<p>
An IEEE 754 floating-point is an integer multiplied by a power
of two:

<blockquote>
<big>M &times 2<sup><small>E</small></sup></big>
</blockquote>

<p>The M value is the "mantissa" and E is the "exponent".  Both
M and E are integers.

<p>For Binary64, M is a 53-bit integer and E is an 11-bit integer that
represents a range of values between -1074 and +972, inclusive.

<p><i>(NB:  The usual description of IEEE 754 is more complex, and it is important
to understand the added complexity if you really want to appreciate the details,
merits, and limitations of IEEE 754.  However, the integer description shown
here, while not exactly right, is easier to understand and is sufficient for
the purposes of this article.)</i></p>

<h3>Unrepresentable numbers</h3>

<p>Not ever decimal number with fewer than 16 significant digits can be
represented exactly as a binary64 number.  In fact, most decimal numbers
with digits to the right of the decimal point lack an exact binary64
equivalent.  For example, if you have a database column that is intended
to hold an item price in dollars and cents, the only cents value that
can be exactly represented are 0.00, 0.25, 0.50, and 0.75.  Any other
numbers to the right of the decimal point result in an approximatation.
If you provide a "price" value of 47.49, that number will be represented
in binary64 as:

<blockquote>
6683623321994527 &times; 2<sup><small>-47</small></sup>
</blockquote>

<p>Which works out to be:

<blockquote>
47.49000000000000198951966012828052043914794921875
</blockquote>

<p>That number is very close to 47.49, but it is not exact.  It is a little
too big.  But if we reduce M by one to 6683623321994526 so that we have the
next smaller possible binary64 value, we get:

<blockquote>
47.4899999999999948840923025272786617279052734375
</blockquote>


<p>
This second number is too small.
The first number is closer to the desired value of 47.49, so that is the
one that gets used.  But it is not exact.  Most decimal values work this
way in IEEE 754.  Remember the key point we made above:

<blockquote><b>
Floating point values are approximate.
</b></blockquote>

<p>If you remember nothing else about floating-point values, 
please don't forget this one key idea.

<h3>Is it close enough?</h3>

<p>15 significant digits of precision are sufficient for most computations.
For example, if "47.49" represents the price, and inflation is running
at 2% per year, then the price is going up by about 0.0000000301 dollars per
second.  The error in the recorded value of 47.49 represents about 65 nanoseconds
worth of inflation.  So, if price of 47.49 is exact
when you enter it, then value actually stored
(47.4900000000000019895196601282805204391479492187) will be exact about
65 nanoseconds later, due to inflation, and will be too small long before you
get your finger off of the Enter key.
Surely that level of precision is sufficient for most purposes?

<h1>Extensions For Dealing With Floating Point Numbers</h1>

<tcl>hd_fragment ieee754ext {ieee754 extension}</tcl>
<h2>The ieee754.c Extension</h2>

<p>The ieee754 extension converts a floating point number between its
binary64 representation and the M&times;2<sup><small>E</small></sup> format.
In other words in the expression:

<blockquote>
<big>F = M &times 2<sup><small>E</small></sup></big>
</blockquote>

<p>The ieee754 extension converts between F and (M,E) and back again.

<p>The ieee754 extension is not part of the [amalgamation], but it is included
by default in the [CLI].  If you want to include the ieee754 extension in your
application, you will need to compile and load it separately.

<tcl>hd_fragment ieee754 {ieee754() function}</tcl>
<h3>The ieee754() function</h3>

<p>The ieee754(F) SQL function takes a single floating-point argument
as its input and returns a string that looks like this:

<blockquote>
'ieee754(M,E)'
</blockquote>

<p>Except that the M and E are replaced by the mantissa and exponent of the
floating point number.  For example:

<codeblock>
sqlite> .mode box
sqlite> SELECT ieee754(47.49) AS x;
┌───────────────────────────────┐
│               x               │
├───────────────────────────────┤
│ ieee754(6683623321994527,-47) │
└───────────────────────────────┘
</codeblock>

<p>
Going in the other direction, the 2-argument version of ieee754() tags
the M and E values and converts them into the corresponding F value:

<codeblock>
sqlite> select ieee754(6683623321994527,-47) as x;
┌───────┐
│   x   │
├───────┤
│ 47.49 │
└───────┘
</codeblock>

<tcl>hd_fragment ieee754m {ieee754_mantissa() function} \
   {ieee754_exponent() function} </tcl>
<h3>The ieee754_mantissa() and ieee754_exponent() functions</h3>

<p>The text output of the one-argument form of ieee754() is great for human
readability, but it awkward to use as part of a larger expression.  Hence
The ieee754_mantissa() and ieee754_exponent() routines were added to return
respectively, the M and E values corresponding to their single argument F
value.
For example:

<codeblock>
sqlite> .mode box
sqlite> SELECT ieee754_mantissa(47.49) AS M, ieee754_exponent(47.49) AS E;
┌──────────────────┬─────┐
│        M         │  E  │
├──────────────────┼─────┤
│ 6683623321994527 │ -47 │
└──────────────────┴─────┘
</codeblock>

<tcl>hd_fragment ieee754b {ieee754_from_blob() function} \
   {ieee754_to_blob() function} </tcl>
<h3>The ieee754_from_blob() and ieee754_to_blob() functions</h3>

<p>The ieee754_to_blob(F) SQL function converts the floating point number F
into an 8-byte BLOB that is the big-endian binary64 encoding of that number.
The ieee754_from_blob(B) function goes the other way, converting an 8-byte
blob into the floating-point value that the binary64 encoding represents.

<p>So, for example, if you read
<a href='https://en.wikipedia.org/wiki/Double-precision_floating-point_format'>on
Wikipedia</a> that the encoding for the minimum positive binary64 value is
0x0000000000000001, then you can find the corresponding floating point value
like this:

<codeblock>
sqlite> .mode box
sqlite> SELECT ieee754_from_blob(x'0000000000000001') AS F;
┌───────────────────────┐
│           F           │
├───────────────────────┤
│ 4.94065645841247e-324 │
└───────────────────────┘
</codeblock>

<p>Or go the other way:

<codeblock>
sqlite> .mode box
sqlite> SELECT quote(ieee754_to_blob(4.94065645841247e-324)) AS binary64;
┌─────────────────────┐
│      binary64       │
├─────────────────────┤
│ X'0000000000000001' │
└─────────────────────┘
</codeblock>

<tcl>hd_fragment decext {decimal extension}</tcl>
<h2>The decimal.c Extension</h2>

<p>The decimal extension provides arbitrary-precision decimal arithmetic on
numbers stored as text strings.  Because the numbers are stored to arbitrary
precision and as text, no approximations are needed.  Computations can be
done exactly.

<p>The decimal extension is not (currently) part of the SQLite [amalgamation].
However, it is included in the [CLI].

<p>There are three math functions available:

<p>
<ul>
<li> decimal_add(A,B)
<li> decimal_sub(A,B)
<li> decimal_mul(A,B)
</ul>
</p>

<p>These functions respectively add, substract, and multiply their arguments
and return a new text string that is the decimal representation of the result.
There is no division operator at this time.

<p>Use the decimal_cmp(A,B) to compare to decimal values.  The result will
be negative, zero, or positive if A is less than, equal to, or greater than B,
respectively.

<p>The decimal_sum(X) function is an aggregate, like the built-in
[sum() aggregate function] except that decimal_sum() computes its result
to arbitrary precision and is therefore precise.

<p>Finally, the decimal extension provides the "decimal" collating sequences
that compares decimal text strings in numeric order.

<h1>Techniques</h1>

<p>
The following SQL illustrates how to use the ieee754 and decimal
extensions compute the exact decimal equivalent
for a binary64 floating-point number.

<codeblock>
-- The pow2 table will hold all the necessary powers of two.
CREATE TABLE pow2(x INTEGER PRIMARY KEY, v TEXT);
WITH RECURSIVE c(x,v) AS (
  VALUES(0,'1')
  UNION ALL
  SELECT x+1, decimal_mul(v,'2') FROM c WHERE x+1&lt;=971
) INSERT INTO pow2(x,v) SELECT x, v FROM c;
WITH RECURSIVE c(x,v) AS (
  VALUES(-1,'0.5')
  UNION ALL
  SELECT x-1, decimal_mul(v,'0.5') FROM c WHERE x-1&gt;=-1075
) INSERT INTO pow2(x,v) SELECT x, v FROM c;

-- This query finds the decimal representation of each value in the "c" table.
WITH c(n) AS (VALUES(47.49))
      ---------------^^^^^---- Replace with whatever you want
SELECT decimal_mul(ieee754_mantissa(c.n),pow2.v)
  FROM pow2, c WHERE pow2.x=ieee754_exponent(c.n);
</codeblock>