Documentation Source Text

Check-in [555bf82e10]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the speed-and-size spreadsheet and the cpu.html page. Also make minor tweaks to the omitted.html page.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 555bf82e10e0f92754f0d1f23192264af91fe89de7c4eb43e4f468081f988dd5
User & Date: drh 2018-11-24 19:14:44.437
Context
2018-11-24
20:24
Minor enhancements and updates to various documents. (check-in: 9e12c649c9 user: drh tags: trunk)
19:14
Update the speed-and-size spreadsheet and the cpu.html page. Also make minor tweaks to the omitted.html page. (check-in: 555bf82e10 user: drh tags: trunk)
2018-11-21
21:46
Improvements to the althttpd documentation. (check-in: 44a1928f55 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to images/cpu-usage.jpg.

cannot compute difference between binary files

Changes to misc/speed-size-graph.ods.

cannot compute difference between binary files

Changes to pages/cpu.in.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<title>Measuring and Reducing CPU Usage in SQLite</title>
<tcl>hd_keywords {CPU cycles used} {CPU performance measurement}</tcl>

<table_of_contents>

<h1>Overview</h1>

<p>The graph below shows the number of CPU cycles used by SQLite on a
standard workload, for all versions of SQLite going back about 9 years.
Recent versions of SQLite use less than a third of the CPU cycles 
compared to older versions.

<p>
This article describes how the SQLite developers measure CPU usage,
what those measurements actually mean, and the techniques used by
SQLite developers on their continuing quest to further reduce the
CPU usage of the SQLite library.
</p>

<center>
<hr>
<div class="imgcontainer">
<img src="./images/cpu-usage.jpg"></img></div><br>
Measured using cachegrind on Ubuntu 16.04 on x64 with gcc 5.4.0 and -Os.<br>
Values normalized so that version 3.23.1 is 100%.
<hr>
</center>

<h1>Measuring Performance</h1>

<p>In brief, the CPU performance of SQLite is measured as follows:









|
|














<







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

25
26
27
28
29
30
31
<title>Measuring and Reducing CPU Usage in SQLite</title>
<tcl>hd_keywords {CPU cycles used} {CPU performance measurement}</tcl>

<table_of_contents>

<h1>Overview</h1>

<p>The graph below shows the number of CPU cycles used by SQLite on a
standard workload, for versions of SQLite going back about 10 years.
Recent versions of SQLite use about one third as many the CPU cycles
compared to older versions.

<p>
This article describes how the SQLite developers measure CPU usage,
what those measurements actually mean, and the techniques used by
SQLite developers on their continuing quest to further reduce the
CPU usage of the SQLite library.
</p>

<center>
<hr>
<div class="imgcontainer">
<img src="./images/cpu-usage.jpg"></img></div><br>
Measured using cachegrind on Ubuntu 16.04 on x64 with gcc 5.4.0 and -Os.<br>

<hr>
</center>

<h1>Measuring Performance</h1>

<p>In brief, the CPU performance of SQLite is measured as follows:

118
119
120
121
122
123
124

125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
script compiles the speedtest1.c program, runs it under cachegrind,
processes the cachegrind output using the
[https://sqlite.org/src/file/tool/cg_anno.tcl | cg_anno.tcl] TCL
script, then saves the results in a series of text files.
Typical output from the speed-check.sh script looks like this:

<blockquote><pre>

==16429== I   refs:      <b><font color="red">1,291,005,499</font></b>
==16429== I1  misses:       24,688,182
==16429== LLi misses:            5,027
==16429== I1  miss rate:          1.91%
==16429== LLi miss rate:          0.00%
==16429== 
==16429== D   refs:        663,242,182  (418,445,823 rd   + 244,796,359 wr)
==16429== D1  misses:        5,958,032  (  3,902,273 rd   +   2,055,759 wr)
==16429== LLd misses:           45,636  (     14,803 rd   +      30,833 wr)
==16429== D1  miss rate:           0.8% (        0.9%     +         0.8%  )
==16429== LLd miss rate:           0.0% (        0.0%     +         0.0%  )
==16429== 
==16429== LL refs:          30,646,214  ( 28,590,455 rd   +   2,055,759 wr)
==16429== LL misses:            50,663  (     19,830 rd   +      30,833 wr)
==16429== LL miss rate:            0.0% (        0.0%     +         0.0%  )
   text	   data	    bss	    dec	    hex	filename
 466711	   6256	   1864	 <b><font color="red">474831</font></b>	  73ecf	sqlite3.o
 199979  914462 7026217 sqlite3.c
</pre></blockquote>

<p>The important parts of the output (the parts that the developers pay
the most attention to) are shown in red.
Basically, the developers want to know the size of the compiled SQLite
library and how many CPU cycles were needed to run the performance test.

<p>The output from the 
[https://sqlite.org/src/file/tool/cg_anno.tcl | cg_anno.tcl] script
shows the number of CPU cycles spent on each line of code.
The report is approximately 80,000 lines long.  The following is a brief
snippet taken from the middle of the report to show what it looks like:

<hr>
<blockquote><pre>
         .  SQLITE_PRIVATE int sqlite3BtreeNext(BtCursor *pCur, int *pRes){
         .    MemPage *pPage;
         .    assert( cursorOwnsBtShared(pCur) );
         .    assert( pRes!=0 );
         .    assert( *pRes==0 || *pRes==1 );
         .    assert( pCur-&gt;skipNext==0 || pCur-&gt;eState!=CURSOR_VALID );







>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|

|
|













<







117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155

156
157
158
159
160
161
162
script compiles the speedtest1.c program, runs it under cachegrind,
processes the cachegrind output using the
[https://sqlite.org/src/file/tool/cg_anno.tcl | cg_anno.tcl] TCL
script, then saves the results in a series of text files.
Typical output from the speed-check.sh script looks like this:

<blockquote><pre>
==8683== 
==8683== I   refs:      <font color="red">1,060,925,768</font>
==8683== I1  misses:       23,731,246
==8683== LLi misses:            5,176
==8683== I1  miss rate:          2.24%
==8683== LLi miss rate:          0.00%
==8683== 
==8683== D   refs:        557,686,925  (361,828,925 rd   + 195,858,000 wr)
==8683== D1  misses:        5,067,063  (  3,544,278 rd   +   1,522,785 wr)
==8683== LLd misses:           57,958  (     16,067 rd   +      41,891 wr)
==8683== D1  miss rate:           0.9% (        1.0%     +         0.8%  )
==8683== LLd miss rate:           0.0% (        0.0%     +         0.0%  )
==8683== 
==8683== LL refs:          28,798,309  ( 27,275,524 rd   +   1,522,785 wr)
==8683== LL misses:            63,134  (     21,243 rd   +      41,891 wr)
==8683== LL miss rate:            0.0% (        0.0%     +         0.0%  )
   text	   data	    bss	    dec	    hex	filename
 523044	   8240	   1976	 <font color="red">533260</font>	  8230c	sqlite3.o
 220507 1007870 7769352 sqlite3.c
</pre></blockquote>

<p>The important parts of the output (the parts that the developers pay
the most attention to) are shown in red.
Basically, the developers want to know the size of the compiled SQLite
library and how many CPU cycles were needed to run the performance test.

<p>The output from the 
[https://sqlite.org/src/file/tool/cg_anno.tcl | cg_anno.tcl] script
shows the number of CPU cycles spent on each line of code.
The report is approximately 80,000 lines long.  The following is a brief
snippet taken from the middle of the report to show what it looks like:


<blockquote><pre>
         .  SQLITE_PRIVATE int sqlite3BtreeNext(BtCursor *pCur, int *pRes){
         .    MemPage *pPage;
         .    assert( cursorOwnsBtShared(pCur) );
         .    assert( pRes!=0 );
         .    assert( *pRes==0 || *pRes==1 );
         .    assert( pCur-&gt;skipNext==0 || pCur-&gt;eState!=CURSOR_VALID );
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
   728,110    if( pPage-&gt;leaf ){
         .      return SQLITE_OK;
         .    }else{
     3,117      return moveToLeftmost(pCur);
         .    }
   721,876  }
</pre></blockquote>
<hr>

<p>
The numbers on the left are the CPU cycle counts for that line of code,
of course.

<p>
The cg_anno.tcl script removes extraneous details from the default 







<







172
173
174
175
176
177
178

179
180
181
182
183
184
185
   728,110    if( pPage-&gt;leaf ){
         .      return SQLITE_OK;
         .    }else{
     3,117      return moveToLeftmost(pCur);
         .    }
   721,876  }
</pre></blockquote>


<p>
The numbers on the left are the CPU cycle counts for that line of code,
of course.

<p>
The cg_anno.tcl script removes extraneous details from the default 
Changes to pages/omitted.in.
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

feature {RIGHT and FULL OUTER JOIN} {
  LEFT OUTER JOIN is implemented, but not RIGHT OUTER JOIN or
  FULL OUTER JOIN.
}

feature {Complete ALTER TABLE support} {
  Only the RENAME TABLE and ADD COLUMN variants of the 
  ALTER TABLE command are supported.  Other kinds of ALTER TABLE operations
  such as
  DROP COLUMN, ALTER COLUMN, ADD CONSTRAINT, and so forth are omitted.
}

feature {Complete trigger support} {
  FOR EACH ROW triggers are supported but not FOR EACH STATEMENT







|







19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

feature {RIGHT and FULL OUTER JOIN} {
  LEFT OUTER JOIN is implemented, but not RIGHT OUTER JOIN or
  FULL OUTER JOIN.
}

feature {Complete ALTER TABLE support} {
  Only the RENAME TABLE, ADD COLUMN, and RENAME COLUMN variants of the 
  ALTER TABLE command are supported.  Other kinds of ALTER TABLE operations
  such as
  DROP COLUMN, ALTER COLUMN, ADD CONSTRAINT, and so forth are omitted.
}

feature {Complete trigger support} {
  FOR EACH ROW triggers are supported but not FOR EACH STATEMENT