Documentation Source Text

Check-in [34514986f4]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:More mutation testing documentation.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 34514986f4fb70d5862e4805be3c93e47bcf6b99
User & Date: drh 2016-09-05 17:18:23.099
Context
2016-09-06
00:32
Fix typo on the homepage. (check-in: 3f158f149c user: drh tags: trunk)
2016-09-05
17:18
More mutation testing documentation. (check-in: 34514986f4 user: drh tags: trunk)
16:51
In the testing document, use <codeblock> instead of <pre> and add a section on mutation testing. (check-in: 15b43e7cdb user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/testing.in.
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712

713
714
715
716
717
718
719
720
indeterminate behavior in the SQLite code (and hence a bug), 
or a bug in the compiler.
Note that SQLite has, over the previous decade, encountered bugs
in each of GCC, Clang, and MSVC.  Compiler bugs, while rare, do happen,
which is why it is so important to test the code in an as-delivered
configuration.

<tcl>hd_fragment mutationtests</tcl>
<h2>Mutation testing</h2>

<p>Using gcov (or similar) to show that every branch instruction is taken
at least once in both directions is good measure of test suite quality.
But even better is showing that every branch instruction makes
a difference in the output.  In other words, we want to show 
not only that every branch instruction both jumps and falls through but also
that every branch is doing useful work and that the test suite is able
to detect and verify that work.  When a branch is found that does not
make a difference in the output, that suggests that the code associated 
the branch can be removed (reducing the size of the library and perhaps
making it run faster) or that the test suite is inadequately testing the
feature that the branch implements.

<p>SQLite strives to verify that every branch instruction makes a difference
using [https://en.wikipedia.org/wiki/Mutation_testing|mutation testing].

A script first compiles the SQLite source code into assembly language
(using, for example, the -S option to gcc).  Then the script steps through
the generated assembly language and, one by one, changes each branch 
instruction into either an unconditional jump or a no-op, compiles the 
result, and verifies that the test suite catches the mutation.

<p>
Unfortunately, SQLite contains many branch instructions that







|
















>
|







689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
indeterminate behavior in the SQLite code (and hence a bug), 
or a bug in the compiler.
Note that SQLite has, over the previous decade, encountered bugs
in each of GCC, Clang, and MSVC.  Compiler bugs, while rare, do happen,
which is why it is so important to test the code in an as-delivered
configuration.

<tcl>hd_fragment mutationtests {mutation testing}</tcl>
<h2>Mutation testing</h2>

<p>Using gcov (or similar) to show that every branch instruction is taken
at least once in both directions is good measure of test suite quality.
But even better is showing that every branch instruction makes
a difference in the output.  In other words, we want to show 
not only that every branch instruction both jumps and falls through but also
that every branch is doing useful work and that the test suite is able
to detect and verify that work.  When a branch is found that does not
make a difference in the output, that suggests that the code associated 
the branch can be removed (reducing the size of the library and perhaps
making it run faster) or that the test suite is inadequately testing the
feature that the branch implements.

<p>SQLite strives to verify that every branch instruction makes a difference
using [https://en.wikipedia.org/wiki/Mutation_testing|mutation testing].
[mutation test script|A script]
first compiles the SQLite source code into assembly language
(using, for example, the -S option to gcc).  Then the script steps through
the generated assembly language and, one by one, changes each branch 
instruction into either an unconditional jump or a no-op, compiles the 
result, and verifies that the test suite catches the mutation.

<p>
Unfortunately, SQLite contains many branch instructions that
Changes to pages/th3.in.
1
2
3
4
5
6
7
8
9
10
11
<title>TH3</title>
<tcl>hd_keywords {TH3}</tcl>

<fancy_format>
<h1>Overview</h1>

<p>SQLite Test Harness #3 (hereafter "TH3") is one of
[three test harnesses] used for testing SQLite.
TH3 meets the following objectives:</p>

<ul>



|







1
2
3
4
5
6
7
8
9
10
11
<title>TH3</title>
<tcl>hd_keywords {TH3}</tcl>

<table_of_contents>
<h1>Overview</h1>

<p>SQLite Test Harness #3 (hereafter "TH3") is one of
[three test harnesses] used for testing SQLite.
TH3 meets the following objectives:</p>

<ul>
50
51
52
53
54
55
56
57

58
59
60
61
62
63
64
65
language - a reimplementation of parts of the TCL language in a 
more portable form that would compile and run on SymbianOS, and 
that was sufficient to run the SQLite tests.  TH1
did not survive as a standard testing tool for SQLite,
but it did find continued service as a
scripting language used to customize the 
[http://www.fossil-scm.org/|Fossil] version control system.
There was also a "Test Harness #2", all traces of which have been

lost.  TH3 was the third attempt.

<p>At about that same time, some avionics manufacturers were
expressing interest in SQLite, which prompted the SQLite developers
to design TH3 to support the rigorous testing standards of
[https://en.wikipedia.org/wiki/DO-178B|DO-178B].

<p>The first code for TH3 was laid down on 2008-09-25.







|
>
|







50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
language - a reimplementation of parts of the TCL language in a 
more portable form that would compile and run on SymbianOS, and 
that was sufficient to run the SQLite tests.  TH1
did not survive as a standard testing tool for SQLite,
but it did find continued service as a
scripting language used to customize the 
[http://www.fossil-scm.org/|Fossil] version control system.
There was also a "Test Harness #2" which was an attempt to
create a simple scripting language using operator prefix notation
to drive tests. TH3 was the third attempt.

<p>At about that same time, some avionics manufacturers were
expressing interest in SQLite, which prompted the SQLite developers
to design TH3 to support the rigorous testing standards of
[https://en.wikipedia.org/wiki/DO-178B|DO-178B].

<p>The first code for TH3 was laid down on 2008-09-25.
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
<p>The TH3 program generator is a TCL script named "<tt>mkth3.tcl</tt>".
To generate a test program, one has merely to run this script and supply
the names of files containing test modules and configurations on the
command line.  Test modules are files that use the "<tt>.test</tt>" suffix
and configurations are files that use the "<tt>.cfg</tt>" suffix.  A
typical invocation of mkth3.tcl might look something like the following:</p>

<blockquote><pre>
tclsh mkth3.tcl *.test *.cfg &gt;testprog1.c
</pre></blockquote>

<p>The output from the mkth3.tcl script is a C program that contains
everything needed to run the tests - everything that is except for
the SQLite library itself.  The generated test program contains 
implementations for all of the support interfaces used by the test
modules and it contains the <tt>main()</tt> routine that drives the
tests.  To convert the test program into a working executable, simply
compile it against SQLite:</p>

<blockquote><pre>
cc -o testprog1 testprog1.c sqlite3.c
</pre></blockquote>

<p>The compilation step shown immediately above is merely representative.
In a working installation, one would normally want
to specify optimization parameters and compile-time switches on the
compiler command line.</p>

<p>For testing on embedded systems, the mkth3.tcl script and the compiler
steps shown above are performed on an ordinary workstation using
a cross-compiler, then the resulting test program is
transfer onto the device to be run.</p>

<p>Once the test program is generated, it is run with no arguments to
perform the tests.  Progress information as well as error diagnostics
appear on standard output.  (Alternative output arrangements can be made
using a compile-time option for embedded devices that lack a standard
output channel.) The program returns zero if there are no
errors and non-zero if any problems were detected.</p>

<p>Typical output from a single TH3 test program run looks like this:

<blockquote><pre>
With SQLite 3.8.11 2015-05-15 04:13:15 56ef98a04765c34c1c2f3ed7a6f03a732f3b886e
-DSQLITE_COVERAGE_TEST
-DSQLITE_NO_SYNC
-DSQLITE_SYSTEM_MALLOC
-DSQLITE_THREADSAFE=1
Config-begin c1.
Begin c1.pager08







|

|









|

|




















|







117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
<p>The TH3 program generator is a TCL script named "<tt>mkth3.tcl</tt>".
To generate a test program, one has merely to run this script and supply
the names of files containing test modules and configurations on the
command line.  Test modules are files that use the "<tt>.test</tt>" suffix
and configurations are files that use the "<tt>.cfg</tt>" suffix.  A
typical invocation of mkth3.tcl might look something like the following:</p>

<codeblock>
tclsh mkth3.tcl *.test *.cfg &gt;testprog1.c
</codeblock>

<p>The output from the mkth3.tcl script is a C program that contains
everything needed to run the tests - everything that is except for
the SQLite library itself.  The generated test program contains 
implementations for all of the support interfaces used by the test
modules and it contains the <tt>main()</tt> routine that drives the
tests.  To convert the test program into a working executable, simply
compile it against SQLite:</p>

<codeblock>
cc -o testprog1 testprog1.c sqlite3.c
</codeblock>

<p>The compilation step shown immediately above is merely representative.
In a working installation, one would normally want
to specify optimization parameters and compile-time switches on the
compiler command line.</p>

<p>For testing on embedded systems, the mkth3.tcl script and the compiler
steps shown above are performed on an ordinary workstation using
a cross-compiler, then the resulting test program is
transfer onto the device to be run.</p>

<p>Once the test program is generated, it is run with no arguments to
perform the tests.  Progress information as well as error diagnostics
appear on standard output.  (Alternative output arrangements can be made
using a compile-time option for embedded devices that lack a standard
output channel.) The program returns zero if there are no
errors and non-zero if any problems were detected.</p>

<p>Typical output from a single TH3 test program run looks like this:

<codeblock>
With SQLite 3.8.11 2015-05-15 04:13:15 56ef98a04765c34c1c2f3ed7a6f03a732f3b886e
-DSQLITE_COVERAGE_TEST
-DSQLITE_NO_SYNC
-DSQLITE_SYSTEM_MALLOC
-DSQLITE_THREADSAFE=1
Config-begin c1.
Begin c1.pager08
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
Config-begin wal1.
Begin wal1.wal37
End wal1.wal37
Config-end wal1. TH3 memory used: 100961
All 226 VDBE coverage points reached
th3: 0 errors out of 1442264 tests in 213.741 seconds. 64-bit little-endian
th3: SQLite 3.8.11 2015-05-15 04:13:15 56ef98a04765c34c1c2f3ed7a6f03a732f3b886e
</pre></blockquote>

<p>The output begins with a report of the [SQLITE_SOURCE_ID]
(cross-checked again [sqlite3_sourceid()]) for the
SQLite under test and the compile-time options used as reported
by [sqlite3_compileoption_get()].  The output concludes with a summary
of the test results and a repeat of the [SQLITE_SOURCE_ID].  If any
errors are detected, additional lines detail the problem.  The error
reporting lines always begin with a single space character so that they
can be quickly extracted from large output files using:

<blockquote><pre>
grep "&#94; "
</pre></blockquote>

<p>The default output shows the beginning and end of each configuration
and test module combination.  In the example above "c1" and "64k" are
configurations and "pager08", "build33", "orderby01", etc. are test modules.
Compile-time and run-time options are available to increase or decrease
the amount of output.
The output can be increased by showing each test case within each







|










|

|







180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
Config-begin wal1.
Begin wal1.wal37
End wal1.wal37
Config-end wal1. TH3 memory used: 100961
All 226 VDBE coverage points reached
th3: 0 errors out of 1442264 tests in 213.741 seconds. 64-bit little-endian
th3: SQLite 3.8.11 2015-05-15 04:13:15 56ef98a04765c34c1c2f3ed7a6f03a732f3b886e
</codeblock>

<p>The output begins with a report of the [SQLITE_SOURCE_ID]
(cross-checked again [sqlite3_sourceid()]) for the
SQLite under test and the compile-time options used as reported
by [sqlite3_compileoption_get()].  The output concludes with a summary
of the test results and a repeat of the [SQLITE_SOURCE_ID].  If any
errors are detected, additional lines detail the problem.  The error
reporting lines always begin with a single space character so that they
can be quickly extracted from large output files using:

<codeblock>
grep "&#94; "
</codeblock>

<p>The default output shows the beginning and end of each configuration
and test module combination.  In the example above "c1" and "64k" are
configurations and "pager08", "build33", "orderby01", etc. are test modules.
Compile-time and run-time options are available to increase or decrease
the amount of output.
The output can be increased by showing each test case within each
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
<p>The TH3 repository also includes the "multitest.tcl" script, another
TCL script used to automate TH3 testing on workstations.  Multitest.tcl
automatically compiles SQLite, then
runs ./th3make repeatedly with a variety of alignments, and captures
the output in a succinct summary screen.  A typical multitest.tcl run
generates output that looks like this:

<blockquote><pre>
file mkdir sqlite3bld
cd sqlite3bld
exec sh /home/drh/sqlite/sqlite/configure
file copy -force config.h ../config.h
exec make clean sqlite3.c
file rename sqlite3.c ../sqlite3.c
aa4f0f90c9c77424943e026a2ecee4a6c7f9e0d3  ../sqlite3.c







|







252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
<p>The TH3 repository also includes the "multitest.tcl" script, another
TCL script used to automate TH3 testing on workstations.  Multitest.tcl
automatically compiles SQLite, then
runs ./th3make repeatedly with a variety of alignments, and captures
the output in a succinct summary screen.  A typical multitest.tcl run
generates output that looks like this:

<codeblock>
file mkdir sqlite3bld
cd sqlite3bld
exec sh /home/drh/sqlite/sqlite/configure
file copy -force config.h ../config.h
exec make clean sqlite3.c
file rename sqlite3.c ../sqlite3.c
aa4f0f90c9c77424943e026a2ecee4a6c7f9e0d3  ../sqlite3.c
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
t37: fast.rc alignment6.rc..................................... Ok   (00:11:15)
t34: fast.rc alignment3.rc sqlite3udl.c........................ Ok   (00:23:05)
t38: fast.rc alignment7.rc..................................... Ok   (00:12:26)
t39: fast.rc -fsanitize=undefined.............................. Ok   (00:24:15)
*******************************************************************************
0 failures on 35 th3makes and 171555634 tests in (05:08:31) 3 cores on bella
SQLite 3.14.1 2016-08-11 13:08:14 34aed3a318a413fd180604365546c1f530d1c60c
</pre></blockquote>

<p>As can be seen above, a single run
of multitest.tcl invokes th3make dozens or times and takes between 12 and 24
CPU hours.  The middle section of the output shows the arguments to each
individual th3make run and the result and elapse time for that th3make.
All build products and output for the separate th3make runs are
captures in subdirectories for post-test analysis.







|







307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
t37: fast.rc alignment6.rc..................................... Ok   (00:11:15)
t34: fast.rc alignment3.rc sqlite3udl.c........................ Ok   (00:23:05)
t38: fast.rc alignment7.rc..................................... Ok   (00:12:26)
t39: fast.rc -fsanitize=undefined.............................. Ok   (00:24:15)
*******************************************************************************
0 failures on 35 th3makes and 171555634 tests in (05:08:31) 3 cores on bella
SQLite 3.14.1 2016-08-11 13:08:14 34aed3a318a413fd180604365546c1f530d1c60c
</codeblock>

<p>As can be seen above, a single run
of multitest.tcl invokes th3make dozens or times and takes between 12 and 24
CPU hours.  The middle section of the output shows the arguments to each
individual th3make run and the result and elapse time for that th3make.
All build products and output for the separate th3make runs are
captures in subdirectories for post-test analysis.
336
337
338
339
340
341
342









































343
344
345
346
347
348
349
The SQLite developers 
are committed to maintaining 100% branch coverage and MC/DC for all 
future releases of SQLite.</p>

<p>The cov1 test set used to obtain 100% branch test coverage are only a
subset of the tests currently implemented using TH3.  New test modules are
added on a regular basis.</p>










































<h1>TH3 License</h1>

<p>SQLite itself is in the <a href="copyright.html">public domain</a> and
can be used for any purpose.  But TH3 is proprietary and requires a license.
</p>








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
The SQLite developers 
are committed to maintaining 100% branch coverage and MC/DC for all 
future releases of SQLite.</p>

<p>The cov1 test set used to obtain 100% branch test coverage are only a
subset of the tests currently implemented using TH3.  New test modules are
added on a regular basis.</p>

<tcl>hd_fragment muttest {mutation test script}</tcl>
<h1>Mutation Testing</h1>

<p>The TH3 source tree contains a scripted name
"mutation-test.tcl" that automates the process of
[mutation testing].

<p>The mutation-test.tcl script takes care of all of the details for
running a mutation test:

<ol>
<li> The script compiles the TH3 test harness into machine code ("th3.o") if
     necessary.
<li> The script compiles the sqlite3.c source file into assembly language
     ("sqlite3.s") if necessary.
<li> The script loops through instructions in the assembly language file
     to locate branch operations.
     <ol type="a">
     <li>The script makes a copy of the original sqlite3.s file.
     <li>The copy is edited to change the branch instruction into either
         a no-op or an unconditional jump.
     <li>The copy of sqlite3.s is assemblied into sqlite3.o then linked
         again th3.o to generate the "th3" executable.
     <li>The "th3" binary is run and the output checked for errors.
     </ol>
<li> The script shows progress for each cycle of the previous step then
     displays a summary of "survivors" at the end.  A "survivor" is a
     mutation that was not detected by TH3.
</ol>

<p>Mutation testing can be slow, since each test can take up to 5
minutes on a fast workstation, and there are two tests for each
branch instructions, and over 20,000 branch instructions.  Efforts are
made to expedite operation.  For example, TH3 is compiled in such a
way that it exists as soon as it finds the first error, and as many
of the mutations are easily detected,  many cycles happen in ly
a few seconds.  Nevertheless, the mutation-test.tcl script includes
command-line options to limit the range of code lines tested so that
mutation testing only needs to be performed on blocks of code that
have recently changed.

<h1>TH3 License</h1>

<p>SQLite itself is in the <a href="copyright.html">public domain</a> and
can be used for any purpose.  But TH3 is proprietary and requires a license.
</p>