Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Updates to the About document. |
---|---|
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
1e99bb882867dce45ebc29f2e6187f7b |
User & Date: | drh 2008-11-30 01:58:52.000 |
Context
2008-11-30
| ||
13:09 | Fix typos and poor wording in the about.wiki document. check-in: f30534fbb0 user: drh tags: trunk | |
01:58 | Updates to the About document. check-in: 1e99bb8828 user: drh tags: trunk | |
00:37 | First sample prototype script with its generator. check-in: c7bfa7e3cf user: drh tags: trunk | |
Changes
Changes to about.wiki.
1 2 | <h1>About Sqllogictest</h1> | | > > | > | | | | > > > | | > > | < | | | | | | | | | > | | | > > | | | > | | | | | | > | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 | <h1>About Sqllogictest</h1> Sqllogictest is a program designed to verify that an SQL database engine computes correct results to queries by comparing the results to identical queries submitted to other SQL database engines. Sqllogictest was originally designed to test [http://www.sqlite.org/ | SQLite], but it is database engine neutral and can just as easily be used to test other database products. Sqllogictest is concerned only with correct results. No attention is paid to performance, optimal use of indices, disk and memory usage, transactional behavior, or concurrancy and locking issues. The sqllogictest program seeks to answer just one question: <blockquote><i> Does the database engine compute the correct answer. </i></blockquote> Every SQL database engine has test vectors used to validate its operation. These manually generated test vectors are important. But generating test vectors is tedious, since the correct solutions must be computed and verifed by hand. The sqllogictest program is designed to sidestep this tedium by using independently developed database engines generate the reference test results automatically. This allows millions of test vectors to be producted by simple scripts, which in turn provides much more thorough and complete testing of the database engine. <h2>Operation</h2> The sqllogictest program is driven by test scripts containing SQL statements and queries and, sometimes, query results. A test script that omits the results is called a "prototype script". A test script that includes results is a "full script". The sqllogictest program operates in two modes: test script completion mode and test script validation mode. In test script completion mode, the sqllogictest program reads a prototype script and runs the statements and queries against a reference database engine. The output is a full script that is a copy of the prototype script with result inserted. In validation mode, the sqllogictest program reads a full script and runs the statements and queries contained therein against a database engine under test. The results received back from the database engine are compared against the results in the full script to validate the output of the database engine. For example, to verify that SQLite gets the same answer as MySQL on a particular set of queries, one might execute commands as follows. First complete the prototype script using MySQL as the reference database engine: <blockquote><pre> sqllogictest -engine MySQL prototype.test >full.test </pre></blockquote> Then verify the results using SQLite: <blockquote><pre> sqllogictest -verify -engine SQLite full.test </pre></blockquote> The second command will display any discrepencies between the output generated by SQLite and the reference data that was generated by MySQL in the first command. Notice that the default mode of operation for sqllogictest is completion mode. The -verify command-line option is used to activate validation mode. Another approach to validation is to run the competion step separately for each database engine and save the output in separate test scripts. Then compare the two test scripts using a file comparison utility. For example: <blockquote><pre> sqllogictest -engine MySQL prototype.test >full-1.test sqllogictest -engine SQLite prototype.test >full-2.test diff full-1.test full-2.test </pre></blockquote> Note that in completion mode, the sqllogictest program will ignore any result value contained in its input test script. Or, in other words, it is safe to pass a full test script into sqllogictest running in completion mode. So the previous test could have been run this way: <blockquote><pre> sqllogictest -engine MySQL prototype.test >full-1.test sqllogictest -engine SQLite full-1.test >full-2.test diff full-1.test full-2.test </pre></blockquote> The long-term vision of the sqllogictest project is to amass a huge collection of prototype scripts. Many of the prototype scripts will be very large, being automatically generated by some dynamic language like Tcl or Perl. Prototype scripts will try queries using all kinds and combinations of constraints, column orders, joins, indexes, and so forth in order to exercize as many paths through the logic of the database engine as possible. <h2>Test-Script Format</h2> Test scripts are line-oriented ASCII text files. No provision is made for Unicode; the purpose of sqllogictest is to test the query and join logic of the database engine, not its support for localization and internationalization. Test scripts consist of zero or more records. A record represents either a single statement or query. Each record is separated from its neighbors by one or more blank line. Records are evaluated in order, starting from the beginning of the script and working toward the end. Lines of the test script that begin with the sharp character ("#", ASCII code 35) are comment lines and are ignored. Comment lines are not considered blank lines and cannot be used to separate records. Comments typically occur at the beginning of a record, but they are allowed to occur in the middle of a record. Comments that occurs in the middle of an SQL statement are stripped from the statement prior to the statement being sent to the database engine for evaluation. Comments are logically removed from the script by a preprocessor. Hence, when we speak of the "first line of a record" we really mean the "first non-comment line of a record". Each record is either a statement or a query. A statement is an SQL command that is to be evaluated but from which we do not expect to get results (other than success or failure). A statement might be a CREATE TABLE or an INSERT or an UPDATE or a DROP INDEX. A query is an SQL command from which we expect to receive results. The result set |
︙ | ︙ | |||
148 149 150 151 152 153 154 | of thinking about completion mode is that it copies the script from input to output, replacing all "----" lines and subsequent result values with the actual results from running the query. The <type-string> argument to the query statement is a short string that specifies the number of result columns and the expected datatype of each result column. There is one character in the <type-string> | | | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 | of thinking about completion mode is that it copies the script from input to output, replacing all "----" lines and subsequent result values with the actual results from running the query. The <type-string> argument to the query statement is a short string that specifies the number of result columns and the expected datatype of each result column. There is one character in the <type-string> for each result column. The characters codes are "T" for a text result, "I" for an integer result, and "R" for a floating-point result. The <sort-mode> argument is optional. If included, it must be one of "nosort", "rowsort", or "valuesort". The default is "nosort". In nosort mode, the results appear in exactly the order in which they were received from the database engine. The nosort mode should only be used on queries that have an ORDER BY clause or which only have a single row of result, since otherwise the order of results is undefined and might vary from one database engine to another. The "rowsort" mode gathers all output from the database engine then sorts it by rows on the client side. Sort comparisons use strcmp() on the rendered ASCII text representation of the values. Hence, "9" sorts after "10", not before. The "valuesort" mode works like rowsort except that it does not honor row groupings. Each individual result value is sorted on its own. In the results section, integer values are rendered as if by printf("%d"). Floating point values are rendered as if by printf("%.3f"). NULL values are rendered as "NULL". Empty strings are rendered as "(empty)". Within non-empty strings, all control characters and unprintable characters are rendered as "@". <h2>Suggestions For Generating Test-Scripts</h2> When sqllogictest runs a test script, it begins with a completely empty database. So the first few records of any test script will typically be CREATE statements of various kinds and expecially CREATE TABLE statements. In order to maximize the portability of scripts across database engines, it is suggested that test scripts stick to the basic CREATE TABLE syntax. Use only a few common datatypes such as: * INTEGER * VARCHAR(30) * REAL Remember, the purpose of sqllogictest is to validate the logic behind the evaluation of SQL statements, not the ability to handle extreme values. So keep content in a reasonable range: small integers, short strings, and floating point numbers that use only the most significant bits of an a 32-bit IEEE float. After creating one or more tables and populating them with test data, use a dynamic language (TCL, Perl, Python, Ruby) to implement a templating scheme that will generate thousands or millions of separate queries. Use a pseudo-random number generator (PRNG) to fill in the templates at random. Seed the PRNG with a constant at the beginning of the dynamic-language program so that rerunning the program will generate the same test-script every time. In the dynamic-language programs that generate test scripts, it is useful to have subroutines that compute elements such as the following: * Randomly permute the elements of a result set. * Choose a random subset of columns in a table to be updated. * Generate a random WHERE clause. * Generate a random string literal of some maximum length. * Generate a random identifier which is not a keyword. Use LIMIT and OFFSET clauses to keep results set sizes under control. It is recommented that most queries use an ORDER BY clause so that the order of values in the output is deterministic. Of course, it is also desirable to have some test cases that omit ORDER BY. In those cases use either the "rowsort" or "valuesort" modifiers at the beginning of the query record to ensure that the output appears in the same order on all database engines. A typical test script will begin with some CREATE statements followed by some INSERT statements to add initial data. This is followed by thousands of randomly generate UPDATE, DELETE, and INSERT statements. Several SELECT statements typical follow each UPDATE, DELETE, or INSERT in order to verify that the content of the database is as expected. It is useful to includes some NULL values in the initial data in order to test the NULL handling logic. Be careful, however, in that different database engines interpret NULLs in a UNIQUE constraint differently. SQLite, PostgreSQL, Oracle, MySQL, and Firebird do it one way. Informix, DB2, and MS-SQL do it another. So while NULLs are useful test values, it will be best to avoid using NULLs in UNIQUE columns. One might have a large set of DELETE, INSERT, SELECT, and UPDATE statements that are repeated multiple times, but with various CREATE INDEX and DROP INDEX statements in between each iteration. Such tests seeks to prove that the same results appear regardless of whether or not indices are present. Remember, the purpose of sqllogictest is to verify that the database engine gets correct results, not that it makes effective use of indices. |