Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add the tokenizer requirements file. Updates to system requirements. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
92de1a1982b1f3578134e016857bcadf |
User & Date: | drh 2008-08-06 22:02:13.000 |
Context
2008-08-07
| ||
01:30 | Tweaks to tokenizer requirements. Add an outline for syntax requirements. (check-in: f2f70f988e user: drh tags: trunk) | |
2008-08-06
| ||
22:02 | Add the tokenizer requirements file. Updates to system requirements. (check-in: 92de1a1982 user: drh tags: trunk) | |
2008-08-05
| ||
18:37 | Last minute updatest to the documentation before 3.6.1. (check-in: 57f8360ad3 user: drh tags: trunk) | |
Changes
Changes to pages/sysreq.in.
︙ | ︙ | |||
40 41 42 43 44 45 46 | <h2>1.0 SQLite is a translator from SQL into low-level disk I/O</h2> <tcl> sysreq S10000 {} { SQLite is an SQL database engine. And the fundamental task of every SQL database engine it to translate the abstract SQL statements | | | > | | 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | <h2>1.0 SQLite is a translator from SQL into low-level disk I/O</h2> <tcl> sysreq S10000 {} { SQLite is an SQL database engine. And the fundamental task of every SQL database engine it to translate the abstract SQL statements readily understood by humans into sequences of I/O operations readily understood by computer hardware. This requirement expresses the essesence of SQLite. } { The SQLite library shall translate high-level SQL statements into low-level I/O calls to persistent storage. } sysreq S10100 S10000 { SQL is one of the worlds most widely known programming languages, but it is also one of the most ill-defined. There are various SQL standards documents available. However the SQL standards documents are obtuse to the point of being incomprehensible. And the standards allow for so much "implementation defined" behavior that there exist two SQL database engines understand exactly the same language.</p> <p>SQLite does not attempt to obtain strict compliance with any one of the various SQL standards. Instead, SQLite tries to be as compatible as possible with other SQL |
︙ | ︙ | |||
540 541 542 543 544 545 546 | sysreq S50300 S50000 { An SQLite database file can be freely moved between machine with different operating systems, different processors, different size integers, and different byte orders. The same database file should work on any machine. } { | | | 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 | sysreq S50300 S50000 { An SQLite database file can be freely moved between machine with different operating systems, different processors, different size integers, and different byte orders. The same database file should work on any machine. } { SQLite database files shall be processor and byte-order independent. } </tcl> <h2>6.0 Introspection</h2> <tcl> sysreq S60000 {} { |
︙ | ︙ | |||
644 645 646 647 648 649 650 | sysreq S70100 S70000 { Some applications (for example <a href="http://www.cvstrac.org/">CVSTrac</a> and <a href="http://www.fossil-scm.org/">Fossil</a>) will run SELECT statements entered by anonymous users on the internet. Such applications want to be able to guarantee that a hostile users does | | | 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 | sysreq S70100 S70000 { Some applications (for example <a href="http://www.cvstrac.org/">CVSTrac</a> and <a href="http://www.fossil-scm.org/">Fossil</a>) will run SELECT statements entered by anonymous users on the internet. Such applications want to be able to guarantee that a hostile users does not access restricted tables (such as the PASSWORD column of the USER table) or modify the database in any way. SQLite supports the ability to analyze an arbitrary SQL statement to insure that it does not perform undesired operations. } { The SQLite library shall provide the application means by which the application can test and enforce compliance with database access policies for any particular SQL statement. |
︙ | ︙ |
Added pages/tokenreq.in.
> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 | <title>SQLite Tokenizer Requirements</title> <h1>Requirements For The SQLite Tokenizer</h1> <p>When processing SQL statements, SQLite (as does every other SQL database engine) breaks the SQL statement up into tokens which are then forwarded to the parser component. SQL statements are split into tokens by the "tokenizer" component of SQLite. This document specifies requirements that precisely define the operation of the SQLite tokenizer.</p> <h2>Character classes</h2> <p>SQL statements are composed of unicode characters. Specific individual characters many be described using a notation consisting of the character "u" followed by four hexadecimal digits. For example, the lower-case letter "a" can be expressed as "u0061" and the dollar sign can be expressed as "u0024". For notational convenience, the following character classes are defined:</p> <blockquote> <dl> <dt><b>WHITESPACE</b></dt> <dd>One of these five characters: u0009, u000a, u000c, u000d, or u0020</dd> <dt><b>ALPHABETIC</b></dt> <dd>Any of the characters in the range u0041 through u005a (letters "A" through "Z") or in the range u0061 through u007a (letters "a" through "z") or the character u005f ("_") or any other character larger than u007f.</dd> <dt><b>NUMERIC</b></dt> <dd>Any of the characters in the range u0030 through u0039 (digits "0" through "9")</dd> <dt><b>ALPHANUMERIC</b></dt> <dd>Any character which is either ALPHABETIC or NUMERIC</dd> <dt><b>HEXADECIMAL</b></dt> <dd>Any NUMERIC character or a characters in the range u0041 through u0046 ("A" through "F") or in the range u0061 through u0066 ("a" through "f") </dd> <dt><b>SPECIAL</b></dt> <dd>Any character which not WHITESPACE, ALPHABETIC, nor NUMERIC</dd> </dl> </blockquote> <h2>Token requirements</h2> <tcl> proc tokenreq {id derivedfrom explaination text} { hd_fragment $id $id set dlist {} foreach d $derivedfrom { append dlist <$d> } hd_requirement $id $text$dlist if {[string length $explaination]} { hd_resolve "<p>$explaination</p>" } hd_puts "<blockquote><b>$id:</b>" hd_resolve $text hd_puts {</b></blockquote} } tokenreq H41010 {} { Processing is left-to-right. This seems obvious, but it needs to be explicitly stated. } { SQLite shall divide input SQL text into tokens working from left to right. } tokenreq H41020 {} { The standard practice in SQL, as with most context-free grammar based programming languages, is to resolve ambiguities in tokenizing by selecting the option that results in the longest tokens. } { At each step in the SQL tokenization process, SQLite shall extract the longest possible token from the remaining input text. } </tcl> <h3>Whitespace tokens</h3> <tcl> tokenreq H41100 {} { Whitespace has the usual definition. } { SQLite shall recognize a sequence of one or more WHITESPACE characters as a WHITESPACE token. } tokenreq H41110 {} { An SQL comment is "--" through the end of line and is understood as whitespace. } { SQLite shall recognize as a WHITESPACE token the two-character sequence "--" (u002d, u002d) followed by any sequence of non-zero characters up through and including the first u000a character or until end of input. } tokenreq H41120 {} { A C-style comment "/*...*/" is also recognized as white-space. } { SQLite shall recognize as a WHITESPACE token the two-character sequence "/*" (u002f, u002a) followed by any sequence of zero or more non-zero characters through with the first "*/" (u002a, u002f) sequence or until end of input. } </tcl> <h3>Identifier tokens</h3> <tcl> tokenreq H41130 {} { Identifiers follow the usual rules with the exception that SQLite allows the dollar-sign symbol in the interior of an identifier. The dollar-sign is for compatibility with Microsoft SQL-Server and is not part of the SQL standard. } { SQLite shall recognize as an ID token any sequence of characters that begins with an ALPHABETIC character and continue with zero or more ALPHANUMERIC characters and/or "$" (u0024) characters and which is not a keyword token. } tokenreq H41140 {} { Identifiers can be arbitrary character strings within square brackets. This feature is also for compatibility with Microsoft SQL-Server and not a part of the SQL standard. } { SQLite shall recognize as an ID token any sequence of non-zero characters that begins with "[" (u005b) and continuing through the first "]" (u005d) character. } tokenreq H41150 {} { The standard way of quoting SQL identifiers is to use double-quotes. } { SQLite shall recognize as an ID token any sequence of characters that begins with a double-quote (u0022), is followed by zero or more non-zero characters and/or pairs of double-quotes (u0022) and terminates with a double-quote (u0022) that is not part of a pair. } tokenreq H41160 {} { MySQL allows identifiers to be quoted using the grave accent character. SQLite supports this for interoperability. } { SQLite shall recognize as an ID token any sequence of characters that begins with a grave accent (u0060), is followed by zero or more non-zero characters and/or pairs ofgrave accents (u0060) and terminates with a grave accent (u0022) that is not part of a pair. } </tcl> <h3>Literals</h3> <tcl> tokenreq H41200 {} { This is the usual definition of string literals for SQL. SQL uses the classic Pascal string literal format. } { SQLite shall recognize as a STRING token a sequence of characters that begins with a single-quote (u0027), is followed by zero or more non-zero characters and/or pairs of single-quotes (u0027) and terminates with a single-quote (u0027) that is not part of a pair. } tokenreq H41210 {} { Blob literals are similar to string literals except that they begin with a single "X" character and contain hexadecimal data. } { SQLite shall recognize as a BLOB token an upper or lower-case "X" (u0058 or u0078) followed by a single-quote (u0027) followed by a number of HEXADECIMAL character that is a multiple of two and terminated by a single-quote (u0027). } tokenreq H41220 {} { Integer literals are a string of digits. The plus or minus sign that might optionally preceed an integer is not part of the integer token. } { SQLite shall recognize as an INTEGER token any squence of one or more NUMERIC characters. } tokenreq H41230 {} { An "exponentiation suffix" is defined to be an upper or lower case "E" (u0045 or u0065) followed by one or more NUMERIC characters. The "E" and the NUMERIC characters may optionally be separated by a plus-sign (u002b) or a minus-sign (u002d). An exponentiation suffix is part of the definition of a FLOAT token: } { SQLite shall recognize as a FLOAT token a sequence of one or more NUMERIC characters together with zero or one period (u002e) and followed by an exponentiation suffix. } tokenreq H41240 {} {} { SQLite shall recognize as a FLOAT token a sequence of one or more NUMERIC characters that includes exactly one period (u002e) character. } </tcl> <h3>Variables</h3> <tcl> tokenreq H42010 {} { Variables are used as placeholders in SQL statements for constant values that are to be bound at start-time. } { SQLite shall recognize as a VARIABLE token the a question-mark (u003f) followed by zero or more NUMERIC characters. } tokenreq H42020 {} { A "parameter name" is defined to be a sequence of one or more characters that consists of ALPHANUMERIC characters and/or dollar-signs (u0025) intermixed with pairs of colons (u003a) and optionally followed by any sequence of non-zero, non-WHITESPACE characters enclosed in parentheses (u0028 and u0029). } { SQLite shall recognize as a VARIABLE token one of the characters at-sign (u0040), dollar-sign (u0024), or colon (u003a) followed by a parameter name. } tokenreq H42030 {} {} { SQLite shall recognize as a VARIABLE token the shape-sign (u0023) followed by a parameter name that does not begin with a NUMERIC character. } tokenreq H42040 {} { The REGISTER token is a special token used in certain unusual circumstances. } { SQLite shall recognize as a REGISTER token a sharp-sign (u0023) followed by one or more NUMERIC characters. } </tcl> <h3>Operator tokens</h3> <p>The following sequences of special characters are recognized as tokens:</p> <tcl> set id 41400 foreach {charseq tname} { - MINUS ( LP ) RP ; SEMI + PLUS * STAR / SLASH % REM = EQ == EQ <= LE <> NE << LSHIFT < LT >= GE >> RSHIFT > GT != NE , COMMA & BITAND ~ BITNOT | BITOR || CONCAT . DOT } { incr id 3 set n [string length $charseq] set body " SQLite shall recognize the $n-character sequenence " append body "\"$charseq\"" set sep " (" for {set i 0} {$i<$n} {incr i} { set c [string index $charseq $i] scan $c %c x append body [format ${sep}u%04x $x] set sep " " } append body ") as token $tname" tokenreq H$id {} {} $body } </tcl> <h3>Keyword tokens</h3> <p>The following keywords are recognized as distinct tokens:</p> <tcl> set id 41500 foreach {charseq tname} { ABORT ABORT ADD ADD AFTER AFTER ALL ALL ALTER ALTER ANALYZE ANALYZE AND AND AS AS ASC ASC ATTACH ATTACH AUTOINCREMENT AUTOINCR BEFORE BEFORE BEGIN BEGIN BETWEEN BETWEEN BY BY CASCADE CASCADE CASE CASE CAST CAST CHECK CHECK COLLATE COLLATE COLUMN COLUMNKW COMMIT COMMIT CONFLICT CONFLICT CONSTRAINT CONSTRAINT CREATE CREATE CROSS JOIN_KW CURRENT_DATE CTIME_KW CURRENT_TIME CTIME_KW CURRENT_TIMESTAMP CTIME_KW DATABASE DATABASE DEFAULT DEFAULT DEFERRED DEFERRED DEFERRABLE DEFERRABLE DELETE DELETE DESC DESC DETACH DETACH DISTINCT DISTINCT DROP DROP END END EACH EACH ELSE ELSE ESCAPE ESCAPE EXCEPT EXCEPT EXCLUSIVE EXCLUSIVE EXISTS EXISTS EXPLAIN EXPLAIN FAIL FAIL FOR FOR FOREIGN FOREIGN FROM FROM FULL JOIN_KW GLOB LIKE_KW GROUP GROUP HAVING HAVING IF IF IGNORE IGNORE IMMEDIATE IMMEDIATE IN IN INDEX INDEX INITIALLY INITIALLY INNER JOIN_KW INSERT INSERT INSTEAD INSTEAD INTERSECT INTERSECT INTO INTO IS IS ISNULL ISNULL JOIN JOIN KEY KEY LEFT JOIN_KW LIKE LIKE_KW LIMIT LIMIT MATCH MATCH NATURAL JOIN_KW NOT NOT NOTNULL NOTNULL NULL NULL OF OF OFFSET OFFSET ON ON OR OR ORDER ORDER OUTER JOIN_KW PLAN PLAN PRAGMA PRAGMA PRIMARY PRIMARY QUERY QUERY RAISE RAISE REFERENCES REFERENCES REGEXP LIKE_KW REINDEX REINDEX RENAME RENAME REPLACE REPLACE RESTRICT RESTRICT RIGHT JOIN_KW ROLLBACK ROLLBACK ROW ROW SELECT SELECT SET SET TABLE TABLE TEMP TEMP TEMPORARY TEMP THEN THEN TO TO TRANSACTION TRANSACTION TRIGGER TRIGGER UNION UNION UNIQUE UNIQUE UPDATE UPDATE USING USING VACUUM VACUUM VALUES VALUES VIEW VIEW VIRTUAL VIRTUAL WHEN WHEN WHERE WHERE } { incr id 3 set n [string length $charseq] set body " SQLite shall recognize the $n-character sequenence " append body "\"$charseq\" in any combination of upper and lower case" append body "letters as the keyword token $tname" tokenreq H$id {} {} $body } |