Documentation Source Text

Check-in [92de1a1982]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add the tokenizer requirements file. Updates to system requirements.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 92de1a1982b1f3578134e016857bcadf5f5899a8
User & Date: drh 2008-08-06 22:02:13.000
Context
2008-08-07
01:30
Tweaks to tokenizer requirements. Add an outline for syntax requirements. (check-in: f2f70f988e user: drh tags: trunk)
2008-08-06
22:02
Add the tokenizer requirements file. Updates to system requirements. (check-in: 92de1a1982 user: drh tags: trunk)
2008-08-05
18:37
Last minute updatest to the documentation before 3.6.1. (check-in: 57f8360ad3 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/sysreq.in.
40
41
42
43
44
45
46
47
48

49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64

<h2>1.0 SQLite is a translator from SQL into low-level disk I/O</h2>

<tcl>
sysreq S10000 {} {
  SQLite is an SQL database engine.  And the fundamental task of
  every SQL database engine it to translate the abstract SQL statements
  readily understood by humans, into sequences of I/O operations readily
  understood by computer hardware.

} {
  The SQLite library shall translate high-level SQL statements into
  low-level I/O calls to persistent storage.
}

sysreq S10100 S10000 {
  SQL is one of the worlds most widely known programming languages,
  but it is also one of the most ill-defined.  There are various SQL
  standards documents available.  But the SQL standards documents are 
  obtuse to the point of being incomprehensible.  And the standards 
  allow for so much "implementation defined" behavior that there exist
  two SQL database engines understand exactly the same language.</p>
  
  <p>SQLite does not attempt to obtain strict compliance with any
  one of the various SQL standards.
  Instead, SQLite tries to be as compatible as possible with other SQL







|
|
>








|







40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65

<h2>1.0 SQLite is a translator from SQL into low-level disk I/O</h2>

<tcl>
sysreq S10000 {} {
  SQLite is an SQL database engine.  And the fundamental task of
  every SQL database engine it to translate the abstract SQL statements
  readily understood by humans into sequences of I/O operations readily
  understood by computer hardware.  This requirement expresses the
  essesence of SQLite.
} {
  The SQLite library shall translate high-level SQL statements into
  low-level I/O calls to persistent storage.
}

sysreq S10100 S10000 {
  SQL is one of the worlds most widely known programming languages,
  but it is also one of the most ill-defined.  There are various SQL
  standards documents available.  However the SQL standards documents are 
  obtuse to the point of being incomprehensible.  And the standards 
  allow for so much "implementation defined" behavior that there exist
  two SQL database engines understand exactly the same language.</p>
  
  <p>SQLite does not attempt to obtain strict compliance with any
  one of the various SQL standards.
  Instead, SQLite tries to be as compatible as possible with other SQL
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554

sysreq S50300 S50000 {
  An SQLite database file can be freely moved between machine
  with different operating systems, different processors,
  different size integers, and different byte orders.  The same
  database file should work on any machine.
} {
  SQLite database files shall by processor and byte-order independent.
}

</tcl>
<h2>6.0 Introspection</h2>
<tcl>

sysreq S60000 {} {







|







541
542
543
544
545
546
547
548
549
550
551
552
553
554
555

sysreq S50300 S50000 {
  An SQLite database file can be freely moved between machine
  with different operating systems, different processors,
  different size integers, and different byte orders.  The same
  database file should work on any machine.
} {
  SQLite database files shall be processor and byte-order independent.
}

</tcl>
<h2>6.0 Introspection</h2>
<tcl>

sysreq S60000 {} {
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658

sysreq S70100 S70000 {
  Some applications (for example
  <a href="http://www.cvstrac.org/">CVSTrac</a> and
  <a href="http://www.fossil-scm.org/">Fossil</a>) will run SELECT
  statements entered by anonymous users on the internet.  Such 
  applications want to be able to guarantee that a hostile users does
  not access restricted tables (such as the password column of the user
  table) or modify the database in any way.  SQLite supports the ability
  to analyze an arbitrary SQL statement to insure that it does not
  perform undesired operations.
} {
  The SQLite library shall provide the application means by which the
  application can test and enforce compliance with database access
  policies for any particular SQL statement.







|







645
646
647
648
649
650
651
652
653
654
655
656
657
658
659

sysreq S70100 S70000 {
  Some applications (for example
  <a href="http://www.cvstrac.org/">CVSTrac</a> and
  <a href="http://www.fossil-scm.org/">Fossil</a>) will run SELECT
  statements entered by anonymous users on the internet.  Such 
  applications want to be able to guarantee that a hostile users does
  not access restricted tables (such as the PASSWORD column of the USER
  table) or modify the database in any way.  SQLite supports the ability
  to analyze an arbitrary SQL statement to insure that it does not
  perform undesired operations.
} {
  The SQLite library shall provide the application means by which the
  application can test and enforce compliance with database access
  policies for any particular SQL statement.
Added pages/tokenreq.in.




































































































































































































































































































































































































































































































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
<title>SQLite Tokenizer Requirements</title>

<h1>Requirements For The SQLite Tokenizer</h1>

<p>When processing SQL statements, SQLite (as does every other SQL
database engine) breaks the SQL statement up into tokens which are
then forwarded to the parser component.  SQL statements are split
into tokens by the "tokenizer" component of SQLite. This document specifies
requirements that precisely define the operation of the SQLite tokenizer.</p>

<h2>Character classes</h2>

<p>SQL statements are composed of unicode characters.  Specific
individual characters many be described using a notation consisting of
the character "u" followed by four hexadecimal digits.  For
example, the lower-case letter "a" can be expressed as "u0061"
and the dollar sign can be expressed as "u0024". 
For notational convenience, the following character classes are
defined:</p>

<blockquote>
<dl>
<dt><b>WHITESPACE</b></dt>
<dd>One of these five characters:  u0009, u000a, u000c, u000d, or u0020</dd>

<dt><b>ALPHABETIC</b></dt>
<dd>Any of the characters in the range u0041 through u005a (letters "A" 
    through "Z") or in the range u0061 through u007a (letters "a" through
    "z") or the character u005f ("_") or any other character larger than
    u007f.</dd>

<dt><b>NUMERIC</b></dt>
<dd>Any of the characters in the range u0030 through u0039 (digits "0"
    through "9")</dd>

<dt><b>ALPHANUMERIC</b></dt>
<dd>Any character which is either ALPHABETIC or NUMERIC</dd>

<dt><b>HEXADECIMAL</b></dt>
<dd>Any NUMERIC character or a characters in the range u0041 through u0046
    ("A" through "F") or in the range u0061 through u0066 ("a" through "f")
</dd>


<dt><b>SPECIAL</b></dt>
<dd>Any character which not WHITESPACE, ALPHABETIC, nor NUMERIC</dd>
</dl>
</blockquote>

<h2>Token requirements</h2>

<tcl>
proc tokenreq {id derivedfrom explaination text} {
  hd_fragment $id $id
  set dlist {}
  foreach d $derivedfrom {
    append dlist <$d>
  }
  hd_requirement $id $text$dlist
  if {[string length $explaination]} {
    hd_resolve "<p>$explaination</p>"
  }
  hd_puts "<blockquote><b>$id:</b>"
  hd_resolve $text
  hd_puts {</b></blockquote}
}

tokenreq H41010 {} {
  Processing is left-to-right.  This seems obvious, but it needs to be
  explicitly stated.
} {
  SQLite shall divide input SQL text into tokens working from left to
  right.
}

tokenreq H41020 {} {
  The standard practice in SQL, as with most context-free grammar based
  programming languages, is to resolve ambiguities in tokenizing by
  selecting the option that results in the longest tokens.
} {
  At each step in the SQL tokenization process, SQLite shall extract
  the longest possible token from the remaining input text.
}

</tcl>
<h3>Whitespace tokens</h3>
<tcl>

tokenreq H41100 {} {
  Whitespace has the usual definition.
} {
  SQLite shall recognize a sequence of one or more WHITESPACE characters
  as a WHITESPACE token.
}

tokenreq H41110 {} {
  An SQL comment is "--" through the end of line and is understood as
  whitespace.
} {
  SQLite shall recognize as a WHITESPACE token the two-character sequence "--" 
  (u002d, u002d) followed by any sequence of non-zero characters up through and
  including the first u000a character or until end of input.
}

tokenreq H41120 {} {
  A C-style comment "/*...*/" is also recognized as white-space.
} {
  SQLite shall recognize as a WHITESPACE token the two-character sequence "/*"
  (u002f, u002a) followed by any sequence of zero or more 
  non-zero characters through with the first "*/" (u002a, u002f) sequence or 
  until end of input.
}

</tcl>
<h3>Identifier tokens</h3>
<tcl>

tokenreq H41130 {} {
  Identifiers follow the usual rules with the exception that SQLite
  allows the dollar-sign symbol in the interior of an identifier.
  The dollar-sign is for compatibility with Microsoft SQL-Server
  and is not part of the SQL standard.
} {
  SQLite shall recognize as an ID token 
  any sequence of characters that begins with
  an ALPHABETIC character and continue with zero or more
  ALPHANUMERIC characters and/or "$" (u0024) characters and which is
  not a keyword token.
}

tokenreq H41140 {} {
  Identifiers can be arbitrary character strings within square brackets.
  This feature is also for compatibility with Microsoft SQL-Server
  and not a part of the SQL standard.
} {
  SQLite shall recognize as an ID token
  any sequence of non-zero characters that begins with "&#91;" (u005b) and
  continuing through the first "&#93;" (u005d) character.
}

tokenreq H41150 {} {
  The standard way of quoting SQL identifiers is to use double-quotes.
} {
  SQLite shall recognize as an ID token
  any sequence of characters
  that begins with a double-quote (u0022), is followed by zero or
  more non-zero characters and/or pairs of double-quotes (u0022)
  and terminates with a double-quote (u0022) that
  is not part of a pair.
}

tokenreq H41160 {} {
  MySQL allows identifiers to be quoted using the grave accent character.
  SQLite supports this for interoperability.
} {
  SQLite shall recognize as an ID token
  any sequence of characters
  that begins with a grave accent (u0060), is followed by zero or
  more non-zero characters and/or pairs ofgrave accents (u0060)
  and terminates with a grave accent (u0022) that
  is not part of a pair.
}

</tcl>
<h3>Literals</h3>
<tcl>

tokenreq H41200 {} {
  This is the usual definition of string literals for SQL.
  SQL uses the classic Pascal string literal format.
} {
  SQLite shall recognize as a STRING token a sequence of characters
  that begins with a single-quote (u0027), is followed by zero or
  more non-zero characters and/or pairs of single-quotes (u0027)
  and terminates with a single-quote (u0027) that
  is not part of a pair.
}

tokenreq H41210 {} {
  Blob literals are similar to string literals except that they
  begin with a single "X" character and contain hexadecimal data.
} {
  SQLite shall recognize as a BLOB token an upper or lower-case "X"
  (u0058 or u0078) followed by a single-quote (u0027) followed by
  a number of HEXADECIMAL character that is a multiple of two and
  terminated by a single-quote (u0027).
}

tokenreq H41220 {} {
  Integer literals are a string of digits.  The plus or minus sign
  that might optionally preceed an integer is not part of the integer
  token.
} {
  SQLite shall recognize as an INTEGER token any squence of
  one or more NUMERIC characters.
}

tokenreq H41230 {} {
  An "exponentiation suffix" is defined to be an upper or lower
  case "E" (u0045 or u0065) followed by one or more NUMERIC
  characters.  The "E" and the NUMERIC characters may optionally
  be separated by a plus-sign (u002b) or a minus-sign (u002d).
  An exponentiation suffix is part of the definition of a FLOAT
  token:
} {
  SQLite shall recognize as a FLOAT token a sequence of one
  or more NUMERIC characters together with zero or one period
  (u002e) and followed by an exponentiation suffix.
}
tokenreq H41240 {} {} {
  SQLite shall recognize as a FLOAT token a sequence of one
  or more NUMERIC characters that includes exactly one period
  (u002e) character.
}

</tcl>
<h3>Variables</h3>
<tcl>

tokenreq H42010 {} {
  Variables are used as placeholders in SQL statements for constant
  values that are to be bound at start-time.
} {
  SQLite shall recognize as a VARIABLE token the a question-mark (u003f)
  followed by zero or more NUMERIC characters.
}

tokenreq H42020 {} {
  A "parameter name" is defined to be a sequence of one or more
  characters that consists of
  ALPHANUMERIC characters and/or dollar-signs (u0025) intermixed with
  pairs of colons (u003a) and optionally followed by any sequence
  of non-zero, non-WHITESPACE characters enclosed in parentheses
  (u0028 and u0029).
} {
  SQLite shall recognize as a VARIABLE token one of the characters
  at-sign (u0040), dollar-sign (u0024), or colon (u003a) followed
  by a parameter name.
}
tokenreq H42030 {} {} {
  SQLite shall recognize as a VARIABLE token the shape-sign (u0023)
  followed by a parameter name that does not begin with a
  NUMERIC character.
}

tokenreq H42040 {} {
  The REGISTER token is a special token used in certain unusual
  circumstances.
} {
  SQLite shall recognize as a REGISTER token a sharp-sign (u0023)
  followed by one or more NUMERIC characters.
}

</tcl>
<h3>Operator tokens</h3>

<p>The following sequences of special characters are recognized as
tokens:</p>
<tcl>

set id 41400
foreach {charseq tname} {
  - MINUS
  ( LP
  ) RP
  ; SEMI
  + PLUS
  * STAR
  / SLASH
  % REM
  = EQ
  == EQ
  <= LE
  <> NE
  << LSHIFT
  < LT
  >= GE
  >> RSHIFT
  > GT
  != NE
  , COMMA
  & BITAND
  ~ BITNOT
  | BITOR
  || CONCAT
  . DOT
} {
  incr id 3
  set n [string length $charseq]
  set body " SQLite shall recognize the $n-character sequenence "
  append body "\"$charseq\""
  set sep " ("
  for {set i 0} {$i<$n} {incr i} {
    set c [string index $charseq $i]
    scan $c %c x
    append body [format ${sep}u%04x $x]
    set sep " "
  }
  append body ") as token $tname"
  tokenreq H$id {} {} $body
}

</tcl>
<h3>Keyword tokens</h3>

<p>The following keywords are recognized as distinct tokens:</p>
<tcl>

set id 41500
foreach {charseq tname} {
   ABORT ABORT
   ADD ADD
   AFTER AFTER
   ALL ALL
   ALTER ALTER
   ANALYZE ANALYZE
   AND AND
   AS AS
   ASC ASC
   ATTACH ATTACH
   AUTOINCREMENT AUTOINCR
   BEFORE BEFORE
   BEGIN BEGIN
   BETWEEN BETWEEN
   BY BY
   CASCADE CASCADE
   CASE CASE
   CAST CAST
   CHECK CHECK
   COLLATE COLLATE
   COLUMN COLUMNKW
   COMMIT COMMIT
   CONFLICT CONFLICT
   CONSTRAINT CONSTRAINT
   CREATE CREATE
   CROSS JOIN_KW
   CURRENT_DATE CTIME_KW
   CURRENT_TIME CTIME_KW
   CURRENT_TIMESTAMP CTIME_KW
   DATABASE DATABASE
   DEFAULT DEFAULT
   DEFERRED DEFERRED
   DEFERRABLE DEFERRABLE
   DELETE DELETE
   DESC DESC
   DETACH DETACH
   DISTINCT DISTINCT
   DROP DROP
   END END
   EACH EACH
   ELSE ELSE
   ESCAPE ESCAPE
   EXCEPT EXCEPT
   EXCLUSIVE EXCLUSIVE
   EXISTS EXISTS
   EXPLAIN EXPLAIN
   FAIL FAIL
   FOR FOR
   FOREIGN FOREIGN
   FROM FROM
   FULL JOIN_KW
   GLOB LIKE_KW
   GROUP GROUP
   HAVING HAVING
   IF IF
   IGNORE IGNORE
   IMMEDIATE IMMEDIATE
   IN IN
   INDEX INDEX
   INITIALLY INITIALLY
   INNER JOIN_KW
   INSERT INSERT
   INSTEAD INSTEAD
   INTERSECT INTERSECT
   INTO INTO
   IS IS
   ISNULL ISNULL
   JOIN JOIN
   KEY KEY
   LEFT JOIN_KW
   LIKE LIKE_KW
   LIMIT LIMIT
   MATCH MATCH
   NATURAL JOIN_KW
   NOT NOT
   NOTNULL NOTNULL
   NULL NULL
   OF OF
   OFFSET OFFSET
   ON ON
   OR OR
   ORDER ORDER
   OUTER JOIN_KW
   PLAN PLAN
   PRAGMA PRAGMA
   PRIMARY PRIMARY
   QUERY QUERY
   RAISE RAISE
   REFERENCES REFERENCES
   REGEXP LIKE_KW
   REINDEX REINDEX
   RENAME RENAME
   REPLACE REPLACE
   RESTRICT RESTRICT
   RIGHT JOIN_KW
   ROLLBACK ROLLBACK
   ROW ROW
   SELECT SELECT
   SET SET
   TABLE TABLE
   TEMP TEMP
   TEMPORARY TEMP
   THEN THEN
   TO TO
   TRANSACTION TRANSACTION
   TRIGGER TRIGGER
   UNION UNION
   UNIQUE UNIQUE
   UPDATE UPDATE
   USING USING
   VACUUM VACUUM
   VALUES VALUES
   VIEW VIEW
   VIRTUAL VIRTUAL
   WHEN WHEN
   WHERE WHERE
} {
  incr id 3
  set n [string length $charseq]
  set body " SQLite shall recognize the $n-character sequenence "
  append body "\"$charseq\" in any combination of upper and lower case"
  append body "letters as the keyword token $tname"
  tokenreq H$id {} {} $body
}