Documentation Source Text

Check-in [df95e4e1ec]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix typos in documentation.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:df95e4e1ec75dcd2a224abdb1189f7e99896d584
User & Date: drh 2013-10-16 23:58:48
Context
2013-10-17
12:54
Add news for the 3.8.1 release. Other final documentation tweaks prior to release. check-in: ca4b03a1f2 user: drh tags: trunk
2013-10-16
23:58
Fix typos in documentation. check-in: df95e4e1ec user: drh tags: trunk
2013-10-15
19:08
Added the fts4aux languageid column. Fix typos in famous.html check-in: cd0e9d17ed user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/fts3.in.

1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
....
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
....
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
    When set to the value "fts3", the matchinfo option reduces the amount of
    information stored by FTS4 with the consequence that the "l" option of
    [matchinfo()] is no longer available.

  <tr><td>notindexed<td> 
    This option is used to specify the name of a column for which data is
    not indexed. Values stored in columns that are not indexed are not
    matched by MATCH queries. Nor are they recognized by auxilliary functions.
    A single CREATE VIRTUAL TABLE statement may have any number of notindexed 
    options.

  <tr><td>order<td>
    <tcl>hd_fragment fts4order {FTS4 order option}</tcl>
    ^The "order" option may be set to either "DESC" or "ASC" (in upper or
    lower case). ^If it is set to "DESC", then FTS4 stores its data in such
................................................................................
  <i>-- are tokenized and added to the inverted index.</i>
  CREATE VIRTUAL TABLE t1 USING fts4(c1, c2, c3, c4, notindexed=c1, notindexed=c3);
</codeblock>

<p>
  Values stored in unindexed columns are not eligible to match MATCH 
  operators. The do not influence the results of the offsets() or matchinfo()
  auxilliary functions. Nor will the snippet() function ever return a
  snippet based on a value stored in an unindexed column.

<tcl>hd_fragment fts4prefix {FTS4 prefix option}</tcl>
<h2 tags="fts4 prefix option">The prefix= option</h2>

<p>
  ^The FTS4 prefix option causes FTS to index term prefixes of specified lengths
................................................................................
  case folding according to rules in Unicode Version 6.1 and it recognizes
  unicode space and punctuation characters and uses those to separate tokens.
  The simple tokenizer only does case folding of ASCII characters and only
  recognizes ASCII space and punctuation characters as token separators.

<p>
  By default, "unicode61" also removes all diacritics from Latin script
  characters. This behaviour can be overriden by adding the tokenizer argument
  "remove_diacritics=0". For example:

<codeblock>
    <i>-- Create tables that remove diacritics from Latin script characters</i>
    <i>-- as part of tokenization.</i>
    CREATE VIRTUAL TABLE txt1 USING fts4(tokenize=unicode61);
    CREATE VIRTUAL TABLE txt2 USING fts4(tokenize=unicode61 "remove_diacritics=1");







|







 







|







 







|







1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
....
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
....
2073
2074
2075
2076
2077
2078
2079
2080
2081
2082
2083
2084
2085
2086
2087
    When set to the value "fts3", the matchinfo option reduces the amount of
    information stored by FTS4 with the consequence that the "l" option of
    [matchinfo()] is no longer available.

  <tr><td>notindexed<td> 
    This option is used to specify the name of a column for which data is
    not indexed. Values stored in columns that are not indexed are not
    matched by MATCH queries. Nor are they recognized by auxiliary functions.
    A single CREATE VIRTUAL TABLE statement may have any number of notindexed 
    options.

  <tr><td>order<td>
    <tcl>hd_fragment fts4order {FTS4 order option}</tcl>
    ^The "order" option may be set to either "DESC" or "ASC" (in upper or
    lower case). ^If it is set to "DESC", then FTS4 stores its data in such
................................................................................
  <i>-- are tokenized and added to the inverted index.</i>
  CREATE VIRTUAL TABLE t1 USING fts4(c1, c2, c3, c4, notindexed=c1, notindexed=c3);
</codeblock>

<p>
  Values stored in unindexed columns are not eligible to match MATCH 
  operators. The do not influence the results of the offsets() or matchinfo()
  auxiliary functions. Nor will the snippet() function ever return a
  snippet based on a value stored in an unindexed column.

<tcl>hd_fragment fts4prefix {FTS4 prefix option}</tcl>
<h2 tags="fts4 prefix option">The prefix= option</h2>

<p>
  ^The FTS4 prefix option causes FTS to index term prefixes of specified lengths
................................................................................
  case folding according to rules in Unicode Version 6.1 and it recognizes
  unicode space and punctuation characters and uses those to separate tokens.
  The simple tokenizer only does case folding of ASCII characters and only
  recognizes ASCII space and punctuation characters as token separators.

<p>
  By default, "unicode61" also removes all diacritics from Latin script
  characters. This behaviour can be overridden by adding the tokenizer argument
  "remove_diacritics=0". For example:

<codeblock>
    <i>-- Create tables that remove diacritics from Latin script characters</i>
    <i>-- as part of tokenization.</i>
    CREATE VIRTUAL TABLE txt1 USING fts4(tokenize=unicode61);
    CREATE VIRTUAL TABLE txt2 USING fts4(tokenize=unicode61 "remove_diacritics=1");

Changes to pages/lang.in.

2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
....
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
  between 0.0 and 1.0, inclusive.)^
  ^The likelihood(X) function is a no-op that the code generator
  optimizes away so that it consumes no CPU cycles during run-time
  (that is, during calls to [sqlite3_step()]).
  ^The purpose of the likelihood(X,Y) function is to provide a hint
  to the query planner that the argument X is a boolean that is
  true with a probability of approximately Y.
  ^(The [unlikely(X)] fucntion is short-hand for likelihood(X,0.0625).)^
}

funcdef {load_extension(X) load_extension(X,Y)} {} {
  ^The load_extension(X,Y) function loads [SQLite extensions] out of the shared
  library file named X using the entry point Y.  ^The result of load_extension()
  is always a NULL.  ^If Y is omitted then the default entry point name is used.
  ^The load_extension() function raises an exception if the extension fails to
................................................................................
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^

<tcl>hd_fragment crossjoin {treats the CROSS JOIN operator specially}</tcl>
<p><b>Side note: Special handling of CROSS JOIN.</b>
^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. They are completely interchangable in SQLite.
^(The "CROSS JOIN" join operator produces the same result as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
<a href=optoverview.html#crossjoin>handled differently by the query
optimizer</a> in that it prevents the query optimizer from reordering
the tables in the join.  An application programmer can use the CROSS JOIN 
operator to directly influense the algorithm that is chosen to implement
the SELECT statement.  Avoid using CROSS JOIN except in specific situations 
where manual control of the query optimizer is desired.  Avoid using
CROSS JOIN early in the development of an application as doing so is
a <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature
optimization</a>.  The special handling of CROSS JOIN is an SQLite-specific
feature and is not a part of standard SQL.
       







|







 







|





|







2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
....
3126
3127
3128
3129
3130
3131
3132
3133
3134
3135
3136
3137
3138
3139
3140
3141
3142
3143
3144
3145
3146
  between 0.0 and 1.0, inclusive.)^
  ^The likelihood(X) function is a no-op that the code generator
  optimizes away so that it consumes no CPU cycles during run-time
  (that is, during calls to [sqlite3_step()]).
  ^The purpose of the likelihood(X,Y) function is to provide a hint
  to the query planner that the argument X is a boolean that is
  true with a probability of approximately Y.
  ^(The [unlikely(X)] function is short-hand for likelihood(X,0.0625).)^
}

funcdef {load_extension(X) load_extension(X,Y)} {} {
  ^The load_extension(X,Y) function loads [SQLite extensions] out of the shared
  library file named X using the entry point Y.  ^The result of load_extension()
  is always a NULL.  ^If Y is omitted then the default entry point name is used.
  ^The load_extension() function raises an exception if the extension fails to
................................................................................
the join operations are processed in order from left to right. In other 
words, the FROM clause (A join-op-1 B join-op-2 C) is computed as 
((A join-op-1 B) join-op-2 C).)^

<tcl>hd_fragment crossjoin {treats the CROSS JOIN operator specially}</tcl>
<p><b>Side note: Special handling of CROSS JOIN.</b>
^There is no difference between the "INNER JOIN", "JOIN" and "," join
operators. They are completely interchangeable in SQLite.
^(The "CROSS JOIN" join operator produces the same result as the 
"INNER JOIN", "JOIN" and "," operators)^, but is 
<a href=optoverview.html#crossjoin>handled differently by the query
optimizer</a> in that it prevents the query optimizer from reordering
the tables in the join.  An application programmer can use the CROSS JOIN 
operator to directly influence the algorithm that is chosen to implement
the SELECT statement.  Avoid using CROSS JOIN except in specific situations 
where manual control of the query optimizer is desired.  Avoid using
CROSS JOIN early in the development of an application as doing so is
a <a href="http://c2.com/cgi/wiki?PrematureOptimization">premature
optimization</a>.  The special handling of CROSS JOIN is an SQLite-specific
feature and is not a part of standard SQL.
       

Changes to pages/rtree.in.

129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<blockquote><pre>
INSERT INTO demo_index VALUES(
    1,                   -- Primary key -- SQLite.org headquarters
    -80.7749, -80.7747,  -- Longitude range
    35.3776, 35.3778     -- Latitude range
);
INSERT INTO demo_index VALUES(
    2,                   -- NC 12th Congressional Distrinct in 2010
    -81.0, -79.6,
    35.0, 36.2
);
</pre></blockquote>)^

<p>
The entries above might represent (for example) a bounding box around







|







129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
<blockquote><pre>
INSERT INTO demo_index VALUES(
    1,                   -- Primary key -- SQLite.org headquarters
    -80.7749, -80.7747,  -- Longitude range
    35.3776, 35.3778     -- Latitude range
);
INSERT INTO demo_index VALUES(
    2,                   -- NC 12th Congressional District in 2010
    -81.0, -79.6,
    35.0, 36.2
);
</pre></blockquote>)^

<p>
The entries above might represent (for example) a bounding box around

Changes to pages/spellfix1.in.

211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
...
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
...
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
...
582
583
584
585
586
587
588
589
590
rows.  This value is an integer which is the number of
of words examined using the edit-distance algorithm to
find the top matches that are ultimately displayed.  This
value is for diagnostic use only.

<dt><p><b>soundslike</b><dd>
(HIDDEN)  When inserting vocabulary entries, this field
can be set to an spelling that matches what the word
sounds like.  See the DEALING WITH UNUSUAL AND DIFFICULT
SPELLINGS section below for details.

<dt><p><b>command</b><dd>
(HIDDEN)  The value of the "command" column is always NULL.  However,
applications can insert special strings into the "command" column in order
to provoke certain behaviors in the spellfix1 virtual table.
................................................................................
<dt><p><b>editdist3(P,W)<br>editdist2(P,W,L)<br>editdist3(T)</b><dd>
These routines provide direct access to the version of the Wagner
edit-distance function that allows for application-defined weights
on edit operations.  The first two forms of this function compare
pattern P against word W and return the edit distance.  In the first
function, the langid is assumed to be 0 and in the second, the
langid is given by the L parameter.  The third form of this function
reloads edit distance coefficience from the table named by T.

<dt><p><b>spellfix1_editdist(P,W)</b><dd>
This routine provides access to the built-in Wagner edit-distance
function that uses default, fixed costs.  The value returned is
the edit distance needed to transform W into P.

<dt><p><b>spellfix1_phonehash(X)</b><dd>
................................................................................

<ul>
<li><p>It works with unicode (UTF8) text.

<li><p>A table of insertion, deletion, and substitution costs can be 
       provided by the application.

<li><p>Multi-character insertsions, deletions, and substitutions can be
       enumerated in the cost table.
</ul>

<h2>The editdist3 COST table</h2>

<p>To program the costs of editdist3, create a table such as the following:

<blockquote><pre>
CREATE TABLE editcost(
  iLang INT,   -- The language ID
  cFrom TEXT,  -- Convert text from this
  cTo   TEXT,  -- Convert text into this
  iCost INT    -- The cost of doing the conversionnn
);
</pre></blockquote>

<p>The cost table can be named anything you want - it does not have to be
called "editcost".  And the table can contain additional columns.
The only requirement is that the
table must contain the four columns show above, with exactly the names shown.
................................................................................
with weights and the weight table changes, simply rerun the single-argument
form of editdist3() to reload revised coefficients.  Note that the 
edit distance
weights used by the editdist3() SQL function are independent from the
weights used by the spellfix1 virtual table.

<p>The second and third forms return the computed edit distance between strings
'string1' and "string2'.  In the second form, an language id of 0 is used.
The language id is specified in the third form.







|







 







|







 







|












|







 







|

211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
...
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
...
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
...
582
583
584
585
586
587
588
589
590
rows.  This value is an integer which is the number of
of words examined using the edit-distance algorithm to
find the top matches that are ultimately displayed.  This
value is for diagnostic use only.

<dt><p><b>soundslike</b><dd>
(HIDDEN)  When inserting vocabulary entries, this field
can be set to a spelling that matches what the word
sounds like.  See the DEALING WITH UNUSUAL AND DIFFICULT
SPELLINGS section below for details.

<dt><p><b>command</b><dd>
(HIDDEN)  The value of the "command" column is always NULL.  However,
applications can insert special strings into the "command" column in order
to provoke certain behaviors in the spellfix1 virtual table.
................................................................................
<dt><p><b>editdist3(P,W)<br>editdist2(P,W,L)<br>editdist3(T)</b><dd>
These routines provide direct access to the version of the Wagner
edit-distance function that allows for application-defined weights
on edit operations.  The first two forms of this function compare
pattern P against word W and return the edit distance.  In the first
function, the langid is assumed to be 0 and in the second, the
langid is given by the L parameter.  The third form of this function
reloads edit distance coefficients from the table named by T.

<dt><p><b>spellfix1_editdist(P,W)</b><dd>
This routine provides access to the built-in Wagner edit-distance
function that uses default, fixed costs.  The value returned is
the edit distance needed to transform W into P.

<dt><p><b>spellfix1_phonehash(X)</b><dd>
................................................................................

<ul>
<li><p>It works with unicode (UTF8) text.

<li><p>A table of insertion, deletion, and substitution costs can be 
       provided by the application.

<li><p>Multi-character insertions, deletions, and substitutions can be
       enumerated in the cost table.
</ul>

<h2>The editdist3 COST table</h2>

<p>To program the costs of editdist3, create a table such as the following:

<blockquote><pre>
CREATE TABLE editcost(
  iLang INT,   -- The language ID
  cFrom TEXT,  -- Convert text from this
  cTo   TEXT,  -- Convert text into this
  iCost INT    -- The cost of doing the conversion
);
</pre></blockquote>

<p>The cost table can be named anything you want - it does not have to be
called "editcost".  And the table can contain additional columns.
The only requirement is that the
table must contain the four columns show above, with exactly the names shown.
................................................................................
with weights and the weight table changes, simply rerun the single-argument
form of editdist3() to reload revised coefficients.  Note that the 
edit distance
weights used by the editdist3() SQL function are independent from the
weights used by the spellfix1 virtual table.

<p>The second and third forms return the computed edit distance between strings
'string1' and "string2'.  In the second form, a language id of 0 is used.
The language id is specified in the third form.