Documentation Source Text

Check-in [95c97dad08]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update rbu.in with more details on using RBU with FTS tables and the sqldiff tool.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 95c97dad08e9c12766517b2c0310d8e9c2ab1a6a
User & Date: dan 2015-08-01 18:13:19.759
Context
2015-08-11
15:47
Enhancements to the arguments in favor of using SQLite as an application file format. Added the undoredo.html document. (check-in: 1e1b1d8101 user: drh tags: trunk)
2015-08-01
18:13
Update rbu.in with more details on using RBU with FTS tables and the sqldiff tool. (check-in: 95c97dad08 user: dan tags: trunk)
2015-07-31
18:32
Fix the link to the first mailing list archive. (check-in: 319676d3b8 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/fancyformat.tcl.
274
275
276
277
278
279
280
281
282
283
284


285
286
287
288
289
290
291
      }
      catch { unset D(nonumber) }
      catch { unset D(notoc) }

      # If there is a "tags" attribute, then add an [hd_fragment] command
      # to the output.
      if {[info exists D(tags)]} {
        append G(doc) "<tcl>[list set ::hd(fragment) $D(id)]</tcl>"
        foreach t [split $D(tags) ,] {
          append G(doc) "<tcl>[list hd_keywords [string trim $t]]</tcl>"
        }


        unset D(tags)
      }

      append G(doc) [formattag $tag [array get D]]
      append G(doc) "$HN "
    }








|
|
|
<
>
>







274
275
276
277
278
279
280
281
282
283

284
285
286
287
288
289
290
291
292
      }
      catch { unset D(nonumber) }
      catch { unset D(notoc) }

      # If there is a "tags" attribute, then add an [hd_fragment] command
      # to the output.
      if {[info exists D(tags)]} {
        #append G(doc) "<tcl>[list set ::hd(fragment) $D(id)]</tcl>"
        #foreach t [split $D(tags) ,] {
        #  append G(doc) "<tcl>[list hd_keywords [string trim $t]]</tcl>"

        #}
        append G(doc) "<tcl>[list hd_fragment $D(id) $D(tags)]</tcl>"
        unset D(tags)
      }

      append G(doc) [formattag $tag [array get D]]
      append G(doc) "$HN "
    }

Changes to pages/fts3.in.
1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
</ul>

<p>
  Because the indexed documents themselves are usually much larger than 
  the full-text index, the content option can be used to achieve 
  significant space savings.

<h3> Contentless FTS4 Tables </h3>

<p>
  In order to create an FTS4 table that does not store a copy of the indexed
  documents at all, the content option should be set to an empty string.
  For example, the following SQL creates such an FTS4 table with three
  columns - "a", "b", and "c":








|







1581
1582
1583
1584
1585
1586
1587
1588
1589
1590
1591
1592
1593
1594
1595
</ul>

<p>
  Because the indexed documents themselves are usually much larger than 
  the full-text index, the content option can be used to achieve 
  significant space savings.

<h3 tags="contentless fts4 tables"> Contentless FTS4 Tables </h3>

<p>
  In order to create an FTS4 table that does not store a copy of the indexed
  documents at all, the content option should be set to an empty string.
  For example, the following SQL creates such an FTS4 table with three
  columns - "a", "b", and "c":

1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
<p>
  Errors related to attempting to retrieve column values other than docid
  are runtime errors that occur within sqlite3_step(). In some cases, for
  example if the MATCH expression in a SELECT query matches zero rows, there
  may be no error at all even if a statement does refer to column values 
  other than docid.

<h3> External Content FTS4 Tables </h3>

<p>
  An "external content" FTS4 table is similar to a contentless table, except
  that if evaluation of a query requires the value of a column other than 
  docid, FTS4 attempts to retrieve that value from a table (or view, or 
  virtual table) nominated by the user (hereafter referred to as the "content
  table"). The FTS4 module never writes to the content table, and writing







|







1637
1638
1639
1640
1641
1642
1643
1644
1645
1646
1647
1648
1649
1650
1651
<p>
  Errors related to attempting to retrieve column values other than docid
  are runtime errors that occur within sqlite3_step(). In some cases, for
  example if the MATCH expression in a SELECT query matches zero rows, there
  may be no error at all even if a statement does refer to column values 
  other than docid.

<h3 tags="external content fts4 tables"> External Content FTS4 Tables </h3>

<p>
  An "external content" FTS4 table is similar to a contentless table, except
  that if evaluation of a query requires the value of a column other than 
  docid, FTS4 attempts to retrieve that value from a table (or view, or 
  virtual table) nominated by the user (hereafter referred to as the "content
  table"). The FTS4 module never writes to the content table, and writing
2427
2428
2429
2430
2431
2432
2433

2434
2435
2436
2437
2438
2439
2440
<p>
  This section describes at a high-level the way the FTS module stores its
  index and content in the database. It is <b>not necessary to read or 
  understand the material in this section in order to use FTS</b> in an 
  application. However, it may be useful to application developers attempting 
  to analyze and understand FTS performance characteristics, or to developers 
  contemplating enhancements to the existing FTS feature set.


<tcl>hd_fragment *shadowtab {FTS shadow tables} {shadow tables}</tcl>
<h2 tags="shadowtabs">Shadow Tables</h2>
<p>
  For each FTS virtual table in a database, three to five real (non-virtual) tables
  are created to store the underlying data.  These real tables are called "shadow tables".
  The real tables are named "%_content",







>







2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
<p>
  This section describes at a high-level the way the FTS module stores its
  index and content in the database. It is <b>not necessary to read or 
  understand the material in this section in order to use FTS</b> in an 
  application. However, it may be useful to application developers attempting 
  to analyze and understand FTS performance characteristics, or to developers 
  contemplating enhancements to the existing FTS feature set.
</p>

<tcl>hd_fragment *shadowtab {FTS shadow tables} {shadow tables}</tcl>
<h2 tags="shadowtabs">Shadow Tables</h2>
<p>
  For each FTS virtual table in a database, three to five real (non-virtual) tables
  are created to store the underlying data.  These real tables are called "shadow tables".
  The real tables are named "%_content",
Changes to pages/rbu.in.
76
77
78
79
80
81
82













83
84
85










86
87
88
89
90
91
92
93
94
95
96
97
98
99

<h2>Preparing An RBU Update File</h2>

<p>All changes to be applied by RBU are stored in a separate SQLite database
called the "RBU database".  The database that is to be modified is called
the "target database".














<p>
For each table in the target database, the RBU database should contain a table
named "data_&lt;<i>target-table-name</i>&gt;" with 










the all the same columns as the
target table, plus one additional column named "rbu_control".
The data_% table should have no PRIMARY KEY or UNIQUE constraints, but
each column should have the same type as the corresponding column in
the target database.
The rbu_control column should have no type at all. For example, if
the target database contains:

<tcl>CODE {
CREATE TABLE t1(a INTEGER PRIMARY KEY, b TEXT, c UNIQUE);
}</tcl>

<p>Then the RBU database should contain:








>
>
>
>
>
>
>
>
>
>
>
>
>


|
>
>
>
>
>
>
>
>
>
>
|
|
|
|
|
<
|







76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113

114
115
116
117
118
119
120
121

<h2>Preparing An RBU Update File</h2>

<p>All changes to be applied by RBU are stored in a separate SQLite database
called the "RBU database".  The database that is to be modified is called
the "target database".

<p>For each table in the target database that will be modified by the update,
a corresponding table is created within the RBU database. The RBU database
table schema is not the same as that of the target database, but is derived
from it as [RBU Database Tables | described below].

<p>The RBU database table contains a single row for each target database 
row inserted, updated or deleted by the update. Populating the RBU database
tables is described in [RBU Database Contents | the following section].
</p>

<tcl>hd_fragment database_tables {RBU Database Tables}</tcl>
<h3>The RBU Database Schema</h3>

<p>
For each table in the target database, the RBU database should contain a table
named "data&lt;<i>integer</i>&gt;_&lt;<i>target-table-name</i>&gt;" where
&lt;<i>target-table-name</i>&gt; is the name of the table in the target
database and &lt;<i>integer</i>&gt; is any sequence of zero or more numeric
characters (0-9). Tables within the RBU database are processed in order by 
name (from smallest to largest according to the BINARY collation sequence),
so the order in which target tables are updated is influenced by the selection 
of the &lt;<i>integer</i>&gt; portion of the data_% table name. While this can
be useful when using RBU to update 
[RBU FTS3/4 Tables | certain types of virtual tables], there is normally no
reason to use anything other than an empty string in place of
&lt;<i>integer</i>&gt;.

<p>The data_% table must have all the same columns as the target table, plus
one additional column named "rbu_control". The data_% table should have no
PRIMARY KEY or UNIQUE constraints, but each column should have the same type as
the corresponding column in the target database. The rbu_control column should

have no type at all. For example, if the target database contains:

<tcl>CODE {
CREATE TABLE t1(a INTEGER PRIMARY KEY, b TEXT, c UNIQUE);
}</tcl>

<p>Then the RBU database should contain:

135
136
137
138
139
140
141



142
143
144
145
146
147
148

<p>Either of the following input table schemas may be used:

<tcl>CODE {
CREATE TABLE data_ft1(a, b, langid, rbu_rowid, rbu_control);
CREATE TABLE data_ft1(a, b, rbu_rowid, rbu_control);
}</tcl>




<p>For each row to INSERT into the target database as part of the RBU 
update, the corresponding data_% table should contain a single record
with the "rbu_control" column set to contain integer value 0. The
other columns should be set to the values that make up the new record 
to insert. 








>
>
>







157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173

<p>Either of the following input table schemas may be used:

<tcl>CODE {
CREATE TABLE data_ft1(a, b, langid, rbu_rowid, rbu_control);
CREATE TABLE data_ft1(a, b, rbu_rowid, rbu_control);
}</tcl>

<tcl>hd_fragment database_contents {RBU Database Contents}</tcl>
<h3>RBU Database Contents</h3>

<p>For each row to INSERT into the target database as part of the RBU 
update, the corresponding data_% table should contain a single record
with the "rbu_control" column set to contain integer value 0. The
other columns should be set to the values that make up the new record 
to insert. 

177
178
179
180
181
182
183




184



185


186










187
188
189




190

191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206

<p>is represented by the data_t1 row created by:

<tcl>CODE {
INSERT INTO data_t1(a, b, c, rbu_control) VALUES(4, NULL, 'usa', '..x');
}</tcl>





<p>Instead of an 'x' character, characters of the rbu_control value specified



for UPDATEs may also be set to 'd'. In this case, instead of updating the


target table with the value stored in the corresponding data_% column, the










user-defined SQL function "rbu_delta()" is invoked and the result stored in
the target table column. rbu_delta() is invoked with two arguments - the
original value currently stored in the target table column and the 




value specified in the data_xxx table.


<p>For example, this row:

<tcl>CODE {
INSERT INTO data_t1(a, b, c, rbu_control) VALUES(4, NULL, 'usa', '..d');
}</tcl>


<p>is similar to an UPDATE statement such as: 

<tcl>CODE {
UPDATE t1 SET c = rbu_delta(c, 'usa') WHERE a = 4;
}</tcl>

<p>If the target database table is a virtual table or a table with no PRIMARY
KEY, the rbu_control value should not include a character corresponding 







>
>
>
>
|
>
>
>
|
>
>
|
>
>
>
>
>
>
>
>
>
>
|
|
|
>
>
>
>
|
>







|
<







202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247

248
249
250
251
252
253
254

<p>is represented by the data_t1 row created by:

<tcl>CODE {
INSERT INTO data_t1(a, b, c, rbu_control) VALUES(4, NULL, 'usa', '..x');
}</tcl>

<p>If RBU is used to update a large BLOB value within a target database, it
may be be more efficient to store a patch or delta that can be used to modify
the existing BLOB instead of an entirely new value within the RBU database. 
RBU allows deltas to be specified in two ways:

<ul>
  <li> In the "fossil delta" format - the format used for blob deltas by the 
  <a href=http://fossil-scm.org>Fossil source-code management system</a>, or

  <li> In a custom format defined by the RBU application.
</ul>

<p> The fossil delta format may only be used to update BLOB values. Instead
of storing the new BLOB within the data_% table, the fossil delta is stored
instead. And instead of specifying an 'x' as part of the ota_control string
for the column to be updated, an 'f' character is stored. When processing
an 'f' update, RBU loads the original BLOB data from disk, applies the fossil
delta to it and stores the results back into the database file. The RBU
databases generated by [sqldiff --rbu] make use of fossil deltas wherever
doing so would save space in the RBU database.

<p> To use a custom delta format, the RBU application must register a
user-defined SQL function named "rbu_delta" before beginning to process the
update. rbu_delta() will be invoked with two arguments - the original value
stored in the target table column and the delta value provided as part of
the RBU update. It should return the result of applying the delta to the
original value. To use the custom delta function, the character of the
rbu_control value corresponding to the target column to update must be
set to 'd' instead of 'x'. Then, instead of updating the target table with the
value stored in the corresponding data_% column, RBU invokes the user-defined
SQL function "rbu_delta()" and the store in the target table column.

<p>For example, this row:

<tcl>CODE {
INSERT INTO data_t1(a, b, c, rbu_control) VALUES(4, NULL, 'usa', '..d');
}</tcl>

<p>causes RBU to update the target database table in a way similar to:


<tcl>CODE {
UPDATE t1 SET c = rbu_delta(c, 'usa') WHERE a = 4;
}</tcl>

<p>If the target database table is a virtual table or a table with no PRIMARY
KEY, the rbu_control value should not include a character corresponding 
214
215
216
217
218
219
220
221
222
223
224
225
226































































































































227
228
229
230
231
232
233

<p>causes a result similar to:

<tcl>CODE {
UPDATE ft1 SET b = 'usa' WHERE rowid = 12;
}</tcl>

<p>The data_xxx tables themselves should have no PRIMARY KEY declarations.
However, RBU is more efficient if reading the rows in from each data_xxx
table in "rowid" order is roughly the same as reading them sorted by
the PRIMARY KEY of the corresponding target database table. In other 
words, rows should be sorted using the destination table PRIMARY KEY 
fields before they are inserted into the data_xxx tables.
































































































































<h2>C/C++ Interface</h2>

<p>Enable the RBU extension by compiling the [amalgamation] with the
[SQLITE_ENABLE_RBU] compile-time option.

<p>The RBU extension interface allows an application to apply an RBU update 







|
|



|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408

<p>causes a result similar to:

<tcl>CODE {
UPDATE ft1 SET b = 'usa' WHERE rowid = 12;
}</tcl>

<p>The data_% tables themselves should have no PRIMARY KEY declarations.
However, RBU is more efficient if reading the rows in from each data_%
table in "rowid" order is roughly the same as reading them sorted by
the PRIMARY KEY of the corresponding target database table. In other 
words, rows should be sorted using the destination table PRIMARY KEY 
fields before they are inserted into the data_% tables.

<tcl>hd_fragment fts4_tables {RBU FTS3/4 Tables}</tcl>
<h3>Using RBU with FTS3/4 Tables</h3>

<p>Usually, an [FTS3 | FTS3 or FTS4] table is an example of a virtual table 
with a rowid that works like a PRIMARY KEY. So, for the following FTS4 tables:

<tcl>CODE {
CREATE VIRTUAL TABLE ft1 USING fts4(addr, text);
CREATE VIRTUAL TABLE ft2 USING fts4;             -- implicit "content" column
}</tcl>

<p>The data_% tables may be created as follows:

<tcl>CODE {
CREATE TABLE data_ft1 USING fts4(addr, text, rbu_rowid, rbu_control);
CREATE TABLE data_ft2 USING fts4(content, rbu_rowid, rbu_control);
}</tcl>

<p>And populated as if the target table were an ordinary SQLite table with no
explicit PRIMARY KEY columns.

<p>[contentless fts4 tables | Contentless FTS4 tables] are handled similarly,
except that any attempt to update or delete rows will cause an error when
applying the update.

<p>[external content fts4 tables | External content FTS4 tables] may also be 
updated using RBU. In this case the user is required to configure the RBU
database so that the same set of UPDATE, DELETE and INSERT operations are
applied to the FTS4 index as to the underlying content table. As for all
updates of external content FTS4 tables, the user is also required to ensure
that any UPDATE or DELETE operations are applied to the FTS4 index before
they are applied to the underlying content table (refer to FTS4 documentation
for a detailed explanation). In RBU, this is done by ensuring that the name
of the data_% table used to write to the FTS4 table sorts before the name
of the data_% table used to update the underlying content table using the
[BINARY] collation sequence. In order to avoid duplicating data within the
RBU database, an SQL view may be used in place of one of the data_% tables.
For example, for the target database schema:

<tcl>CODE {
  CREATE TABLE ccc(addr, text);
  CREATE VIRTUAL TABLE ccc_fts USING fts4(addr, text, content=ccc);
}</tcl>

<p>
  The following RBU database schema may be used: 

<tcl>CODE {
  CREATE TABLE data_ccc(addr, text, rbu_rowid, rbu_control);
  CREATE VIEW data0_ccc_fts AS SELECT * FROM data_ccc;
}</tcl>
 
<p>
  The data_ccc table may then be populated as normal with the updates intended
  for target database table ccc. The same updates will be read by RBU from
  the data0_ccc_fts view and applied to FTS table ccc_fts. Because
  "data0_ccc_fts" is smalle than "data_ccc", the FTS table will be updated
  first, as required.

<p>
  Cases in which the underlying content table has an explicit INTEGER PRIMARY
  KEY column are slightly more difficult, as the text values stored in the
  ota_control column are slightly different for the FTS index and its
  underlying content table. For the underlying content table, a character
  must be included in any ota_control text values for the explicit IPK, but
  for the FTS table itself, which has an implicit rowid, it should not. This
  is inconvenient, but can be solved using a more complicated view, as follows:

<tcl>CODE {
  -- Target database schema
  CREATE TABLE ddd(i INTEGER PRIMARY KEY, k TEXT);
  CREATE VIRTUAL TABLE ddd_fts USING fts4(k, content=ddd);

  -- RBU database schema
  CREATE TABLE data_ccc(i, k, rbu_control);
  CREATE VIEW data0_ccc_fts AS SELECT i AS rbu_rowid, k, CASE 
    WHEN rbu_control IN (0,1) THEN rbu_control ELSE substr(rbu_control, 2) END
  FROM data_ccc;
}</tcl>

<p>
  The substr() function in the SQL view above returns the text of the
  rbu_control argument with the first character (the one corresponding to
  column "i", which is not required by the FTS table) removed.

<tcl>hd_fragment sqldiff {sqldiff --rbu}</tcl>
<h3>Automatically Generating RBU Updates with sqldiff</h3>

<p>
  As of SQLite version 3.8.12, the [sqldiff] utility is able to generate
  RBU databases representing the difference between two databases with
  identical schemas. For example, the following command:

<tcl>CODE {
sqldiff --rbu t1.db t2.db
}</tcl>

<p>
  Outputs an SQL script to create an RBU database which, if used to update
  database t1.db, patches it so that its contents are identical to that of
  database t2.db.

<p>
  By default, sqldiff attempts to process all non-virtual tables within
  the two databases provided to it. If any table appears in one database
  but not the other, or if any table has a slightly different schema in
  one database it is an error. The "--table" option may be useful if this
  causes a problem
  
<p>
  Virtual tables are ignored by default by sqldiff. However, it is possible 
  to explicitly create an RBU data_% table for a virtual table that features
  a rowid that functions like a primary key using a command such as:

<tcl>CODE {
sqldiff --rbu --table &lt;<i>virtual-table-name</i>&gt; t1.db t2.db
}</tcl>

<p>
  Unfortunately, even though virtual tables are ignored by default, any
  [FTS shadow tables | underlying database tables] that they create in order to
  store data within the database are not, and [sqldiff] will include add these
  to any RBU database. For this reason, users attempting to use sqldiff to
  create RBU updates to apply to target databases with one or more virtual
  tables will likely have to run sqldiff using the --table option separately
  for each table to update in the target database.

<h2>C/C++ Interface</h2>

<p>Enable the RBU extension by compiling the [amalgamation] with the
[SQLITE_ENABLE_RBU] compile-time option.

<p>The RBU extension interface allows an application to apply an RBU update 
279
280
281
282
283
284
285




<p>If an update is only partially applied to the target database by the
time sqlite3rbu_close() is called, state information is saved 
within the state database if it exists, or otherwise in the RBU database. 
This allows subsequent processes to automatically
resume the RBU update from where it left off.
If state information is stored in the RBU database, it can be removed
by dropping all tables whose names begin with "rbu_".











>
>
>
>
454
455
456
457
458
459
460
461
462
463
464
<p>If an update is only partially applied to the target database by the
time sqlite3rbu_close() is called, state information is saved 
within the state database if it exists, or otherwise in the RBU database. 
This allows subsequent processes to automatically
resume the RBU update from where it left off.
If state information is stored in the RBU database, it can be removed
by dropping all tables whose names begin with "rbu_".

<p>For more details, refer to the comments in 
<a href=http://sqlite.org/src/doc/trunk/ext/rbu/sqlite3rbu.h>header file
sqlite3ota.h<a>.