SQLite
Artifact Content
Not logged in

Artifact d7cc99350c10134f358fe1a8997d9225b3f712b2:


     1  /*
     2  ** 2014 August 30
     3  **
     4  ** The author disclaims copyright to this source code.  In place of
     5  ** a legal notice, here is a blessing:
     6  **
     7  **    May you do good and not evil.
     8  **    May you find forgiveness for yourself and forgive others.
     9  **    May you share freely, never taking more than you give.
    10  **
    11  *************************************************************************
    12  **
    13  ** This file contains the public interface for the RBU extension. 
    14  */
    15  
    16  /*
    17  ** SUMMARY
    18  **
    19  ** Writing a transaction containing a large number of operations on 
    20  ** b-tree indexes that are collectively larger than the available cache
    21  ** memory can be very inefficient. 
    22  **
    23  ** The problem is that in order to update a b-tree, the leaf page (at least)
    24  ** containing the entry being inserted or deleted must be modified. If the
    25  ** working set of leaves is larger than the available cache memory, then a 
    26  ** single leaf that is modified more than once as part of the transaction 
    27  ** may be loaded from or written to the persistent media multiple times.
    28  ** Additionally, because the index updates are likely to be applied in
    29  ** random order, access to pages within the database is also likely to be in 
    30  ** random order, which is itself quite inefficient.
    31  **
    32  ** One way to improve the situation is to sort the operations on each index
    33  ** by index key before applying them to the b-tree. This leads to an IO
    34  ** pattern that resembles a single linear scan through the index b-tree,
    35  ** and all but guarantees each modified leaf page is loaded and stored 
    36  ** exactly once. SQLite uses this trick to improve the performance of
    37  ** CREATE INDEX commands. This extension allows it to be used to improve
    38  ** the performance of large transactions on existing databases.
    39  **
    40  ** Additionally, this extension allows the work involved in writing the 
    41  ** large transaction to be broken down into sub-transactions performed 
    42  ** sequentially by separate processes. This is useful if the system cannot 
    43  ** guarantee that a single update process will run for long enough to apply 
    44  ** the entire update, for example because the update is being applied on a 
    45  ** mobile device that is frequently rebooted. Even after the writer process 
    46  ** has committed one or more sub-transactions, other database clients continue
    47  ** to read from the original database snapshot. In other words, partially 
    48  ** applied transactions are not visible to other clients. 
    49  **
    50  ** "RBU" stands for "Resumable Bulk Update". As in a large database update
    51  ** transmitted via a wireless network to a mobile device. A transaction
    52  ** applied using this extension is hence refered to as an "RBU update".
    53  **
    54  **
    55  ** LIMITATIONS
    56  **
    57  ** An "RBU update" transaction is subject to the following limitations:
    58  **
    59  **   * The transaction must consist of INSERT, UPDATE and DELETE operations
    60  **     only.
    61  **
    62  **   * INSERT statements may not use any default values.
    63  **
    64  **   * UPDATE and DELETE statements must identify their target rows by 
    65  **     non-NULL PRIMARY KEY values. Rows with NULL values stored in PRIMARY
    66  **     KEY fields may not be updated or deleted. If the table being written 
    67  **     has no PRIMARY KEY, affected rows must be identified by rowid.
    68  **
    69  **   * UPDATE statements may not modify PRIMARY KEY columns.
    70  **
    71  **   * No triggers will be fired.
    72  **
    73  **   * No foreign key violations are detected or reported.
    74  **
    75  **   * CHECK constraints are not enforced.
    76  **
    77  **   * No constraint handling mode except for "OR ROLLBACK" is supported.
    78  **
    79  **
    80  ** PREPARATION
    81  **
    82  ** An "RBU update" is stored as a separate SQLite database. A database
    83  ** containing an RBU update is an "RBU database". For each table in the 
    84  ** target database to be updated, the RBU database should contain a table
    85  ** named "data_<target name>" containing the same set of columns as the
    86  ** target table, and one more - "rbu_control". The data_% table should 
    87  ** have no PRIMARY KEY or UNIQUE constraints, but each column should have
    88  ** the same type as the corresponding column in the target database.
    89  ** The "rbu_control" column should have no type at all. For example, if
    90  ** the target database contains:
    91  **
    92  **   CREATE TABLE t1(a INTEGER PRIMARY KEY, b TEXT, c UNIQUE);
    93  **
    94  ** Then the RBU database should contain:
    95  **
    96  **   CREATE TABLE data_t1(a INTEGER, b TEXT, c, rbu_control);
    97  **
    98  ** The order of the columns in the data_% table does not matter.
    99  **
   100  ** Instead of a regular table, the RBU database may also contain virtual
   101  ** tables or view named using the data_<target> naming scheme. 
   102  **
   103  ** Instead of the plain data_<target> naming scheme, RBU database tables 
   104  ** may also be named data<integer>_<target>, where <integer> is any sequence
   105  ** of zero or more numeric characters (0-9). This can be significant because
   106  ** tables within the RBU database are always processed in order sorted by 
   107  ** name. By judicious selection of the the <integer> portion of the names
   108  ** of the RBU tables the user can therefore control the order in which they
   109  ** are processed. This can be useful, for example, to ensure that "external
   110  ** content" FTS4 tables are updated before their underlying content tables.
   111  **
   112  ** If the target database table is a virtual table or a table that has no
   113  ** PRIMARY KEY declaration, the data_% table must also contain a column 
   114  ** named "rbu_rowid". This column is mapped to the tables implicit primary 
   115  ** key column - "rowid". Virtual tables for which the "rowid" column does 
   116  ** not function like a primary key value cannot be updated using RBU. For 
   117  ** example, if the target db contains either of the following:
   118  **
   119  **   CREATE VIRTUAL TABLE x1 USING fts3(a, b);
   120  **   CREATE TABLE x1(a, b)
   121  **
   122  ** then the RBU database should contain:
   123  **
   124  **   CREATE TABLE data_x1(a, b, rbu_rowid, rbu_control);
   125  **
   126  ** All non-hidden columns (i.e. all columns matched by "SELECT *") of the
   127  ** target table must be present in the input table. For virtual tables,
   128  ** hidden columns are optional - they are updated by RBU if present in
   129  ** the input table, or not otherwise. For example, to write to an fts4
   130  ** table with a hidden languageid column such as:
   131  **
   132  **   CREATE VIRTUAL TABLE ft1 USING fts4(a, b, languageid='langid');
   133  **
   134  ** Either of the following input table schemas may be used:
   135  **
   136  **   CREATE TABLE data_ft1(a, b, langid, rbu_rowid, rbu_control);
   137  **   CREATE TABLE data_ft1(a, b, rbu_rowid, rbu_control);
   138  **
   139  ** For each row to INSERT into the target database as part of the RBU 
   140  ** update, the corresponding data_% table should contain a single record
   141  ** with the "rbu_control" column set to contain integer value 0. The
   142  ** other columns should be set to the values that make up the new record 
   143  ** to insert. 
   144  **
   145  ** If the target database table has an INTEGER PRIMARY KEY, it is not 
   146  ** possible to insert a NULL value into the IPK column. Attempting to 
   147  ** do so results in an SQLITE_MISMATCH error.
   148  **
   149  ** For each row to DELETE from the target database as part of the RBU 
   150  ** update, the corresponding data_% table should contain a single record
   151  ** with the "rbu_control" column set to contain integer value 1. The
   152  ** real primary key values of the row to delete should be stored in the
   153  ** corresponding columns of the data_% table. The values stored in the
   154  ** other columns are not used.
   155  **
   156  ** For each row to UPDATE from the target database as part of the RBU 
   157  ** update, the corresponding data_% table should contain a single record
   158  ** with the "rbu_control" column set to contain a value of type text.
   159  ** The real primary key values identifying the row to update should be 
   160  ** stored in the corresponding columns of the data_% table row, as should
   161  ** the new values of all columns being update. The text value in the 
   162  ** "rbu_control" column must contain the same number of characters as
   163  ** there are columns in the target database table, and must consist entirely
   164  ** of 'x' and '.' characters (or in some special cases 'd' - see below). For 
   165  ** each column that is being updated, the corresponding character is set to
   166  ** 'x'. For those that remain as they are, the corresponding character of the
   167  ** rbu_control value should be set to '.'. For example, given the tables 
   168  ** above, the update statement:
   169  **
   170  **   UPDATE t1 SET c = 'usa' WHERE a = 4;
   171  **
   172  ** is represented by the data_t1 row created by:
   173  **
   174  **   INSERT INTO data_t1(a, b, c, rbu_control) VALUES(4, NULL, 'usa', '..x');
   175  **
   176  ** Instead of an 'x' character, characters of the rbu_control value specified
   177  ** for UPDATEs may also be set to 'd'. In this case, instead of updating the
   178  ** target table with the value stored in the corresponding data_% column, the
   179  ** user-defined SQL function "rbu_delta()" is invoked and the result stored in
   180  ** the target table column. rbu_delta() is invoked with two arguments - the
   181  ** original value currently stored in the target table column and the 
   182  ** value specified in the data_xxx table.
   183  **
   184  ** For example, this row:
   185  **
   186  **   INSERT INTO data_t1(a, b, c, rbu_control) VALUES(4, NULL, 'usa', '..d');
   187  **
   188  ** is similar to an UPDATE statement such as: 
   189  **
   190  **   UPDATE t1 SET c = rbu_delta(c, 'usa') WHERE a = 4;
   191  **
   192  ** Finally, if an 'f' character appears in place of a 'd' or 's' in an 
   193  ** ota_control string, the contents of the data_xxx table column is assumed
   194  ** to be a "fossil delta" - a patch to be applied to a blob value in the
   195  ** format used by the fossil source-code management system. In this case
   196  ** the existing value within the target database table must be of type BLOB. 
   197  ** It is replaced by the result of applying the specified fossil delta to
   198  ** itself.
   199  **
   200  ** If the target database table is a virtual table or a table with no PRIMARY
   201  ** KEY, the rbu_control value should not include a character corresponding 
   202  ** to the rbu_rowid value. For example, this:
   203  **
   204  **   INSERT INTO data_ft1(a, b, rbu_rowid, rbu_control) 
   205  **       VALUES(NULL, 'usa', 12, '.x');
   206  **
   207  ** causes a result similar to:
   208  **
   209  **   UPDATE ft1 SET b = 'usa' WHERE rowid = 12;
   210  **
   211  ** The data_xxx tables themselves should have no PRIMARY KEY declarations.
   212  ** However, RBU is more efficient if reading the rows in from each data_xxx
   213  ** table in "rowid" order is roughly the same as reading them sorted by
   214  ** the PRIMARY KEY of the corresponding target database table. In other 
   215  ** words, rows should be sorted using the destination table PRIMARY KEY 
   216  ** fields before they are inserted into the data_xxx tables.
   217  **
   218  ** USAGE
   219  **
   220  ** The API declared below allows an application to apply an RBU update 
   221  ** stored on disk to an existing target database. Essentially, the 
   222  ** application:
   223  **
   224  **     1) Opens an RBU handle using the sqlite3rbu_open() function.
   225  **
   226  **     2) Registers any required virtual table modules with the database
   227  **        handle returned by sqlite3rbu_db(). Also, if required, register
   228  **        the rbu_delta() implementation.
   229  **
   230  **     3) Calls the sqlite3rbu_step() function one or more times on
   231  **        the new handle. Each call to sqlite3rbu_step() performs a single
   232  **        b-tree operation, so thousands of calls may be required to apply 
   233  **        a complete update.
   234  **
   235  **     4) Calls sqlite3rbu_close() to close the RBU update handle. If
   236  **        sqlite3rbu_step() has been called enough times to completely
   237  **        apply the update to the target database, then the RBU database
   238  **        is marked as fully applied. Otherwise, the state of the RBU 
   239  **        update application is saved in the RBU database for later 
   240  **        resumption.
   241  **
   242  ** See comments below for more detail on APIs.
   243  **
   244  ** If an update is only partially applied to the target database by the
   245  ** time sqlite3rbu_close() is called, various state information is saved 
   246  ** within the RBU database. This allows subsequent processes to automatically
   247  ** resume the RBU update from where it left off.
   248  **
   249  ** To remove all RBU extension state information, returning an RBU database 
   250  ** to its original contents, it is sufficient to drop all tables that begin
   251  ** with the prefix "rbu_"
   252  **
   253  ** DATABASE LOCKING
   254  **
   255  ** An RBU update may not be applied to a database in WAL mode. Attempting
   256  ** to do so is an error (SQLITE_ERROR).
   257  **
   258  ** While an RBU handle is open, a SHARED lock may be held on the target
   259  ** database file. This means it is possible for other clients to read the
   260  ** database, but not to write it.
   261  **
   262  ** If an RBU update is started and then suspended before it is completed,
   263  ** then an external client writes to the database, then attempting to resume
   264  ** the suspended RBU update is also an error (SQLITE_BUSY).
   265  */
   266  
   267  #ifndef _SQLITE3RBU_H
   268  #define _SQLITE3RBU_H
   269  
   270  #include "sqlite3.h"              /* Required for error code definitions */
   271  
   272  #ifdef __cplusplus
   273  extern "C" {
   274  #endif
   275  
   276  typedef struct sqlite3rbu sqlite3rbu;
   277  
   278  /*
   279  ** Open an RBU handle.
   280  **
   281  ** Argument zTarget is the path to the target database. Argument zRbu is
   282  ** the path to the RBU database. Each call to this function must be matched
   283  ** by a call to sqlite3rbu_close(). When opening the databases, RBU passes
   284  ** the SQLITE_CONFIG_URI flag to sqlite3_open_v2(). So if either zTarget
   285  ** or zRbu begin with "file:", it will be interpreted as an SQLite 
   286  ** database URI, not a regular file name.
   287  **
   288  ** If the zState argument is passed a NULL value, the RBU extension stores 
   289  ** the current state of the update (how many rows have been updated, which 
   290  ** indexes are yet to be updated etc.) within the RBU database itself. This
   291  ** can be convenient, as it means that the RBU application does not need to
   292  ** organize removing a separate state file after the update is concluded. 
   293  ** Or, if zState is non-NULL, it must be a path to a database file in which 
   294  ** the RBU extension can store the state of the update.
   295  **
   296  ** When resuming an RBU update, the zState argument must be passed the same
   297  ** value as when the RBU update was started.
   298  **
   299  ** Once the RBU update is finished, the RBU extension does not 
   300  ** automatically remove any zState database file, even if it created it.
   301  **
   302  ** By default, RBU uses the default VFS to access the files on disk. To
   303  ** use a VFS other than the default, an SQLite "file:" URI containing a
   304  ** "vfs=..." option may be passed as the zTarget option.
   305  **
   306  ** IMPORTANT NOTE FOR ZIPVFS USERS: The RBU extension works with all of
   307  ** SQLite's built-in VFSs, including the multiplexor VFS. However it does
   308  ** not work out of the box with zipvfs. Refer to the comment describing
   309  ** the zipvfs_create_vfs() API below for details on using RBU with zipvfs.
   310  */
   311  sqlite3rbu *sqlite3rbu_open(
   312    const char *zTarget, 
   313    const char *zRbu,
   314    const char *zState
   315  );
   316  
   317  /*
   318  ** Internally, each RBU connection uses a separate SQLite database 
   319  ** connection to access the target and rbu update databases. This
   320  ** API allows the application direct access to these database handles.
   321  **
   322  ** The first argument passed to this function must be a valid, open, RBU
   323  ** handle. The second argument should be passed zero to access the target
   324  ** database handle, or non-zero to access the rbu update database handle.
   325  ** Accessing the underlying database handles may be useful in the
   326  ** following scenarios:
   327  **
   328  **   * If any target tables are virtual tables, it may be necessary to
   329  **     call sqlite3_create_module() on the target database handle to 
   330  **     register the required virtual table implementations.
   331  **
   332  **   * If the data_xxx tables in the RBU source database are virtual 
   333  **     tables, the application may need to call sqlite3_create_module() on
   334  **     the rbu update db handle to any required virtual table
   335  **     implementations.
   336  **
   337  **   * If the application uses the "rbu_delta()" feature described above,
   338  **     it must use sqlite3_create_function() or similar to register the
   339  **     rbu_delta() implementation with the target database handle.
   340  **
   341  ** If an error has occurred, either while opening or stepping the RBU object,
   342  ** this function may return NULL. The error code and message may be collected
   343  ** when sqlite3rbu_close() is called.
   344  **
   345  ** Database handles returned by this function remain valid until the next
   346  ** call to any sqlite3rbu_xxx() function other than sqlite3rbu_db().
   347  */
   348  sqlite3 *sqlite3rbu_db(sqlite3rbu*, int bRbu);
   349  
   350  /*
   351  ** Do some work towards applying the RBU update to the target db. 
   352  **
   353  ** Return SQLITE_DONE if the update has been completely applied, or 
   354  ** SQLITE_OK if no error occurs but there remains work to do to apply
   355  ** the RBU update. If an error does occur, some other error code is 
   356  ** returned. 
   357  **
   358  ** Once a call to sqlite3rbu_step() has returned a value other than
   359  ** SQLITE_OK, all subsequent calls on the same RBU handle are no-ops
   360  ** that immediately return the same value.
   361  */
   362  int sqlite3rbu_step(sqlite3rbu *pRbu);
   363  
   364  /*
   365  ** Force RBU to save its state to disk.
   366  **
   367  ** If a power failure or application crash occurs during an update, following
   368  ** system recovery RBU may resume the update from the point at which the state
   369  ** was last saved. In other words, from the most recent successful call to 
   370  ** sqlite3rbu_close() or this function.
   371  **
   372  ** SQLITE_OK is returned if successful, or an SQLite error code otherwise.
   373  */
   374  int sqlite3rbu_savestate(sqlite3rbu *pRbu);
   375  
   376  /*
   377  ** Close an RBU handle. 
   378  **
   379  ** If the RBU update has been completely applied, mark the RBU database
   380  ** as fully applied. Otherwise, assuming no error has occurred, save the
   381  ** current state of the RBU update appliation to the RBU database.
   382  **
   383  ** If an error has already occurred as part of an sqlite3rbu_step()
   384  ** or sqlite3rbu_open() call, or if one occurs within this function, an
   385  ** SQLite error code is returned. Additionally, *pzErrmsg may be set to
   386  ** point to a buffer containing a utf-8 formatted English language error
   387  ** message. It is the responsibility of the caller to eventually free any 
   388  ** such buffer using sqlite3_free().
   389  **
   390  ** Otherwise, if no error occurs, this function returns SQLITE_OK if the
   391  ** update has been partially applied, or SQLITE_DONE if it has been 
   392  ** completely applied.
   393  */
   394  int sqlite3rbu_close(sqlite3rbu *pRbu, char **pzErrmsg);
   395  
   396  /*
   397  ** Return the total number of key-value operations (inserts, deletes or 
   398  ** updates) that have been performed on the target database since the
   399  ** current RBU update was started.
   400  */
   401  sqlite3_int64 sqlite3rbu_progress(sqlite3rbu *pRbu);
   402  
403 /* 404 ** Obtain permyriadage (permyriadage is to 10000 as percentage is to 100) 405 ** progress indications for the two stages of an RBU update. This API may 406 ** be useful for driving GUI progress indicators and similar. 407 ** 408 ** An RBU update is divided into two stages: 409 ** 410 ** * Stage 1, in which changes are accumulated in an oal/wal file, and 411 ** * Stage 2, in which the contents of the wal file are copied into the 412 ** main database. 413 ** 414 ** The update is visible to non-RBU clients during stage 2. During stage 1 415 ** non-RBU reader clients may see the original database. 416 ** 417 ** If this API is called during stage 2 of the update, output variable 418 ** (*pnOne) is set to 10000 to indicate that stage 1 has finished and (*pnTwo) 419 ** to a value between 0 and 10000 to indicate the permyriadage progress of 420 ** stage 2. A value of 5000 indicates that stage 2 is half finished, 421 ** 9000 indicates that it is 90% finished, and so on. 422 ** 423 ** If this API is called during stage 1 of the update, output variable 424 ** (*pnTwo) is set to 0 to indicate that stage 2 has not yet started. The 425 ** value to which (*pnOne) is set depends on whether or not the RBU 426 ** database contains an "rbu_count" table. The rbu_count table, if it 427 ** exists, must contain the same columns as the following: 428 ** 429 ** CREATE TABLE rbu_count(tbl TEXT PRIMARY KEY, cnt INTEGER) WITHOUT ROWID; 430 ** 431 ** There must be one row in the table for each source (data_xxx) table within 432 ** the RBU database. The 'tbl' column should contain the name of the source 433 ** table. The 'cnt' column should contain the number of rows within the 434 ** source table. 435 ** 436 ** If the rbu_count table is present and populated correctly and this 437 ** API is called during stage 1, the *pnOne output variable is set to the 438 ** permyriadage progress of the same stage. If the rbu_count table does 439 ** not exist, then (*pnOne) is set to -1 during stage 1. If the rbu_count 440 ** table exists but is not correctly populated, the value of the *pnOne 441 ** output variable during stage 1 is undefined. 442 */ 443 void sqlite3rbu_bp_progress(sqlite3rbu *pRbu, int *pnOne, int *pnTwo);
444 445 /* 446 ** Create an RBU VFS named zName that accesses the underlying file-system 447 ** via existing VFS zParent. Or, if the zParent parameter is passed NULL, 448 ** then the new RBU VFS uses the default system VFS to access the file-system. 449 ** The new object is registered as a non-default VFS with SQLite before 450 ** returning. 451 ** 452 ** Part of the RBU implementation uses a custom VFS object. Usually, this 453 ** object is created and deleted automatically by RBU. 454 ** 455 ** The exception is for applications that also use zipvfs. In this case, 456 ** the custom VFS must be explicitly created by the user before the RBU 457 ** handle is opened. The RBU VFS should be installed so that the zipvfs 458 ** VFS uses the RBU VFS, which in turn uses any other VFS layers in use 459 ** (for example multiplexor) to access the file-system. For example, 460 ** to assemble an RBU enabled VFS stack that uses both zipvfs and 461 ** multiplexor (error checking omitted): 462 ** 463 ** // Create a VFS named "multiplex" (not the default). 464 ** sqlite3_multiplex_initialize(0, 0); 465 ** 466 ** // Create an rbu VFS named "rbu" that uses multiplexor. If the 467 ** // second argument were replaced with NULL, the "rbu" VFS would 468 ** // access the file-system via the system default VFS, bypassing the 469 ** // multiplexor. 470 ** sqlite3rbu_create_vfs("rbu", "multiplex"); 471 ** 472 ** // Create a zipvfs VFS named "zipvfs" that uses rbu. 473 ** zipvfs_create_vfs_v3("zipvfs", "rbu", 0, xCompressorAlgorithmDetector); 474 ** 475 ** // Make zipvfs the default VFS. 476 ** sqlite3_vfs_register(sqlite3_vfs_find("zipvfs"), 1); 477 ** 478 ** Because the default VFS created above includes a RBU functionality, it 479 ** may be used by RBU clients. Attempting to use RBU with a zipvfs VFS stack 480 ** that does not include the RBU layer results in an error. 481 ** 482 ** The overhead of adding the "rbu" VFS to the system is negligible for 483 ** non-RBU users. There is no harm in an application accessing the 484 ** file-system via "rbu" all the time, even if it only uses RBU functionality 485 ** occasionally. 486 */ 487 int sqlite3rbu_create_vfs(const char *zName, const char *zParent); 488 489 /* 490 ** Deregister and destroy an RBU vfs created by an earlier call to 491 ** sqlite3rbu_create_vfs(). 492 ** 493 ** VFS objects are not reference counted. If a VFS object is destroyed 494 ** before all database handles that use it have been closed, the results 495 ** are undefined. 496 */ 497 void sqlite3rbu_destroy_vfs(const char *zName); 498 499 #ifdef __cplusplus 500 } /* end of the 'extern "C"' block */ 501 #endif 502 503 #endif /* _SQLITE3RBU_H */