# # Run this Tcl script to generate the sqlite.html file. # set rcsid {$Id: arch.tcl,v 1.16 2004/10/10 17:24:54 drh Exp $} source common.tcl header {Architecture of SQLite} puts {

The Architecture Of SQLite

Introduction

Block Diagram Of SQLite

This document describes the architecture of the SQLite library. The information here is useful to those who want to understand or modify the inner workings of SQLite.

A block diagram showing the main components of SQLite and how they interrelate is shown at the right. The text that follows will provide a quick overview of each of these components.

This document describes SQLite version 3.0. Version 2.8 and earlier are similar but the details differ.

Interface

Much of the public interface to the SQLite library is implemented by functions found in the main.c, legacy.c, and vdbeapi.c source files though some routines are scattered about in other files where they can have access to data structures with file scope. The sqlite3_get_table() routine is implemented in table.c. sqlite3_mprintf() is found in printf.c. sqlite3_complete() is in tokenize.c. The Tcl interface is implemented by tclsqlite.c. More information on the C interface to SQLite is available separately.

To avoid name collisions with other software, all external symbols in the SQLite library begin with the prefix sqlite3. Those symbols that are intended for external use (in other words, those symbols which form the API for SQLite) begin with sqlite3_.

Tokenizer

When a string containing SQL statements is to be executed, the interface passes that string to the tokenizer. The job of the tokenizer is to break the original string up into tokens and pass those tokens one by one to the parser. The tokenizer is hand-coded in C in the file tokenize.c.

Note that in this design, the tokenizer calls the parser. People who are familiar with YACC and BISON may be used to doing things the other way around -- having the parser call the tokenizer. The author of SQLite has done it both ways and finds things generally work out nicer for the tokenizer to call the parser. YACC has it backwards.

Parser

The parser is the piece that assigns meaning to tokens based on their context. The parser for SQLite is generated using the Lemon LALR(1) parser generator. Lemon does the same job as YACC/BISON, but it uses a different input syntax which is less error-prone. Lemon also generates a parser which is reentrant and thread-safe. And lemon defines the concept of a non-terminal destructor so that it does not leak memory when syntax errors are encountered. The source file that drives Lemon is found in parse.y.

Because lemon is a program not normally found on development machines, the complete source code to lemon (just one C file) is included in the SQLite distribution in the "tool" subdirectory. Documentation on lemon is found in the "doc" subdirectory of the distribution.

Code Generator

After the parser assembles tokens into complete SQL statements, it calls the code generator to produce virtual machine code that will do the work that the SQL statements request. There are many files in the code generator: attach.c, auth.c, build.c, delete.c, expr.c, insert.c, pragma.c, select.c, trigger.c, update.c, vacuum.c and where.c. In these files is where most of the serious magic happens. expr.c handles code generation for expressions. where.c handles code generation for WHERE clauses on SELECT, UPDATE and DELETE statements. The files attach.c, delete.c, insert.c, select.c, trigger.c update.c, and vacuum.c handle the code generation for SQL statements with the same names. (Each of these files calls routines in expr.c and where.c as necessary.) All other SQL statements are coded out of build.c. The auth.c file implements the functionality of sqlite3_set_authorizer().

Virtual Machine

The program generated by the code generator is executed by the virtual machine. Additional information about the virtual machine is available separately. To summarize, the virtual machine implements an abstract computing engine specifically designed to manipulate database files. The machine has a stack which is used for intermediate storage. Each instruction contains an opcode and up to three additional operands.

The virtual machine itself is entirely contained in a single source file vdbe.c. The virtual machine also has its own header files: vdbe.h that defines an interface between the virtual machine and the rest of the SQLite library and vdbeInt.h which defines structure private the virtual machine. The vdbeaux.c file contains utilities used by the virtual machine and interface modules used by the rest of the library to construct VM programs. The vdbeapi.c file contains external interfaces to the virtual machine such as the sqlite3_bind_... family of functions. Individual values (strings, integer, floating point numbers, and BLOBs) are stored in an internal object named "Mem" which is implemented by vdbemem.c.

SQLite implements SQL functions using callbacks to C-language routines. Even the built-in SQL functions are implemented this way. Most of the built-in SQL functions (ex: coalesce(), count(), substr(), and so forth) can be found in func.c. Date and time conversion functions are found in date.c.

B-Tree

An SQLite database is maintained on disk using a B-tree implementation found in the btree.c source file. A separate B-tree is used for each table and index in the database. All B-trees are stored in the same disk file. Details of the file format are recorded in a large comment at the beginning of btree.c.

The interface to the B-tree subsystem is defined by the header file btree.h.

Page Cache

The B-tree module requests information from the disk in fixed-size chunks. The default chunk size is 1024 bytes but can vary between 512 and 65536 bytes. The page cache is responsible for reading, writing, and caching these chunks. The page cache also provides the rollback and atomic commit abstraction and takes care of locking of the database file. The B-tree driver requests particular pages from the page cache and notifies the page cache when it wants to modify pages or commit or rollback changes and the page cache handles all the messy details of making sure the requests are handled quickly, safely, and efficiently.

The code to implement the page cache is contained in the single C source file pager.c. The interface to the page cache subsystem is defined by the header file pager.h.

OS Interface

In order to provide portability between POSIX and Win32 operating systems, SQLite uses an abstraction layer to interface with the operating system. The interface to the OS abstraction layer is defined in os.h. Each supported operating system has its own implementation: os_unix.c for Unix, os_win.c for windows, and so forth. Each of these operating-specific implements typically has its own header file: os_unix.h, os_win.h, etc.

Utilities

Memory allocation and caseless string comparison routines are located in util.c. Symbol tables used by the parser are maintained by hash tables found in hash.c. The utf.c source file contains Unicode conversion subroutines. SQLite has its own private implementation of printf() (with some extensions) in printf.c and its own random number generator in random.c.

Test Code

If you count regression test scripts, more than half the total code base of SQLite is devoted to testing. There are many assert() statements in the main code files. In additional, the source files test1.c through test5.c together with md5.c implement extensions used for testing purposes only. The os_test.c backend interface is used to simulate power failures to verify the crash-recovery mechanism in the pager.

} footer $rcsid