# # Run this Tcl script to generate the fileformat.html file. # set rcsid {$Id: fileformat.tcl,v 1.4 2000/08/04 13:49:03 drh Exp $} puts { The SQLite file format

The SQLite File Format

} puts "

(This page was last modified on [lrange $rcsid 3 4] GMT)

" puts {

SQLite stores each SQL table and index in a separate GDBM file. The name of the GDBM file used to store a particular table is usually just the table name with ".tbl" appended. Consider an example:

} proc Code {body} { puts {

}
  regsub -all {&} [string trim $body] {\&} body
  regsub -all {>} $body {\>} body
  regsub -all {<} $body {\<} body
  regsub -all {\(\(\(} $body {} body
  regsub -all {\)\)\)} $body {} body
  puts $body
  puts {

} } Code { $ (((rm -rf ex1))) $ (((sqlite ex1))) Enter ".help" for instructions sqlite> (((create table tbl1(one varchar(10), two smallint);))) sqlite> (((create index idx1 on tbl1(one);))) sqlite> (((insert into tbl1 values('hello!',10);))) sqlite> (((.exit))) $ ls ex1 idx1.tbl sqlite_master.tbl tbl1.tbl $ } puts {

The example above creates a new SQL database with a single table named tbl1 and a single index named idx1. Three files were created for this database. tbl1.tbl stores all the data for the tbl1 table and idx1.tbl stores all the information needed by the index idx1. The remaining file sqlite_master.tbl holds the data for the special built-in table called sqlite_master. Every SQLite database has an sqlite_master table. This table contains the schema for the database. You can query the sqlite_master table using ordinary SQL commands, but you cannot write to the sqlite_master table.

The GDBM file used to store an SQL table is usually just the name of the table with .tbl appended. But there are exceptions. First, the name of the table is converted to all lower case letters before being used to construct the filename. This is because SQL table names are not case sensitive but Unix filenames are. Second, if the table name contains any characters other than alphanumerics and underscores, the exceptional characters are encoded as a single '+' sign. For example:

} Code { $ (((sqlite ex1))) sqlite> (((create table 'Strange Table Name!'(a int, b char(30));))) sqlite> .exit $ (((ls ex1))) idx1.tbl sqlite_master.tbl strange+table+name+.tbl tbl1.tbl $ } puts {

SQL Table File Format

Each record of a GDBM file contains a key and a data. Both key and data are arbitary bytes of any length. The information from an SQL table is mapped into a GDBM file as follows:

The GDBM key for each record of an SQL table file is a randomly chosen integer. The key size thus depends on the size of an integer on the host computer. (Typically this means "4 bytes".)

If the SQL table contains N columns, then the data entry for each record begins with N integers. Each integer is the offset in bytes from the beginning of the GDBM data to the start of the data for the corresponding column. If the column contains a NULL value, then its corresponding integer will be zero. All column data is stored as null-terminated ASCII text strings.

Consider a simple example:

} Code { $ (((rm -rf ex1))) $ (((sqlite ex1))) sqlite> (((create table t1(a int, b text, c text);))) sqlite> (((insert into t1 values(10,NULL,'hello!');))) sqlite> (((insert into t1 values(-11,'this is','a test');))) sqlite> (((.exit))) $ (((gdbmdump ex1/t1.tbl))) key : 6d1a6e03 m.n. data : 0c000000 10000000 18000000 2d313100 74686973 ............-11.this 20697300 61207465 737400 is.a test. key : 6d3f90e2 m?.. data : 0c000000 00000000 0f000000 31300068 656c6c6f ............10.hello 2100 !. $ } puts {

In the example above, we have created a new table named t1 that contains two records. The gdbmdump program is used to dump the contents of the t1 GDBM file in a human readable format. The source code to gdbmdump is included with the SQLite distribution. Just type "make gdbmdump" to build it.

We can see in the dump of t1 that each record is a separate GDBM entry with a 4-byte random key. The keys shown are for a single sample run. If you try this experiment yourself, you will probably get completely different keys.

Because the t1 table contains 3 columns, the data part of each record begins with 3 integers. In both records of the example, the first integer has the value 12 since the beginning of the data for the first column begins on the 13th byte of the record. You can see how each column's data is stored as a null-terminated string. For the second record, observe that the offset integer is zero for the second column. This indicates that the second column contains NULL data.

SQL Index File Format

Each SQL index is also represented using a single GDBM file. There is one entry in the GDBM file for each unique SQL key in the table that is being indexed. The GDBM key is an arbitrary length null-terminated string which is SQL key that is used by the index. The data is a list of integers that correspond to GDBM keys of entries in data table that have the corresponding SQL key. If the data record of the index is exactly 4 bytes in size, then the data represents a single integer key. If the data is greater than 4 bytes in size, then the first 4 bytes form an integer that tells us how many keys are in the data. The index data record is always sized to be a power of 2. Unused slots at the end of the index data record are filled with zero.

To illustrate, we will create an index on the example table shown above, and add a new entry to this table that has a duplicate SQL key.

} Code { $ (((sqlite ex1))) sqlite> (((create index i1 on t1(a);))) sqlite> (((insert into t1 values(10,'another','record');))) sqlite> (((.exit))) $ (((gdbmdump ex1/t1.tbl))) key : 223100ae "1.. data : 0c000000 10000000 18000000 2d313100 74686973 ............-11.this 20697300 61207465 737400 is.a test. key : a840e996 .@.. data : 0c000000 00000000 0f000000 31300068 656c6c6f ............10.hello 2100 !. key : c19e3119 ..1. data : 0c000000 0f000000 17000000 31300061 6e6f7468 ............10.anoth 65720072 65636f72 6400 er.record. $ } puts {

We added the new record to the t1 table because we wanted to have two records with the same value on column a since that column is used by the i1 index. You can see from the dump above that the new t1 record is assigned another random GDBM key.

Now let's look at a dump of the index file.

} Code { $ (((gdbmdump ex1/i1.tbl))) key : 313000 10. data : 02000000 45b4f724 6d3f90e2 00000000 ....E..$m?...... key : 2d313100 -11. data : 6d1a6e03 m.n. $ } puts {

The GDBM file for the index contains only two records because the t1 table contains only two distinct values for column a. You can see that the GDBM keys for each record are just the text values for a columns of table t1. The data for each record of the index is a list of integers where each integer is the GDBM key for an entry in the t1 table that has the corresponding value for the a column.

The index entry for -11 contains only a single entry and is 4 bytes in size. The index entry for 10 is 16 bytes in size but contains only 2 entries. The first integer is the number of entires. The two integer keys follow. The last 4 bytes are unused. } puts {

Back to the SQLite Home Page

}