SQLite

Check-in [523d52dfa6]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment::-) (CVS 218)
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 523d52dfa6ae3028cbcc88d406501f3ebb6cbd2d
User & Date: drh 2001-05-21 13:45:10.000
Context
2001-05-24
21:06
Continued work on btree (CVS 219) (check-in: 18500cdcc1 user: drh tags: trunk)
2001-05-21
13:45
:-) (CVS 218) (check-in: 523d52dfa6 user: drh tags: trunk)
2001-05-15
00:39
:-) (CVS 217) (check-in: ee6760fb62 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/btree.c.
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76



77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105


106
107
108
109



110
111
112
113
114
115










116
117
118
119
120
121
122






123
124
125
126












127
128
129
130
131
132
133
134
135







136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
** Boston, MA  02111-1307, USA.
**
** Author contact information:
**   drh@hwaci.com
**   http://www.hwaci.com/drh/
**
*************************************************************************
** $Id: btree.c,v 1.5 2001/05/15 00:39:25 drh Exp $
*/
#include "sqliteInt.h"
#include "pager.h"
#include "btree.h"
#include <assert.h>

/*
** The maximum number of database entries that can be held in a single
** page of the database. 
*/
#define MX_CELL ((SQLITE_PAGE_SIZE-sizeof(PageHdr))/sizeof(Cell))

/*
** The maximum amount of data (in bytes) that can be stored locally for a
** database entry.  If the entry contains more data than this, the
** extra goes onto overflow pages.
*/
#define MX_LOCAL_PAYLOAD ((SQLITE_PAGE_SIZE-sizeof(PageHdr)-4*sizeof(Cell))/4)


/*
** The in-memory image of a disk page has the auxiliary information appended
** to the end.  EXTRA_SIZE is the number of bytes of space needed to hold
** that extra information.
*/
#define EXTRA_SIZE (sizeof(MemPage)-SQLITE_PAGE_SIZE)

/*
** Number of bytes on a single overflow page.
*/
#define OVERFLOW_SIZE (SQLITE_PAGE_SIZE-sizeof(Pgno))


/*
** Primitive data types.  u32 must be 4 bytes and u16 must be 2 bytes.

*/
typedef unsigned int u32;
typedef unsigned short int u16;

/*
** Forward declarations of structures used only in this file.
*/
typedef struct Page1Header Page1Header;
typedef struct MemPage MemPage;
typedef struct PageHdr PageHdr;
typedef struct Cell Cell;
typedef struct FreeBlk FreeBlk;
typedef struct OverflowPage OverflowPage;

/*
** All structures on a database page are aligned to 4-byte boundries.
** This routine rounds up a number of bytes to the next multiple of 4.



*/
#define ROUNDUP(X)  ((X+3) & ~3)


/*
** The first pages of the database file contains some additional
** information used for housekeeping and sanity checking.  Otherwise,
** the first page is just like any other.  The additional information
** found on the first page is described by the following structure.
*/
struct Page1Header {
  u32 magic1;       /* A magic number for sanity checking */
  u32 magic2;       /* A second magic number for sanity checking */
  Pgno firstList;   /* First free page in a list of all free pages */
};
#define MAGIC_1  0x7264dc61
#define MAGIC_2  0x54e55d9e


/*
** Each database page has a header as follows:
**
**      page1_header          Extra numbers found on page 1 only.
**      rightmost_pgno        Page number of the right-most child page
**      first_cell            Index into MemPage.aPage of first cell
**      first_free            Index of first free block
**
** MemPage.pStart always points to the rightmost_pgno.  First_free is
** 0 if there is no free space on this page.  Otherwise it points to


** an area like this:
**
**      nByte                 Number of free bytes in this block
**      next_free             Next free block or 0 if this is the end



*/
struct PageHdr {
  Pgno pgno;      /* Child page that comes after all cells on this page */
  u16 firstCell;  /* Index in MemPage.aPage[] of the first cell */
  u16 firstFree;  /* Index in MemPage.aPage[] of the first free block */
};










struct Cell {
  Pgno pgno;      /* Child page that comes before this cell */
  u16 nKey;       /* Number of bytes in the key */
  u16 iNext;      /* Index in MemPage.aPage[] of next cell in sorted order */
  u32 nData;      /* Number of bytes of data */
  char aData[4];  /* Key and data */
};






struct FreeBlk {
  u16 iSize;      /* Number of u32-sized slots in the block of free space */
  u16 iNext;      /* Index in MemPage.aPage[] of the next free block */
};












struct OverflowPage {
  Pgno next;
  char aData[SQLITE_PAGE_SIZE-sizeof(Pgno)];
};

/*
** For every page in the database file, an instance of the following structure
** is stored in memory.  The aPage[] array contains the data obtained from
** the disk.  The rest is auxiliary data that held in memory only.







*/
struct MemPage {
  char aPage[SQLITE_PAGE_SIZE];  /* Page data stored on disk */
  unsigned char isInit;          /* True if sequel is initialized */
  unsigned char validUp;         /* True if MemPage.up is valid */
  unsigned char validLeft;       /* True if MemPage.left is valid */
  unsigned char validRight;      /* True if MemPage.right is valid */
  Pgno up;                     /* The parent page.  0 means this is the root */
  Pgno left;                   /* Left sibling page.  0==none */
  Pgno right;                  /* Right sibling page.  0==none */
  int idxStart;                /* Index in aPage[] of real data */
  PageHdr *pStart;             /* Points to aPage[idxStart] */
  int nFree;                   /* Number of free bytes in aPage[] */
  int nCell;                   /* Number of entries on this page */
  u32 *aCell[MX_CELL];         /* All entires in sorted order */
}

/*
** Everything we need to know about an open database
*/
struct Btree {
  Pager *pPager;        /* The page cache */
  BtCursor *pCursor;    /* All open cursors */
  MemPage *page1;       /* First page of the database */
  int inTrans;          /* True if a transaction is current */
};
typedef Btree Bt;

/*
** A cursor is a pointer to a particular entry in the BTree.
** The entry is identified by its MemPage and the index in
** MemPage.aCell[] of the entry.







|

















|
|













<


>

















>
>
>


<















<









|
>
>
|

<
<
>
>
>






>
>
>
>
>
>
>
>
>
>







>
>
>
>
>
>




>
>
>
>
>
>
>
>
>
>
>
>








|
>
>
>
>
>
>
>



|



|
|
|
|
|
|
|
|







|

|







17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56

57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

97
98
99
100
101
102
103
104
105
106
107
108
109
110


111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
** Boston, MA  02111-1307, USA.
**
** Author contact information:
**   drh@hwaci.com
**   http://www.hwaci.com/drh/
**
*************************************************************************
** $Id: btree.c,v 1.6 2001/05/21 13:45:10 drh Exp $
*/
#include "sqliteInt.h"
#include "pager.h"
#include "btree.h"
#include <assert.h>

/*
** The maximum number of database entries that can be held in a single
** page of the database. 
*/
#define MX_CELL ((SQLITE_PAGE_SIZE-sizeof(PageHdr))/sizeof(Cell))

/*
** The maximum amount of data (in bytes) that can be stored locally for a
** database entry.  If the entry contains more data than this, the
** extra goes onto overflow pages.
*/
#define MX_LOCAL_PAYLOAD \
  ((SQLITE_PAGE_SIZE-sizeof(PageHdr)-4*(sizeof(Cell)+sizeof(Pgno)))/4)

/*
** The in-memory image of a disk page has the auxiliary information appended
** to the end.  EXTRA_SIZE is the number of bytes of space needed to hold
** that extra information.
*/
#define EXTRA_SIZE (sizeof(MemPage)-SQLITE_PAGE_SIZE)

/*
** Number of bytes on a single overflow page.
*/
#define OVERFLOW_SIZE (SQLITE_PAGE_SIZE-sizeof(Pgno))


/*
** Primitive data types.  u32 must be 4 bytes and u16 must be 2 bytes.
** Change these typedefs when porting to new architectures.
*/
typedef unsigned int u32;
typedef unsigned short int u16;

/*
** Forward declarations of structures used only in this file.
*/
typedef struct Page1Header Page1Header;
typedef struct MemPage MemPage;
typedef struct PageHdr PageHdr;
typedef struct Cell Cell;
typedef struct FreeBlk FreeBlk;
typedef struct OverflowPage OverflowPage;

/*
** All structures on a database page are aligned to 4-byte boundries.
** This routine rounds up a number of bytes to the next multiple of 4.
**
** This might need to change for computer architectures that require
** and 8-byte alignment boundry for structures.
*/
#define ROUNDUP(X)  ((X+3) & ~3)


/*
** The first pages of the database file contains some additional
** information used for housekeeping and sanity checking.  Otherwise,
** the first page is just like any other.  The additional information
** found on the first page is described by the following structure.
*/
struct Page1Header {
  u32 magic1;       /* A magic number for sanity checking */
  u32 magic2;       /* A second magic number for sanity checking */
  Pgno firstList;   /* First free page in a list of all free pages */
};
#define MAGIC_1  0x7264dc61
#define MAGIC_2  0x54e55d9e


/*
** Each database page has a header as follows:
**
**      page1_header          Extra numbers found on page 1 only.
**      rightmost_pgno        Page number of the right-most child page
**      first_cell            Index into MemPage.aPage of first cell
**      first_free            Index of first free block
**
** MemPage.pStart always points to the rightmost_pgno.  First_free is
** 0 if there is no free space on this page.  Otherwise, first_free is
** the index in MemPage.aPage[] of a FreeBlk structure that describes
** the first block of free space.  All free space is defined by a linked
** list of FreeBlk structures.
**


** Data is stored in a linked list of Cell structures.  First_cell is
** the index into MemPage.aPage[] of the first cell on the page.  The
** Cells are kept in sorted order.
*/
struct PageHdr {
  Pgno pgno;      /* Child page that comes after all cells on this page */
  u16 firstCell;  /* Index in MemPage.aPage[] of the first cell */
  u16 firstFree;  /* Index in MemPage.aPage[] of the first free block */
};

/*
** Data on a database page is stored as a linked list of Cell structures.
** Both the key and the data are stored in aData[].  The key always comes
** first.  The aData[] field grows as necessary to hold the key and data,
** up to a maximum of MX_LOCAL_PAYLOAD bytes.  If the size of the key and
** data combined exceeds MX_LOCAL_PAYLOAD bytes, then the 4 bytes beginning
** at Cell.aData[MX_LOCAL_PAYLOAD] are the page number of the first overflow
** page.
*/
struct Cell {
  Pgno pgno;      /* Child page that comes before this cell */
  u16 nKey;       /* Number of bytes in the key */
  u16 iNext;      /* Index in MemPage.aPage[] of next cell in sorted order */
  u32 nData;      /* Number of bytes of data */
  char aData[4];  /* Key and data */
};

/*
** Free space on a page is remembered using a linked list of the FreeBlk
** structures.  Space on a database page is allocated in increments of
** at least 4 bytes and is always aligned to a 4-byte boundry.
*/
struct FreeBlk {
  u16 iSize;      /* Number of u32-sized slots in the block of free space */
  u16 iNext;      /* Index in MemPage.aPage[] of the next free block */
};

/*
** When the key and data for a single entry in the BTree will not fit in
** the MX_LOACAL_PAYLOAD bytes of space available on the database page,
** then all extra data is written to a linked list of overflow pages.
** Each overflow page is an instance of the following structure.
**
** Unused pages in the database are also represented by instances of
** the OverflowPage structure.  The Page1Header.freeList field is the
** page number of the first page in a linked list of unused database
** pages.
*/
struct OverflowPage {
  Pgno next;
  char aData[SQLITE_PAGE_SIZE-sizeof(Pgno)];
};

/*
** For every page in the database file, an instance of the following structure
** is stored in memory.  The aPage[] array contains the data obtained from
** the disk.  The rest is auxiliary data that held in memory only.  The
** auxiliary data is only valid for regular database pages - the auxiliary
** data is meaningless for overflow pages and pages on the freelist.
**
** Of particular interest in the auxiliary data is the aCell[] entry.  Each
** aCell[] entry is a pointer to a Cell structure in aPage[].  The cells
** put in this array so that they can be accessed in constant time, rather
** than in linear time which would be needed if we walked the linked list.
*/
struct MemPage {
  char aPage[SQLITE_PAGE_SIZE];  /* Page data stored on disk */
  unsigned char isInit;          /* True if auxiliary data is initialized */
  unsigned char validUp;         /* True if MemPage.up is valid */
  unsigned char validLeft;       /* True if MemPage.left is valid */
  unsigned char validRight;      /* True if MemPage.right is valid */
  Pgno up;                       /* The parent page. 0 means this is the root */
  Pgno left;                     /* Left sibling page.  0==none */
  Pgno right;                    /* Right sibling page.  0==none */
  int idxStart;                  /* Index in aPage[] of real data */
  PageHdr *pStart;               /* Points to aPage[idxStart] */
  int nFree;                     /* Number of free bytes in aPage[] */
  int nCell;                     /* Number of entries on this page */
  Cell *aCell[MX_CELL];          /* All data entires in sorted order */
}

/*
** Everything we need to know about an open database
*/
struct Btree {
  Pager *pPager;        /* The page cache */
  BtCursor *pCursor;    /* A list of all open cursors */
  MemPage *page1;       /* First page of the database */
  int inTrans;          /* True if a transaction is in progress */
};
typedef Btree Bt;

/*
** A cursor is a pointer to a particular entry in the BTree.
** The entry is identified by its MemPage and the index in
** MemPage.aCell[] of the entry.
209
210
211
212
213
214
215
216

217
218
219
220
221
222
223
}

/*
** Allocate space on a page.  The space needs to be at least
** nByte bytes in size.  (Actually, all allocations are rounded
** up to the next even multiple of 4.)  Return the index into
** pPage->aPage[] of the first byte of the new allocation.
** Or return 0 if there is not enough space.

**
** This routine will call defragmentPage if necessary to consolidate
** free space.  
*/
static int allocSpace(MemPage *pPage, int nByte){
  FreeBlk *p;
  u16 *pIdx;







|
>







248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
}

/*
** Allocate space on a page.  The space needs to be at least
** nByte bytes in size.  (Actually, all allocations are rounded
** up to the next even multiple of 4.)  Return the index into
** pPage->aPage[] of the first byte of the new allocation.
** Or return 0 if there is not enough free space on the page to
** satisfy the allocation request.
**
** This routine will call defragmentPage if necessary to consolidate
** free space.  
*/
static int allocSpace(MemPage *pPage, int nByte){
  FreeBlk *p;
  u16 *pIdx;
245
246
247
248
249
250
251
252



253
254
255
256
257
258
259
  pPage->nFree -= nByte;
  return start;
}

/*
** Return a section of the MemPage.aPage[] to the freelist.
** The first byte of the new free block is pPage->aPage[start]
** and there are a told of size bytes to be freed.



*/
static void freeSpace(MemPage *pPage, int start, int size){
  int end = start + size;
  u16 *pIdx, idx;
  FreeBlk *pFBlk;
  FreeBlk *pNew;
  FreeBlk *pNext;







|
>
>
>







285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
  pPage->nFree -= nByte;
  return start;
}

/*
** Return a section of the MemPage.aPage[] to the freelist.
** The first byte of the new free block is pPage->aPage[start]
** and the size of the block is "size".
**
** Most of the effort here is involved in coalesing adjacent
** free blocks into a single big free block.
*/
static void freeSpace(MemPage *pPage, int start, int size){
  int end = start + size;
  u16 *pIdx, idx;
  FreeBlk *pFBlk;
  FreeBlk *pNew;
  FreeBlk *pNext;
303
304
305
306
307
308
309

310
311
312
313
314
315
316
317

318
319
320
321
322
323
324
325
326
327
328
329
330




331
332
333
334
335
336
337
  pPage->isInit = 1;
  pPage->validUp = 1;
  pPage->up = pgnoParent;
  pPage->nCell = 0;
  idx = pPage->pStart->firstCell;
  while( idx!=0 ){
    if( idx>SQLITE_PAGE_SIZE-sizeof(Cell) ) goto page_format_error;

    pCell = (Cell*)&pPage->aPage[idx];
    pPage->aCell[pPage->nCell++] = pCell;
    idx = pCell->iNext;
  }
  pPage->nFree = 0;
  idx = pPage->pStart->firstFree;
  while( idx!=0 ){
    if( idx>SQLITE_PAGE_SIZE-sizeof(FreeBlk) ) goto page_format_error;

    pFBlk = (FreeBlk*)&pPage->aPage[idx];
    pPage->nFree += pFBlk->iSize;
    if( pFBlk->iNext <= idx ) goto page_format_error;
    idx = pFBlk->iNext;
  }
  return SQLITE_OK;

page_format_error:
  return SQLITE_CORRUPT;
}

/*
** Open a new database




*/
int sqliteBtreeOpen(const char *zFilename, int mode, Btree **ppBtree){
  Btree *pBt;

  pBt = sqliteMalloc( sizeof(*pBt) );
  if( pBt==0 ){
    **ppBtree = 0;







>








>












|
>
>
>
>







346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
  pPage->isInit = 1;
  pPage->validUp = 1;
  pPage->up = pgnoParent;
  pPage->nCell = 0;
  idx = pPage->pStart->firstCell;
  while( idx!=0 ){
    if( idx>SQLITE_PAGE_SIZE-sizeof(Cell) ) goto page_format_error;
    if( idx<pPage->idxStart + sizeof(PageHeader) ) goto page_format_error;
    pCell = (Cell*)&pPage->aPage[idx];
    pPage->aCell[pPage->nCell++] = pCell;
    idx = pCell->iNext;
  }
  pPage->nFree = 0;
  idx = pPage->pStart->firstFree;
  while( idx!=0 ){
    if( idx>SQLITE_PAGE_SIZE-sizeof(FreeBlk) ) goto page_format_error;
    if( idx<pPage->idxStart + sizeof(PageHeader) ) goto page_format_error;
    pFBlk = (FreeBlk*)&pPage->aPage[idx];
    pPage->nFree += pFBlk->iSize;
    if( pFBlk->iNext <= idx ) goto page_format_error;
    idx = pFBlk->iNext;
  }
  return SQLITE_OK;

page_format_error:
  return SQLITE_CORRUPT;
}

/*
** Open a new database.
**
** Actually, this routine just sets up the internal data structures
** for accessing the database.  We do not actually open the database
** file until the first page is loaded.
*/
int sqliteBtreeOpen(const char *zFilename, int mode, Btree **ppBtree){
  Btree *pBt;

  pBt = sqliteMalloc( sizeof(*pBt) );
  if( pBt==0 ){
    **ppBtree = 0;
357
358
359
360
361
362
363



































364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
  while( pBt->pCursor ){
    sqliteBtreeCloseCursor(pBt->pCursor);
  }
  sqlitepager_close(pBt->pPager);
  sqliteFree(pBt);
  return SQLITE_OK;
}




































/*
** Start a new transaction
*/
int sqliteBtreeBeginTrans(Btree *pBt){
  int rc;
  if( pBt->inTrans ) return SQLITE_ERROR;
  if( pBt->page1==0 ){
    rc = lockBtree(pBt);
    if( rc!=SQLITE_OK ) return rc;
  }
  rc = sqlitepager_write(pBt->page1);
  if( rc==SQLITE_OK ){
    pBt->inTrans = 1;
  }
  return rc;
}

/*
** Get a reference to page1 of the database file.  This will
** also acquire a readlock on that file.
*/
static int lockBtree(Btree *pBt){
  int rc;
  if( pBt->page1 ) return SQLITE_OK;
  rc = sqlitepager_get(pBt->pPager, 1, &pBt->page1);
  if( rc!=SQLITE_OK ) return rc;
  rc = initPage(pBt->page1, 1, 0);
  if( rc!=SQLITE_OK ){
    sqlitepager_unref(pBt->page1);
    pBt->page1 = 0;
    return rc;
  }
  /* Sanity checking on the database file format */
  return rc;
}

/*
** Remove the last reference to the database file.  This will
** remove the read lock.
*/
static void unlockBtree(Btree *pBt){
  if( pBt->pCursor==0 && pBt->page1!=0 ){
    sqlitepager_unref(pBt->page1);







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


















<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465



















466
467
468
469
470
471
472
  while( pBt->pCursor ){
    sqliteBtreeCloseCursor(pBt->pCursor);
  }
  sqlitepager_close(pBt->pPager);
  sqliteFree(pBt);
  return SQLITE_OK;
}

/*
** Get a reference to page1 of the database file.  This will
** also acquire a readlock on that file.
**
** SQLITE_OK is returned on success.  If the file is not a
** well-formed database file, then SQLITE_CORRUPT is returned.
** SQLITE_BUSY is returned if the database is locked.  SQLITE_NOMEM
** is returned if we run out of memory.  SQLITE_PROTOCOL is returned
** if there is a locking protocol violation.
*/
static int lockBtree(Btree *pBt){
  int rc;
  if( pBt->page1 ) return SQLITE_OK;
  rc = sqlitepager_get(pBt->pPager, 1, &pBt->page1);
  if( rc!=SQLITE_OK ) return rc;
  rc = initPage(pBt->page1, 1, 0);
  if( rc!=SQLITE_OK ) goto lock_failed;

  /* Do some checking to help insure the file we opened really is
  ** a valid database file. 
  */
  if( sqlitepager_pagecount(pBt->pPager)>0 ){
    Page1Header *pP1 = (Page1Header*)pBt->page1;
    if( pP1->magic1!=MAGIC_1 || pP1->magic2!=MAGIC_2 ){
      rc = SQLITE_CORRUPT;
      goto lock_failed;
    }
  }
  return rc;

lock_failed:
  sqlitepager_unref(pBt->page1);
  pBt->page1 = 0;
}

/*
** Start a new transaction
*/
int sqliteBtreeBeginTrans(Btree *pBt){
  int rc;
  if( pBt->inTrans ) return SQLITE_ERROR;
  if( pBt->page1==0 ){
    rc = lockBtree(pBt);
    if( rc!=SQLITE_OK ) return rc;
  }
  rc = sqlitepager_write(pBt->page1);
  if( rc==SQLITE_OK ){
    pBt->inTrans = 1;
  }
  return rc;
}




















/*
** Remove the last reference to the database file.  This will
** remove the read lock.
*/
static void unlockBtree(Btree *pBt){
  if( pBt->pCursor==0 && pBt->page1!=0 ){
    sqlitepager_unref(pBt->page1);
Changes to src/pager.c.
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
**   http://www.hwaci.com/drh/
**
*************************************************************************
** This is the implementation of the page cache subsystem.
** 
** The page cache is used to access a database file.  The pager journals
** all writes in order to support rollback.  Locking is used to limit
** access to one or more reader or on writer.
**
** @(#) $Id: pager.c,v 1.5 2001/04/28 16:52:42 drh Exp $
*/
#include "sqliteInt.h"
#include "pager.h"
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <assert.h>







|

|







21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
**   http://www.hwaci.com/drh/
**
*************************************************************************
** This is the implementation of the page cache subsystem.
** 
** The page cache is used to access a database file.  The pager journals
** all writes in order to support rollback.  Locking is used to limit
** access to one or more reader or one writer.
**
** @(#) $Id: pager.c,v 1.6 2001/05/21 13:45:10 drh Exp $
*/
#include "sqliteInt.h"
#include "pager.h"
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <assert.h>
52
53
54
55
56
57
58
59
60
61
62
63



64
65
66
67
68
69
70
71
72
**                       file at the same time.
**
**   SQLITE_WRITELOCK    The page cache is writing the database.
**                       Access is exclusive.  No other processes or
**                       threads can be reading or writing while one
**                       process is writing.
**
** The page cache comes up in PCS_UNLOCK.  The first time a
** sqlite_page_get() occurs, the state transitions to PCS_READLOCK.
** After all pages have been released using sqlite_page_unref(),
** the state transitions back to PCS_UNLOCK.  The first time
** that sqlite_page_write() is called, the state transitions to



** PCS_WRITELOCK.  The sqlite_page_rollback() and sqlite_page_commit()
** functions transition the state back to PCS_READLOCK.
*/
#define SQLITE_UNLOCK      0
#define SQLITE_READLOCK    1
#define SQLITE_WRITELOCK   2


/*







|
|

|

>
>
>
|
|







52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
**                       file at the same time.
**
**   SQLITE_WRITELOCK    The page cache is writing the database.
**                       Access is exclusive.  No other processes or
**                       threads can be reading or writing while one
**                       process is writing.
**
** The page cache comes up in SQLITE_UNLOCK.  The first time a
** sqlite_page_get() occurs, the state transitions to SQLITE_READLOCK.
** After all pages have been released using sqlite_page_unref(),
** the state transitions back to SQLITE_UNLOCK.  The first time
** that sqlite_page_write() is called, the state transitions to
** SQLITE_WRITELOCK.  (Note that sqlite_page_write() can only be
** called on an outstanding page which means that the pager must
** be in SQLITE_READLOCK before it transitions to SQLITE_WRITELOCK.)
** The sqlite_page_rollback() and sqlite_page_commit() functions 
** transition the state from SQLITE_WRITELOCK back to SQLITE_READLOCK.
*/
#define SQLITE_UNLOCK      0
#define SQLITE_READLOCK    1
#define SQLITE_WRITELOCK   2


/*
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
*/
#define PGHDR_TO_DATA(P)  ((void*)(&(P)[1]))
#define DATA_TO_PGHDR(D)  (&((PgHdr*)(D))[-1])
#define PGHDR_TO_EXTRA(P) ((void*)&((char*)(&(P)[1]))[SQLITE_PAGE_SIZE])

/*
** How big to make the hash table used for locating in-memory pages
** by page number.
*/
#define N_PG_HASH 101

/*
** A open page cache is an instance of the following structure.
*/
struct Pager {







|







95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
*/
#define PGHDR_TO_DATA(P)  ((void*)(&(P)[1]))
#define DATA_TO_PGHDR(D)  (&((PgHdr*)(D))[-1])
#define PGHDR_TO_EXTRA(P) ((void*)&((char*)(&(P)[1]))[SQLITE_PAGE_SIZE])

/*
** How big to make the hash table used for locating in-memory pages
** by page number.  Knuth says this should be a prime number.
*/
#define N_PG_HASH 101

/*
** A open page cache is an instance of the following structure.
*/
struct Pager {
332
333
334
335
336
337
338
339
340

341
342
343
344
345
346
347
** Playback the journal and thus restore the database file to
** the state it was in before we started making changes.  
**
** The journal file format is as follows:  There is an initial
** file-type string for sanity checking.  Then there is a single
** Pgno number which is the number of pages in the database before
** changes were made.  The database is truncated to this size.
** Next come zero or more page records which each page record
** consists of a Pgno, SQLITE_PAGE_SIZE bytes of data.  

**
** For playback, the pages have to be read from the journal in
** reverse order and put back into the original database file.
**
** If the file opened as the journal file is not a well-formed
** journal file (as determined by looking at the magic number
** at the beginning) then this routine returns SQLITE_PROTOCOL.







|
|
>







335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
** Playback the journal and thus restore the database file to
** the state it was in before we started making changes.  
**
** The journal file format is as follows:  There is an initial
** file-type string for sanity checking.  Then there is a single
** Pgno number which is the number of pages in the database before
** changes were made.  The database is truncated to this size.
** Next come zero or more page records where each page record
** consists of a Pgno and SQLITE_PAGE_SIZE bytes of data.  See
** the PageRecord structure for details.
**
** For playback, the pages have to be read from the journal in
** reverse order and put back into the original database file.
**
** If the file opened as the journal file is not a well-formed
** journal file (as determined by looking at the magic number
** at the beginning) then this routine returns SQLITE_PROTOCOL.
563
564
565
566
567
568
569






570
571
572
573
574
575
576
}

/*
** Acquire a page.
**
** A read lock is obtained for the first page acquired.  The lock
** is dropped when the last page is released.  






**
** The acquisition might fail for several reasons.  In all cases,
** an appropriate error code is returned and *ppPage is set to NULL.
**
** See also sqlitepager_lookup().  Both this routine and _lookup() attempt
** to find a page in the in-memory cache first.  If the page is not already
** in cache, this routine goes to disk to read it in whereas _lookup()







>
>
>
>
>
>







567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
}

/*
** Acquire a page.
**
** A read lock is obtained for the first page acquired.  The lock
** is dropped when the last page is released.  
**
** A _get works for any page number greater than 0.  If the database
** file is smaller than the requested page, then no actual disk
** read occurs and the memory image of the page is initialized to
** all zeros.  The extra data appended to a page is always initialized
** to zeros the first time a page is loaded into memory.
**
** The acquisition might fail for several reasons.  In all cases,
** an appropriate error code is returned and *ppPage is set to NULL.
**
** See also sqlitepager_lookup().  Both this routine and _lookup() attempt
** to find a page in the in-memory cache first.  If the page is not already
** in cache, this routine goes to disk to read it in whereas _lookup()
723
724
725
726
727
728
729




730
731

732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
    h = pager_hash(pgno);
    pPg->pNextHash = pPager->aHash[h];
    pPager->aHash[h] = pPg;
    if( pPg->pNextHash ){
      assert( pPg->pNextHash->pPrevHash==0 );
      pPg->pNextHash->pPrevHash = pPg;
    }




    pager_seek(pPager->fd, (pgno-1)*SQLITE_PAGE_SIZE);
    pager_read(pPager->fd, PGHDR_TO_DATA(pPg), SQLITE_PAGE_SIZE);

    if( pPager->nExtra>0 ){
      memset(PGHDR_TO_EXTRA(pPg), 0, pPager->nExtra);
    }
  }else{
    /* The requested page is in the page cache. */
    pPager->nHit++;
    sqlitepager_ref(pPg);
  }
  *ppPage = PGHDR_TO_DATA(pPg);
  return SQLITE_OK;
}

/*
** Acquire a page if it is already in the in-memory cache.  Do
** not read the page from disk.  Return a pointer to the page,
** or 0 if the page is not in cache.
**
** See also sqlitepager_get().  The difference between this routine
** and sqlitepager_get() is that _get() will go to the disk and read
** in the page if the page is not already in cache.  This routine
** returns NULL if the page is not in cache and no disk I/O ever
** occurs.
*/
void *sqlitepager_lookup(Pager *pPager, Pgno pgno){
  PgHdr *pPg;

  /* Make sure we have not hit any critical errors.
  */ 
  if( pPager==0 || pgno==0 ){







>
>
>
>
|
|
>




















|
|







733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
    h = pager_hash(pgno);
    pPg->pNextHash = pPager->aHash[h];
    pPager->aHash[h] = pPg;
    if( pPg->pNextHash ){
      assert( pPg->pNextHash->pPrevHash==0 );
      pPg->pNextHash->pPrevHash = pPg;
    }
    if( pPager->dbSize<0 ) sqlitepager_pagecount(pPager);
    if( pPager->dbSize<pgno ){
      memset(PGHDR_TO_DATA(pPg), 0, SQLITE_PAGE_SIZE);
    }else{
      pager_seek(pPager->fd, (pgno-1)*SQLITE_PAGE_SIZE);
      pager_read(pPager->fd, PGHDR_TO_DATA(pPg), SQLITE_PAGE_SIZE);
    }
    if( pPager->nExtra>0 ){
      memset(PGHDR_TO_EXTRA(pPg), 0, pPager->nExtra);
    }
  }else{
    /* The requested page is in the page cache. */
    pPager->nHit++;
    sqlitepager_ref(pPg);
  }
  *ppPage = PGHDR_TO_DATA(pPg);
  return SQLITE_OK;
}

/*
** Acquire a page if it is already in the in-memory cache.  Do
** not read the page from disk.  Return a pointer to the page,
** or 0 if the page is not in cache.
**
** See also sqlitepager_get().  The difference between this routine
** and sqlitepager_get() is that _get() will go to the disk and read
** in the page if the page is not already in cache.  This routine
** returns NULL if the page is not in cache of if a disk I/O has ever
** happened.
*/
void *sqlitepager_lookup(Pager *pPager, Pgno pgno){
  PgHdr *pPg;

  /* Make sure we have not hit any critical errors.
  */ 
  if( pPager==0 || pgno==0 ){
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
** Mark a data page as writeable.  The page is written into the journal 
** if it is not there already.  This routine must be called before making
** changes to a page.
**
** The first time this routine is called, the pager creates a new
** journal and acquires a write lock on the database.  If the write
** lock could not be acquired, this routine returns SQLITE_BUSY.  The
** calling routine must check for that routine and be careful not to
** change any page data until this routine returns SQLITE_OK.
**
** If the journal file could not be written because the disk is full,
** then this routine returns SQLITE_FULL and does an immediate rollback.
** All subsequent write attempts also return SQLITE_FULL until there
** is a call to sqlitepager_commit() or sqlitepager_rollback() to
** reset.







|







835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
** Mark a data page as writeable.  The page is written into the journal 
** if it is not there already.  This routine must be called before making
** changes to a page.
**
** The first time this routine is called, the pager creates a new
** journal and acquires a write lock on the database.  If the write
** lock could not be acquired, this routine returns SQLITE_BUSY.  The
** calling routine must check for that return value and be careful not to
** change any page data until this routine returns SQLITE_OK.
**
** If the journal file could not be written because the disk is full,
** then this routine returns SQLITE_FULL and does an immediate rollback.
** All subsequent write attempts also return SQLITE_FULL until there
** is a call to sqlitepager_commit() or sqlitepager_rollback() to
** reset.
885
886
887
888
889
890
891



892
893
894
895
896
897
898
    if( rc!=SQLITE_OK ){
      sqlitepager_rollback(pPager);
      pPager->errMask |= PAGER_ERR_FULL;
      return rc;
    }
  }
  pPg->inJournal = 1;



  return rc;
}

/*
** Commit all changes to the database and release the write lock.
**
** If the commit fails for any reason, a rollback attempt is made







>
>
>







900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
    if( rc!=SQLITE_OK ){
      sqlitepager_rollback(pPager);
      pPager->errMask |= PAGER_ERR_FULL;
      return rc;
    }
  }
  pPg->inJournal = 1;
  if( pPager->dbSize<pPg->pgno ){
    pPager->dbSize = pPg->pgno;
  }
  return rc;
}

/*
** Commit all changes to the database and release the write lock.
**
** If the commit fails for any reason, a rollback attempt is made
Changes to test/pager.test.
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#   drh@hwaci.com
#   http://www.hwaci.com/drh/
#
#***********************************************************************
# This file implements regression tests for SQLite library.  The
# focus of this script is page cache subsystem.
#
# $Id: pager.test,v 1.3 2001/04/17 20:09:11 drh Exp $


set testdir [file dirname $argv0]
source $testdir/tester.tcl

if {$dbprefix!="mem:" && [info commands pager_open]!=""} {








|







19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
#   drh@hwaci.com
#   http://www.hwaci.com/drh/
#
#***********************************************************************
# This file implements regression tests for SQLite library.  The
# focus of this script is page cache subsystem.
#
# $Id: pager.test,v 1.4 2001/05/21 13:45:10 drh Exp $


set testdir [file dirname $argv0]
source $testdir/tester.tcl

if {$dbprefix!="mem:" && [info commands pager_open]!=""} {

71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
    set ::g1 [page_get $::p1 1]
  } msg]
  if {$v} {lappend v $msg}
  set v
} {0}
do_test pager-2.4 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size -1 state 1 err 0 hit 0 miss 1 ovfl 0}
do_test pager-2.5 {
  pager_pagecount $::p1
} {0}
do_test pager-2.6 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size 0 state 1 err 0 hit 0 miss 1 ovfl 0}
do_test pager-2.7 {







|







71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
    set ::g1 [page_get $::p1 1]
  } msg]
  if {$v} {lappend v $msg}
  set v
} {0}
do_test pager-2.4 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size 0 state 1 err 0 hit 0 miss 1 ovfl 0}
do_test pager-2.5 {
  pager_pagecount $::p1
} {0}
do_test pager-2.6 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size 0 state 1 err 0 hit 0 miss 1 ovfl 0}
do_test pager-2.7 {
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
  expr {$::g1!=0}
} {1}
do_test pager-2.12 {
  page_number $::g1
} {1}
do_test pager-2.13 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size -1 state 1 err 0 hit 0 miss 2 ovfl 0}
do_test pager-2.14 {
  set v [catch {
    page_write $::g1 "Page-One"
  } msg]
  lappend v $msg
} {0 {}}
do_test pager-2.15 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size 0 state 2 err 0 hit 0 miss 2 ovfl 0}
do_test pager-2.16 {
  page_read $::g1
} {Page-One}
do_test pager-2.17 {
  set v [catch {
    pager_commit $::p1
  } msg]







|








|







99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
  expr {$::g1!=0}
} {1}
do_test pager-2.12 {
  page_number $::g1
} {1}
do_test pager-2.13 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size 0 state 1 err 0 hit 0 miss 2 ovfl 0}
do_test pager-2.14 {
  set v [catch {
    page_write $::g1 "Page-One"
  } msg]
  lappend v $msg
} {0 {}}
do_test pager-2.15 {
  pager_stats $::p1
} {ref 1 page 1 max 10 size 1 state 2 err 0 hit 0 miss 2 ovfl 0}
do_test pager-2.16 {
  page_read $::g1
} {Page-One}
do_test pager-2.17 {
  set v [catch {
    pager_commit $::p1
  } msg]
Changes to www/index.tcl.
1
2
3
4
5
6
7
8
9
10
11
#
# Run this TCL script to generate HTML for the index.html file.
#
set rcsid {$Id: index.tcl,v 1.36 2001/04/05 16:49:44 drh Exp $}

puts {<html>
<head><title>SQLite: An SQL Database Library Built Atop GDBM</title></head>
<body bgcolor=white>
<h1 align=center>SQLite: An SQL Database Library Built Atop
<a href="http://www.gnu.org/software/gdbm/gdbm.html">GDBM</a></h1>
<p align=center>}



|







1
2
3
4
5
6
7
8
9
10
11
#
# Run this TCL script to generate HTML for the index.html file.
#
set rcsid {$Id: index.tcl,v 1.37 2001/05/21 13:45:10 drh Exp $}

puts {<html>
<head><title>SQLite: An SQL Database Library Built Atop GDBM</title></head>
<body bgcolor=white>
<h1 align=center>SQLite: An SQL Database Library Built Atop
<a href="http://www.gnu.org/software/gdbm/gdbm.html">GDBM</a></h1>
<p align=center>}
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

<ul>
<li><p>
  The LIKE operator is suppose to ignore case. 
  But it only ignores case for 7-bit Latin characters.
  The case of 8-bit iso8859 characters or UTF-8 characters is
  signification.  Hence, <b>'a'&nbsp;LIKE&nbsp;'A'</b> returns
  TRUE but <b>'&aelig;'&nbsp;LIKE&nbsp;'&AElig;'</b>" returns FALSE.
</p></li>
</ul>

<p><b>Important Note:</b>  Serious bugs have been found in versions
1.0.22 on Unix and 1.0.26 on Windows.  Users of these or earlier
versions of SQLite should upgrade.</p>








|







65
66
67
68
69
70
71
72
73
74
75
76
77
78
79

<ul>
<li><p>
  The LIKE operator is suppose to ignore case. 
  But it only ignores case for 7-bit Latin characters.
  The case of 8-bit iso8859 characters or UTF-8 characters is
  signification.  Hence, <b>'a'&nbsp;LIKE&nbsp;'A'</b> returns
  TRUE but <b>'&aelig;'&nbsp;LIKE&nbsp;'&AElig;'</b> returns FALSE.
</p></li>
</ul>

<p><b>Important Note:</b>  Serious bugs have been found in versions
1.0.22 on Unix and 1.0.26 on Windows.  Users of these or earlier
versions of SQLite should upgrade.</p>

143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
<p>Instructions for building SQLite for WindowsNT are
found <a href="crosscompile.html">here</a>.
}

puts {<h2>Command-line Usage Example</h2>

<p>Download the source archive and compile the <b>sqlite</b>
program as described above.  The type:</p>

<blockquote><pre>
bash$ sqlite ~/newdb              <i>Directory ~/newdb created automatically</i>
sqlite> create table t1(
   ...>    a int,
   ...>    b varchar(20)
   ...>    c text







|







143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
<p>Instructions for building SQLite for WindowsNT are
found <a href="crosscompile.html">here</a>.
}

puts {<h2>Command-line Usage Example</h2>

<p>Download the source archive and compile the <b>sqlite</b>
program as described above.  Then type:</p>

<blockquote><pre>
bash$ sqlite ~/newdb              <i>Directory ~/newdb created automatically</i>
sqlite> create table t1(
   ...>    a int,
   ...>    b varchar(20)
   ...>    c text
Changes to www/lang.tcl.
1
2
3
4
5
6
7
8
9
10
11
#
# Run this Tcl script to generate the sqlite.html file.
#
set rcsid {$Id: lang.tcl,v 1.7 2001/04/04 11:48:58 drh Exp $}

puts {<html>
<head>
  <title>Query Language Understood By SQLite</title>
</head>
<body bgcolor=white>
<h1 align=center>



|







1
2
3
4
5
6
7
8
9
10
11
#
# Run this Tcl script to generate the sqlite.html file.
#
set rcsid {$Id: lang.tcl,v 1.8 2001/05/21 13:45:10 drh Exp $}

puts {<html>
<head>
  <title>Query Language Understood By SQLite</title>
</head>
<body bgcolor=white>
<h1 align=center>
380
381
382
383
384
385
386
387

388
389





390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
puts {

<p>The LIKE operator does a wildcard comparision.  The operand
to the right contains the wildcards.}
puts "A percent symbol [Operator %] in the right operand
matches any sequence of zero or more characters on the left.
An underscore [Operator _] on the right
matches any single character on the left.  The LIKE operator is

not case sensitive and will match upper case characters on one
side against lower case characters on the other.</p>"





puts {

<p>The GLOB operator is similar to LIKE but uses the Unix
file globbing syntax for its wildcards.  Also, GLOB is case
sensitive, unlike LIKE.  Both GLOB and LIKE may be preceded by
the NOT keyword to invert the sense of the test.</p>

<p>A column name can be any of the names defined in the CREATE TABLE
statement or one of the following special identifiers: "<b>ROWID</b>",
"<b>OID</b>", or "<b>_ROWID_</b>".
These special identifiers all describe the
unique random integer key (the "row key") associated every every 
row of every table.
The special identifiers only refer to the row key if the CREATE TABLE
statement does not define a real column with the same name.  Row keys
act like read-only columns.  A row key can be used anywhere a regular
column can be used, except that you cannot change the value
of a row key in an UPDATE or INSERT statement.
"SELECT * ..." does not return the row key.</p>







|
>

|
>
>
>
>
>
|










|







380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
puts {

<p>The LIKE operator does a wildcard comparision.  The operand
to the right contains the wildcards.}
puts "A percent symbol [Operator %] in the right operand
matches any sequence of zero or more characters on the left.
An underscore [Operator _] on the right
matches any single character on the left."
puts {The LIKE operator is
not case sensitive and will match upper case characters on one
side against lower case characters on the other.
(A bug: SQLite only understands upper/lower case for 7-bit Latin
characters.  Hence the LIKE operator is case sensitive for
8-bit iso8859 characters or UTF-8 characters.  For example,
the expression <b>'a'&nbsp;LIKE&nbsp;'A'</b> is TRUE but
<b>'&aelig;'&nbsp;LIKE&nbsp;'&AElig;'</b> is FALSE.)
</p>

<p>The GLOB operator is similar to LIKE but uses the Unix
file globbing syntax for its wildcards.  Also, GLOB is case
sensitive, unlike LIKE.  Both GLOB and LIKE may be preceded by
the NOT keyword to invert the sense of the test.</p>

<p>A column name can be any of the names defined in the CREATE TABLE
statement or one of the following special identifiers: "<b>ROWID</b>",
"<b>OID</b>", or "<b>_ROWID_</b>".
These special identifiers all describe the
unique random integer key (the "row key") associated with every 
row of every table.
The special identifiers only refer to the row key if the CREATE TABLE
statement does not define a real column with the same name.  Row keys
act like read-only columns.  A row key can be used anywhere a regular
column can be used, except that you cannot change the value
of a row key in an UPDATE or INSERT statement.
"SELECT * ..." does not return the row key.</p>