/ Check-in [cbcaece7]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:A file format change for btree.c makes it between 10 and 20% faster. (CVS 1493)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: cbcaece7f45a0bc994e6c54a996afa4e6529da6a
User & Date: drh 2004-05-29 21:46:49
Context
2004-05-30
01:38
Do not include the P3 parameter on OP_Integer opcodes if the integer will fit in 32 bits. The P3 conversion is slow. (CVS 1494) check-in: fcd84eba user: drh tags: trunk
2004-05-29
21:46
A file format change for btree.c makes it between 10 and 20% faster. (CVS 1493) check-in: cbcaece7 user: drh tags: trunk
11:24
Transform OP_HexBlob and OP_String8 to OP_Blob and OP_String the first time they are executed. (CVS 1492) check-in: 3225de89 user: danielk1977 tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/btree.c.

     5      5   ** a legal notice, here is a blessing:
     6      6   **
     7      7   **    May you do good and not evil.
     8      8   **    May you find forgiveness for yourself and forgive others.
     9      9   **    May you share freely, never taking more than you give.
    10     10   **
    11     11   *************************************************************************
    12         -** $Id: btree.c,v 1.148 2004/05/29 10:23:19 danielk1977 Exp $
           12  +** $Id: btree.c,v 1.149 2004/05/29 21:46:49 drh Exp $
    13     13   **
    14     14   ** This file implements a external (disk-based) database using BTrees.
    15     15   ** For a detailed discussion of BTrees, refer to
    16     16   **
    17     17   **     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
    18     18   **     "Sorting And Searching", pages 473-480. Addison-Wesley
    19     19   **     Publishing Company, Reading, Massachusetts.
................................................................................
    91     91   ** the cell size drop below the min embedded payload fraction.
    92     92   **
    93     93   ** The min leaf payload fraction is like the min embedded payload fraction
    94     94   ** except that it applies to leaf nodes in a LEAFDATA tree.  The maximum
    95     95   ** payload fraction for a LEAFDATA tree is always 100% (or 255) and it
    96     96   ** not specified in the header.
    97     97   **
    98         -** Each btree page begins with a header described below.  Note that the
    99         -** header for page one begins at byte 100.  For all other btree pages, the
   100         -** header begins on byte zero.
           98  +** Each btree pages is divided into three sections:  The header, the
           99  +** cell pointer array, and the cell area area.  Page 1 also has a 100-byte
          100  +** file header that occurs before the page header.   The 100-byte file
          101  +** header occurs on page 1 only.
          102  +**
          103  +** The page headers looks like this:
   101    104   **
   102    105   **   OFFSET   SIZE     DESCRIPTION
   103    106   **      0       1      Flags. 1: intkey, 2: zerodata, 4: leafdata, 8: leaf
   104    107   **      1       2      byte offset to the first freeblock
   105         -**      3       2      byte offset to the first cell
   106         -**      5       1      number of fragmented free bytes
   107         -**      6       4      Right child (the Ptr(N+1) value).  Omitted if leaf
          108  +**      3       2      number of cells on this page
          109  +**      5       2      first byte past the cell array area
          110  +**      7       1      number of fragmented free bytes
          111  +**      8       4      Right child (the Ptr(N+1) value).  Omitted if leaf
   108    112   **
   109    113   ** The flags define the format of this btree page.  The leaf flag means that
   110    114   ** this page has no children.  The zerodata flag means that this page carries
   111    115   ** only keys and no data.  The intkey flag means that the key is a single
   112    116   ** variable length integer at the beginning of the payload.
   113    117   **
   114         -** A variable-length integer is 1 to 9 bytes where the lower 7 bits of each 
          118  +** The cell pointer array begins on the first byte after the page header.
          119  +** The cell pointer array contains zero or more 2-byte numbers which are
          120  +** offsets from the beginning of the page to the cell content in the cell
          121  +** content area.  The cell pointers occur in sorted order.  The system strives
          122  +** to keep free space after the last cell pointer so that new cells can
          123  +** be easily added without have to defragment the page.
          124  +**
          125  +** Cell content is stored at the very end of the page and grows toward the
          126  +** beginning of the page.
          127  +**
          128  +** Unused space within the cell content area is collected into a linked list of
          129  +** freeblocks.  Each freeblock is at least 4 bytes in size.  The byte offset
          130  +** to the first freeblock is given in the header.  Freeblocks occur in
          131  +** increasing order.  Because a freeblock must be at least 4 bytes in size,
          132  +** any group of 3 or fewer unused bytes in the cell content area cannot
          133  +** exist on the freeblock chain.  A group of 3 or fewer free bytes is called
          134  +** a fragment.  The total number of bytes in all fragments is recorded.
          135  +** in the page header at offset 7.
          136  +**
          137  +**    SIZE    DESCRIPTION
          138  +**      2     Byte offset of the next freeblock
          139  +**      2     Bytes in this freeblock
          140  +**
          141  +** Cells are of variable length.  Cells are stored in the cell content area at
          142  +** the end of the page.  Pointers to the cells are in the cell pointer array
          143  +** that immediately follows the page header.  Cells is not necessarily
          144  +** contiguous or in order, but cell pointers are contiguous and in order.
          145  +**
          146  +** Cell content makes use of variable length integers.  A variable
          147  +** length integer is 1 to 9 bytes where the lower 7 bits of each 
   115    148   ** byte are used.  The integer consists of all bytes that have bit 8 set and
   116    149   ** the first byte with bit 8 clear.  The most significant byte of the integer
   117    150   ** appears first.  A variable-length integer may not be more than 9 bytes long.
   118    151   ** As a special case, all 8 bytes of the 9th byte are used as data.  This
   119    152   ** allows a 64-bit integer to be encoded in 9 bytes.
   120    153   **
   121    154   **    0x00                      becomes  0x00000000
................................................................................
   125    158   **    0x80 0x7f                 becomes  0x0000007f
   126    159   **    0x8a 0x91 0xd1 0xac 0x78  becomes  0x12345678
   127    160   **    0x81 0x81 0x81 0x81 0x01  becomes  0x10204081
   128    161   **
   129    162   ** Variable length integers are used for rowids and to hold the number of
   130    163   ** bytes of key and data in a btree cell.
   131    164   **
   132         -** Unused space within a btree page is collected into a linked list of
   133         -** freeblocks.  Each freeblock is at least 4 bytes in size.  The byte offset
   134         -** to the first freeblock is given in the header.  Freeblocks occur in
   135         -** increasing order.  Because a freeblock is 4 bytes in size, the minimum
   136         -** size allocation on a btree page is 4 bytes.  Because a freeblock must be
   137         -** at least 4 bytes in size, any group of 3 or fewer unused bytes cannot
   138         -** exist on the freeblock chain.  A group of 3 or fewer free bytes is called
   139         -** a fragment.  The total number of bytes in all fragments is recorded.
   140         -** in the page header at offset 5.
          165  +** The content of a cell looks like this:
   141    166   **
   142    167   **    SIZE    DESCRIPTION
   143         -**      2     Byte offset of the next freeblock
   144         -**      2     Bytes in this freeblock
   145         -**
   146         -** Cells are of variable length.  The first cell begins on the byte defined
   147         -** in the page header.  Cells do not necessarily occur in order - they can
   148         -** skip around on the page.
   149         -**
   150         -**    SIZE    DESCRIPTION
   151         -**      2     Byte offset of the next cell.  0 if this is the last cell
   152    168   **      4     Page number of the left child. Omitted if leaf flag is set.
   153    169   **     var    Number of bytes of data. Omitted if the zerodata flag is set.
   154    170   **     var    Number of bytes of key. Or the key itself if intkey flag is set.
   155    171   **      *     Payload
   156    172   **      4     First page of the overflow chain.  Omitted if no overflow
   157    173   **
   158    174   ** Overflow pages form a linked list.  Each page except the last is completely
................................................................................
   176    192   #include "sqliteInt.h"
   177    193   #include "pager.h"
   178    194   #include "btree.h"
   179    195   #include <assert.h>
   180    196   
   181    197   
   182    198   /* Maximum page size.  The upper bound on this value is 65536 (a limit
   183         -** imposed by the 2-byte offset at the beginning of each cell.)  The
          199  +** imposed by the 2-byte size of cell array pointers.)  The
   184    200   ** maximum page size determines the amount of stack space allocated
   185    201   ** by many of the routines in this module.  On embedded architectures
   186    202   ** or any machine where memory and especially stack memory is limited,
   187    203   ** one may wish to chose a smaller value for the maximum page size.
   188    204   */
   189    205   #ifndef MX_PAGE_SIZE
   190    206   # define MX_PAGE_SIZE 1024
   191    207   #endif
   192    208   
   193    209   /* The following value is the maximum cell size assuming a maximum page
   194    210   ** size give above.
   195    211   */
   196         -#define MX_CELL_SIZE  (MX_PAGE_SIZE-6)
          212  +#define MX_CELL_SIZE  (MX_PAGE_SIZE-8)
   197    213   
   198    214   /* The maximum number of cells on a single page of the database.  This
   199    215   ** assumes a minimum cell size of 3 bytes.  Such small cells will be
   200    216   ** exceedingly rare, but they are possible.
   201    217   */
   202         -#define MX_CELL ((MX_PAGE_SIZE-6)/3)
          218  +#define MX_CELL ((MX_PAGE_SIZE-8)/3)
   203    219   
   204    220   /* Forward declarations */
   205    221   typedef struct MemPage MemPage;
   206    222   
   207    223   /*
   208    224   ** This is a magic string that appears at the beginning of every
   209    225   ** SQLite database in order to identify the file as a real database.
................................................................................
   226    242   **
   227    243   ** The pParent field points back to the parent page.  This allows us to
   228    244   ** walk up the BTree from any leaf to the root.  Care must be taken to
   229    245   ** unref() the parent page pointer when this page is no longer referenced.
   230    246   ** The pageDestructor() routine handles that chore.
   231    247   */
   232    248   struct MemPage {
   233         -  u8 isInit;                     /* True if previously initialized */
   234         -  u8 idxShift;                   /* True if Cell indices have changed */
   235         -  u8 isOverfull;                 /* Some aCell[] do not fit on page */
   236         -  u8 intKey;                     /* True if intkey flag is set */
   237         -  u8 leaf;                       /* True if leaf flag is set */
   238         -  u8 zeroData;                   /* True if table stores keys only */
   239         -  u8 leafData;                   /* True if tables stores data on leaves only */
   240         -  u8 hasData;                    /* True if this page stores data */
   241         -  u8 hdrOffset;                  /* 100 for page 1.  0 otherwise */
   242         -  u8 needRelink;                 /* True if cell not linked properly in aData */
   243         -  int idxParent;                 /* Index in pParent->aCell[] of this node */
   244         -  int nFree;                     /* Number of free bytes on the page */
   245         -  int nCell;                     /* Number of entries on this page */
   246         -  int nCellAlloc;                /* Number of slots allocated in aCell[] */
   247         -  unsigned char **aCell;         /* Pointer to start of each cell */
   248         -  struct Btree *pBt;             /* Pointer back to BTree structure */
   249         -
   250         -  /* When page content is move from one page to the other (by the movePage()
   251         -  ** subroutine) only the information about is moved.  The information below
   252         -  ** is fixed. */
   253         -  unsigned char *aData;          /* Pointer back to the start of the page */
   254         -  Pgno pgno;                     /* Page number for this page */
   255         -  MemPage *pParent;              /* The parent of this page.  NULL for root */
          249  +  u8 isInit;           /* True if previously initialized */
          250  +  u8 idxShift;         /* True if Cell indices have changed */
          251  +  u8 nOverflow;        /* Number of overflow cell bodies in aCell[] */
          252  +  u8 intKey;           /* True if intkey flag is set */
          253  +  u8 leaf;             /* True if leaf flag is set */
          254  +  u8 zeroData;         /* True if table stores keys only */
          255  +  u8 leafData;         /* True if tables stores data on leaves only */
          256  +  u8 hasData;          /* True if this page stores data */
          257  +  u8 hdrOffset;        /* 100 for page 1.  0 otherwise */
          258  +  u16 cellOffset;      /* Index in aData of first cell pointer */
          259  +  u16 idxParent;       /* Index in parent of this node */
          260  +  u16 nFree;           /* Number of free bytes on the page */
          261  +  u16 nCell;           /* Number of cells on this page, local and ovfl */
          262  +  struct _OvflCell {   /* Cells that will not fit on aData[] */
          263  +    u8 *pCell;           /* Pointers to the body of the overflow cell */
          264  +    u16 idx;             /* Insert this cell before idx-th non-overflow cell */
          265  +  } aOvfl[3];
          266  +  struct Btree *pBt;   /* Pointer back to BTree structure */
          267  +  u8 *aData;           /* Pointer back to the start of the page */
          268  +  Pgno pgno;           /* Page number for this page */
          269  +  MemPage *pParent;    /* The parent of this page.  NULL for root */
   256    270   };
   257    271   
   258    272   /*
   259    273   ** The in-memory image of a disk page has the auxiliary information appended
   260    274   ** to the end.  EXTRA_SIZE is the number of bytes of space needed to hold
   261    275   ** that extra information.
   262    276   */
................................................................................
   287    301   /*
   288    302   ** An instance of the following structure is used to hold information
   289    303   ** about a cell.  The parseCell() function fills in this structure
   290    304   ** based on information extract from the raw disk page.
   291    305   */
   292    306   typedef struct CellInfo CellInfo;
   293    307   struct CellInfo {
          308  +  u8 *pCell;     /* Pointer to the start of cell content */
   294    309     i64 nKey;      /* The key for INTKEY tables, or number of bytes in key */
   295    310     u32 nData;     /* Number of bytes of data */
   296    311     u16 nHeader;   /* Size of the cell header in bytes */
   297    312     u16 nLocal;    /* Amount of payload held locally */
   298    313     u16 iOverflow; /* Offset to overflow page number.  Zero if no overflow */
   299         -  u16 nSize;     /* Total size of the cell (on the main b-tree page) */
          314  +  u16 nSize;     /* Total size of the cell content (on the main b-tree page) */
   300    315   };
   301    316   
   302    317   /*
   303    318   ** A cursor is a pointer to a particular entry in the BTree.
   304    319   ** The entry is identified by its MemPage and the index in
   305    320   ** MemPage.aCell[] of the entry.
   306    321   */
................................................................................
   346    361   ** file.
   347    362   */
   348    363   #define getVarint    sqlite3GetVarint
   349    364   #define getVarint32  sqlite3GetVarint32
   350    365   #define putVarint    sqlite3PutVarint
   351    366   
   352    367   /*
   353         -** Parse a cell header and fill in the CellInfo structure.
          368  +** Return a pointer to the start of cell content for the given
          369  +** cell of a page.  This routine works only for pages that
          370  +** do not contain overflow cells.
   354    371   */
   355         -static void parseCell(
          372  +static u8 *findCell(MemPage *pPage, int iCell){
          373  +  u8 *data = pPage->aData;
          374  +  assert( iCell>=0 );
          375  +  assert( iCell<get2byte(&data[pPage->hdrOffset+3]) );
          376  +  return data + get2byte(&data[pPage->cellOffset+2*iCell]);
          377  +}
          378  +
          379  +/*
          380  +** This a more complex version of findCell() that works for
          381  +** pages that do contain overflow cells.  See insert
          382  +*/
          383  +static u8 *findOverflowCell(MemPage *pPage, int iCell){
          384  +  int i;
          385  +  for(i=pPage->nOverflow-1; i>=0; i--){
          386  +    if( pPage->aOvfl[i].idx<=iCell ){
          387  +      if( pPage->aOvfl[i].idx==iCell ){
          388  +        return pPage->aOvfl[i].pCell;
          389  +      }
          390  +      iCell--;
          391  +    }
          392  +  }
          393  +  return findCell(pPage, iCell);
          394  +}
          395  +
          396  +/*
          397  +** Parse a cell content block and fill in the CellInfo structure.  There
          398  +** are two versions of this function.  parseCell() takes a cell index
          399  +** as the second argument and parseCellPtr() takes a pointer to the
          400  +** body of the cell as its second argument.
          401  +*/
          402  +static void parseCellPtr(
   356    403     MemPage *pPage,         /* Page containing the cell */
   357         -  unsigned char *pCell,   /* Pointer to the first byte of the cell */
          404  +  u8 *pCell,              /* Pointer to the cell text. */
   358    405     CellInfo *pInfo         /* Fill in this structure */
   359    406   ){
   360    407     int n;
   361    408     int nPayload;
   362    409     Btree *pBt;
   363    410     int minLocal, maxLocal;
          411  +
          412  +  pInfo->pCell = pCell;
   364    413     assert( pPage->leaf==0 || pPage->leaf==1 );
   365         -  n = 6 - 4*pPage->leaf;
          414  +  n = 4 - 4*pPage->leaf;
   366    415     if( pPage->hasData ){
   367    416       n += getVarint32(&pCell[n], &pInfo->nData);
   368    417     }else{
   369    418       pInfo->nData = 0;
   370    419     }
   371    420     n += getVarint(&pCell[n], &pInfo->nKey);
   372    421     pInfo->nHeader = n;
................................................................................
   382    431       minLocal = pBt->minLocal;
   383    432       maxLocal = pBt->maxLocal;
   384    433     }
   385    434     if( nPayload<=maxLocal ){
   386    435       pInfo->nLocal = nPayload;
   387    436       pInfo->iOverflow = 0;
   388    437       pInfo->nSize = nPayload + n;
          438  +    if( pInfo->nSize<4 ){
          439  +      pInfo->nSize = 4;  /* Minimum cell size is 4 */
          440  +    }
   389    441     }else{
   390    442       int surplus = minLocal + (nPayload - minLocal)%(pBt->usableSize - 4);
   391    443       if( surplus <= maxLocal ){
   392    444         pInfo->nLocal = surplus;
   393    445       }else{
   394    446         pInfo->nLocal = minLocal;
   395    447       }
   396    448       pInfo->iOverflow = pInfo->nLocal + n;
   397    449       pInfo->nSize = pInfo->iOverflow + 4;
   398    450     }
   399    451   }
          452  +static void parseCell(
          453  +  MemPage *pPage,         /* Page containing the cell */
          454  +  int iCell,              /* The cell index.  First cell is 0 */
          455  +  CellInfo *pInfo         /* Fill in this structure */
          456  +){
          457  +  parseCellPtr(pPage, findCell(pPage, iCell), pInfo);
          458  +}
   400    459   
   401    460   /*
   402         -** Compute the total number of bytes that a Cell needs on the main
   403         -** database page.  The number returned includes the Cell header,
   404         -** local payload storage, and the pointer to overflow pages (if
   405         -** applicable).  Additional space allocated on overflow pages
   406         -** is NOT included in the value returned from this routine.
          461  +** Compute the total number of bytes that a Cell needs in the cell
          462  +** data area of the btree-page.  The return number includes the cell
          463  +** data header and the local payload, but not any overflow page or
          464  +** the space used by the cell pointer.
   407    465   */
   408         -static int cellSize(MemPage *pPage, unsigned char *pCell){
          466  +static int cellSize(MemPage *pPage, int iCell){
          467  +  CellInfo info;
          468  +  parseCell(pPage, iCell, &info);
          469  +  return info.nSize;
          470  +}
          471  +static int cellSizePtr(MemPage *pPage, u8 *pCell){
   409    472     CellInfo info;
   410         -
   411         -  parseCell(pPage, pCell, &info);
          473  +  parseCellPtr(pPage, pCell, &info);
   412    474     return info.nSize;
   413    475   }
   414    476   
   415    477   /*
   416    478   ** Do sanity checking on a page.  Throw an exception if anything is
   417    479   ** not right.
   418    480   **
................................................................................
   419    481   ** This routine is used for internal error checking only.  It is omitted
   420    482   ** from most builds.
   421    483   */
   422    484   #if defined(BTREE_DEBUG) && !defined(NDEBUG) && 0
   423    485   static void _pageIntegrity(MemPage *pPage){
   424    486     int usableSize;
   425    487     u8 *data;
   426         -  int i, idx, c, pc, hdr, nFree;
          488  +  int i, j, idx, c, pc, hdr, nFree;
          489  +  int cellOffset;
          490  +  int nCell, cellLimit;
   427    491     u8 used[MX_PAGE_SIZE];
   428    492   
   429    493     usableSize = pPage->pBt->usableSize;
   430    494     assert( pPage->aData==&((unsigned char*)pPage)[-pPage->pBt->pageSize] );
   431    495     hdr = pPage->hdrOffset;
   432    496     assert( hdr==(pPage->pgno==1 ? 100 : 0) );
   433    497     assert( pPage->pgno==sqlite3pager_pagenumber(pPage->aData) );
................................................................................
   435    499     if( pPage->isInit ){
   436    500       assert( pPage->leaf == ((c & PTF_LEAF)!=0) );
   437    501       assert( pPage->zeroData == ((c & PTF_ZERODATA)!=0) );
   438    502       assert( pPage->leafData == ((c & PTF_LEAFDATA)!=0) );
   439    503       assert( pPage->intKey == ((c & (PTF_INTKEY|PTF_LEAFDATA))!=0) );
   440    504       assert( pPage->hasData ==
   441    505                !(pPage->zeroData || (!pPage->leaf && pPage->leafData)) );
          506  +    assert( pPage->cellOffset==pPage->hdrOffset+12-4*pPage->leaf );
          507  +    assert( pPage->nCell = get2byte(&pPage->aData[hdr+3]) );
   442    508     }
   443    509     data = pPage->aData;
   444    510     memset(used, 0, usableSize);
   445    511     for(i=0; i<hdr+10-pPage->leaf*4; i++) used[i] = 1;
   446    512     nFree = 0;
   447    513     pc = get2byte(&data[hdr+1]);
   448    514     while( pc ){
................................................................................
   453    519       nFree += size;
   454    520       for(i=pc; i<pc+size; i++){
   455    521         assert( used[i]==0 );
   456    522         used[i] = 1;
   457    523       }
   458    524       pc = get2byte(&data[pc]);
   459    525     }
   460         -  assert( pPage->isInit==0 || pPage->nFree==nFree+data[hdr+5] );
   461    526     idx = 0;
   462         -  pc = get2byte(&data[hdr+3]);
   463         -  while( pc ){
          527  +  nCell = get2byte(&data[hdr+3]);
          528  +  cellLimit = get2byte(&data[hdr+5]);
          529  +  assert( pPage->isInit==0 
          530  +         || pPage->nFree==nFree+data[hdr+7]+cellLimit-(cellOffset+2*nCell) );
          531  +  cellOffset = pPage->cellOffset;
          532  +  for(i=0; i<nCell; i++){
   464    533       int size;
   465         -    assert( pPage->isInit==0 || idx<pPage->nCell );
          534  +    pc = get2byte(&data[cellOffset+2*i]);
   466    535       assert( pc>0 && pc<usableSize-4 );
   467         -    assert( pPage->isInit==0 || pPage->aCell[idx]==&data[pc] );
   468    536       size = cellSize(pPage, &data[pc]);
   469    537       assert( pc+size<=usableSize );
   470         -    for(i=pc; i<pc+size; i++){
   471         -      assert( used[i]==0 );
   472         -      used[i] = 1;
          538  +    for(j=pc; j<pc+size; j++){
          539  +      assert( used[j]==0 );
          540  +      used[j] = 1;
   473    541       }
   474         -    pc = get2byte(&data[pc]);
   475         -    idx++;
   476    542     }
   477         -  assert( idx==pPage->nCell );
          543  +  for(i=cellOffset+2*nCell; i<cellimit; i++){
          544  +    assert( used[i]==0 );
          545  +    used[i] = 1;
          546  +  }
   478    547     nFree = 0;
   479    548     for(i=0; i<usableSize; i++){
   480    549       assert( used[i]<=1 );
   481    550       if( used[i]==0 ) nFree++;
   482    551     }
   483         -  assert( nFree==data[hdr+5] );
          552  +  assert( nFree==data[hdr+7] );
   484    553   }
   485    554   #define pageIntegrity(X) _pageIntegrity(X)
   486    555   #else
   487    556   # define pageIntegrity(X)
   488    557   #endif
   489    558   
   490    559   /*
   491    560   ** Defragment the page given.  All Cells are moved to the
   492    561   ** beginning of the page and all free space is collected 
   493    562   ** into one big FreeBlk at the end of the page.
   494    563   */
   495    564   static void defragmentPage(MemPage *pPage){
   496         -  int pc, i, n, addr;
   497         -  int start, hdr, size;
   498         -  int leftover;
   499         -  unsigned char *oldPage;
   500         -  unsigned char newPage[MX_PAGE_SIZE];
          565  +  int i;                     /* Loop counter */
          566  +  int pc;                    /* Address of a i-th cell */
          567  +  int addr;                  /* Offset of first byte after cell pointer array */
          568  +  int hdr;                   /* Offset to the page header */
          569  +  int size;                  /* Size of a cell */
          570  +  int usableSize;            /* Number of usable bytes on a page */
          571  +  int cellOffset;            /* Offset to the cell pointer array */
          572  +  int brk;                   /* Offset to the cell content area */
          573  +  int nCell;                 /* Number of cells on the page */
          574  +  unsigned char *data;               /* The page data */
          575  +  unsigned char temp[MX_PAGE_SIZE];  /* Temp holding area for cell content */
   501    576   
   502    577     assert( sqlite3pager_iswriteable(pPage->aData) );
   503    578     assert( pPage->pBt!=0 );
   504    579     assert( pPage->pBt->usableSize <= MX_PAGE_SIZE );
   505         -  assert( !pPage->needRelink );
   506         -  assert( !pPage->isOverfull );
   507         -  oldPage = pPage->aData;
          580  +  assert( pPage->nOverflow==0 );
          581  +  data = pPage->aData;
   508    582     hdr = pPage->hdrOffset;
   509         -  addr = 3+hdr;
   510         -  n = 6+hdr;
   511         -  if( !pPage->leaf ){
   512         -    n += 4;
   513         -  }
   514         -  memcpy(&newPage[hdr], &oldPage[hdr], n-hdr);
   515         -  start = n;
   516         -  pc = get2byte(&oldPage[addr]);
   517         -  i = 0;
   518         -  while( pc>0 ){
   519         -    assert( n<pPage->pBt->usableSize );
   520         -    size = cellSize(pPage, &oldPage[pc]);
   521         -    memcpy(&newPage[n], &oldPage[pc], size);
   522         -    put2byte(&newPage[addr],n);
   523         -    assert( pPage->aCell[i]==&oldPage[pc] );
   524         -    pPage->aCell[i++] = &oldPage[n];
   525         -    addr = n;
   526         -    n += size;
   527         -    pc = get2byte(&oldPage[pc]);
   528         -  }
   529         -  assert( i==pPage->nCell );
   530         -  leftover = pPage->pBt->usableSize - n;
   531         -  assert( leftover>=0 );
   532         -  assert( pPage->nFree==leftover );
   533         -  if( leftover<4 ){
   534         -    oldPage[hdr+5] = leftover;
   535         -    leftover = 0;
   536         -    n = pPage->pBt->usableSize;
   537         -  }
   538         -  memcpy(&oldPage[hdr], &newPage[hdr], n-hdr);
   539         -  if( leftover==0 ){
   540         -    put2byte(&oldPage[hdr+1], 0);
   541         -  }else if( leftover>=4 ){
   542         -    put2byte(&oldPage[hdr+1], n);
   543         -    put2byte(&oldPage[n], 0);
   544         -    put2byte(&oldPage[n+2], leftover);
   545         -    memset(&oldPage[n+4], 0, leftover-4);
   546         -  }
   547         -  oldPage[hdr+5] = 0;
   548         -}
   549         -
   550         -/*
   551         -** Allocate nByte bytes of space on a page.  If nByte is less than
   552         -** 4 it is rounded up to 4.
          583  +  cellOffset = pPage->cellOffset;
          584  +  nCell = pPage->nCell;
          585  +  assert( nCell==get2byte(&data[hdr+3]) );
          586  +  usableSize = pPage->pBt->usableSize;
          587  +  brk = get2byte(&data[hdr+5]);
          588  +  memcpy(&temp[brk], &data[brk], usableSize - brk);
          589  +  brk = usableSize;
          590  +  for(i=0; i<nCell; i++){
          591  +    u8 *pAddr;     /* The i-th cell pointer */
          592  +    pAddr = &data[cellOffset + i*2];
          593  +    pc = get2byte(pAddr);
          594  +    assert( pc<pPage->pBt->usableSize );
          595  +    size = cellSizePtr(pPage, &temp[pc]);
          596  +    brk -= size;
          597  +    memcpy(&data[brk], &temp[pc], size);
          598  +    put2byte(pAddr, brk);
          599  +  }
          600  +  assert( brk>=cellOffset+2*nCell );
          601  +  put2byte(&data[hdr+5], brk);
          602  +  data[hdr+1] = 0;
          603  +  data[hdr+2] = 0;
          604  +  data[hdr+7] = 0;
          605  +  addr = cellOffset+2*nCell;
          606  +  memset(&data[addr], 0, brk-addr);
          607  +}
          608  +
          609  +/*
          610  +** Allocate nByte bytes of space on a page.
   553    611   **
   554    612   ** Return the index into pPage->aData[] of the first byte of
   555    613   ** the new allocation. Or return 0 if there is not enough free
   556    614   ** space on the page to satisfy the allocation request.
   557    615   **
   558    616   ** If the page contains nBytes of free space but does not contain
   559    617   ** nBytes of contiguous free space, then this routine automatically
   560    618   ** calls defragementPage() to consolidate all free space before 
   561    619   ** allocating the new chunk.
   562         -**
   563         -** Algorithm:  Carve a piece off of the first freeblock that is
   564         -** nByte in size or larger.
   565    620   */
   566    621   static int allocateSpace(MemPage *pPage, int nByte){
   567    622     int addr, pc, hdr;
   568    623     int size;
   569    624     int nFrag;
          625  +  int top;
          626  +  int nCell;
          627  +  int cellOffset;
   570    628     unsigned char *data;
   571         -#ifndef NDEBUG
   572         -  int cnt = 0;
   573         -#endif
   574         -
          629  +  
   575    630     data = pPage->aData;
   576    631     assert( sqlite3pager_iswriteable(data) );
   577    632     assert( pPage->pBt );
   578    633     if( nByte<4 ) nByte = 4;
   579         -  if( pPage->nFree<nByte || pPage->isOverfull ) return 0;
          634  +  if( pPage->nFree<nByte || pPage->nOverflow>0 ) return 0;
          635  +  pPage->nFree -= nByte;
   580    636     hdr = pPage->hdrOffset;
   581         -  nFrag = data[hdr+5];
   582         -  if( nFrag>=60 || nFrag>pPage->nFree-nByte ){
          637  +
          638  +  nFrag = data[hdr+7];
          639  +  if( nFrag<60 ){
          640  +    /* Search the freelist looking for a slot big enough to satisfy the
          641  +    ** space request. */
          642  +    addr = hdr+1;
          643  +    while( (pc = get2byte(&data[addr]))>0 ){
          644  +      size = get2byte(&data[pc+2]);
          645  +      if( size>=nByte ){
          646  +        if( size<nByte+4 ){
          647  +          memcpy(&data[addr], &data[pc], 2);
          648  +          data[hdr+7] = nFrag + size - nByte;
          649  +          return pc;
          650  +        }else{
          651  +          put2byte(&data[pc+2], size-nByte);
          652  +          return pc + size - nByte;
          653  +        }
          654  +      }
          655  +      addr = pc;
          656  +    }
          657  +  }
          658  +
          659  +  /* Allocate memory from the gap in between the cell pointer array
          660  +  ** and the cell content area.
          661  +  */
          662  +  top = get2byte(&data[hdr+5]);
          663  +  nCell = get2byte(&data[hdr+3]);
          664  +  cellOffset = pPage->cellOffset;
          665  +  if( nFrag>=60 || cellOffset + 2*nCell > top - nByte ){
   583    666       defragmentPage(pPage);
          667  +    top = get2byte(&data[hdr+5]);
   584    668     }
   585         -  addr = hdr+1;
   586         -  pc = get2byte(&data[addr]);
   587         -  assert( addr<pc );
   588         -  assert( pc<=pPage->pBt->usableSize-4 );
   589         -  while( (size = get2byte(&data[pc+2]))<nByte ){
   590         -    addr = pc;
   591         -    pc = get2byte(&data[addr]);
   592         -    assert( pc<=pPage->pBt->usableSize-4 );
   593         -    assert( pc>=addr+size+4 || pc==0 );
   594         -    if( pc==0 ){
   595         -      assert( (cnt++)==0 );
   596         -      defragmentPage(pPage);
   597         -      assert( data[hdr+5]==0 );
   598         -      addr = pPage->hdrOffset+1;
   599         -      pc = get2byte(&data[addr]);
   600         -    }
   601         -  }
   602         -  assert( pc>0 && size>=nByte );
   603         -  assert( pc+size<=pPage->pBt->usableSize );
   604         -  if( size>nByte+4 ){
   605         -    int newStart = pc+nByte;
   606         -    put2byte(&data[addr], newStart);
   607         -    put2byte(&data[newStart], get2byte(&data[pc]));
   608         -    put2byte(&data[newStart+2], size-nByte);
   609         -  }else{
   610         -    put2byte(&data[addr], get2byte(&data[pc]));
   611         -    data[hdr+5] += size-nByte;
   612         -  }
   613         -  pPage->nFree -= nByte;
   614         -  assert( pPage->nFree>=0 );
   615         -  return pc;
          669  +  top -= nByte;
          670  +  assert( cellOffset + 2*nCell <= top );
          671  +  put2byte(&data[hdr+5], top);
          672  +  return top;
   616    673   }
   617    674   
   618    675   /*
   619    676   ** Return a section of the pPage->aData to the freelist.
   620    677   ** The first byte of the new free block is pPage->aDisk[start]
   621    678   ** and the size of the block is "size" bytes.
   622    679   **
   623    680   ** Most of the effort here is involved in coalesing adjacent
   624    681   ** free blocks into a single big free block.
   625    682   */
   626    683   static void freeSpace(MemPage *pPage, int start, int size){
   627    684     int end = start + size;  /* End of the segment being freed */
   628         -  int addr, pbegin;
   629         -#ifndef NDEBUG
   630         -  int tsize = 0;          /* Total size of all freeblocks */
   631         -#endif
          685  +  int addr, pbegin, hdr;
   632    686     unsigned char *data = pPage->aData;
   633    687   
   634    688     assert( pPage->pBt!=0 );
   635    689     assert( sqlite3pager_iswriteable(data) );
   636    690     assert( start>=pPage->hdrOffset+6+(pPage->leaf?0:4) );
   637    691     assert( end<=pPage->pBt->usableSize );
   638    692     if( size<4 ) size = 4;
   639    693   
   640    694     /* Add the space back into the linked list of freeblocks */
   641         -  addr = pPage->hdrOffset + 1;
          695  +  hdr = pPage->hdrOffset;
          696  +  addr = hdr + 1;
   642    697     while( (pbegin = get2byte(&data[addr]))<start && pbegin>0 ){
   643    698       assert( pbegin<=pPage->pBt->usableSize-4 );
   644    699       assert( pbegin>addr );
   645    700       addr = pbegin;
   646    701     }
   647    702     assert( pbegin<=pPage->pBt->usableSize-4 );
   648    703     assert( pbegin>addr || pbegin==0 );
................................................................................
   652    707     pPage->nFree += size;
   653    708   
   654    709     /* Coalesce adjacent free blocks */
   655    710     addr = pPage->hdrOffset + 1;
   656    711     while( (pbegin = get2byte(&data[addr]))>0 ){
   657    712       int pnext, psize;
   658    713       assert( pbegin>addr );
   659         -    assert( pbegin<pPage->pBt->usableSize-4 );
          714  +    assert( pbegin<=pPage->pBt->usableSize-4 );
   660    715       pnext = get2byte(&data[pbegin]);
   661    716       psize = get2byte(&data[pbegin+2]);
   662    717       if( pbegin + psize + 3 >= pnext && pnext>0 ){
   663    718         int frag = pnext - (pbegin+psize);
   664         -      assert( frag<=data[pPage->hdrOffset+5] );
   665         -      data[pPage->hdrOffset+5] -= frag;
          719  +      assert( frag<=data[pPage->hdrOffset+7] );
          720  +      data[pPage->hdrOffset+7] -= frag;
   666    721         put2byte(&data[pbegin], get2byte(&data[pnext]));
   667    722         put2byte(&data[pbegin+2], pnext+get2byte(&data[pnext+2])-pbegin);
   668    723       }else{
   669         -      assert( (tsize += psize)>0 );
   670    724         addr = pbegin;
   671    725       }
   672    726     }
   673         -  assert( tsize+data[pPage->hdrOffset+5]==pPage->nFree );
   674         -}
   675    727   
   676         -/*
   677         -** Resize the aCell[] array of the given page so that it is able to
   678         -** hold at least nNewSz entries.
   679         -**
   680         -** Return SQLITE_OK or SQLITE_NOMEM.
   681         -*/
   682         -static int resizeCellArray(MemPage *pPage, int nNewSz){
   683         -  if( pPage->nCellAlloc<nNewSz ){
   684         -    int n = nNewSz*sizeof(pPage->aCell[0]);
   685         -    if( pPage->aCell==0 ){
   686         -      pPage->aCell = sqliteMallocRaw( n );
   687         -    }else{
   688         -      pPage->aCell = sqliteRealloc(pPage->aCell, n);
   689         -    }
   690         -    if( sqlite3_malloc_failed ) return SQLITE_NOMEM;
   691         -    pPage->nCellAlloc = nNewSz;
          728  +  /* If the cell content area begins with a freeblock, remove it. */
          729  +  if( data[hdr+1]==data[hdr+5] && data[hdr+2]==data[hdr+6] ){
          730  +    int top;
          731  +    pbegin = get2byte(&data[hdr+1]);
          732  +    memcpy(&data[hdr+1], &data[pbegin], 2);
          733  +    top = get2byte(&data[hdr+5]);
          734  +    put2byte(&data[hdr+5], top + get2byte(&data[pbegin+2]));
   692    735     }
   693         -  return SQLITE_OK;
   694    736   }
   695    737   
   696    738   /*
   697    739   ** Initialize the auxiliary information for a disk block.
   698    740   **
   699    741   ** The pParent parameter must be a pointer to the MemPage which
   700    742   ** is the parent of the page being initialized.  The root of a
................................................................................
   708    750   */
   709    751   static int initPage(
   710    752     MemPage *pPage,        /* The page to be initialized */
   711    753     MemPage *pParent       /* The parent.  Might be NULL */
   712    754   ){
   713    755     int c, pc, i, hdr;
   714    756     unsigned char *data;
   715         -  int usableSize;
   716         -  int nCell, nFree;
   717         -  u8 *aCell[MX_PAGE_SIZE/2];
   718         -
          757  +  int usableSize, cellOffset;
          758  +  int nFree;
          759  +  int top;
   719    760   
   720    761     assert( pPage->pBt!=0 );
   721    762     assert( pParent==0 || pParent->pBt==pPage->pBt );
   722    763     assert( pPage->pgno==sqlite3pager_pagenumber(pPage->aData) );
   723    764     assert( pPage->aData == &((unsigned char*)pPage)[-pPage->pBt->pageSize] );
   724    765     assert( pPage->pParent==0 || pPage->pParent==pParent );
   725    766     assert( pPage->pParent==pParent || !pPage->isInit );
   726    767     if( pPage->isInit ) return SQLITE_OK;
   727    768     if( pPage->pParent==0 && pParent!=0 ){
   728    769       pPage->pParent = pParent;
   729    770       sqlite3pager_ref(pParent->aData);
   730    771     }
   731         -  pPage->nCell = pPage->nCellAlloc = 0;
   732         -  assert( pPage->hdrOffset==(pPage->pgno==1 ? 100 : 0) );
   733    772     hdr = pPage->hdrOffset;
   734    773     data = pPage->aData;
   735    774     c = data[hdr];
          775  +  assert( pPage->hdrOffset==(pPage->pgno==1 ? 100 : 0) );
   736    776     pPage->intKey = (c & (PTF_INTKEY|PTF_LEAFDATA))!=0;
   737    777     pPage->zeroData = (c & PTF_ZERODATA)!=0;
   738    778     pPage->leafData = (c & PTF_LEAFDATA)!=0;
   739    779     pPage->leaf = (c & PTF_LEAF)!=0;
   740    780     pPage->hasData = !(pPage->zeroData || (!pPage->leaf && pPage->leafData));
   741         -  pPage->isOverfull = 0;
   742         -  pPage->needRelink = 0;
          781  +  pPage->nOverflow = 0;
   743    782     pPage->idxShift = 0;
   744    783     usableSize = pPage->pBt->usableSize;
   745         -
   746         -  /* Initialize the cell count and cell pointers */
   747         -  i = 0;
   748         -  pc = get2byte(&data[hdr+3]);
   749         -  nCell = 0;
   750         -  while( pc>0 ){
   751         -    if( pc>=usableSize ) return SQLITE_CORRUPT;
   752         -    if( nCell>sizeof(aCell)/sizeof(aCell[0]) ) return SQLITE_CORRUPT;
   753         -    aCell[nCell++] = &data[pc];
   754         -    pc = get2byte(&data[pc]);
   755         -  }
   756         -  if( resizeCellArray(pPage, nCell) ){
   757         -    return SQLITE_NOMEM;
   758         -  }
   759         -  pPage->nCell = nCell;
   760         -  memcpy(pPage->aCell, aCell, nCell*sizeof(aCell[0]));
          784  +  pPage->cellOffset = cellOffset = hdr + 12 - 4*pPage->leaf;
          785  +  top = get2byte(&data[hdr+5]);
          786  +  pPage->nCell = get2byte(&data[hdr+3]);
   761    787   
   762    788     /* Compute the total free space on the page */
   763    789     pc = get2byte(&data[hdr+1]);
   764         -  nFree = data[hdr+5];
          790  +  nFree = data[hdr+7] + top - (cellOffset + 2*pPage->nCell);
   765    791     i = 0;
   766    792     while( pc>0 ){
   767    793       int next, size;
   768    794       if( pc>=usableSize ) return SQLITE_CORRUPT;
   769    795       if( i++>MX_PAGE_SIZE ) return SQLITE_CORRUPT;
   770    796       next = get2byte(&data[pc]);
   771    797       size = get2byte(&data[pc+2]);
................................................................................
   792    818     int first;
   793    819   
   794    820     assert( sqlite3pager_pagenumber(data)==pPage->pgno );
   795    821     assert( &data[pBt->pageSize] == (unsigned char*)pPage );
   796    822     assert( sqlite3pager_iswriteable(data) );
   797    823     memset(&data[hdr], 0, pBt->usableSize - hdr);
   798    824     data[hdr] = flags;
   799         -  first = hdr + 6 + 4*((flags&PTF_LEAF)==0);
   800         -  put2byte(&data[hdr+1], first);
   801         -  put2byte(&data[first+2], pBt->usableSize - first);
   802         -  sqliteFree(pPage->aCell);
   803         -  pPage->aCell = 0;
   804         -  pPage->nCell = 0;
   805         -  pPage->nCellAlloc = 0;
          825  +  first = hdr + 8 + 4*((flags&PTF_LEAF)==0);
          826  +  memset(&data[hdr+1], 0, 4);
          827  +  data[hdr+7] = 0;
          828  +  put2byte(&data[hdr+5], pBt->usableSize);
   806    829     pPage->nFree = pBt->usableSize - first;
   807    830     pPage->intKey = (flags & (PTF_INTKEY|PTF_LEAFDATA))!=0;
   808    831     pPage->zeroData = (flags & PTF_ZERODATA)!=0;
   809    832     pPage->leafData = (flags & PTF_LEAFDATA)!=0;
   810    833     pPage->leaf = (flags & PTF_LEAF)!=0;
   811    834     pPage->hasData = !(pPage->zeroData || (!pPage->leaf && pPage->leafData));
   812    835     pPage->hdrOffset = hdr;
   813         -  pPage->isOverfull = 0;
   814         -  pPage->needRelink = 0;
          836  +  pPage->cellOffset = first;
          837  +  pPage->nOverflow = 0;
   815    838     pPage->idxShift = 0;
          839  +  pPage->nCell = 0;
   816    840     pPage->isInit = 1;
   817    841     pageIntegrity(pPage);
   818    842   }
   819    843   
   820    844   /*
   821    845   ** Get a page from the pager.  Initialize the MemPage.pBt and
   822    846   ** MemPage.aData elements if needed.
................................................................................
   871    895   /*
   872    896   ** This routine is called when the reference count for a page
   873    897   ** reaches zero.  We need to unref the pParent pointer when that
   874    898   ** happens.
   875    899   */
   876    900   static void pageDestructor(void *pData, int pageSize){
   877    901     MemPage *pPage = (MemPage*)&((char*)pData)[pageSize];
   878         -  assert( pPage->isInit==0 || pPage->needRelink==0 );
   879    902     if( pPage->pParent ){
   880    903       MemPage *pParent = pPage->pParent;
   881    904       pPage->pParent = 0;
   882    905       releasePage(pParent);
   883    906     }
   884         -  sqliteFree(pPage->aCell);
   885         -  pPage->aCell = 0;
   886    907     pPage->isInit = 0;
   887    908   }
   888    909   
   889    910   /*
   890    911   ** Open a new database.
   891    912   **
   892    913   ** Actually, this routine just sets up the internal data structures
................................................................................
  1032   1053       pBt->minLeafFrac = page1[23];
  1033   1054     }
  1034   1055   
  1035   1056     /* maxLocal is the maximum amount of payload to store locally for
  1036   1057     ** a cell.  Make sure it is small enough so that at least minFanout
  1037   1058     ** cells can will fit on one page.  We assume a 10-byte page header.
  1038   1059     ** Besides the payload, the cell must store:
  1039         -  **     2-byte pointer to next cell
         1060  +  **     2-byte pointer to the cell
  1040   1061     **     4-byte child pointer
  1041   1062     **     9-byte nKey value
  1042   1063     **     4-byte nData value
  1043   1064     **     4-byte overflow page pointer
  1044         -  ** So a cell consists of a header which is as much as 19 bytes long,
  1045         -  ** 0 to N bytes of payload, and an optional 4 byte overflow page pointer.
         1065  +  ** So a cell consists of a 2-byte poiner, a header which is as much as
         1066  +  ** 17 bytes long, 0 to N bytes of payload, and an optional 4 byte overflow
         1067  +  ** page pointer.
  1046   1068     */
  1047         -  pBt->maxLocal = (pBt->usableSize-10)*pBt->maxEmbedFrac/255 - 23;
  1048         -  pBt->minLocal = (pBt->usableSize-10)*pBt->minEmbedFrac/255 - 23;
  1049         -  pBt->maxLeaf = pBt->usableSize - 33;
  1050         -  pBt->minLeaf = (pBt->usableSize-10)*pBt->minLeafFrac/255 - 23;
         1069  +  pBt->maxLocal = (pBt->usableSize-12)*pBt->maxEmbedFrac/255 - 23;
         1070  +  pBt->minLocal = (pBt->usableSize-12)*pBt->minEmbedFrac/255 - 23;
         1071  +  pBt->maxLeaf = pBt->usableSize - 35;
         1072  +  pBt->minLeaf = (pBt->usableSize-12)*pBt->minLeafFrac/255 - 23;
  1051   1073     if( pBt->minLocal>pBt->maxLocal || pBt->maxLocal<0 ){
  1052   1074       goto page1_init_failed;
  1053   1075     }
  1054   1076     assert( pBt->maxLeaf + 23 <= MX_CELL_SIZE );
  1055   1077     pBt->pPage1 = pPage1;
  1056   1078     return SQLITE_OK;
  1057   1079   
................................................................................
  1481   1503   **
  1482   1504   ** BtCursor.info is a cache of the information in the current cell.
  1483   1505   ** Using this cache reduces the number of calls to parseCell().
  1484   1506   */
  1485   1507   static void getCellInfo(BtCursor *pCur){
  1486   1508     MemPage *pPage = pCur->pPage;
  1487   1509     if( !pCur->infoValid ){
  1488         -    parseCell(pPage, pPage->aCell[pCur->idx], &pCur->info);
         1510  +    parseCell(pPage, pCur->idx, &pCur->info);
  1489   1511       pCur->infoValid = 1;
  1490   1512     }else{
  1491   1513   #ifndef NDEBUG
  1492   1514       CellInfo info;
  1493         -    parseCell(pPage, pPage->aCell[pCur->idx], &info);
         1515  +    parseCell(pPage, pCur->idx, &info);
  1494   1516       assert( memcmp(&info, &pCur->info, sizeof(info))==0 );
  1495   1517   #endif
  1496   1518     }
  1497   1519   }
  1498   1520   
  1499   1521   /*
  1500   1522   ** Set *pSize to the size of the buffer needed to hold the value of
................................................................................
  1558   1580   
  1559   1581     assert( pCur!=0 && pCur->pPage!=0 );
  1560   1582     assert( pCur->isValid );
  1561   1583     pBt = pCur->pBt;
  1562   1584     pPage = pCur->pPage;
  1563   1585     pageIntegrity(pPage);
  1564   1586     assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  1565         -  aPayload = pPage->aCell[pCur->idx];
  1566   1587     getCellInfo(pCur);
         1588  +  aPayload = pCur->info.pCell;
  1567   1589     aPayload += pCur->info.nHeader;
  1568   1590     if( pPage->intKey ){
  1569   1591       nKey = 0;
  1570   1592     }else{
  1571   1593       nKey = pCur->info.nKey;
  1572   1594     }
  1573   1595     assert( offset>=0 );
................................................................................
  1697   1719   
  1698   1720     assert( pCur!=0 && pCur->pPage!=0 );
  1699   1721     assert( pCur->isValid );
  1700   1722     pBt = pCur->pBt;
  1701   1723     pPage = pCur->pPage;
  1702   1724     pageIntegrity(pPage);
  1703   1725     assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  1704         -  aPayload = pPage->aCell[pCur->idx];
  1705   1726     getCellInfo(pCur);
         1727  +  aPayload = pCur->info.pCell;
  1706   1728     aPayload += pCur->info.nHeader;
  1707   1729     if( pPage->intKey ){
  1708   1730       nKey = 0;
  1709   1731     }else{
  1710   1732       nKey = pCur->info.nKey;
  1711   1733     }
  1712   1734     if( skipKey ){
................................................................................
  1821   1843     idxParent = pPage->idxParent;
  1822   1844     sqlite3pager_ref(pParent->aData);
  1823   1845     oldPgno = pPage->pgno;
  1824   1846     releasePage(pPage);
  1825   1847     pCur->pPage = pParent;
  1826   1848     pCur->infoValid = 0;
  1827   1849     assert( pParent->idxShift==0 );
  1828         -  if( pParent->idxShift==0 ){
  1829         -    pCur->idx = idxParent;
  1830         -#ifndef NDEBUG  
  1831         -    /* Verify that pCur->idx is the correct index to point back to the child
  1832         -    ** page we just came from 
  1833         -    */
  1834         -    if( pCur->idx<pParent->nCell ){
  1835         -      assert( get4byte(&pParent->aCell[idxParent][2])==oldPgno );
  1836         -    }else{
  1837         -      assert( get4byte(&pParent->aData[pParent->hdrOffset+6])==oldPgno );
  1838         -    }
  1839         -#endif
  1840         -  }else{
  1841         -    /* The MemPage.idxShift flag indicates that cell indices might have 
  1842         -    ** changed since idxParent was set and hence idxParent might be out
  1843         -    ** of date.  So recompute the parent cell index by scanning all cells
  1844         -    ** and locating the one that points to the child we just came from.
  1845         -    */
  1846         -    int i;
  1847         -    pCur->idx = pParent->nCell;
  1848         -    for(i=0; i<pParent->nCell; i++){
  1849         -      if( get4byte(&pParent->aCell[i][2])==oldPgno ){
  1850         -        pCur->idx = i;
  1851         -        break;
  1852         -      }
  1853         -    }
  1854         -  }
         1850  +  pCur->idx = idxParent;
  1855   1851   }
  1856   1852   
  1857   1853   /*
  1858   1854   ** Move the cursor to the root page
  1859   1855   */
  1860   1856   static int moveToRoot(BtCursor *pCur){
  1861   1857     MemPage *pRoot;
................................................................................
  1871   1867     pageIntegrity(pRoot);
  1872   1868     pCur->pPage = pRoot;
  1873   1869     pCur->idx = 0;
  1874   1870     pCur->infoValid = 0;
  1875   1871     if( pRoot->nCell==0 && !pRoot->leaf ){
  1876   1872       Pgno subpage;
  1877   1873       assert( pRoot->pgno==1 );
  1878         -    subpage = get4byte(&pRoot->aData[pRoot->hdrOffset+6]);
         1874  +    subpage = get4byte(&pRoot->aData[pRoot->hdrOffset+8]);
  1879   1875       assert( subpage>0 );
  1880   1876       pCur->isValid = 1;
  1881   1877       rc = moveToChild(pCur, subpage);
  1882   1878     }
  1883   1879     pCur->isValid = pCur->pPage->nCell>0;
  1884   1880     return rc;
  1885   1881   }
................................................................................
  1892   1888     Pgno pgno;
  1893   1889     int rc;
  1894   1890     MemPage *pPage;
  1895   1891   
  1896   1892     assert( pCur->isValid );
  1897   1893     while( !(pPage = pCur->pPage)->leaf ){
  1898   1894       assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  1899         -    pgno = get4byte(&pPage->aCell[pCur->idx][2]);
         1895  +    pgno = get4byte(findCell(pPage, pCur->idx));
  1900   1896       rc = moveToChild(pCur, pgno);
  1901   1897       if( rc ) return rc;
  1902   1898     }
  1903   1899     return SQLITE_OK;
  1904   1900   }
  1905   1901   
  1906   1902   /*
................................................................................
  1913   1909   static int moveToRightmost(BtCursor *pCur){
  1914   1910     Pgno pgno;
  1915   1911     int rc;
  1916   1912     MemPage *pPage;
  1917   1913   
  1918   1914     assert( pCur->isValid );
  1919   1915     while( !(pPage = pCur->pPage)->leaf ){
  1920         -    pgno = get4byte(&pPage->aData[pPage->hdrOffset+6]);
         1916  +    pgno = get4byte(&pPage->aData[pPage->hdrOffset+8]);
  1921   1917       pCur->idx = pPage->nCell;
  1922   1918       rc = moveToChild(pCur, pgno);
  1923   1919       if( rc ) return rc;
  1924   1920     }
  1925   1921     pCur->idx = pPage->nCell - 1;
  1926   1922     pCur->infoValid = 0;
  1927   1923     return SQLITE_OK;
................................................................................
  2062   2058         }
  2063   2059       }
  2064   2060       assert( lwr==upr+1 );
  2065   2061       assert( pPage->isInit );
  2066   2062       if( pPage->leaf ){
  2067   2063         chldPg = 0;
  2068   2064       }else if( lwr>=pPage->nCell ){
  2069         -      chldPg = get4byte(&pPage->aData[pPage->hdrOffset+6]);
         2065  +      chldPg = get4byte(&pPage->aData[pPage->hdrOffset+8]);
  2070   2066       }else{
  2071         -      chldPg = get4byte(&pPage->aCell[lwr][2]);
         2067  +      chldPg = get4byte(findCell(pPage, lwr));
  2072   2068       }
  2073   2069       if( chldPg==0 ){
  2074   2070         assert( pCur->idx>=0 && pCur->idx<pCur->pPage->nCell );
  2075   2071         if( pRes ) *pRes = c;
  2076   2072         return SQLITE_OK;
  2077   2073       }
  2078   2074       pCur->idx = lwr;
................................................................................
  2113   2109     }
  2114   2110     assert( pPage->isInit );
  2115   2111     assert( pCur->idx<pPage->nCell );
  2116   2112     pCur->idx++;
  2117   2113     pCur->infoValid = 0;
  2118   2114     if( pCur->idx>=pPage->nCell ){
  2119   2115       if( !pPage->leaf ){
  2120         -      rc = moveToChild(pCur, get4byte(&pPage->aData[pPage->hdrOffset+6]));
         2116  +      rc = moveToChild(pCur, get4byte(&pPage->aData[pPage->hdrOffset+8]));
  2121   2117         if( rc ) return rc;
  2122   2118         rc = moveToLeftmost(pCur);
  2123   2119         *pRes = 0;
  2124   2120         return rc;
  2125   2121       }
  2126   2122       do{
  2127   2123         if( isRootPage(pPage) ){
................................................................................
  2162   2158       *pRes = 1;
  2163   2159       return SQLITE_OK;
  2164   2160     }
  2165   2161     pPage = pCur->pPage;
  2166   2162     assert( pPage->isInit );
  2167   2163     assert( pCur->idx>=0 );
  2168   2164     if( !pPage->leaf ){
  2169         -    pgno = get4byte(&pPage->aCell[pCur->idx][2]);
         2165  +    pgno = get4byte( findCell(pPage, pCur->idx) );
  2170   2166       rc = moveToChild(pCur, pgno);
  2171   2167       if( rc ) return rc;
  2172   2168       rc = moveToRightmost(pCur);
  2173   2169     }else{
  2174   2170       while( pCur->idx==0 ){
  2175   2171         if( isRootPage(pPage) ){
  2176   2172           pCur->isValid = 0;
................................................................................
  2358   2354   */
  2359   2355   static int clearCell(MemPage *pPage, unsigned char *pCell){
  2360   2356     Btree *pBt = pPage->pBt;
  2361   2357     CellInfo info;
  2362   2358     Pgno ovflPgno;
  2363   2359     int rc;
  2364   2360   
  2365         -  parseCell(pPage, pCell, &info);
         2361  +  parseCellPtr(pPage, pCell, &info);
  2366   2362     if( info.iOverflow==0 ){
  2367   2363       return SQLITE_OK;  /* No overflow pages. Return without doing anything */
  2368   2364     }
  2369   2365     ovflPgno = get4byte(&pCell[info.iOverflow]);
  2370   2366     while( ovflPgno!=0 ){
  2371   2367       MemPage *pOvfl;
  2372   2368       rc = getPage(pBt, ovflPgno, &pOvfl);
................................................................................
  2408   2404     unsigned char *pPayload;
  2409   2405     Btree *pBt = pPage->pBt;
  2410   2406     Pgno pgnoOvfl = 0;
  2411   2407     int nHeader;
  2412   2408     CellInfo info;
  2413   2409   
  2414   2410     /* Fill in the header. */
  2415         -  nHeader = 2;
         2411  +  nHeader = 0;
  2416   2412     if( !pPage->leaf ){
  2417   2413       nHeader += 4;
  2418   2414     }
  2419   2415     if( pPage->hasData ){
  2420   2416       nHeader += putVarint(&pCell[nHeader], nData);
  2421   2417     }else{
  2422   2418       nData = 0;
  2423   2419     }
  2424   2420     nHeader += putVarint(&pCell[nHeader], *(u64*)&nKey);
  2425         -  parseCell(pPage, pCell, &info);
         2421  +  parseCellPtr(pPage, pCell, &info);
  2426   2422     assert( info.nHeader==nHeader );
  2427   2423     assert( info.nKey==nKey );
  2428   2424     assert( info.nData==nData );
  2429   2425     
  2430   2426     /* Fill in the payload */
  2431   2427     nPayload = nData;
  2432   2428     if( pPage->intKey ){
................................................................................
  2516   2512   static void reparentChildPages(MemPage *pPage){
  2517   2513     int i;
  2518   2514     Btree *pBt;
  2519   2515   
  2520   2516     if( pPage->leaf ) return;
  2521   2517     pBt = pPage->pBt;
  2522   2518     for(i=0; i<pPage->nCell; i++){
  2523         -    reparentPage(pBt, get4byte(&pPage->aCell[i][2]), pPage, i);
         2519  +    reparentPage(pBt, get4byte(findCell(pPage,i)), pPage, i);
  2524   2520     }
  2525         -  reparentPage(pBt, get4byte(&pPage->aData[pPage->hdrOffset+6]), pPage, i);
         2521  +  reparentPage(pBt, get4byte(&pPage->aData[pPage->hdrOffset+8]), pPage, i);
  2526   2522     pPage->idxShift = 0;
  2527   2523   }
  2528   2524   
  2529   2525   /*
  2530   2526   ** Remove the i-th cell from pPage.  This routine effects pPage only.
  2531   2527   ** The cell content is not freed or deallocated.  It is assumed that
  2532   2528   ** the cell content has been copied someplace else.  This routine just
  2533   2529   ** removes the reference to the cell from pPage.
  2534   2530   **
  2535   2531   ** "sz" must be the number of bytes in the cell.
  2536         -**
  2537         -** Try to maintain the integrity of the linked list of cells.  But if
  2538         -** the cell being inserted does not fit on the page, this will not be
  2539         -** possible.  If the linked list is not maintained, then just update
  2540         -** pPage->aCell[] and set the pPage->needRelink flag so that we will
  2541         -** know to rebuild the linked list later.
  2542   2532   */
  2543   2533   static void dropCell(MemPage *pPage, int idx, int sz){
  2544         -  int j, pc;
  2545         -  u8 *data;
         2534  +  int i;          /* Loop counter */
         2535  +  int pc;         /* Offset to cell content of cell being deleted */
         2536  +  u8 *data;       /* pPage->aData */
         2537  +  u8 *ptr;        /* Used to move bytes around within data[] */
         2538  +
  2546   2539     assert( idx>=0 && idx<pPage->nCell );
  2547         -  assert( sz==cellSize(pPage, pPage->aCell[idx]) );
         2540  +  assert( sz==cellSize(pPage, idx) );
  2548   2541     assert( sqlite3pager_iswriteable(pPage->aData) );
  2549         -  assert( pPage->aCell[idx]>=pPage->aData );
  2550         -  assert( pPage->aCell[idx]<=&pPage->aData[pPage->pBt->usableSize-sz] );
  2551   2542     data = pPage->aData;
  2552         -  pc = Addr(pPage->aCell[idx]) - Addr(data);
  2553         -  assert( pc>pPage->hdrOffset && pc+sz<=pPage->pBt->usableSize );
         2543  +  ptr = &data[pPage->cellOffset + 2*idx];
         2544  +  pc = get2byte(ptr);
         2545  +  assert( pc>10 && pc+sz<=pPage->pBt->usableSize );
  2554   2546     freeSpace(pPage, pc, sz);
  2555         -  for(j=idx; j<pPage->nCell-1; j++){
  2556         -    pPage->aCell[j] = pPage->aCell[j+1];
         2547  +  for(i=idx+1; i<pPage->nCell; i++, ptr+=2){
         2548  +    ptr[0] = ptr[2];
         2549  +    ptr[1] = ptr[3];
  2557   2550     }
  2558   2551     pPage->nCell--;
  2559         -  if( !pPage->isOverfull && !pPage->needRelink ){
  2560         -    u8 *pPrev;
  2561         -    if( idx==0 ){
  2562         -      pPrev = &data[pPage->hdrOffset+3];
  2563         -    }else{
  2564         -      pPrev = pPage->aCell[idx-1];
  2565         -    }
  2566         -    if( idx<pPage->nCell ){
  2567         -      pc = Addr(pPage->aCell[idx]) - Addr(data);
  2568         -    }else{
  2569         -      pc = 0;
  2570         -    }
  2571         -    put2byte(pPrev, pc);
  2572         -    pageIntegrity(pPage);
  2573         -  }else{
  2574         -    pPage->needRelink = 1;
  2575         -  }
         2552  +  put2byte(&data[pPage->hdrOffset+3], pPage->nCell);
         2553  +  pPage->nFree += 2;
  2576   2554     pPage->idxShift = 1;
  2577   2555   }
  2578   2556   
  2579   2557   /*
  2580   2558   ** Insert a new cell on pPage at cell index "i".  pCell points to the
  2581   2559   ** content of the cell.
  2582   2560   **
  2583   2561   ** If the cell content will fit on the page, then put it there.  If it
  2584         -** will not fit and pTemp is not NULL, then make a copy of the content
  2585         -** into pTemp, set pPage->aCell[i] point to pTemp, and set pPage->isOverfull.
  2586         -** If the content will not fit and pTemp is NULL, then make pPage->aCell[i]
  2587         -** point to pCell and set pPage->isOverfull.
  2588         -**
  2589         -** Try to maintain the integrity of the linked list of cells.  But if
  2590         -** the cell being inserted does not fit on the page, this will not be
  2591         -** possible.  If the linked list is not maintained, then just update
  2592         -** pPage->aCell[] and set the pPage->needRelink flag so that we will
  2593         -** know to rebuild the linked list later.
         2562  +** will not fit, then make a copy of the cell content into pTemp if
         2563  +** pTemp is not null.  Regardless of pTemp, allocate a new entry
         2564  +** in pPage->aOvfl[] and make it point to the cell content (either
         2565  +** in pTemp or the original pCell) and also record its index. 
         2566  +** Allocating a new entry in pPage->aCell[] implies that 
         2567  +** pPage->nOverflow is incremented.
  2594   2568   */
  2595   2569   static void insertCell(
  2596   2570     MemPage *pPage,   /* Page into which we are copying */
  2597         -  int i,            /* Which cell on pPage to insert after */
  2598         -  u8 *pCell,        /* Text of the new cell to insert */
  2599         -  int sz,           /* Bytes of data in pCell */
         2571  +  int i,            /* New cell becomes the i-th cell of the page */
         2572  +  u8 *pCell,        /* Content of the new cell */
         2573  +  int sz,           /* Bytes of content in pCell */
  2600   2574     u8 *pTemp         /* Temp storage space for pCell, if needed */
  2601   2575   ){
  2602         -  int idx, j;
  2603         -  assert( i>=0 && i<=pPage->nCell );
  2604         -  assert( sz==cellSize(pPage, pCell) );
         2576  +  int idx;          /* Where to write new cell content in data[] */
         2577  +  int j;            /* Loop counter */
         2578  +  int top;          /* First byte of content for any cell in data[] */
         2579  +  int end;          /* First byte past the last cell pointer in data[] */
         2580  +  int ins;          /* Index in data[] where new cell pointer is inserted */
         2581  +  int hdr;          /* Offset into data[] of the page header */
         2582  +  int cellOffset;   /* Address of first cell pointer in data[] */
         2583  +  u8 *data;         /* The content of the whole page */
         2584  +  u8 *ptr;          /* Used for moving information around in data[] */
         2585  +
         2586  +  assert( i>=0 && i<=pPage->nCell+pPage->nOverflow );
         2587  +  assert( sz==cellSizePtr(pPage, pCell) );
  2605   2588     assert( sqlite3pager_iswriteable(pPage->aData) );
  2606         -  idx = pPage->needRelink ? 0 : allocateSpace(pPage, sz);
  2607         -  resizeCellArray(pPage, pPage->nCell+1);
  2608         -  for(j=pPage->nCell; j>i; j--){
  2609         -    pPage->aCell[j] = pPage->aCell[j-1];
  2610         -  }
  2611         -  pPage->nCell++;
  2612         -  if( idx<=0 ){
  2613         -    pPage->isOverfull = 1;
         2589  +  if( pPage->nOverflow || sz+2>pPage->nFree ){
  2614   2590       if( pTemp ){
  2615   2591         memcpy(pTemp, pCell, sz);
  2616         -    }else{
  2617         -      pTemp = pCell;
         2592  +      pCell = pTemp;
  2618   2593       }
  2619         -    pPage->aCell[i] = pTemp;
         2594  +    j = pPage->nOverflow++;
         2595  +    assert( j<sizeof(pPage->aOvfl)/sizeof(pPage->aOvfl[0]) );
         2596  +    pPage->aOvfl[j].pCell = pCell;
         2597  +    pPage->aOvfl[j].idx = i;
         2598  +    pPage->nFree = 0;
  2620   2599     }else{
  2621         -    u8 *data = pPage->aData;
         2600  +    data = pPage->aData;
         2601  +    hdr = pPage->hdrOffset;
         2602  +    top = get2byte(&data[hdr+5]);
         2603  +    cellOffset = pPage->cellOffset;
         2604  +    end = cellOffset + 2*pPage->nCell + 2;
         2605  +    ins = cellOffset + 2*i;
         2606  +    if( end > top - sz ){
         2607  +      defragmentPage(pPage);
         2608  +      top = get2byte(&data[hdr+5]);
         2609  +      assert( end + sz <= top );
         2610  +    }
         2611  +    idx = allocateSpace(pPage, sz);
         2612  +    assert( idx>0 );
         2613  +    assert( end <= get2byte(&data[hdr+5]) );
         2614  +    pPage->nCell++;
         2615  +    pPage->nFree -= 2;
  2622   2616       memcpy(&data[idx], pCell, sz);
  2623         -    pPage->aCell[i] = &data[idx];
  2624         -  }
  2625         -  if( !pPage->isOverfull && !pPage->needRelink ){
  2626         -    u8 *pPrev;
  2627         -    int pc;
  2628         -    if( i==0 ){
  2629         -      pPrev = &pPage->aData[pPage->hdrOffset+3];
  2630         -    }else{
  2631         -      pPrev = pPage->aCell[i-1];
         2617  +    for(j=end-2, ptr=&data[j]; j>ins; j-=2, ptr-=2){
         2618  +      ptr[0] = ptr[-2];
         2619  +      ptr[1] = ptr[-1];
  2632   2620       }
  2633         -    pc = get2byte(pPrev);
  2634         -    put2byte(pPrev, idx);
  2635         -    put2byte(pPage->aCell[i], pc);
         2621  +    put2byte(&data[ins], idx);
         2622  +    put2byte(&data[hdr+3], pPage->nCell);
         2623  +    pPage->idxShift = 1;
  2636   2624       pageIntegrity(pPage);
  2637         -  }else{
  2638         -    pPage->needRelink = 1;
  2639   2625     }
  2640         -  pPage->idxShift = 1;
  2641   2626   }
  2642   2627   
  2643   2628   /*
  2644   2629   ** Add a list of cells to a page.  The page should be initially empty.
  2645   2630   ** The cells are guaranteed to fit on the page.
  2646   2631   */
  2647   2632   static void assemblePage(
  2648   2633     MemPage *pPage,   /* The page to be assemblied */
  2649   2634     int nCell,        /* The number of cells to add to this page */
  2650         -  u8 **apCell,      /* Pointers to cell text */
         2635  +  u8 **apCell,      /* Pointers to cell bodies */
  2651   2636     int *aSize        /* Sizes of the cells */
  2652   2637   ){
  2653   2638     int i;            /* Loop counter */
  2654   2639     int totalSize;    /* Total size of all cells */
  2655   2640     int hdr;          /* Index of page header */
  2656         -  int pc, prevpc;   /* Addresses of cells being inserted */
         2641  +  int cellptr;      /* Address of next cell pointer */
         2642  +  int cellbody;     /* Address of next cell body */
  2657   2643     u8 *data;         /* Data for the page */
  2658   2644   
  2659         -  assert( pPage->needRelink==0 );
  2660         -  assert( pPage->isOverfull==0 );
         2645  +  assert( pPage->nOverflow==0 );
  2661   2646     totalSize = 0;
  2662   2647     for(i=0; i<nCell; i++){
  2663   2648       totalSize += aSize[i];
  2664   2649     }
  2665         -  assert( totalSize<=pPage->nFree );
         2650  +  assert( totalSize+2*nCell<=pPage->nFree );
  2666   2651     assert( pPage->nCell==0 );
  2667         -  resizeCellArray(pPage, nCell);
  2668         -  pc = allocateSpace(pPage, totalSize);
         2652  +  cellptr = pPage->cellOffset;
  2669   2653     data = pPage->aData;
  2670   2654     hdr = pPage->hdrOffset;
  2671         -  prevpc = hdr+3;
         2655  +  put2byte(&data[hdr+3], nCell);
         2656  +  cellbody = allocateSpace(pPage, totalSize);
         2657  +  assert( cellbody>0 );
         2658  +  assert( pPage->nFree >= 2*nCell );
         2659  +  pPage->nFree -= 2*nCell;
  2672   2660     for(i=0; i<nCell; i++){
  2673         -    memcpy(data+pc, apCell[i], aSize[i]);
  2674         -    put2byte(data+prevpc, pc);
  2675         -    pPage->aCell[i] = data+pc;
  2676         -    prevpc = pc;
  2677         -    pc += aSize[i];
  2678         -    assert( pc<=pPage->pBt->usableSize );
         2661  +    put2byte(&data[cellptr], cellbody);
         2662  +    memcpy(&data[cellbody], apCell[i], aSize[i]);
         2663  +    cellptr += 2;
         2664  +    cellbody += aSize[i];
  2679   2665     }
         2666  +  assert( cellbody==pPage->pBt->usableSize );
  2680   2667     pPage->nCell = nCell;
  2681         -  put2byte(data+prevpc, 0);
  2682   2668   }
  2683   2669   
  2684         -#if 0  /* Never Used */
  2685         -/*
  2686         -** Rebuild the linked list of cells on a page so that the cells
  2687         -** occur in the order specified by the pPage->aCell[] array.  
  2688         -** Invoke this routine once to repair damage after one or more
  2689         -** invocations of either insertCell() or dropCell().
  2690         -*/
  2691         -static void relinkCellList(MemPage *pPage){
  2692         -  int i, idxFrom;
  2693         -  assert( sqlite3pager_iswriteable(pPage->aData) );
  2694         -  if( !pPage->needRelink ) return;
  2695         -  idxFrom = pPage->hdrOffset+3;
  2696         -  for(i=0; i<pPage->nCell; i++){
  2697         -    int idx = Addr(pPage->aCell[i]) - Addr(pPage->aData);
  2698         -    assert( idx>pPage->hdrOffset && idx<pPage->pBt->usableSize );
  2699         -    put2byte(&pPage->aData[idxFrom], idx);
  2700         -    idxFrom = idx;
  2701         -  }
  2702         -  put2byte(&pPage->aData[idxFrom], 0);
  2703         -  pPage->needRelink = 0;
  2704         -}
  2705         -#endif
  2706         -
  2707   2670   /*
  2708   2671   ** GCC does not define the offsetof() macro so we'll have to do it
  2709   2672   ** ourselves.
  2710   2673   */
  2711   2674   #ifndef offsetof
  2712   2675   #define offsetof(STRUCTURE,FIELD) ((int)((char*)&((STRUCTURE*)0)->FIELD))
  2713   2676   #endif
  2714   2677   
  2715         -/*
  2716         -** Move the content of the page at pFrom over to pTo.  The pFrom->aCell[]
  2717         -** pointers that point into pFrom->aData[] must be adjusted to point
  2718         -** into pTo->aData[] instead.  But some pFrom->aCell[] entries might
  2719         -** not point to pFrom->aData[].  Those are unchanged.
  2720         -**
  2721         -** Over this operation completes, the meta data for pFrom is zeroed.
  2722         -*/
  2723         -static void movePage(MemPage *pTo, MemPage *pFrom){
  2724         -  uptr from, to;
  2725         -  int i;
  2726         -  int usableSize;
  2727         -  int ofst;
  2728         -
  2729         -  assert( pTo->hdrOffset==0 );
  2730         -  assert( pFrom->isInit );
  2731         -  ofst = pFrom->hdrOffset;
  2732         -  usableSize = pFrom->pBt->usableSize;
  2733         -  sqliteFree(pTo->aCell);
  2734         -  memcpy(pTo->aData, &pFrom->aData[ofst], usableSize - ofst);
  2735         -  memcpy(pTo, pFrom, offsetof(MemPage, aData));
  2736         -  pFrom->isInit = 0;
  2737         -  pFrom->aCell = 0;
  2738         -  assert( pTo->aData[5]<155 );
  2739         -  pTo->aData[5] += ofst;
  2740         -  pTo->isOverfull = pFrom->isOverfull;
  2741         -  to = Addr(pTo->aData);
  2742         -  from = Addr(&pFrom->aData[ofst]);
  2743         -  for(i=0; i<pTo->nCell; i++){
  2744         -    uptr x = Addr(pTo->aCell[i]);
  2745         -    if( x>from && x<from+usableSize-ofst ){
  2746         -      *((uptr*)&pTo->aCell[i]) = x + to - from;
  2747         -    }
  2748         -  }
  2749         -}
  2750         -
  2751   2678   /*
  2752   2679   ** The following parameters determine how many adjacent pages get involved
  2753   2680   ** in a balancing operation.  NN is the number of neighbors on either side
  2754   2681   ** of the page that participate in the balancing operation.  NB is the
  2755   2682   ** total number of pages that participate, including the target page and
  2756   2683   ** NN neighbors on either side.
  2757   2684   **
................................................................................
  2759   2686   ** (to 2 or 3) gives a modest improvement in SELECT and DELETE performance
  2760   2687   ** in exchange for a larger degradation in INSERT and UPDATE performance.
  2761   2688   ** The value of NN appears to give the best results overall.
  2762   2689   */
  2763   2690   #define NN 1             /* Number of neighbors on either side of pPage */
  2764   2691   #define NB (NN*2+1)      /* Total pages involved in the balance */
  2765   2692   
         2693  +/* Forward reference */
         2694  +static int balance(MemPage*);
         2695  +
  2766   2696   /*
  2767   2697   ** This routine redistributes Cells on pPage and up to NN*2 siblings
  2768   2698   ** of pPage so that all pages have about the same amount of free space.
  2769   2699   ** Usually one sibling on either side of pPage is used in the balancing,
  2770   2700   ** though both siblings might come from one side if pPage is the first
  2771   2701   ** or last child of its parent.  If pPage has fewer than 2*NN siblings
  2772   2702   ** (something which can only happen if pPage is the root page or a 
................................................................................
  2788   2718   ** might become overfull or underfull.  If that happens, then this routine
  2789   2719   ** is called recursively on the parent.
  2790   2720   **
  2791   2721   ** If this routine fails for any reason, it might leave the database
  2792   2722   ** in a corrupted state.  So if this routine fails, the database should
  2793   2723   ** be rolled back.
  2794   2724   */
  2795         -static int balance(MemPage *pPage){
         2725  +static int balance_nonroot(MemPage *pPage){
  2796   2726     MemPage *pParent;            /* The parent of pPage */
  2797   2727     Btree *pBt;                  /* The whole database */
  2798   2728     int nCell;                   /* Number of cells in aCell[] */
  2799   2729     int nOld;                    /* Number of pages in apOld[] */
  2800   2730     int nNew;                    /* Number of pages in apNew[] */
  2801   2731     int nDiv;                    /* Number of cells in apDiv[] */
  2802   2732     int i, j, k;                 /* Loop counters */
................................................................................
  2805   2735     int rc;                      /* The return code */
  2806   2736     int leafCorrection;          /* 4 if pPage is a leaf.  0 if not */
  2807   2737     int leafData;                /* True if pPage is a leaf of a LEAFDATA tree */
  2808   2738     int usableSpace;             /* Bytes in pPage beyond the header */
  2809   2739     int pageFlags;               /* Value of pPage->aData[0] */
  2810   2740     int subtotal;                /* Subtotal of bytes in cells on one page */
  2811   2741     int iSpace = 0;              /* First unused byte of aSpace[] */
  2812         -  MemPage *extraUnref = 0;     /* Unref this page if not zero */
  2813   2742     MemPage *apOld[NB];          /* pPage and up to two siblings */
  2814   2743     Pgno pgnoOld[NB];            /* Page numbers for each page in apOld[] */
  2815   2744     MemPage *apCopy[NB];         /* Private copies of apOld[] pages */
  2816   2745     MemPage *apNew[NB+1];        /* pPage and up to NB siblings after balancing */
  2817   2746     Pgno pgnoNew[NB+1];          /* Page numbers for each page in apNew[] */
  2818   2747     int idxDiv[NB];              /* Indices of divider cells in pParent */
  2819   2748     u8 *apDiv[NB];               /* Divider cells in pParent */
................................................................................
  2821   2750     int szNew[NB+1];             /* Combined size of cells place on i-th page */
  2822   2751     u8 *apCell[(MX_CELL+2)*NB];  /* All cells from pages being balanced */
  2823   2752     int szCell[(MX_CELL+2)*NB];  /* Local size of all cells */
  2824   2753     u8 aCopy[NB][MX_PAGE_SIZE+sizeof(MemPage)];  /* Space for apCopy[] */
  2825   2754     u8 aSpace[MX_PAGE_SIZE*4];   /* Space to copies of divider cells */
  2826   2755   
  2827   2756     /* 
  2828         -  ** Return without doing any work if pPage is neither overfull nor
  2829         -  ** underfull.
         2757  +  ** Find the parent page.
  2830   2758     */
  2831   2759     assert( pPage->isInit );
  2832   2760     assert( sqlite3pager_iswriteable(pPage->aData) );
  2833   2761     pBt = pPage->pBt;
  2834         -  if( !pPage->isOverfull && pPage->nFree<pBt->usableSize*2/3
  2835         -        && pPage->nCell>=2){
  2836         -    assert( pPage->needRelink==0 );
  2837         -    return SQLITE_OK;
  2838         -  }
  2839         -
  2840         -  /*
  2841         -  ** Find the parent of the page to be balanced.  If there is no parent,
  2842         -  ** it means this page is the root page and special rules apply.
  2843         -  */
  2844   2762     pParent = pPage->pParent;
  2845         -  if( pParent==0 ){
  2846         -    Pgno pgnoChild;
  2847         -    MemPage *pChild;
  2848         -    assert( pPage->isInit );
  2849         -    if( pPage->nCell==0 ){
  2850         -      if( pPage->leaf ){
  2851         -        /* The table is completely empty */
  2852         -        assert( pPage->needRelink==0 );
  2853         -        TRACE(("BALANCE: empty table %d\n", pPage->pgno));
  2854         -      }else{
  2855         -        /* The root page is empty but has one child.  Transfer the
  2856         -        ** information from that one child into the root page if it 
  2857         -        ** will fit.  This reduces the depth of the tree by one.
  2858         -        **
  2859         -        ** If the root page is page 1, it has less space available than
  2860         -        ** its child (due to the 100 byte header that occurs at the beginning
  2861         -        ** of the database fle), so it might not be able to hold all of the 
  2862         -        ** information currently contained in the child.  If this is the 
  2863         -        ** case, then do not do the transfer.  Leave page 1 empty except
  2864         -        ** for the right-pointer to the child page.  The child page becomes
  2865         -        ** the virtual root of the tree.
  2866         -        */
  2867         -        pgnoChild = get4byte(&pPage->aData[pPage->hdrOffset+6]);
  2868         -        assert( pgnoChild>0 && pgnoChild<=sqlite3pager_pagecount(pBt->pPager) );
  2869         -        rc = getPage(pBt, pgnoChild, &pChild);
  2870         -        if( rc ) return rc;
  2871         -        if( pPage->pgno==1 ){
  2872         -          rc = initPage(pChild, pPage);
  2873         -          if( rc ) return rc;
  2874         -          if( pChild->nFree>=100 ){
  2875         -            /* The child information will fit on the root page, so do the
  2876         -            ** copy */
  2877         -            zeroPage(pPage, pChild->aData[0]);
  2878         -            for(i=0; i<pChild->nCell; i++){
  2879         -              szCell[i]  = cellSize(pChild, pChild->aCell[i]);
  2880         -            }
  2881         -            assemblePage(pPage, pChild->nCell, pChild->aCell, szCell);
  2882         -            freePage(pChild);
  2883         -            TRACE(("BALANCE: child %d transfer to page 1\n", pChild->pgno));
  2884         -          }else{
  2885         -            /* The child has more information that will fit on the root.
  2886         -            ** The tree is already balanced.  Do nothing. */
  2887         -            TRACE(("BALANCE: child %d will not fit on page 1\n", pChild->pgno));
  2888         -          }
  2889         -        }else{
  2890         -          memcpy(pPage->aData, pChild->aData, pBt->usableSize);
  2891         -          pPage->isInit = 0;
  2892         -          pPage->pParent = 0;
  2893         -          rc = initPage(pPage, 0);
  2894         -          assert( rc==SQLITE_OK );
  2895         -          freePage(pChild);
  2896         -          TRACE(("BALANCE: transfer child %d into root %d\n",
  2897         -                  pChild->pgno, pPage->pgno));
  2898         -        }
  2899         -        reparentChildPages(pPage);
  2900         -        releasePage(pChild);
  2901         -      }
  2902         -      return SQLITE_OK;
  2903         -    }
  2904         -    if( !pPage->isOverfull ){
  2905         -      /* It is OK for the root page to be less than half full.
  2906         -      */
  2907         -      assert( pPage->needRelink==0 );
  2908         -      TRACE(("BALANCE: root page %d is low - no changes\n", pPage->pgno));
  2909         -      return SQLITE_OK;
  2910         -    }
  2911         -    /*
  2912         -    ** If we get to here, it means the root page is overfull.
  2913         -    ** When this happens, Create a new child page and copy the
  2914         -    ** contents of the root into the child.  Then make the root
  2915         -    ** page an empty page with rightChild pointing to the new
  2916         -    ** child.  Then fall thru to the code below which will cause
  2917         -    ** the overfull child page to be split.
  2918         -    */
  2919         -    rc = allocatePage(pBt, &pChild, &pgnoChild, pPage->pgno);
  2920         -    if( rc ) return rc;
  2921         -    assert( sqlite3pager_iswriteable(pChild->aData) );
  2922         -    movePage(pChild, pPage);
  2923         -    assert( pChild->aData[0]==pPage->aData[pPage->hdrOffset] );
  2924         -    pChild->pParent = pPage;
  2925         -    sqlite3pager_ref(pPage->aData);
  2926         -    pChild->idxParent = 0;
  2927         -    pChild->isOverfull = 1;
  2928         -    zeroPage(pPage, pChild->aData[0] & ~PTF_LEAF);
  2929         -    put4byte(&pPage->aData[pPage->hdrOffset+6], pChild->pgno);
  2930         -    pParent = pPage;
  2931         -    pPage = pChild;
  2932         -    extraUnref = pChild;
  2933         -    TRACE(("BALANCE: copy root %d into %d and balance %d\n",
  2934         -            pParent->pgno, pPage->pgno, pPage->pgno));
  2935         -  }else{
  2936         -    TRACE(("BALANCE: begin page %d child of %d\n",
  2937         -            pPage->pgno, pParent->pgno));
  2938         -  }
  2939         -  rc = sqlite3pager_write(pParent->aData);
  2940         -  if( rc ) return rc;
  2941         -  assert( pParent->isInit );
         2763  +  sqlite3pager_write(pParent->aData);
         2764  +  assert( pParent );
         2765  +  TRACE(("BALANCE: begin page %d child of %d\n", pPage->pgno, pParent->pgno));
  2942   2766     
  2943   2767     /*
  2944   2768     ** Find the cell in the parent page whose left child points back
  2945   2769     ** to pPage.  The "idx" variable is the index of that cell.  If pPage
  2946   2770     ** is the rightmost child of pParent then set idx to pParent->nCell 
  2947   2771     */
  2948   2772     if( pParent->idxShift ){
  2949   2773       Pgno pgno;
  2950   2774       pgno = pPage->pgno;
  2951   2775       assert( pgno==sqlite3pager_pagenumber(pPage->aData) );
  2952   2776       for(idx=0; idx<pParent->nCell; idx++){
  2953         -      if( get4byte(&pParent->aCell[idx][2])==pgno ){
         2777  +      if( get4byte(findCell(pParent, idx))==pgno ){
  2954   2778           break;
  2955   2779         }
  2956   2780       }
  2957   2781       assert( idx<pParent->nCell
  2958         -             || get4byte(&pParent->aData[pParent->hdrOffset+6])==pgno );
         2782  +             || get4byte(&pParent->aData[pParent->hdrOffset+8])==pgno );
  2959   2783     }else{
  2960   2784       idx = pPage->idxParent;
  2961   2785     }
  2962   2786   
  2963   2787     /*
  2964   2788     ** Initialize variables so that it will be safe to jump
  2965   2789     ** directly to balance_cleanup at any moment.
................................................................................
  2981   2805     if( nxDiv<0 ){
  2982   2806       nxDiv = 0;
  2983   2807     }
  2984   2808     nDiv = 0;
  2985   2809     for(i=0, k=nxDiv; i<NB; i++, k++){
  2986   2810       if( k<pParent->nCell ){
  2987   2811         idxDiv[i] = k;
  2988         -      apDiv[i] = pParent->aCell[k];
         2812  +      apDiv[i] = findCell(pParent, k);
  2989   2813         nDiv++;
  2990   2814         assert( !pParent->leaf );
  2991         -      pgnoOld[i] = get4byte(&apDiv[i][2]);
         2815  +      pgnoOld[i] = get4byte(apDiv[i]);
  2992   2816       }else if( k==pParent->nCell ){
  2993         -      pgnoOld[i] = get4byte(&pParent->aData[pParent->hdrOffset+6]);
         2817  +      pgnoOld[i] = get4byte(&pParent->aData[pParent->hdrOffset+8]);
  2994   2818       }else{
  2995   2819         break;
  2996   2820       }
  2997   2821       rc = getAndInitPage(pBt, pgnoOld[i], &apOld[i], pParent);
  2998   2822       if( rc ) goto balance_cleanup;
  2999   2823       apOld[i]->idxParent = k;
  3000   2824       apCopy[i] = 0;
................................................................................
  3006   2830     ** Make copies of the content of pPage and its siblings into aOld[].
  3007   2831     ** The rest of this function will use data from the copies rather
  3008   2832     ** that the original pages since the original pages will be in the
  3009   2833     ** process of being overwritten.
  3010   2834     */
  3011   2835     for(i=0; i<nOld; i++){
  3012   2836       MemPage *p = apCopy[i] = (MemPage*)&aCopy[i+1][-sizeof(MemPage)];
  3013         -    p->aData = &((u8*)p)[-pBt->usableSize];
  3014         -    p->aCell = 0;
  3015         -    p->hdrOffset = 0;
  3016         -    movePage(p, apOld[i]);
         2837  +    p->aData = &((u8*)p)[-pBt->pageSize];
         2838  +    memcpy(p->aData, apOld[i]->aData, pBt->pageSize + sizeof(MemPage));
         2839  +    p->aData = &((u8*)p)[-pBt->pageSize];
  3017   2840     }
  3018   2841   
  3019   2842     /*
  3020   2843     ** Load pointers to all cells on sibling pages and the divider cells
  3021   2844     ** into the local apCell[] array.  Make copies of the divider cells
  3022   2845     ** into space obtained form aSpace[] and remove the the divider Cells
  3023   2846     ** from pParent.
................................................................................
  3033   2856     **       leafData:  1 if pPage holds key+data and pParent holds only keys.
  3034   2857     */
  3035   2858     nCell = 0;
  3036   2859     leafCorrection = pPage->leaf*4;
  3037   2860     leafData = pPage->leafData && pPage->leaf;
  3038   2861     for(i=0; i<nOld; i++){
  3039   2862       MemPage *pOld = apCopy[i];
  3040         -    for(j=0; j<pOld->nCell; j++){
  3041         -      apCell[nCell] = pOld->aCell[j];
  3042         -      szCell[nCell] = cellSize(pOld, apCell[nCell]);
         2863  +    int limit = pOld->nCell+pOld->nOverflow;
         2864  +    for(j=0; j<limit; j++){
         2865  +      apCell[nCell] = findOverflowCell(pOld, j);
         2866  +      szCell[nCell] = cellSizePtr(pOld, apCell[nCell]);
  3043   2867         nCell++;
  3044   2868       }
  3045   2869       if( i<nOld-1 ){
  3046         -      int sz = cellSize(pParent, apDiv[i]);
         2870  +      int sz = cellSizePtr(pParent, apDiv[i]);
  3047   2871         if( leafData ){
  3048   2872           /* With the LEAFDATA flag, pParent cells hold only INTKEYs that
  3049   2873           ** are duplicates of keys on the child pages.  We need to remove
  3050   2874           ** the divider cells from pParent, but the dividers cells are not
  3051   2875           ** added to apCell[] because they are duplicates of child cells.
  3052   2876           */
  3053   2877           dropCell(pParent, nxDiv, sz);
................................................................................
  3057   2881           pTemp = &aSpace[iSpace];
  3058   2882           iSpace += sz;
  3059   2883           assert( iSpace<=sizeof(aSpace) );
  3060   2884           memcpy(pTemp, apDiv[i], sz);
  3061   2885           apCell[nCell] = pTemp+leafCorrection;
  3062   2886           dropCell(pParent, nxDiv, sz);
  3063   2887           szCell[nCell] -= leafCorrection;
  3064         -        assert( get4byte(pTemp+2)==pgnoOld[i] );
         2888  +        assert( get4byte(pTemp)==pgnoOld[i] );
  3065   2889           if( !pOld->leaf ){
  3066   2890             assert( leafCorrection==0 );
  3067   2891             /* The right pointer of the child page pOld becomes the left
  3068   2892             ** pointer of the divider cell */
  3069         -          memcpy(&apCell[nCell][2], &pOld->aData[pOld->hdrOffset+6], 4);
         2893  +          memcpy(apCell[nCell], &pOld->aData[pOld->hdrOffset+8], 4);
  3070   2894           }else{
  3071   2895             assert( leafCorrection==4 );
  3072   2896           }
  3073   2897           nCell++;
  3074   2898         }
  3075   2899       }
  3076   2900     }
................................................................................
  3087   2911     **           k: The total number of sibling pages
  3088   2912     **    szNew[i]: Spaced used on the i-th sibling page.
  3089   2913     **   cntNew[i]: Index in apCell[] and szCell[] for the first cell to
  3090   2914     **              the right of the i-th sibling page.
  3091   2915     ** usableSpace: Number of bytes of space available on each sibling.
  3092   2916     ** 
  3093   2917     */
  3094         -  usableSpace = pBt->usableSize - 10 + leafCorrection;
         2918  +  usableSpace = pBt->usableSize - 12 + leafCorrection;
  3095   2919     for(subtotal=k=i=0; i<nCell; i++){
  3096         -    subtotal += szCell[i];
         2920  +    subtotal += szCell[i] + 2;
  3097   2921       if( subtotal > usableSpace ){
  3098   2922         szNew[k] = subtotal - szCell[i];
  3099   2923         cntNew[k] = i;
  3100   2924         if( leafData ){ i--; }
  3101   2925         subtotal = 0;
  3102   2926         k++;
  3103   2927       }
................................................................................
  3120   2944       int szRight = szNew[i];  /* Size of sibling on the right */
  3121   2945       int szLeft = szNew[i-1]; /* Size of sibling on the left */
  3122   2946       int r;              /* Index of right-most cell in left sibling */
  3123   2947       int d;              /* Index of first cell to the left of right sibling */
  3124   2948   
  3125   2949       r = cntNew[i-1] - 1;
  3126   2950       d = r + 1 - leafData;
  3127         -    while( szRight==0 || szRight+szCell[d]<=szLeft-szCell[r] ){
  3128         -      szRight += szCell[d];
  3129         -      szLeft -= szCell[r];
         2951  +    while( szRight==0 || szRight+szCell[d]+2<=szLeft-(szCell[r]+2) ){
         2952  +      szRight += szCell[d] + 2;
         2953  +      szLeft -= szCell[r] + 2;
  3130   2954         cntNew[i-1]--;
  3131   2955         r = cntNew[i-1] - 1;
  3132   2956         d = r + 1 - leafData;
  3133   2957       }
  3134   2958       szNew[i] = szRight;
  3135   2959       szNew[i-1] = szLeft;
  3136   2960     }
................................................................................
  3215   3039     ** Evenly distribute the data in apCell[] across the new pages.
  3216   3040     ** Insert divider cells into pParent as necessary.
  3217   3041     */
  3218   3042     j = 0;
  3219   3043     for(i=0; i<nNew; i++){
  3220   3044       MemPage *pNew = apNew[i];
  3221   3045       assert( pNew->pgno==pgnoNew[i] );
  3222         -    resizeCellArray(pNew, cntNew[i] - j);
  3223   3046       assemblePage(pNew, cntNew[i]-j, &apCell[j], &szCell[j]);
  3224   3047       j = cntNew[i];
  3225   3048       assert( pNew->nCell>0 );
  3226         -    assert( !pNew->isOverfull );
  3227         -    assert( pNew->needRelink==0 );
         3049  +    assert( pNew->nOverflow==0 );
  3228   3050       if( i<nNew-1 && j<nCell ){
  3229   3051         u8 *pCell;
  3230   3052         u8 *pTemp;
  3231   3053         int sz;
  3232   3054         pCell = apCell[j];
  3233   3055         sz = szCell[j] + leafCorrection;
  3234   3056         if( !pNew->leaf ){
  3235         -        memcpy(&pNew->aData[6], pCell+2, 4);
         3057  +        memcpy(&pNew->aData[8], pCell, 4);
  3236   3058           pTemp = 0;
  3237   3059         }else if( leafData ){
  3238   3060           CellInfo info;
  3239   3061           j--;
  3240         -        parseCell(pNew, apCell[j], &info);
         3062  +        parseCellPtr(pNew, apCell[j], &info);
  3241   3063           pCell = &aSpace[iSpace];
  3242   3064           fillInCell(pParent, pCell, 0, info.nKey, 0, 0, &sz);
  3243   3065           iSpace += sz;
  3244   3066           assert( iSpace<=sizeof(aSpace) );
  3245   3067           pTemp = 0;
  3246   3068         }else{
  3247   3069           pCell -= 4;
  3248   3070           pTemp = &aSpace[iSpace];
  3249   3071           iSpace += sz;
  3250   3072           assert( iSpace<=sizeof(aSpace) );
  3251   3073         }
  3252   3074         insertCell(pParent, nxDiv, pCell, sz, pTemp);
  3253         -      put4byte(&pParent->aCell[nxDiv][2], pNew->pgno);
         3075  +      put4byte(findOverflowCell(pParent,nxDiv), pNew->pgno);
  3254   3076         j++;
  3255   3077         nxDiv++;
  3256   3078       }
  3257   3079     }
  3258   3080     assert( j==nCell );
  3259   3081     if( (pageFlags & PTF_LEAF)==0 ){
  3260         -    memcpy(&apNew[nNew-1]->aData[6], &apCopy[nOld-1]->aData[6], 4);
         3082  +    memcpy(&apNew[nNew-1]->aData[8], &apCopy[nOld-1]->aData[8], 4);
  3261   3083     }
  3262         -  if( nxDiv==pParent->nCell ){
         3084  +  if( nxDiv==pParent->nCell+pParent->nOverflow ){
  3263   3085       /* Right-most sibling is the right-most child of pParent */
  3264         -    put4byte(&pParent->aData[pParent->hdrOffset+6], pgnoNew[nNew-1]);
         3086  +    put4byte(&pParent->aData[pParent->hdrOffset+8], pgnoNew[nNew-1]);
  3265   3087     }else{
  3266   3088       /* Right-most sibling is the left child of the first entry in pParent
  3267   3089       ** past the right-most divider entry */
  3268         -    put4byte(&pParent->aCell[nxDiv][2], pgnoNew[nNew-1]);
         3090  +    put4byte(findOverflowCell(pParent, nxDiv), pgnoNew[nNew-1]);
  3269   3091     }
  3270   3092   
  3271   3093     /*
  3272   3094     ** Reparent children of all cells.
  3273   3095     */
  3274   3096     for(i=0; i<nNew; i++){
  3275   3097       reparentChildPages(apNew[i]);
................................................................................
  3288   3110     
  3289   3111     /*
  3290   3112     ** Cleanup before returning.
  3291   3113     */
  3292   3114   balance_cleanup:
  3293   3115     for(i=0; i<nOld; i++){
  3294   3116       releasePage(apOld[i]);
  3295         -    if( apCopy[i] ){
  3296         -      sqliteFree(apCopy[i]->aCell);
  3297         -    }
  3298   3117     }
  3299   3118     for(i=0; i<nNew; i++){
  3300   3119       releasePage(apNew[i]);
  3301   3120     }
  3302   3121     releasePage(pParent);
  3303         -  releasePage(extraUnref);
  3304   3122     TRACE(("BALANCE: finished with %d: old=%d new=%d cells=%d\n",
  3305   3123             pPage->pgno, nOld, nNew, nCell));
  3306   3124     return rc;
  3307   3125   }
         3126  +
         3127  +/*
         3128  +** This routine is called for the root page of a btree when the root
         3129  +** page contains no cells.  This is an opportunity to make the tree
         3130  +** shallower by one level.
         3131  +*/
         3132  +static int balance_shallower(MemPage *pPage){
         3133  +  MemPage *pChild;             /* The only child page of pPage */
         3134  +  Pgno pgnoChild;              /* Page number for pChild */
         3135  +  int rc;                      /* Return code from subprocedures */
         3136  +  u8 *apCell[(MX_CELL+2)*NB];  /* All cells from pages being balanced */
         3137  +  int szCell[(MX_CELL+2)*NB];  /* Local size of all cells */
         3138  +
         3139  +  assert( pPage->pParent==0 );
         3140  +  assert( pPage->nCell==0 );
         3141  +  if( pPage->leaf ){
         3142  +    /* The table is completely empty */
         3143  +    TRACE(("BALANCE: empty table %d\n", pPage->pgno));
         3144  +  }else{
         3145  +    /* The root page is empty but has one child.  Transfer the
         3146  +    ** information from that one child into the root page if it 
         3147  +    ** will fit.  This reduces the depth of the tree by one.
         3148  +    **
         3149  +    ** If the root page is page 1, it has less space available than
         3150  +    ** its child (due to the 100 byte header that occurs at the beginning
         3151  +    ** of the database fle), so it might not be able to hold all of the 
         3152  +    ** information currently contained in the child.  If this is the 
         3153  +    ** case, then do not do the transfer.  Leave page 1 empty except
         3154  +    ** for the right-pointer to the child page.  The child page becomes
         3155  +    ** the virtual root of the tree.
         3156  +    */
         3157  +    pgnoChild = get4byte(&pPage->aData[pPage->hdrOffset+8]);
         3158  +    assert( pgnoChild>0 );
         3159  +    assert( pgnoChild<=sqlite3pager_pagecount(pPage->pBt->pPager) );
         3160  +    rc = getPage(pPage->pBt, pgnoChild, &pChild);
         3161  +    if( rc ) return rc;
         3162  +    if( pPage->pgno==1 ){
         3163  +      rc = initPage(pChild, pPage);
         3164  +      if( rc ) return rc;
         3165  +      assert( pChild->nOverflow==0 );
         3166  +      if( pChild->nFree>=100 ){
         3167  +        /* The child information will fit on the root page, so do the
         3168  +        ** copy */
         3169  +        int i;
         3170  +        zeroPage(pPage, pChild->aData[0]);
         3171  +        for(i=0; i<pChild->nCell; i++){
         3172  +          apCell[i] = findCell(pChild,i);
         3173  +          szCell[i] = cellSizePtr(pChild, apCell[i]);
         3174  +        }
         3175  +        assemblePage(pPage, pChild->nCell, apCell, szCell);
         3176  +        freePage(pChild);
         3177  +        TRACE(("BALANCE: child %d transfer to page 1\n", pChild->pgno));
         3178  +      }else{
         3179  +        /* The child has more information that will fit on the root.
         3180  +        ** The tree is already balanced.  Do nothing. */
         3181  +        TRACE(("BALANCE: child %d will not fit on page 1\n", pChild->pgno));
         3182  +      }
         3183  +    }else{
         3184  +      memcpy(pPage->aData, pChild->aData, pPage->pBt->usableSize);
         3185  +      pPage->isInit = 0;
         3186  +      pPage->pParent = 0;
         3187  +      rc = initPage(pPage, 0);
         3188  +      assert( rc==SQLITE_OK );
         3189  +      freePage(pChild);
         3190  +      TRACE(("BALANCE: transfer child %d into root %d\n",
         3191  +              pChild->pgno, pPage->pgno));
         3192  +    }
         3193  +    reparentChildPages(pPage);
         3194  +    releasePage(pChild);
         3195  +  }
         3196  +  return SQLITE_OK;
         3197  +}
         3198  +
         3199  +
         3200  +/*
         3201  +** The root page is overfull
         3202  +**
         3203  +** When this happens, Create a new child page and copy the
         3204  +** contents of the root into the child.  Then make the root
         3205  +** page an empty page with rightChild pointing to the new
         3206  +** child.   Finally, call balance_internal() on the new child
         3207  +** to cause it to split.
         3208  +*/
         3209  +static int balance_deeper(MemPage *pPage){
         3210  +  int rc;             /* Return value from subprocedures */
         3211  +  MemPage *pChild;    /* Pointer to a new child page */
         3212  +  Pgno pgnoChild;     /* Page number of the new child page */
         3213  +  Btree *pBt;         /* The BTree */
         3214  +  int usableSize;     /* Total usable size of a page */
         3215  +  u8 *data;           /* Content of the parent page */
         3216  +  u8 *cdata;          /* Content of the child page */
         3217  +  int hdr;            /* Offset to page header in parent */
         3218  +  int brk;            /* Offset to content of first cell in parent */
         3219  +
         3220  +  assert( pPage->pParent==0 );
         3221  +  assert( pPage->nOverflow>0 );
         3222  +  pBt = pPage->pBt;
         3223  +  rc = allocatePage(pBt, &pChild, &pgnoChild, pPage->pgno);
         3224  +  if( rc ) return rc;
         3225  +  assert( sqlite3pager_iswriteable(pChild->aData) );
         3226  +  usableSize = pBt->usableSize;
         3227  +  data = pPage->aData;
         3228  +  hdr = pPage->hdrOffset;
         3229  +  brk = get2byte(&data[hdr+5]);
         3230  +  cdata = pChild->aData;
         3231  +  memcpy(cdata, &data[hdr], pPage->cellOffset+2*pPage->nCell-hdr);
         3232  +  memcpy(&cdata[brk], &data[brk], usableSize-brk);
         3233  +  rc = initPage(pChild, pPage);
         3234  +  if( rc ) return rc;
         3235  +  memcpy(pChild->aOvfl, pPage->aOvfl, pPage->nOverflow*sizeof(pPage->aOvfl[0]));
         3236  +  pChild->nOverflow = pPage->nOverflow;
         3237  +  if( pChild->nOverflow ){
         3238  +    pChild->nFree = 0;
         3239  +  }
         3240  +  assert( pChild->nCell==pPage->nCell );
         3241  +  zeroPage(pPage, pChild->aData[0] & ~PTF_LEAF);
         3242  +  put4byte(&pPage->aData[pPage->hdrOffset+8], pgnoChild);
         3243  +  TRACE(("BALANCE: copy root %d into %d\n", pPage->pgno, pChild->pgno));
         3244  +  rc = balance_nonroot(pChild);
         3245  +  releasePage(pChild);
         3246  +  return rc;
         3247  +}
         3248  +
         3249  +/*
         3250  +** Decide if the page pPage needs to be balanced.  If balancing is
         3251  +** required, call the appropriate balancing routine.
         3252  +*/
         3253  +static int balance(MemPage *pPage){
         3254  +  int rc = SQLITE_OK;
         3255  +  if( pPage->pParent==0 ){
         3256  +    if( pPage->nOverflow>0 ){
         3257  +      rc = balance_deeper(pPage);
         3258  +    }
         3259  +    if( pPage->nCell==0 ){
         3260  +      rc = balance_shallower(pPage);
         3261  +    }
         3262  +  }else{
         3263  +    if( pPage->nOverflow>0 || pPage->nFree>pPage->pBt->usableSize*2/3 ){
         3264  +      rc = balance_nonroot(pPage);
         3265  +    }
         3266  +  }
         3267  +  return rc;
         3268  +}
  3308   3269   
  3309   3270   /*
  3310   3271   ** This routine checks all cursors that point to the same table
  3311   3272   ** as pCur points to.  If any of those cursors were opened with
  3312   3273   ** wrFlag==0 then this routine returns SQLITE_LOCKED.  If all
  3313   3274   ** cursors point to the same table were opened with wrFlag==1
  3314   3275   ** then this routine returns SQLITE_OK.
................................................................................
  3381   3342             pCur->pgnoRoot, nKey, nData, pPage->pgno,
  3382   3343             loc==0 ? "overwrite" : "new entry"));
  3383   3344     assert( pPage->isInit );
  3384   3345     rc = sqlite3pager_write(pPage->aData);
  3385   3346     if( rc ) return rc;
  3386   3347     rc = fillInCell(pPage, newCell, pKey, nKey, pData, nData, &szNew);
  3387   3348     if( rc ) return rc;
  3388         -  assert( szNew==cellSize(pPage, newCell) );
         3349  +  assert( szNew==cellSizePtr(pPage, newCell) );
  3389   3350     assert( szNew<=sizeof(newCell) );
  3390   3351     if( loc==0 && pCur->isValid ){
  3391   3352       int szOld;
  3392   3353       assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  3393         -    oldCell = pPage->aCell[pCur->idx];
         3354  +    oldCell = findCell(pPage, pCur->idx);
  3394   3355       if( !pPage->leaf ){
  3395         -      memcpy(&newCell[2], &oldCell[2], 4);
         3356  +      memcpy(newCell, oldCell, 4);
  3396   3357       }
  3397         -    szOld = cellSize(pPage, oldCell);
         3358  +    szOld = cellSizePtr(pPage, oldCell);
  3398   3359       rc = clearCell(pPage, oldCell);
  3399   3360       if( rc ) return rc;
  3400   3361       dropCell(pPage, pCur->idx, szOld);
  3401   3362     }else if( loc<0 && pPage->nCell>0 ){
  3402   3363       assert( pPage->leaf );
  3403   3364       pCur->idx++;
  3404   3365       pCur->infoValid = 0;
................................................................................
  3440   3401       return SQLITE_PERM;   /* Did not open this cursor for writing */
  3441   3402     }
  3442   3403     if( checkReadLocks(pCur) ){
  3443   3404       return SQLITE_LOCKED; /* The table pCur points to has a read lock */
  3444   3405     }
  3445   3406     rc = sqlite3pager_write(pPage->aData);
  3446   3407     if( rc ) return rc;
  3447         -  pCell = pPage->aCell[pCur->idx];
         3408  +  pCell = findCell(pPage, pCur->idx);
  3448   3409     if( !pPage->leaf ){
  3449         -    pgnoChild = get4byte(&pCell[2]);
         3410  +    pgnoChild = get4byte(pCell);
  3450   3411     }
  3451   3412     clearCell(pPage, pCell);
  3452   3413     if( !pPage->leaf ){
  3453   3414       /*
  3454   3415       ** The entry we are about to delete is not a leaf so if we do not
  3455   3416       ** do something we will leave a hole on an internal page.
  3456   3417       ** We have to fill the hole by moving in a cell from a leaf.  The
................................................................................
  3469   3430         if( rc!=SQLITE_NOMEM ) rc = SQLITE_CORRUPT;
  3470   3431         return rc;
  3471   3432       }
  3472   3433       rc = sqlite3pager_write(leafCur.pPage->aData);
  3473   3434       if( rc ) return rc;
  3474   3435       TRACE(("DELETE: table=%d delete internal from %d replace from leaf %d\n",
  3475   3436          pCur->pgnoRoot, pPage->pgno, leafCur.pPage->pgno));
  3476         -    dropCell(pPage, pCur->idx, cellSize(pPage, pCell));
  3477         -    pNext = leafCur.pPage->aCell[leafCur.idx];
  3478         -    szNext = cellSize(leafCur.pPage, pNext);
         3437  +    dropCell(pPage, pCur->idx, cellSizePtr(pPage, pCell));
         3438  +    pNext = findCell(leafCur.pPage, leafCur.idx);
         3439  +    szNext = cellSizePtr(leafCur.pPage, pNext);
  3479   3440       assert( sizeof(tempCell)>=szNext+4 );
  3480   3441       insertCell(pPage, pCur->idx, pNext-4, szNext+4, tempCell);
  3481         -    put4byte(pPage->aCell[pCur->idx]+2, pgnoChild);
         3442  +    put4byte(findOverflowCell(pPage, pCur->idx), pgnoChild);
  3482   3443       rc = balance(pPage);
  3483   3444       if( rc ) return rc;
  3484   3445       dropCell(leafCur.pPage, leafCur.idx, szNext);
  3485   3446       rc = balance(leafCur.pPage);
  3486   3447       releaseTempCursor(&leafCur);
  3487   3448     }else{
  3488   3449       TRACE(("DELETE: table=%d delete from leaf %d\n",
  3489   3450          pCur->pgnoRoot, pPage->pgno));
  3490         -    dropCell(pPage, pCur->idx, cellSize(pPage, pCell));
         3451  +    dropCell(pPage, pCur->idx, cellSizePtr(pPage, pCell));
  3491   3452       rc = balance(pPage);
  3492   3453     }
  3493   3454     moveToRoot(pCur);
  3494   3455     return rc;
  3495   3456   }
  3496   3457   
  3497   3458   /*
................................................................................
  3541   3502     int i;
  3542   3503   
  3543   3504     rc = getAndInitPage(pBt, pgno, &pPage, pParent);
  3544   3505     if( rc ) return rc;
  3545   3506     rc = sqlite3pager_write(pPage->aData);
  3546   3507     if( rc ) return rc;
  3547   3508     for(i=0; i<pPage->nCell; i++){
  3548         -    pCell = pPage->aCell[i];
         3509  +    pCell = findCell(pPage, i);
  3549   3510       if( !pPage->leaf ){
  3550         -      rc = clearDatabasePage(pBt, get4byte(&pCell[2]), pPage->pParent, 1);
         3511  +      rc = clearDatabasePage(pBt, get4byte(pCell), pPage->pParent, 1);
  3551   3512         if( rc ) return rc;
  3552   3513       }
  3553   3514       rc = clearCell(pPage, pCell);
  3554   3515       if( rc ) return rc;
  3555   3516     }
  3556   3517     if( !pPage->leaf ){
  3557         -    rc = clearDatabasePage(pBt, get4byte(&pPage->aData[6]), pPage->pParent, 1);
         3518  +    rc = clearDatabasePage(pBt, get4byte(&pPage->aData[8]), pPage->pParent, 1);
  3558   3519       if( rc ) return rc;
  3559   3520     }
  3560   3521     if( freePageFlag ){
  3561   3522       rc = freePage(pPage);
  3562   3523     }else{
  3563   3524       zeroPage(pPage, pPage->aData[0] | PTF_LEAF);
  3564   3525     }
................................................................................
  3692   3653   int sqlite3BtreePageDump(Btree *pBt, int pgno, int recursive){
  3693   3654     int rc;
  3694   3655     MemPage *pPage;
  3695   3656     int i, j, c;
  3696   3657     int nFree;
  3697   3658     u16 idx;
  3698   3659     int hdr;
         3660  +  int nCell;
  3699   3661     unsigned char *data;
  3700   3662     char range[20];
  3701   3663     unsigned char payload[20];
  3702   3664   
  3703   3665     rc = getPage(pBt, (Pgno)pgno, &pPage);
  3704   3666     if( rc ){
  3705   3667       return rc;
................................................................................
  3708   3670     data = pPage->aData;
  3709   3671     c = data[hdr];
  3710   3672     pPage->intKey = (c & (PTF_INTKEY|PTF_LEAFDATA))!=0;
  3711   3673     pPage->zeroData = (c & PTF_ZERODATA)!=0;
  3712   3674     pPage->leafData = (c & PTF_LEAFDATA)!=0;
  3713   3675     pPage->leaf = (c & PTF_LEAF)!=0;
  3714   3676     pPage->hasData = !(pPage->zeroData || (!pPage->leaf && pPage->leafData));
         3677  +  nCell = get2byte(&data[hdr+3]);
  3715   3678     printf("PAGE %d:  flags=0x%02x  frag=%d   parent=%d\n", pgno,
  3716         -    data[hdr], data[hdr+5], 
         3679  +    data[hdr], data[hdr+7], 
  3717   3680       (pPage->isInit && pPage->pParent) ? pPage->pParent->pgno : 0);
  3718         -  i = 0;
  3719   3681     assert( hdr == (pgno==1 ? 100 : 0) );
  3720         -  idx = get2byte(&data[hdr+3]);
  3721         -  while( idx>0 && idx<=pBt->usableSize ){
         3682  +  idx = hdr + 12 - pPage->leaf*4;
         3683  +  for(i=0; i<nCell; i++){
  3722   3684       CellInfo info;
  3723   3685       Pgno child;
  3724         -    unsigned char *pCell = &data[idx];
         3686  +    unsigned char *pCell;
  3725   3687       int sz;
         3688  +    int addr;
  3726   3689   
  3727         -    pCell = &data[idx];
  3728         -    parseCell(pPage, pCell, &info);
         3690  +    addr = get2byte(&data[idx + 2*i]);
         3691  +    pCell = &data[addr];
         3692  +    parseCellPtr(pPage, pCell, &info);
  3729   3693       sz = info.nSize;
  3730         -    sprintf(range,"%d..%d", idx, idx+sz-1);
         3694  +    sprintf(range,"%d..%d", addr, addr+sz-1);
  3731   3695       if( pPage->leaf ){
  3732   3696         child = 0;
  3733   3697       }else{
  3734         -      child = get4byte(&pCell[2]);
         3698  +      child = get4byte(pCell);
  3735   3699       }
  3736   3700       sz = info.nData;
  3737   3701       if( !pPage->intKey ) sz += info.nKey;
  3738   3702       if( sz>sizeof(payload)-1 ) sz = sizeof(payload)-1;
  3739   3703       memcpy(payload, &pCell[info.nHeader], sz);
  3740   3704       for(j=0; j<sz; j++){
  3741   3705         if( payload[j]<0x20 || payload[j]>0x7f ) payload[j] = '.';
  3742   3706       }
  3743   3707       payload[sz] = 0;
  3744   3708       printf(
  3745   3709         "cell %2d: i=%-10s chld=%-4d nk=%-4lld nd=%-4d payload=%s\n",
  3746   3710         i, range, child, info.nKey, info.nData, payload
  3747   3711       );
  3748         -    if( pPage->isInit && pPage->aCell[i]!=pCell ){
  3749         -      printf("**** aCell[%d] does not match on prior entry ****\n", i);
  3750         -    }
  3751         -    i++;
  3752         -    idx = get2byte(pCell);
  3753         -  }
  3754         -  if( idx!=0 ){
  3755         -    printf("ERROR: next cell index out of range: %d\n", idx);
  3756   3712     }
  3757   3713     if( !pPage->leaf ){
  3758         -    printf("right_child: %d\n", get4byte(&data[hdr+6]));
         3714  +    printf("right_child: %d\n", get4byte(&data[hdr+8]));
  3759   3715     }
  3760   3716     nFree = 0;
  3761   3717     i = 0;
  3762   3718     idx = get2byte(&data[hdr+1]);
  3763   3719     while( idx>0 && idx<pPage->pBt->usableSize ){
  3764   3720       int sz = get2byte(&data[idx+2]);
  3765   3721       sprintf(range,"%d..%d", idx, idx+sz-1);
................................................................................
  3769   3725       idx = get2byte(&data[idx]);
  3770   3726       i++;
  3771   3727     }
  3772   3728     if( idx!=0 ){
  3773   3729       printf("ERROR: next freeblock index out of range: %d\n", idx);
  3774   3730     }
  3775   3731     if( recursive && !pPage->leaf ){
  3776         -    idx = get2byte(&data[hdr+3]);
  3777         -    while( idx>0 && idx<pBt->usableSize ){
  3778         -      unsigned char *pCell = &data[idx];
  3779         -      sqlite3BtreePageDump(pBt, get4byte(&pCell[2]), 1);
         3732  +    for(i=0; i<nCell; i++){
         3733  +      unsigned char *pCell = findCell(pPage, i);
         3734  +      sqlite3BtreePageDump(pBt, get4byte(pCell), 1);
  3780   3735         idx = get2byte(pCell);
  3781   3736       }
  3782         -    sqlite3BtreePageDump(pBt, get4byte(&data[hdr+6]), 1);
         3737  +    sqlite3BtreePageDump(pBt, get4byte(&data[hdr+8]), 1);
  3783   3738     }
  3784   3739     sqlite3pager_unref(data);
  3785   3740     fflush(stdout);
  3786   3741     return SQLITE_OK;
  3787   3742   }
  3788   3743   #endif
  3789   3744   
................................................................................
  3810   3765     pageIntegrity(pPage);
  3811   3766     assert( pPage->isInit );
  3812   3767     aResult[0] = sqlite3pager_pagenumber(pPage->aData);
  3813   3768     assert( aResult[0]==pPage->pgno );
  3814   3769     aResult[1] = pCur->idx;
  3815   3770     aResult[2] = pPage->nCell;
  3816   3771     if( pCur->idx>=0 && pCur->idx<pPage->nCell ){
  3817         -    aResult[3] = cellSize(pPage, pPage->aCell[pCur->idx]);
  3818         -    aResult[6] = pPage->leaf ? 0 : get4byte(&pPage->aCell[pCur->idx][2]);
         3772  +    u8 *pCell = findCell(pPage, pCur->idx);
         3773  +    aResult[3] = cellSizePtr(pPage, pCell);
         3774  +    aResult[6] = pPage->leaf ? 0 : get4byte(pCell);
  3819   3775     }else{
  3820   3776       aResult[3] = 0;
  3821   3777       aResult[6] = 0;
  3822   3778     }
  3823   3779     aResult[4] = pPage->nFree;
  3824   3780     cnt = 0;
  3825   3781     idx = get2byte(&pPage->aData[pPage->hdrOffset+1]);
  3826   3782     while( idx>0 && idx<pPage->pBt->usableSize ){
  3827   3783       cnt++;
  3828   3784       idx = get2byte(&pPage->aData[idx]);
  3829   3785     }
  3830   3786     aResult[5] = cnt;
  3831         -  aResult[7] = pPage->leaf ? 0 : get4byte(&pPage->aData[pPage->hdrOffset+6]);
         3787  +  aResult[7] = pPage->leaf ? 0 : get4byte(&pPage->aData[pPage->hdrOffset+8]);
  3832   3788     return SQLITE_OK;
  3833   3789   }
  3834   3790   #endif
  3835   3791   
  3836   3792   /*
  3837   3793   ** Return the pager associated with a BTree.  This routine is used for
  3838   3794   ** testing and debugging only.
................................................................................
  3960   3916     char *zLowerBound,    /* All keys should be greater than this, if not NULL */
  3961   3917     int nLower,           /* Number of characters in zLowerBound */
  3962   3918     char *zUpperBound,    /* All keys should be less than this, if not NULL */
  3963   3919     int nUpper            /* Number of characters in zUpperBound */
  3964   3920   ){
  3965   3921     MemPage *pPage;
  3966   3922     int i, rc, depth, d2, pgno, cnt;
  3967         -  int hdr;
         3923  +  int hdr, cellStart;
         3924  +  int nCell;
  3968   3925     u8 *data;
  3969   3926     BtCursor cur;
  3970   3927     Btree *pBt;
  3971   3928     int maxLocal, usableSize;
  3972   3929     char zMsg[100];
  3973   3930     char zContext[100];
  3974   3931     char hit[MX_PAGE_SIZE];
................................................................................
  4000   3957       u8 *pCell;
  4001   3958       int sz;
  4002   3959       CellInfo info;
  4003   3960   
  4004   3961       /* Check payload overflow pages
  4005   3962       */
  4006   3963       sprintf(zContext, "On tree page %d cell %d: ", iPage, i);
  4007         -    pCell = pPage->aCell[i];
  4008         -    parseCell(pPage, pCell, &info);
         3964  +    pCell = findCell(pPage,i);
         3965  +    parseCellPtr(pPage, pCell, &info);
  4009   3966       sz = info.nData;
  4010   3967       if( !pPage->intKey ) sz += info.nKey;
  4011   3968       if( sz>info.nLocal ){
  4012   3969         int nPage = (sz - info.nLocal + usableSize - 5)/(usableSize - 4);
  4013   3970         checkList(pCheck, 0, get4byte(&pCell[info.iOverflow]),nPage,zContext);
  4014   3971       }
  4015   3972   
  4016   3973       /* Check sanity of left child page.
  4017   3974       */
  4018   3975       if( !pPage->leaf ){
  4019         -      pgno = get4byte(&pCell[2]);
         3976  +      pgno = get4byte(pCell);
  4020   3977         d2 = checkTreePage(pCheck,pgno,pPage,zContext,0,0,0,0);
  4021   3978         if( i>0 && d2!=depth ){
  4022   3979           checkAppendMsg(pCheck, zContext, "Child page depth differs");
  4023   3980         }
  4024   3981         depth = d2;
  4025   3982       }
  4026   3983     }
  4027   3984     if( !pPage->leaf ){
  4028         -    pgno = get4byte(&pPage->aData[pPage->hdrOffset+6]);
         3985  +    pgno = get4byte(&pPage->aData[pPage->hdrOffset+8]);
  4029   3986       sprintf(zContext, "On page %d at right child: ", iPage);
  4030   3987       checkTreePage(pCheck, pgno, pPage, zContext,0,0,0,0);
  4031   3988     }
  4032   3989    
  4033   3990     /* Check for complete coverage of the page
  4034   3991     */
  4035         -  memset(hit, 0, usableSize);
  4036         -  memset(hit, 1, pPage->hdrOffset+10-4*(pPage->leaf));
  4037   3992     data = pPage->aData;
  4038   3993     hdr = pPage->hdrOffset;
  4039         -  for(cnt=0, i=get2byte(&data[hdr+3]); i>0 && i<usableSize && cnt<10000; cnt++){
  4040         -    int size = cellSize(pPage, &data[i]);
         3994  +  memset(hit, 0, usableSize);
         3995  +  memset(hit, 1, get2byte(&data[hdr+5]));
         3996  +  nCell = get2byte(&data[hdr+3]);
         3997  +  cellStart = hdr + 12 - 4*pPage->leaf;
         3998  +  for(i=0; i<nCell; i++){
         3999  +    int pc = get2byte(&data[cellStart+i*2]);
         4000  +    int size = cellSizePtr(pPage, &data[pc]);
  4041   4001       int j;
  4042         -    for(j=i+size-1; j>=i; j--) hit[j]++;
  4043         -    i = get2byte(&data[i]);
         4002  +    for(j=pc+size-1; j>=pc; j--) hit[j]++;
  4044   4003     }
  4045   4004     for(cnt=0, i=get2byte(&data[hdr+1]); i>0 && i<usableSize && cnt<10000; cnt++){
  4046   4005       int size = get2byte(&data[i+2]);
  4047   4006       int j;
  4048   4007       for(j=i+size-1; j>=i; j--) hit[j]++;
  4049   4008       i = get2byte(&data[i]);
  4050   4009     }
................................................................................
  4053   4012         cnt++;
  4054   4013       }else if( hit[i]>1 ){
  4055   4014         sprintf(zMsg, "Multiple uses for byte %d of page %d", i, iPage);
  4056   4015         checkAppendMsg(pCheck, zMsg, 0);
  4057   4016         break;
  4058   4017       }
  4059   4018     }
  4060         -  if( cnt!=data[hdr+5] ){
         4019  +  if( cnt!=data[hdr+7] ){
  4061   4020       sprintf(zMsg, "Fragmented space is %d byte reported as %d on page %d",
  4062         -        cnt, data[hdr+5], iPage);
         4021  +        cnt, data[hdr+7], iPage);
  4063   4022       checkAppendMsg(pCheck, zMsg, 0);
  4064   4023     }
  4065   4024   
  4066   4025     releasePage(pPage);
  4067   4026     return depth+1;
  4068   4027   }
  4069   4028   
................................................................................
  4146   4105   }
  4147   4106   
  4148   4107   /*
  4149   4108   ** Copy the complete content of pBtFrom into pBtTo.  A transaction
  4150   4109   ** must be active for both files.
  4151   4110   **
  4152   4111   ** The size of file pBtFrom may be reduced by this operation.
  4153         -** If anything goes wrong, the transaction on pBtTo is rolled back.
         4112  +** If anything goes wrong, the transaction on pBtFrom is rolled back.
  4154   4113   */
  4155   4114   int sqlite3BtreeCopyFile(Btree *pBtTo, Btree *pBtFrom){
  4156   4115     int rc = SQLITE_OK;
  4157   4116     Pgno i, nPage, nToPage;
  4158   4117   
  4159   4118     if( !pBtTo->inTrans || !pBtFrom->inTrans ) return SQLITE_ERROR;
  4160   4119     if( pBtTo->pCursor ) return SQLITE_BUSY;