SQLite

Check-in [11750c6aee]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Simplify things by rolling the functionality of balance_shallower() into balance_nonroot(). (CVS 6808)
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 11750c6aee6aa05b2627ad9dfb2fbcdfe8944168
User & Date: danielk1977 2009-06-24 05:40:34.000
Context
2009-06-24
10:26
Remove the sqlite3Assert() function. The ALWAYS() and NEVER() macros call assert() directly when compiled with SQLITE_DEBUG. (CVS 6809) (check-in: d8fc373fef user: drh tags: trunk)
05:40
Simplify things by rolling the functionality of balance_shallower() into balance_nonroot(). (CVS 6808) (check-in: 11750c6aee user: danielk1977 tags: trunk)
2009-06-23
20:28
Enhance autoincrement so that it works with triggers that also do autoincrement inserts, even multiple inserts into the same table. Ticket #3928 (CVS 6807) (check-in: 1330993de8 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to src/btree.c.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*
** 2004 April 6
**
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: btree.c,v 1.641 2009/06/23 16:40:18 danielk1977 Exp $
**
** This file implements a external (disk-based) database using BTrees.
** See the header comment on "btreeInt.h" for additional information.
** Including a description of file format and an overview of operation.
*/
#include "btreeInt.h"












|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*
** 2004 April 6
**
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: btree.c,v 1.642 2009/06/24 05:40:34 danielk1977 Exp $
**
** This file implements a external (disk-based) database using BTrees.
** See the header comment on "btreeInt.h" for additional information.
** Including a description of file format and an overview of operation.
*/
#include "btreeInt.h"

5384
5385
5386
5387
5388
5389
5390
5391
5392
5393
5394
5395
5396
5397
5398
5399
5400
5401
5402
5403
5404
5405
5406
5407
5408
5409
5410
5411
5412
5413
5414
5415
5416
5417
5418
5419
5420
5421
5422
5423
5424
5425
5426
5427
5428
5429
5430
5431
5432
5433
5434
5435
5436
5437
5438
  ** for any b-tree or overflow pages that pTo now contains the pointers to. */
  if( ISAUTOVACUUM ){
    rc = setChildPtrmaps(pTo);
  }
  return rc;
}

/*
** This routine is called on the root page of a btree when the root
** page contains no cells. This is an opportunity to make the tree
** shallower by one level.
*/
static int balance_shallower(MemPage *pRoot, MemPage *pChild){
  /* The root page is empty but has one child.  Transfer the
  ** information from that one child into the root page if it 
  ** will fit.  This reduces the depth of the tree by one.
  **
  ** If the root page is page 1, it has less space available than
  ** its child (due to the 100 byte header that occurs at the beginning
  ** of the database fle), so it might not be able to hold all of the 
  ** information currently contained in the child.  If this is the 
  ** case, then do not do the transfer.  Leave page 1 empty except
  ** for the right-pointer to the child page.  The child page becomes
  ** the virtual root of the tree.
  */
  int rc = SQLITE_OK;                        /* Return code */
  int const hdr = pRoot->hdrOffset;          /* Offset of root page header */

  assert( sqlite3_mutex_held(pRoot->pBt->mutex) );
  assert( pRoot->nCell==0 );
  assert( pChild->pgno==get4byte(&pRoot->aData[pRoot->hdrOffset+8]) );
  assert( hdr==0 || pRoot->pgno==1 );

  if( pChild->nFree>=hdr ){
    if( hdr ){
      rc = defragmentPage(pChild);
    }
    if( rc==SQLITE_OK ){
      rc = copyNodeContent(pChild, pRoot);
    }
    if( rc==SQLITE_OK ){
      rc = freePage(pChild);
    }
  }

  return rc;
}

/*
** This routine redistributes cells on the iParentIdx'th child of pParent
** (hereafter "the page") and up to 2 siblings so that all pages have about the
** same amount of free space. Usually a single sibling on either side of the
** page are used in the balancing, though both siblings might come from one
** side if the page is the first or last child of its parent. If the page 
** has fewer than 2 siblings (something which can only happen if the page







<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<







5384
5385
5386
5387
5388
5389
5390









































5391
5392
5393
5394
5395
5396
5397
  ** for any b-tree or overflow pages that pTo now contains the pointers to. */
  if( ISAUTOVACUUM ){
    rc = setChildPtrmaps(pTo);
  }
  return rc;
}










































/*
** This routine redistributes cells on the iParentIdx'th child of pParent
** (hereafter "the page") and up to 2 siblings so that all pages have about the
** same amount of free space. Usually a single sibling on either side of the
** page are used in the balancing, though both siblings might come from one
** side if the page is the first or last child of its parent. If the page 
** has fewer than 2 siblings (something which can only happen if the page
5918
5919
5920
5921
5922
5923
5924























5925
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
5937
5938
5939
5940
5941
5942
5943
5944
5945
5946
5947
5948
5949
5950
5951
5952
5953
5954
5955
5956
5957
5958
5959
5960
5961
5962
5963
  assert( nOld>0 );
  assert( nNew>0 );
  if( (pageFlags & PTF_LEAF)==0 ){
    u8 *zChild = &apCopy[nOld-1]->aData[8];
    memcpy(&apNew[nNew-1]->aData[8], zChild, 4);
  }
























  /* Fix the pointer-map entries for all the cells that were shifted around. 
  ** There are several different types of pointer-map entries that need to
  ** be dealt with by this routine. Some of these have been set already, but
  ** many have not. The following is a summary:
  **
  **   1) The entries associated with new sibling pages that were not
  **      siblings when this function was called. These have already
  **      been set. We don't need to worry about old siblings that were
  **      moved to the free-list - the freePage() code has taken care
  **      of those.
  **
  **   2) The pointer-map entries associated with the first overflow
  **      page in any overflow chains used by new divider cells. These 
  **      have also already been taken care of by the insertCell() code.
  **
  **   3) If the sibling pages are not leaves, then the child pages of
  **      cells stored on the sibling pages may need to be updated.
  **
  **   4) If the sibling pages are not internal intkey nodes, then any
  **      overflow pages used by these cells may need to be updated
  **      (internal intkey nodes never contain pointers to overflow pages).
  **
  **   5) If the sibling pages are not leaves, then the pointer-map
  **      entries for the right-child pages of each sibling may need
  **      to be updated.
  **
  ** Cases 1 and 2 are dealt with above by other code. The following
  ** block deals with cases 3 and 4. Since setting a pointer map entry
  ** is a relatively expensive operation, this code only sets pointer
  ** map entries for child or overflow pages that have actually moved
  ** between pages.  */
  if( ISAUTOVACUUM ){
    MemPage *pNew = apNew[0];
    MemPage *pOld = apCopy[0];
    int nOverflow = pOld->nOverflow;
    int iNextOld = pOld->nCell + nOverflow;
    int iOverflow = (nOverflow ? pOld->aOvfl[0].idx : -1);
    j = 0;                             /* Current 'old' sibling page */
    k = 0;                             /* Current 'new' sibling page */







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<







5877
5878
5879
5880
5881
5882
5883
5884
5885
5886
5887
5888
5889
5890
5891
5892
5893
5894
5895
5896
5897
5898
5899
5900
5901
5902
5903
5904
5905
5906
5907
5908
5909
5910
5911
5912
5913
5914
5915
5916
5917
5918
5919
5920
5921
5922
5923
5924
5925
5926
5927
5928
5929
5930
5931
5932
5933
5934
5935
5936
5937

5938
5939
5940
5941
5942
5943
5944
  assert( nOld>0 );
  assert( nNew>0 );
  if( (pageFlags & PTF_LEAF)==0 ){
    u8 *zChild = &apCopy[nOld-1]->aData[8];
    memcpy(&apNew[nNew-1]->aData[8], zChild, 4);
  }

  if( isRoot && pParent->nCell==0 && pParent->hdrOffset<=apNew[0]->nFree ){
    /* The root page of the b-tree now contains no cells. The only sibling
    ** page is the right-child of the parent. Copy the contents of the
    ** child page into the parent, decreasing the overall height of the
    ** b-tree structure by one. This is described as the "balance-shallower"
    ** sub-algorithm in some documentation.
    **
    ** If this is an auto-vacuum database, the call to copyNodeContent() 
    ** sets all pointer-map entries corresponding to database image pages 
    ** for which the pointer is stored within the content being copied.
    **
    ** The second assert below verifies that the child page is defragmented
    ** (it must be, as it was just reconstructed using assemblePage()). This
    ** is important if the parent page happens to be page 1 of the database
    ** image.  */
    assert( nNew==1 );
    assert( apNew[0]->nFree == 
        (get2byte(&apNew[0]->aData[5])-apNew[0]->cellOffset-apNew[0]->nCell*2) 
    );
    if( SQLITE_OK==(rc = copyNodeContent(apNew[0], pParent)) ){
      rc = freePage(apNew[0]);
    }
  }else if( ISAUTOVACUUM ){
    /* Fix the pointer-map entries for all the cells that were shifted around. 
    ** There are several different types of pointer-map entries that need to
    ** be dealt with by this routine. Some of these have been set already, but
    ** many have not. The following is a summary:
    **
    **   1) The entries associated with new sibling pages that were not
    **      siblings when this function was called. These have already
    **      been set. We don't need to worry about old siblings that were
    **      moved to the free-list - the freePage() code has taken care
    **      of those.
    **
    **   2) The pointer-map entries associated with the first overflow
    **      page in any overflow chains used by new divider cells. These 
    **      have also already been taken care of by the insertCell() code.
    **
    **   3) If the sibling pages are not leaves, then the child pages of
    **      cells stored on the sibling pages may need to be updated.
    **
    **   4) If the sibling pages are not internal intkey nodes, then any
    **      overflow pages used by these cells may need to be updated
    **      (internal intkey nodes never contain pointers to overflow pages).
    **
    **   5) If the sibling pages are not leaves, then the pointer-map
    **      entries for the right-child pages of each sibling may need
    **      to be updated.
    **
    ** Cases 1 and 2 are dealt with above by other code. The next
    ** block deals with cases 3 and 4 and the one after that, case 5. Since
    ** setting a pointer map entry is a relatively expensive operation, this
    ** code only sets pointer map entries for child or overflow pages that have
    ** actually moved between pages.  */

    MemPage *pNew = apNew[0];
    MemPage *pOld = apCopy[0];
    int nOverflow = pOld->nOverflow;
    int iNextOld = pOld->nCell + nOverflow;
    int iOverflow = (nOverflow ? pOld->aOvfl[0].idx : -1);
    j = 0;                             /* Current 'old' sibling page */
    k = 0;                             /* Current 'new' sibling page */
6028
6029
6030
6031
6032
6033
6034
6035
6036
6037
6038
6039
6040
6041
6042
6043
6044
6045
6046
6047
6048
6049
6050
6051
#endif
  }

  assert( pParent->isInit );
  TRACE(("BALANCE: finished: old=%d new=%d cells=%d\n",
          nOld, nNew, nCell));

  if( rc==SQLITE_OK && pParent->nCell==0 && isRoot ){
    /* The root page of the b-tree now contains no cells. If the root-page 
    ** is not also a leaf page, it will have a single child page. Call 
    ** balance_shallower to attempt to copy the contents of the single
    ** child-page into the root page (this may not be possible if the
    ** root page is page 1).  */ 
    assert( nNew==1 );
    rc = balance_shallower(pParent, apNew[0]);
  }
 
  /*
  ** Cleanup before returning.
  */
balance_cleanup:
  sqlite3ScratchFree(apCell);
  for(i=0; i<nOld; i++){
    releasePage(apOld[i]);







<
<
<
<
<
<
<
<
<
<







6009
6010
6011
6012
6013
6014
6015










6016
6017
6018
6019
6020
6021
6022
#endif
  }

  assert( pParent->isInit );
  TRACE(("BALANCE: finished: old=%d new=%d cells=%d\n",
          nOld, nNew, nCell));











  /*
  ** Cleanup before returning.
  */
balance_cleanup:
  sqlite3ScratchFree(apCell);
  for(i=0; i<nOld; i++){
    releasePage(apOld[i]);
6123
6124
6125
6126
6127
6128
6129
6130
6131
6132
6133
6134
6135
6136
6137
6138
6139
6140
6141
** some way. This function figures out if this modification means the
** tree needs to be balanced, and if so calls the appropriate balancing 
** routine. Balancing routines are:
**
**   balance_quick()
**   balance_deeper()
**   balance_nonroot()
**
** If built with SQLITE_DEBUG, pCur->pagesShuffled is set to true if 
** balance_shallower(), balance_deeper() or balance_nonroot() is called.
** If none of these functions are invoked, pCur->pagesShuffled is left
** unmodified.
*/
static int balance(BtCursor *pCur){
  int rc = SQLITE_OK;
  const int nMin = pCur->pBt->usableSize * 2 / 3;
  u8 aBalanceQuickSpace[13];
  u8 *pFree = 0;








<
<
<
<
<







6094
6095
6096
6097
6098
6099
6100





6101
6102
6103
6104
6105
6106
6107
** some way. This function figures out if this modification means the
** tree needs to be balanced, and if so calls the appropriate balancing 
** routine. Balancing routines are:
**
**   balance_quick()
**   balance_deeper()
**   balance_nonroot()





*/
static int balance(BtCursor *pCur){
  int rc = SQLITE_OK;
  const int nMin = pCur->pBt->usableSize * 2 / 3;
  u8 aBalanceQuickSpace[13];
  u8 *pFree = 0;

6157
6158
6159
6160
6161
6162
6163
6164
6165
6166
6167
6168
6169
6170
6171
        rc = balance_deeper(pPage, &pCur->apPage[1]);
        if( rc==SQLITE_OK ){
          pCur->iPage = 1;
          pCur->aiIdx[0] = 0;
          pCur->aiIdx[1] = 0;
          assert( pCur->apPage[1]->nOverflow );
        }
        VVA_ONLY( pCur->pagesShuffled = 1 );
      }else{
        break;
      }
    }else if( pPage->nOverflow==0 && pPage->nFree<=nMin ){
      break;
    }else{
      MemPage * const pParent = pCur->apPage[iPage-1];







<







6123
6124
6125
6126
6127
6128
6129

6130
6131
6132
6133
6134
6135
6136
        rc = balance_deeper(pPage, &pCur->apPage[1]);
        if( rc==SQLITE_OK ){
          pCur->iPage = 1;
          pCur->aiIdx[0] = 0;
          pCur->aiIdx[1] = 0;
          assert( pCur->apPage[1]->nOverflow );
        }

      }else{
        break;
      }
    }else if( pPage->nOverflow==0 && pPage->nFree<=nMin ){
      break;
    }else{
      MemPage * const pParent = pCur->apPage[iPage-1];
6225
6226
6227
6228
6229
6230
6231
6232
6233
6234
6235
6236
6237
6238
6239
            sqlite3PageFree(pFree);
          }

          /* The pSpace buffer will be freed after the next call to
          ** balance_nonroot(), or just before this function returns, whichever
          ** comes first. */
          pFree = pSpace;
          VVA_ONLY( pCur->pagesShuffled = 1 );
        }
      }

      pPage->nOverflow = 0;

      /* The next iteration of the do-loop balances the parent page. */
      releasePage(pPage);







<







6190
6191
6192
6193
6194
6195
6196

6197
6198
6199
6200
6201
6202
6203
            sqlite3PageFree(pFree);
          }

          /* The pSpace buffer will be freed after the next call to
          ** balance_nonroot(), or just before this function returns, whichever
          ** comes first. */
          pFree = pSpace;

        }
      }

      pPage->nOverflow = 0;

      /* The next iteration of the do-loop balances the parent page. */
      releasePage(pPage);
Changes to src/btreeInt.h.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*
** 2004 April 6
**
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: btreeInt.h,v 1.48 2009/06/22 12:05:10 drh Exp $
**
** This file implements a external (disk-based) database using BTrees.
** For a detailed discussion of BTrees, refer to
**
**     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
**     "Sorting And Searching", pages 473-480. Addison-Wesley
**     Publishing Company, Reading, Massachusetts.











|







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*
** 2004 April 6
**
** The author disclaims copyright to this source code.  In place of
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: btreeInt.h,v 1.49 2009/06/24 05:40:34 danielk1977 Exp $
**
** This file implements a external (disk-based) database using BTrees.
** For a detailed discussion of BTrees, refer to
**
**     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
**     "Sorting And Searching", pages 473-480. Addison-Wesley
**     Publishing Company, Reading, Massachusetts.
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
  void *pKey;      /* Saved key that was cursor's last known position */
  i64 nKey;        /* Size of pKey, or last integer key */
  int skip;        /* (skip<0) -> Prev() is a no-op. (skip>0) -> Next() is */
#ifndef SQLITE_OMIT_INCRBLOB
  u8 isIncrblobHandle;      /* True if this cursor is an incr. io handle */
  Pgno *aOverflow;          /* Cache of overflow page locations */
#endif
#ifndef NDEBUG
  u8 pagesShuffled;         /* True if Btree pages are rearranged by balance()*/
#endif
  i16 iPage;                            /* Index of current page in apPage */
  MemPage *apPage[BTCURSOR_MAX_DEPTH];  /* Pages from root to current page */
  u16 aiIdx[BTCURSOR_MAX_DEPTH];        /* Current index in apPage[i] */
};

/*
** Potential values for BtCursor.eState.







<
<
<







472
473
474
475
476
477
478



479
480
481
482
483
484
485
  void *pKey;      /* Saved key that was cursor's last known position */
  i64 nKey;        /* Size of pKey, or last integer key */
  int skip;        /* (skip<0) -> Prev() is a no-op. (skip>0) -> Next() is */
#ifndef SQLITE_OMIT_INCRBLOB
  u8 isIncrblobHandle;      /* True if this cursor is an incr. io handle */
  Pgno *aOverflow;          /* Cache of overflow page locations */
#endif



  i16 iPage;                            /* Index of current page in apPage */
  MemPage *apPage[BTCURSOR_MAX_DEPTH];  /* Pages from root to current page */
  u16 aiIdx[BTCURSOR_MAX_DEPTH];        /* Current index in apPage[i] */
};

/*
** Potential values for BtCursor.eState.