/ Check-in [9f1caa53]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:The btree.test test is no working with integrity_check enabled. (CVS 1330)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 9f1caa530e69aaf202debac36b6a46d707f362d7
User & Date: drh 2004-05-09 11:51:39
Context
2004-05-09
20:40
More btree.c bug fixing. It's getting closer but still not there yet. Move obsolete test scripts into the attic. (CVS 1331) check-in: 9379c7c9 user: drh tags: trunk
11:51
The btree.test test is no working with integrity_check enabled. (CVS 1330) check-in: 9f1caa53 user: drh tags: trunk
01:35
Begin trying to get integrity checking working on the new btree.c. (CVS 1329) check-in: 499569da user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to src/btree.c.

5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
...
214
215
216
217
218
219
220

221
222
223
224
225
226
227
...
373
374
375
376
377
378
379








































































380
381
382
383
384
385
386
...
388
389
390
391
392
393
394


395
396
397
398
399
400
401
...
404
405
406
407
408
409
410

411

412
413
414
415
416
417
418
419
420
...
595
596
597
598
599
600
601
602
603
604
605
606
607
608

609
610
611
612
613
614
615
616
617

618
619
620
621
622
623
624
...
653
654
655
656
657
658
659

660
661
662
663
664
665
666
667
668
669
670
671
672


673
674
675
676
677
678
679
...
681
682
683
684
685
686
687





688
689
690
691
692
693
694
...
741
742
743
744
745
746
747

748
749
750
751
752
753
754
...
992
993
994
995
996
997
998
999

1000
1001
1002
1003
1004
1005
1006


















1007
1008
1009
1010
1011
1012
1013
....
1263
1264
1265
1266
1267
1268
1269

1270
1271
1272
1273
1274
1275
1276
....
1305
1306
1307
1308
1309
1310
1311

1312
1313
1314
1315
1316
1317
1318
....
1427
1428
1429
1430
1431
1432
1433

1434
1435
1436
1437
1438
1439
1440
....
1463
1464
1465
1466
1467
1468
1469

1470
1471
1472
1473
1474
1475
1476
....
1512
1513
1514
1515
1516
1517
1518

1519
1520
1521
1522
1523
1524
1525
....
1535
1536
1537
1538
1539
1540
1541
1542
1543

1544
1545
1546
1547
1548
1549
1550
....
1558
1559
1560
1561
1562
1563
1564

1565
1566

1567
1568
1569
1570
1571
1572
1573
....
1609
1610
1611
1612
1613
1614
1615

1616
1617
1618
1619
1620
1621
1622
....
1757
1758
1759
1760
1761
1762
1763

1764
1765
1766
1767
1768
1769
1770
....
2020
2021
2022
2023
2024
2025
2026


2027
2028
2029
2030
2031
2032
2033
....
2191
2192
2193
2194
2195
2196
2197

2198
2199
2200
2201
2202
2203
2204
2205

2206
2207
2208
2209
2210
2211
2212
....
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243

2244
2245
2246
2247

2248
2249
2250
2251
2252

2253
2254
2255
2256
2257
2258
2259

















2260
2261
2262
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273

2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290

2291
2292















2293
2294
2295
2296
2297
2298
2299
2300
2301
2302
2303
2304
2305

2306
2307
2308
2309
2310
2311
2312
2313

2314
2315
2316
2317
2318
2319
2320
....
2325
2326
2327
2328
2329
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
....
2352
2353
2354
2355
2356
2357
2358









2359
2360
2361
2362
2363
2364
2365
....
2417
2418
2419
2420
2421
2422
2423

2424
2425
2426
2427
2428
2429
2430
....
2447
2448
2449
2450
2451
2452
2453

2454
2455
2456
2457
2458
2459
2460
2461

2462
2463
2464
2465
2466
2467
2468
....
2485
2486
2487
2488
2489
2490
2491

2492
2493
2494

2495
2496
2497
2498
2499
2500
2501
2502

2503
2504
2505
2506
2507
2508
2509
2510
2511
2512

2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530

2531
2532
2533
2534
2535
2536

2537
2538
2539
2540
2541
2542
2543
....
2606
2607
2608
2609
2610
2611
2612


2613
2614
2615
2616
2617
2618
2619
2620
....
2631
2632
2633
2634
2635
2636
2637
2638
2639
2640
2641

2642
2643
2644
2645
2646
2647
2648
2649
....
2687
2688
2689
2690
2691
2692
2693

2694
2695
2696
2697
2698
2699
2700
2701

2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
2718
2719
2720
....
2800
2801
2802
2803
2804
2805
2806



2807

2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823


2824
2825
2826
2827
2828
2829
2830
....
2968
2969
2970
2971
2972
2973
2974

2975
2976
2977
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987

2988
2989

2990
2991
2992
2993
2994
2995
2996
....
3015
3016
3017
3018
3019
3020
3021
3022
3023
3024
3025
3026
3027
3028
3029
....
3186
3187
3188
3189
3190
3191
3192
3193
3194

3195
3196
3197
3198
3199
3200
3201
....
3283
3284
3285
3286
3287
3288
3289
3290
3291
3292


3293
3294
3295
3296
3297
3298
3299
....
3435
3436
3437
3438
3439
3440
3441
3442
3443
3444
3445
3446
3447
3448
3449
3450
3451
....
3455
3456
3457
3458
3459
3460
3461
3462


3463
3464
3465
3466

3467
3468
3469

3470
3471
3472
3473


3474
3475
3476
3477
3478
3479
3480
....
3482
3483
3484
3485
3486
3487
3488
3489
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504

3505
3506
3507
3508
3509
3510
3511
3512
3513
3514






3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528


3529
3530
3531
3532
3533
3534
3535
3536
3537
3538



3539
3540
3541

3542
3543
3544
3545
3546
3547
3548


3549
3550
3551

3552
3553
3554


3555
3556
3557

3558
3559

3560
3561
3562
3563

3564
3565
3566
3567
3568
3569




3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
....
3597
3598
3599
3600
3601
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: btree.c,v 1.118 2004/05/09 01:35:06 drh Exp $
**
** This file implements a external (disk-based) database using BTrees.
** For a detailed discussion of BTrees, refer to
**
**     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
**     "Sorting And Searching", pages 473-480. Addison-Wesley
**     Publishing Company, Reading, Massachusetts.
................................................................................
  u8 isInit;                     /* True if previously initialized */
  u8 idxShift;                   /* True if Cell indices have changed */
  u8 isOverfull;                 /* Some aCell[] do not fit on page */
  u8 intKey;                     /* True if intkey flag is set */
  u8 leaf;                       /* True if leaf flag is set */
  u8 zeroData;                   /* True if zero data flag is set */
  u8 hdrOffset;                  /* 100 for page 1.  0 otherwise */

  int idxParent;                 /* Index in pParent->aCell[] of this node */
  int nFree;                     /* Number of free bytes on the page */
  int nCell;                     /* Number of entries on this page */
  int nCellAlloc;                /* Number of slots allocated in aCell[] */
  unsigned char **aCell;         /* Pointer to start of each cell */
  struct Btree *pBt;             /* Pointer back to BTree structure */

................................................................................
  maxPayload = pPage->pBt->maxLocal;
  if( nPayload>maxPayload ){
    nPayload = maxPayload + 4;
  }
  return n + nPayload;
}









































































/*
** Defragment the page given.  All Cells are moved to the
** beginning of the page and all free space is collected 
** into one big FreeBlk at the end of the page.
*/
static void defragmentPage(MemPage *pPage){
  int pc, i, n, addr;
................................................................................
  int leftover;
  unsigned char *oldPage;
  unsigned char newPage[MX_PAGE_SIZE];

  assert( sqlite3pager_iswriteable(pPage->aData) );
  assert( pPage->pBt!=0 );
  assert( pPage->pBt->pageSize <= MX_PAGE_SIZE );


  oldPage = pPage->aData;
  hdr = pPage->hdrOffset;
  addr = 3+hdr;
  n = 6+hdr;
  if( !pPage->leaf ){
    n += 4;
  }
................................................................................
  pc = get2byte(&oldPage[addr]);
  i = 0;
  while( pc>0 ){
    assert( n<pPage->pBt->pageSize );
    size = cellSize(pPage, &oldPage[pc]);
    memcpy(&newPage[n], &oldPage[pc], size);
    put2byte(&newPage[addr],n);

    pPage->aCell[i++] = &oldPage[n];

    n += size;
    addr = pc;
    pc = get2byte(&oldPage[pc]);
  }
  assert( i==pPage->nCell );
  leftover = pPage->pBt->pageSize - n;
  assert( leftover>=0 );
  assert( pPage->nFree==leftover );
  if( leftover<4 ){
................................................................................
  int pageSize;
  int sumCell = 0;       /* Total size of all cells */

  assert( pPage->pBt!=0 );
  assert( pParent==0 || pParent->pBt==pPage->pBt );
  assert( pPage->pgno==sqlite3pager_pagenumber(pPage->aData) );
  assert( pPage->aData == &((unsigned char*)pPage)[-pPage->pBt->pageSize] );
  assert( pPage->isInit==0 || pPage->pParent==pParent );
  if( pPage->isInit ) return SQLITE_OK;
  assert( pPage->pParent==0 );
  pPage->pParent = pParent;
  if( pParent ){
    sqlite3pager_ref(pParent->aData);
  }

  pPage->nCell = pPage->nCellAlloc = 0;
  assert( pPage->hdrOffset==(pPage->pgno==1 ? 100 : 0) );
  hdr = pPage->hdrOffset;
  data = pPage->aData;
  c = data[hdr];
  pPage->intKey = (c & PTF_INTKEY)!=0;
  pPage->zeroData = (c & PTF_ZERODATA)!=0;
  pPage->leaf = (c & PTF_LEAF)!=0;
  pPage->isOverfull = 0;

  pPage->idxShift = 0;
  pageSize = pPage->pBt->pageSize;

  /* Initialize the cell count and cell pointers */
  pc = get2byte(&data[hdr+3]);
  while( pc>0 ){
    if( pc>=pageSize ) return SQLITE_CORRUPT;
................................................................................
  /* Sanity check:  Cells and freespace and header must sum to the size
  ** a page. */
  if( sumCell+pPage->nFree+hdr+10-pPage->leaf*4 != pageSize ){
    return SQLITE_CORRUPT;
  }

  pPage->isInit = 1;

  return SQLITE_OK;
}

/*
** Set up a raw page so that it looks like a database page holding
** no entries.
*/
static void zeroPage(MemPage *pPage, int flags){
  unsigned char *data = pPage->aData;
  Btree *pBt = pPage->pBt;
  int hdr = pPage->hdrOffset;
  int first;



  assert( sqlite3pager_iswriteable(data) );
  memset(&data[hdr], 0, pBt->pageSize - hdr);
  data[hdr] = flags;
  first = hdr + 6 + 4*((flags&PTF_LEAF)==0);
  put2byte(&data[hdr+1], first);
  put2byte(&data[first+2], pBt->pageSize - first);
  sqliteFree(pPage->aCell);
................................................................................
  pPage->nCell = 0;
  pPage->nCellAlloc = 0;
  pPage->nFree = pBt->pageSize - first;
  pPage->intKey = (flags & PTF_INTKEY)!=0;
  pPage->leaf = (flags & PTF_LEAF)!=0;
  pPage->zeroData = (flags & PTF_ZERODATA)!=0;
  pPage->hdrOffset = hdr;





}

/*
** Get a page from the pager.  Initialize the MemPage.pBt and
** MemPage.aData elements if needed.
*/
static int getPage(Btree *pBt, Pgno pgno, MemPage **ppPage){
................................................................................
/*
** This routine is called when the reference count for a page
** reaches zero.  We need to unref the pParent pointer when that
** happens.
*/
static void pageDestructor(void *pData){
  MemPage *pPage = (MemPage*)&((char*)pData)[SQLITE_PAGE_SIZE];

  if( pPage->pParent ){
    MemPage *pParent = pPage->pParent;
    pPage->pParent = 0;
    releasePage(pParent);
  }
  sqliteFree(pPage->aCell);
  pPage->aCell = 0;
................................................................................
/*
** Invalidate all cursors
*/
static void invalidateCursors(Btree *pBt){
  BtCursor *pCur;
  for(pCur=pBt->pCursor; pCur; pCur=pCur->pNext){
    MemPage *pPage = pCur->pPage;
    if( pPage && !pPage->isInit ){

      releasePage(pPage);
      pCur->pPage = 0;
      pCur->isValid = 0;
      pCur->status = SQLITE_ABORT;
    }
  }
}



















/*
** Rollback the transaction in progress.  All cursors will be
** invalided by this operation.  Any attempt to use a cursor
** that was open at the beginning of this operation will result
** in an error.
**
................................................................................
  MemPage *pPage;
  unsigned char *cell;

  if( !pCur->isValid ){
    *pSize = 0;
  }else{
    pPage = pCur->pPage;

    assert( pPage!=0 );
    assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
    cell = pPage->aCell[pCur->idx];
    cell += 2;   /* Skip the offset to the next cell */
    if( !pPage->leaf ){
      cell += 4;  /* Skip the child pointer */
    }
................................................................................
  u64 nData, nKey;
  int maxLocal, ovflSize;

  assert( pCur!=0 && pCur->pPage!=0 );
  assert( pCur->isValid );
  pBt = pCur->pBt;
  pPage = pCur->pPage;

  assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  aPayload = pPage->aCell[pCur->idx];
  aPayload += 2;  /* Skip the next cell index */
  if( !pPage->leaf ){
    aPayload += 4;  /* Skip the child pointer */
  }
  if( pPage->zeroData ){
................................................................................
  if( !pCur->isValid ){
    return 0;
  }
  assert( pCur->pPage!=0 );
  assert( pCur->idx>=0 && pCur->idx<pCur->pPage->nCell );
  pBt = pCur->pBt;
  pPage = pCur->pPage;

  assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  assert( pPage->intKey==0 );
  aPayload = pPage->aCell[pCur->idx];
  aPayload += 2;  /* Skip the next cell index */
  if( !pPage->leaf ){
    aPayload += 4;  /* Skip the child pointer */
  }
................................................................................

  if( !pCur->isValid ){
    return pCur->status ? pCur->status : SQLITE_INTERNAL;
  }
  pPage = pCur->pPage;
  assert( pPage!=0 );
  assert( pPage->isInit );

  if( pPage->zeroData ){
    *pSize = 0;
  }else{
    assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
    cell = pPage->aCell[pCur->idx];
    cell += 2;   /* Skip the offset to the next cell */
    if( !pPage->leaf ){
................................................................................
  MemPage *pNewPage;
  MemPage *pOldPage;
  Btree *pBt = pCur->pBt;

  assert( pCur->isValid );
  rc = getAndInitPage(pBt, newPgno, &pNewPage, pCur->pPage);
  if( rc ) return rc;

  pNewPage->idxParent = pCur->idx;
  pOldPage = pCur->pPage;
  pOldPage->idxShift = 0;
  releasePage(pOldPage);
  pCur->pPage = pNewPage;
  pCur->idx = 0;
  if( pNewPage->nCell<1 ){
................................................................................
** for the table rooted on page 1, sometime the real root page
** is empty except for the right-pointer.  In such cases the
** virtual root page is the page that the right-pointer of page
** 1 is pointing to.
*/
static int isRootPage(MemPage *pPage){
  MemPage *pParent = pPage->pParent;
  assert( pParent==0 || pParent->isInit );
  if( pParent==0 || (pParent->pgno==1 && pParent->nCell==0) ) return 1;

  return 0;
}

/*
** Move the cursor up to the parent page.
**
** pCur->idx is set to the cell index that contains the pointer
................................................................................
  MemPage *pPage;
  int idxParent;

  assert( pCur->isValid );
  pPage = pCur->pPage;
  assert( pPage!=0 );
  assert( !isRootPage(pPage) );

  pParent = pPage->pParent;
  assert( pParent!=0 );

  idxParent = pPage->idxParent;
  sqlite3pager_ref(pParent->aData);
  oldPgno = pPage->pgno;
  releasePage(pPage);
  pCur->pPage = pParent;
  assert( pParent->idxShift==0 );
  if( pParent->idxShift==0 ){
................................................................................

  rc = getAndInitPage(pBt, pCur->pgnoRoot, &pRoot, 0);
  if( rc ){
    pCur->isValid = 0;
    return rc;
  }
  releasePage(pCur->pPage);

  pCur->pPage = pRoot;
  pCur->idx = 0;
  if( pRoot->nCell==0 && !pRoot->leaf ){
    Pgno subpage;
    assert( pRoot->pgno==1 );
    subpage = get4byte(&pRoot->aData[pRoot->hdrOffset+6]);
    assert( subpage>0 );
................................................................................
  for(;;){
    int lwr, upr;
    Pgno chldPg;
    MemPage *pPage = pCur->pPage;
    int c = -1;  /* pRes return if table is empty must be -1 */
    lwr = 0;
    upr = pPage->nCell-1;

    while( lwr<=upr ){
      void *pCellKey;
      u64 nCellKey;
      pCur->idx = (lwr+upr)/2;
      sqlite3BtreeKeySize(pCur, &nCellKey);
      if( pPage->intKey ){
        if( nCellKey<nKey ){
................................................................................
  rc = sqlite3pager_write(pPage1->aData);
  if( rc ) return rc;
  n = get4byte(&pPage1->aData[36]);
  put4byte(&pPage1->aData[36], n+1);

  if( n==0 ){
    /* This is the first free page */


    memset(pPage->aData, 0, 8);
    put4byte(&pPage1->aData[32], pPage->pgno);
  }else{
    /* Other free pages already exist.  Retrive the first trunk page
    ** of the freelist and find out how many leaves it has. */
    MemPage *pTrunk;
    rc = getPage(pBt, get4byte(&pPage1->aData[32]), &pTrunk);
................................................................................
static void reparentPage(Btree *pBt, Pgno pgno, MemPage *pNewParent, int idx){
  MemPage *pThis;
  unsigned char *aData;

  if( pgno==0 ) return;
  assert( pBt->pPager!=0 );
  aData = sqlite3pager_lookup(pBt->pPager, pgno);

  pThis = (MemPage*)&aData[pBt->pageSize];
  if( pThis && pThis->isInit ){
    if( pThis->pParent!=pNewParent ){
      if( pThis->pParent ) sqlite3pager_unref(pThis->pParent->aData);
      pThis->pParent = pNewParent;
      if( pNewParent ) sqlite3pager_ref(pNewParent->aData);
    }
    pThis->idxParent = idx;

    sqlite3pager_unref(aData);
  }
}

/*
** Change the pParent pointer of all children of pPage to point back
** to pPage.
................................................................................
** Remove the i-th cell from pPage.  This routine effects pPage only.
** The cell content is not freed or deallocated.  It is assumed that
** the cell content has been copied someplace else.  This routine just
** removes the reference to the cell from pPage.
**
** "sz" must be the number of bytes in the cell.
**
** Do not bother maintaining the integrity of the linked list of Cells.
** Only the pPage->aCell[] array is important.  The relinkCellList() 
** routine will be called soon after this routine in order to rebuild 

** the linked list.
*/
static void dropCell(MemPage *pPage, int idx, int sz){
  int j, pc;

  assert( idx>=0 && idx<pPage->nCell );
  assert( sz==cellSize(pPage, pPage->aCell[idx]) );
  assert( sqlite3pager_iswriteable(pPage->aData) );
  assert( pPage->aCell[idx]>=pPage->aData );
  assert( pPage->aCell[idx]<&pPage->aData[pPage->pBt->pageSize-sz] );

  pc = Addr(pPage->aCell[idx]) - Addr(pPage->aData);
  assert( pc>pPage->hdrOffset && pc+sz<=pPage->pBt->pageSize );
  freeSpace(pPage, pc, sz);
  for(j=idx; j<pPage->nCell-1; j++){
    pPage->aCell[j] = pPage->aCell[j+1];
  }
  pPage->nCell--;

















  pPage->idxShift = 1;
}

/*
** Insert a new cell on pPage at cell index "i".  pCell points to the
** content of the cell.
**
** If the cell content will fit on the page, then put it there.  If it
** will not fit, then just make pPage->aCell[i] point to the content
** and set pPage->isOverfull.  
**
** Do not bother maintaining the integrity of the linked list of Cells.
** Only the pPage->aCell[] array is important.  The relinkCellList() 
** routine will be called soon after this routine in order to rebuild 

** the linked list.
*/
static void insertCell(MemPage *pPage, int i, unsigned char *pCell, int sz){
  int idx, j;
  assert( i>=0 && i<=pPage->nCell );
  assert( sz==cellSize(pPage, pCell) );
  assert( sqlite3pager_iswriteable(pPage->aData) );
  idx = allocateSpace(pPage, sz);
  resizeCellArray(pPage, pPage->nCell+1);
  for(j=pPage->nCell; j>i; j--){
    pPage->aCell[j] = pPage->aCell[j-1];
  }
  pPage->nCell++;
  if( idx<=0 ){
    pPage->isOverfull = 1;
    pPage->aCell[i] = pCell;
  }else{

    memcpy(&pPage->aData[idx], pCell, sz);
    pPage->aCell[i] = &pPage->aData[idx];















  }
  pPage->idxShift = 1;
}

/*
** Rebuild the linked list of cells on a page so that the cells
** occur in the order specified by the pPage->aCell[] array.  
** Invoke this routine once to repair damage after one or more
** invocations of either insertCell() or dropCell().
*/
static void relinkCellList(MemPage *pPage){
  int i, idxFrom;
  assert( sqlite3pager_iswriteable(pPage->aData) );

  idxFrom = pPage->hdrOffset+3;
  for(i=0; i<pPage->nCell; i++){
    int idx = Addr(pPage->aCell[i]) - Addr(pPage->aData);
    assert( idx>pPage->hdrOffset && idx<pPage->pBt->pageSize );
    put2byte(&pPage->aData[idxFrom], idx);
    idxFrom = idx;
  }
  put2byte(&pPage->aData[idxFrom], 0);

}

/*
** GCC does not define the offsetof() macro so we'll have to do it
** ourselves.
*/
#ifndef offsetof
................................................................................
** Move the content of the page at pFrom over to pTo.  The pFrom->aCell[]
** pointers that point into pFrom->aData[] must be adjusted to point
** into pTo->aData[] instead.  But some pFrom->aCell[] entries might
** not point to pFrom->aData[].  Those are unchanged.
**
** Over this operation completes, the meta data for pFrom is zeroed.
*/
static void copyPage(MemPage *pTo, MemPage *pFrom){
  uptr from, to;
  int i;
  int pageSize;
  int ofst;

  assert( pTo->hdrOffset==0 );
  ofst = pFrom->hdrOffset;
................................................................................
    uptr x = Addr(pTo->aCell[i]);
    if( x>from && x<from+pageSize-ofst ){
      *((uptr*)&pTo->aCell[i]) = x + to - from;
    }
  }
}










/*
** The following parameters determine how many adjacent pages get involved
** in a balancing operation.  NN is the number of neighbors on either side
** of the page that participate in the balancing operation.  NB is the
** total number of pages that participate, including the target page and
** NN neighbors on either side.
**
................................................................................
  int idx;                     /* Index of pPage in pParent->aCell[] */
  int nxDiv;                   /* Next divider slot in pParent->aCell[] */
  int rc;                      /* The return code */
  int leafCorrection;          /* 4 if pPage is a leaf.  0 if not */
  int usableSpace;             /* Bytes in pPage beyond the header */
  int pageFlags;               /* Value of pPage->aData[0] */
  int subtotal;                /* Subtotal of bytes in cells on one page */

  MemPage *apOld[NB];          /* pPage and up to two siblings */
  Pgno pgnoOld[NB];            /* Page numbers for each page in apOld[] */
  MemPage *apCopy[NB];         /* Private copies of apOld[] pages */
  MemPage *apNew[NB+1];        /* pPage and up to NB siblings after balancing */
  Pgno pgnoNew[NB+1];          /* Page numbers for each page in apNew[] */
  int idxDiv[NB];              /* Indices of divider cells in pParent */
  u8 *apDiv[NB];               /* Divider cells in pParent */
................................................................................
  }

  /*
  ** Find the parent of the page to be balanced.  If there is no parent,
  ** it means this page is the root page and special rules apply.
  */
  pParent = pPage->pParent;

  if( pParent==0 ){
    Pgno pgnoChild;
    MemPage *pChild;
    assert( pPage->isInit );
    if( pPage->nCell==0 ){
      if( pPage->leaf ){
        /* The table is completely empty */
        relinkCellList(pPage);

      }else{
        /* The root page is empty but has one child.  Transfer the
        ** information from that one child into the root page if it 
        ** will fit.  This reduces the depth of the BTree by one.
        **
        ** If the root page is page 1, it has less space available than
        ** its child (due to the 100 byte header that occurs at the beginning
................................................................................
            zeroPage(pPage, pChild->aData[0]);
            resizeCellArray(pPage, pChild->nCell);
            for(i=0; i<pChild->nCell; i++){
              insertCell(pPage, i, pChild->aCell[i], 
                        cellSize(pChild, pChild->aCell[i]));
            }
            freePage(pChild);

          }else{
            /* The child has more information that will fit on the root.
            ** The tree is already balanced.  Do nothing. */

          }
        }else{
          memcpy(pPage, pChild, pBt->pageSize);
          pPage->isInit = 0;
          pPage->pParent = 0;
          rc = initPage(pPage, 0);
          assert( rc==SQLITE_OK );
          freePage(pChild);

        }
        reparentChildPages(pPage);
        releasePage(pChild);
      }
      return SQLITE_OK;
    }
    if( !pPage->isOverfull ){
      /* It is OK for the root page to be less than half full.
      */
      relinkCellList(pPage);

      return SQLITE_OK;
    }
    /*
    ** If we get to here, it means the root page is overfull.
    ** When this happens, Create a new child page and copy the
    ** contents of the root into the child.  Then make the root
    ** page an empty page with rightChild pointing to the new
    ** child.  Then fall thru to the code below which will cause
    ** the overfull child page to be split.
    */
    rc = allocatePage(pBt, &pChild, &pgnoChild, pPage->pgno);
    if( rc ) return rc;
    assert( sqlite3pager_iswriteable(pChild->aData) );
    copyPage(pChild, pPage);
    assert( pChild->aData[0]==pPage->aData[pPage->hdrOffset] );
    pChild->pParent = pPage;
    pChild->idxParent = 0;
    sqlite3pager_ref(pPage->aData);

    pChild->isOverfull = 1;
    zeroPage(pPage, pChild->aData[0] & ~PTF_LEAF);
    put4byte(&pPage->aData[pPage->hdrOffset+6], pChild->pgno);
    pParent = pPage;
    pPage = pChild;
    initPage(pParent, 0);

  }
  rc = sqlite3pager_write(pParent->aData);
  if( rc ) return rc;
  assert( pParent->isInit );
  
  /*
  ** Find the cell in the parent page whose left child points back
................................................................................
  ** The rest of this function will use data from the copies rather
  ** that the original pages since the original pages will be in the
  ** process of being overwritten.
  */
  for(i=0; i<nOld; i++){
    MemPage *p = apCopy[i] = (MemPage*)&aCopy[i+1][-sizeof(MemPage)];
    p->aData = &((u8*)p)[-pBt->pageSize];


    copyPage(p, apOld[i]);
  }

  /*
  ** Load pointers to all cells on sibling pages and the divider cells
  ** into the local apCell[] array.  Make copies of the divider cells
  ** into aTemp[] and remove the the divider Cells from pParent.
  **
................................................................................
    MemPage *pOld = apCopy[i];
    for(j=0; j<pOld->nCell; j++){
      apCell[nCell] = pOld->aCell[j];
      szCell[nCell] = cellSize(pOld, apCell[nCell]);
      nCell++;
    }
    if( i<nOld-1 ){
      szCell[nCell] = cellSize(pParent, apDiv[i]) - leafCorrection;
      memcpy(aTemp[i], apDiv[i], szCell[nCell] + leafCorrection);
      apCell[nCell] = &aTemp[i][leafCorrection];
      dropCell(pParent, nxDiv, szCell[nCell]);

      assert( get4byte(&apCell[nCell][2])==pgnoOld[i] );
      if( !pOld->leaf ){
        assert( leafCorrection==0 );
        /* The right pointer of the child page pOld becomes the left
        ** pointer of the divider cell */
        memcpy(&apCell[nCell][2], &pOld->aData[pOld->hdrOffset+6], 4);
      }else{
        assert( leafCorrection==4 );
................................................................................

  /*
  ** Allocate k new pages.  Reuse old pages where possible.
  */
  assert( pPage->pgno>1 );
  pageFlags = pPage->aData[0];
  for(i=0; i<k; i++){

    if( i<nOld ){
      apNew[i] = apOld[i];
      pgnoNew[i] = pgnoOld[i];
      apOld[i] = 0;
      sqlite3pager_write(apNew[i]->aData);
    }else{
      rc = allocatePage(pBt, &apNew[i], &pgnoNew[i], pgnoNew[i-1]);
      if( rc ) goto balance_cleanup;

    }
    nNew++;
    zeroPage(apNew[i], pageFlags);
    apNew[i]->isInit = 1;
  }

  /* Free any old pages that were not reused as new pages.
  */
  while( i<nOld ){
    rc = freePage(apOld[i]);
    if( rc ) goto balance_cleanup;
    sqlite3pager_unref(apOld[i]->aData);
    apOld[i] = 0;
    i++;
  }

  /*
  ** Put the new pages in accending order.  This helps to
  ** keep entries in the disk file in order so that a scan
................................................................................
    reparentChildPages(apNew[i]);
  }
  reparentChildPages(pParent);

  /*
  ** balance the parent page.
  */



  rc = balance(pParent);


  /*
  ** Cleanup before returning.
  */
balance_cleanup:
  for(i=0; i<nOld; i++){
    releasePage(apOld[i]);
    if( apCopy[i] ){
      releasePage(apCopy[i]->pParent);
      sqliteFree(apCopy[i]->aCell);
    }
  }
  for(i=0; i<nNew; i++){
    releasePage(apNew[i]);
  }
  releasePage(pParent);


  return rc;
}

/*
** This routine checks all cursors that point to the same table
** as pCur points to.  If any of those cursors were opened with
** wrFlag==0 then this routine returns SQLITE_LOCKED.  If all
................................................................................
    ** next Cell after the one to be deleted is guaranteed to exist and
    ** to be a leaf so we can use it.
    */
    BtCursor leafCur;
    unsigned char *pNext;
    int szNext;
    int notUsed;

    getTempCursor(pCur, &leafCur);
    rc = sqlite3BtreeNext(&leafCur, &notUsed);
    if( rc!=SQLITE_OK ){
      if( rc!=SQLITE_NOMEM ) rc = SQLITE_CORRUPT;
      return rc;
    }
    rc = sqlite3pager_write(leafCur.pPage->aData);
    if( rc ) return rc;
    dropCell(pPage, pCur->idx, cellSize(pPage, pCell));
    pNext = leafCur.pPage->aCell[leafCur.idx];
    szNext = cellSize(leafCur.pPage, pNext);
    insertCell(pPage, pCur->idx, &pNext[-4], szNext+4);
    put4byte(&pNext[-2], pgnoChild);

    rc = balance(pPage);
    if( rc ) return rc;

    dropCell(leafCur.pPage, leafCur.idx, szNext);
    rc = balance(leafCur.pPage);
    releaseTempCursor(&leafCur);
  }else{
    dropCell(pPage, pCur->idx, cellSize(pPage, pCell));
    rc = balance(pPage);
  }
................................................................................
  if( !pBt->inTrans ){
    /* Must start a transaction first */
    return pBt->readOnly ? SQLITE_READONLY : SQLITE_ERROR;
  }
  if( pBt->readOnly ){
    return SQLITE_READONLY;
  }
  rc = allocatePage(pBt, &pRoot, &pgnoRoot, 0);
  if( rc ) return rc;
  assert( sqlite3pager_iswriteable(pRoot->aData) );
  zeroPage(pRoot, flags | PTF_LEAF);
  sqlite3pager_unref(pRoot->aData);
  *piTable = (int)pgnoRoot;
  return SQLITE_OK;
}
................................................................................
  }
  hdr = pPage->hdrOffset;
  data = pPage->aData;
  c = data[hdr];
  pPage->intKey = (c & PTF_INTKEY)!=0;
  pPage->zeroData = (c & PTF_ZERODATA)!=0;
  pPage->leaf = (c & PTF_LEAF)!=0;
  printf("PAGE %d:  flags=0x%02x  frag=%d\n", pgno,
    data[hdr], data[hdr+5]);

  i = 0;
  assert( hdr == (pgno==1 ? 100 : 0) );
  idx = get2byte(&data[hdr+3]);
  while( idx>0 && idx<=pBt->pageSize ){
    u64 nData, nKey;
    int nHeader;
    Pgno child;
................................................................................
**   aResult[4] =  Number of free bytes on this page
**   aResult[5] =  Number of free blocks on the page
**   aResult[6] =  Page number of the left child of this entry
**   aResult[7] =  Page number of the right child for the whole page
**
** This routine is used for testing and debugging only.
*/
int sqlite3BtreeCursorDump(BtCursor *pCur, int *aResult){
  int cnt, idx;
  MemPage *pPage = pCur->pPage;


  assert( pPage->isInit );
  aResult[0] = sqlite3pager_pagenumber(pPage->aData);
  assert( aResult[0]==pPage->pgno );
  aResult[1] = pCur->idx;
  aResult[2] = pPage->nCell;
  if( pCur->idx>=0 && pCur->idx<pPage->nCell ){
    aResult[3] = cellSize(pPage, pPage->aCell[pCur->idx]);
................................................................................
** the tree depth.  Root pages return 0.  Parents of root pages
** return 1, and so forth.
** 
** These checks are done:
**
**      1.  Make sure that cells and freeblocks do not overlap
**          but combine to completely cover the page.
**      2.  Make sure cell keys are in order.
**      3.  Make sure no key is less than or equal to zLowerBound.
**      4.  Make sure no key is greater than or equal to zUpperBound.
**      5.  Check the integrity of overflow pages.
**      6.  Recursively call checkTreePage on all children.
**      7.  Verify that the depth of all children is the same.
**      8.  Make sure this page is at least 33% full or else it is
**          the root of the tree.
*/
static int checkTreePage(
................................................................................
  char *zParentContext, /* Parent context */
  char *zLowerBound,    /* All keys should be greater than this, if not NULL */
  int nLower,           /* Number of characters in zLowerBound */
  char *zUpperBound,    /* All keys should be less than this, if not NULL */
  int nUpper            /* Number of characters in zUpperBound */
){
  MemPage *pPage;
  int i, rc, depth, d2, pgno;


  char *zKey1, *zKey2;
  int nKey1, nKey2;
  BtCursor cur;
  Btree *pBt;

  char zMsg[100];
  char zContext[100];
  char hit[SQLITE_USABLE_SIZE];


  /* Check that the page exists
  */
  cur.pBt = pBt = pCheck->pBt;


  if( iPage==0 ) return 0;
  if( checkRef(pCheck, iPage, zParentContext) ) return 0;
  sprintf(zContext, "On tree page %d: ", iPage);
  if( (rc = getPage(pBt, (Pgno)iPage, &pPage))!=0 ){
    sprintf(zMsg, "unable to get the page. error code=%d", rc);
    checkAppendMsg(pCheck, zContext, zMsg);
    return 0;
................................................................................
  if( (rc = initPage(pPage, pParent))!=0 ){
    sprintf(zMsg, "initPage() returns error code %d", rc);
    checkAppendMsg(pCheck, zContext, zMsg);
    releasePage(pPage);
    return 0;
  }

#if 0

  /* Check out all the cells.
  */
  depth = 0;
  if( zLowerBound ){
    zKey1 = sqliteMalloc( nLower+1 );
    memcpy(zKey1, zLowerBound, nLower);
    zKey1[nLower] = 0;
  }else{
    zKey1 = 0;
  }
  nKey1 = nLower;
  cur.pPage = pPage;
  for(i=0; i<pPage->nCell; i++){
    Cell *pCell = pPage->aCell[i];

    int sz;

    /* Check payload overflow pages
    */
    nKey2 = NKEY(pBt, pCell->h);
    sz = nKey2 + NDATA(pBt, pCell->h);
    sprintf(zContext, "On page %d cell %d: ", iPage, i);
    if( sz>MX_LOCAL_PAYLOAD ){
      int nPage = (sz - MX_LOCAL_PAYLOAD + OVERFLOW_SIZE - 1)/OVERFLOW_SIZE;
      checkList(pCheck, 0, SWAB32(pBt, pCell->ovfl), nPage, zContext);






    }

    /* Check that keys are in the right order
    */
    cur.idx = i;
    zKey2 = sqliteMallocRaw( nKey2+1 );
    getPayload(&cur, 0, nKey2, zKey2);
    if( zKey1 && keyCompare(zKey1, nKey1, zKey2, nKey2)>=0 ){
      checkAppendMsg(pCheck, zContext, "Key is out of order");
    }

    /* Check sanity of left child page.
    */
    pgno = SWAB32(pBt, pCell->h.leftChild);


    d2 = checkTreePage(pCheck, pgno, pPage, zContext, zKey1,nKey1,zKey2,nKey2);
    if( i>0 && d2!=depth ){
      checkAppendMsg(pCheck, zContext, "Child page depth differs");
    }
    depth = d2;
    sqliteFree(zKey1);
    zKey1 = zKey2;
    nKey1 = nKey2;
  }
  pgno = SWAB32(pBt, pPage->u.hdr.rightChild);



  sprintf(zContext, "On page %d at right child: ", iPage);
  checkTreePage(pCheck, pgno, pPage, zContext, zKey1,nKey1,zUpperBound,nUpper);
  sqliteFree(zKey1);

 
  /* Check for complete coverage of the page
  */
  memset(hit, 0, sizeof(hit));
  memset(hit, 1, sizeof(PageHdr));
  for(i=SWAB16(pBt, pPage->u.hdr.firstCell); i>0 && i<SQLITE_USABLE_SIZE; ){
    Cell *pCell = (Cell*)&pPage->u.aDisk[i];


    int j;
    for(j=i+cellSize(pBt, pCell)-1; j>=i; j--) hit[j]++;
    i = SWAB16(pBt, pCell->h.iNext);

  }
  for(i=SWAB16(pBt,pPage->u.hdr.firstFree); i>0 && i<SQLITE_USABLE_SIZE; ){
    FreeBlk *pFBlk = (FreeBlk*)&pPage->u.aDisk[i];


    int j;
    for(j=i+SWAB16(pBt,pFBlk->iSize)-1; j>=i; j--) hit[j]++;
    i = SWAB16(pBt,pFBlk->iNext);

  }
  for(i=0; i<SQLITE_USABLE_SIZE; i++){

    if( hit[i]==0 ){
      sprintf(zMsg, "Unused space at byte %d of page %d", i, iPage);
      checkAppendMsg(pCheck, zMsg, 0);
      break;

    }else if( hit[i]>1 ){
      sprintf(zMsg, "Multiple uses for byte %d of page %d", i, iPage);
      checkAppendMsg(pCheck, zMsg, 0);
      break;
    }
  }





#endif

  releasePage(pPage);
  return depth;
}

/*
** This routine does a complete check of the given BTree file.  aRoot[] is
** an array of pages numbers were each page number is the root page of
** a table.  nRoot is the number of entries in aRoot.
**
................................................................................
  sCheck.pPager = pBt->pPager;
  sCheck.nPage = sqlite3pager_pagecount(sCheck.pPager);
  if( sCheck.nPage==0 ){
    unlockBtreeIfUnused(pBt);
    return 0;
  }
  sCheck.anRef = sqliteMallocRaw( (sCheck.nPage+1)*sizeof(sCheck.anRef[0]) );
  sCheck.anRef[1] = 1;
  for(i=2; i<=sCheck.nPage; i++){ sCheck.anRef[i] = 0; }
  sCheck.zErrMsg = 0;

  /* Check the integrity of the freelist
  */
  checkList(&sCheck, 1, get4byte(&pBt->pPage1->aData[32]),
            get4byte(&pBt->pPage1->aData[36]), "Main freelist: ");








|







 







>







 







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







>
>







 







>

>

<







 







|
<
|
|
<


>









>







 







>













>
>







 







>
>
>
>
>







 







>







 







|
>







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







 







>







 







>







 







>







 







>







 







>







 







|
|
>







 







>


>







 







>







 







>







 







>
>







 







>
|
|
|
|
|
|
|
|
>







 







|
|
|
>
|



>





>
|






>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>











|
|
|
>
|






|









>
|
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>













>








>







 







|







 







>
>
>
>
>
>
>
>
>







 







>







 







>








>







 







>



>








>










>













|


<

>





|
>







 







>
>
|







 







|
|


>
|







 







>

|


|

|

>


|
<







|







 







>
>
>

>








<







>
>







 







>











|

>


>







 







|







 







|
|
>







 







|


>
>







 







|
|
|







 







|
>
>




>


<
>




>
>







 







<
<



<
<
<
<
<
<
<
<


|
>
|



<
<
<
<
<
<
>
>
>
>
>
>


<
<
<
<
<
<
<
<
<


<
>
>
|
|
|
|
|
<
<
<
|
<
>
>
>
|
|
<
>



|
|
|
|
>
>

|
<
>

<
<
>
>

|
<
>

<
>

<
<
<
>






>
>
>
>
|
<


|







 







<
|







5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
...
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
...
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
...
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
...
479
480
481
482
483
484
485
486
487
488
489

490
491
492
493
494
495
496
...
671
672
673
674
675
676
677
678

679
680

681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
...
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
...
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
...
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
....
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
....
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
....
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
....
1533
1534
1535
1536
1537
1538
1539
1540
1541
1542
1543
1544
1545
1546
1547
....
1570
1571
1572
1573
1574
1575
1576
1577
1578
1579
1580
1581
1582
1583
1584
....
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
....
1644
1645
1646
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
....
1668
1669
1670
1671
1672
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
....
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
....
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
....
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
....
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
2329
2330
....
2352
2353
2354
2355
2356
2357
2358
2359
2360
2361
2362
2363
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
2377
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
2391
2392
2393
2394
2395
2396
2397
2398
2399
2400
2401
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
2424
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
2447
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
....
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
....
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
2520
2521
2522
2523
2524
2525
2526
2527
2528
2529
2530
2531
....
2583
2584
2585
2586
2587
2588
2589
2590
2591
2592
2593
2594
2595
2596
2597
....
2614
2615
2616
2617
2618
2619
2620
2621
2622
2623
2624
2625
2626
2627
2628
2629
2630
2631
2632
2633
2634
2635
2636
2637
....
2654
2655
2656
2657
2658
2659
2660
2661
2662
2663
2664
2665
2666
2667
2668
2669
2670
2671
2672
2673
2674
2675
2676
2677
2678
2679
2680
2681
2682
2683
2684
2685
2686
2687
2688
2689
2690
2691
2692
2693
2694
2695
2696
2697
2698
2699
2700
2701

2702
2703
2704
2705
2706
2707
2708
2709
2710
2711
2712
2713
2714
2715
2716
2717
....
2780
2781
2782
2783
2784
2785
2786
2787
2788
2789
2790
2791
2792
2793
2794
2795
2796
....
2807
2808
2809
2810
2811
2812
2813
2814
2815
2816
2817
2818
2819
2820
2821
2822
2823
2824
2825
2826
....
2864
2865
2866
2867
2868
2869
2870
2871
2872
2873
2874
2875
2876
2877
2878
2879
2880
2881
2882
2883

2884
2885
2886
2887
2888
2889
2890
2891
2892
2893
2894
2895
2896
2897
2898
....
2978
2979
2980
2981
2982
2983
2984
2985
2986
2987
2988
2989
2990
2991
2992
2993
2994
2995
2996
2997

2998
2999
3000
3001
3002
3003
3004
3005
3006
3007
3008
3009
3010
3011
3012
3013
....
3151
3152
3153
3154
3155
3156
3157
3158
3159
3160
3161
3162
3163
3164
3165
3166
3167
3168
3169
3170
3171
3172
3173
3174
3175
3176
3177
3178
3179
3180
3181
3182
....
3201
3202
3203
3204
3205
3206
3207
3208
3209
3210
3211
3212
3213
3214
3215
....
3372
3373
3374
3375
3376
3377
3378
3379
3380
3381
3382
3383
3384
3385
3386
3387
3388
....
3470
3471
3472
3473
3474
3475
3476
3477
3478
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
....
3624
3625
3626
3627
3628
3629
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
....
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
3654
3655
3656
3657
3658
3659
3660

3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
....
3676
3677
3678
3679
3680
3681
3682


3683
3684
3685








3686
3687
3688
3689
3690
3691
3692
3693






3694
3695
3696
3697
3698
3699
3700
3701









3702
3703

3704
3705
3706
3707
3708
3709
3710



3711

3712
3713
3714
3715
3716

3717
3718
3719
3720
3721
3722
3723
3724
3725
3726
3727
3728

3729
3730


3731
3732
3733
3734

3735
3736

3737
3738



3739
3740
3741
3742
3743
3744
3745
3746
3747
3748
3749
3750

3751
3752
3753
3754
3755
3756
3757
3758
3759
3760
....
3776
3777
3778
3779
3780
3781
3782

3783
3784
3785
3786
3787
3788
3789
3790
** a legal notice, here is a blessing:
**
**    May you do good and not evil.
**    May you find forgiveness for yourself and forgive others.
**    May you share freely, never taking more than you give.
**
*************************************************************************
** $Id: btree.c,v 1.119 2004/05/09 11:51:39 drh Exp $
**
** This file implements a external (disk-based) database using BTrees.
** For a detailed discussion of BTrees, refer to
**
**     Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3:
**     "Sorting And Searching", pages 473-480. Addison-Wesley
**     Publishing Company, Reading, Massachusetts.
................................................................................
  u8 isInit;                     /* True if previously initialized */
  u8 idxShift;                   /* True if Cell indices have changed */
  u8 isOverfull;                 /* Some aCell[] do not fit on page */
  u8 intKey;                     /* True if intkey flag is set */
  u8 leaf;                       /* True if leaf flag is set */
  u8 zeroData;                   /* True if zero data flag is set */
  u8 hdrOffset;                  /* 100 for page 1.  0 otherwise */
  u8 needRelink;                 /* True if need to run relinkCellList() */
  int idxParent;                 /* Index in pParent->aCell[] of this node */
  int nFree;                     /* Number of free bytes on the page */
  int nCell;                     /* Number of entries on this page */
  int nCellAlloc;                /* Number of slots allocated in aCell[] */
  unsigned char **aCell;         /* Pointer to start of each cell */
  struct Btree *pBt;             /* Pointer back to BTree structure */

................................................................................
  maxPayload = pPage->pBt->maxLocal;
  if( nPayload>maxPayload ){
    nPayload = maxPayload + 4;
  }
  return n + nPayload;
}

/*
** Do sanity checking on a page.  Throw an exception if anything is
** not right.
**
** This routine is used for internal error checking only.  It is omitted
** from most builds.
*/
#if defined(BTREE_DEBUG) && !defined(NDEBUG) && 0
static void _pageIntegrity(MemPage *pPage){
  int pageSize;
  u8 *data;
  int i, idx, c, pc, hdr, nFree;
  u8 used[MX_PAGE_SIZE];

  pageSize = pPage->pBt->pageSize;
  assert( pPage->aData==&((unsigned char*)pPage)[-pageSize] );
  hdr = pPage->hdrOffset;
  assert( hdr==(pPage->pgno==1 ? 100 : 0) );
  assert( pPage->pgno==sqlite3pager_pagenumber(pPage->aData) );
  c = pPage->aData[hdr];
  if( pPage->isInit ){
    assert( pPage->leaf == ((c & PTF_LEAF)!=0) );
    assert( pPage->zeroData == ((c & PTF_ZERODATA)!=0) );
    assert( pPage->intKey == ((c & PTF_INTKEY)!=0) );
  }
  data = pPage->aData;
  memset(used, 0, pageSize);
  for(i=0; i<hdr+10-pPage->leaf*4; i++) used[i] = 1;
  nFree = 0;
  pc = get2byte(&data[hdr+1]);
  while( pc ){
    int size;
    assert( pc>0 && pc<pageSize-4 );
    size = get2byte(&data[pc+2]);
    assert( pc+size<=pageSize );
    nFree += size;
    for(i=pc; i<pc+size; i++){
      assert( used[i]==0 );
      used[i] = 1;
    }
    pc = get2byte(&data[pc]);
  }
  assert( pPage->isInit==0 || pPage->nFree==nFree+data[hdr+5] );
  idx = 0;
  pc = get2byte(&data[hdr+3]);
  while( pc ){
    int size;
    assert( pPage->isInit==0 || idx<pPage->nCell );
    assert( pc>0 && pc<pageSize-4 );
    assert( pPage->isInit==0 || pPage->aCell[idx]==&data[pc] );
    size = cellSize(pPage, &data[pc]);
    assert( pc+size<=pageSize );
    for(i=pc; i<pc+size; i++){
      assert( used[i]==0 );
      used[i] = 1;
    }
    pc = get2byte(&data[pc]);
    idx++;
  }
  assert( idx==pPage->nCell );
  nFree = 0;
  for(i=0; i<pageSize; i++){
    assert( used[i]<=1 );
    if( used[i]==0 ) nFree++;
  }
  assert( nFree==data[hdr+5] );
}
#define pageIntegrity(X) _pageIntegrity(X)
#else
# define pageIntegrity(X)
#endif

/*
** Defragment the page given.  All Cells are moved to the
** beginning of the page and all free space is collected 
** into one big FreeBlk at the end of the page.
*/
static void defragmentPage(MemPage *pPage){
  int pc, i, n, addr;
................................................................................
  int leftover;
  unsigned char *oldPage;
  unsigned char newPage[MX_PAGE_SIZE];

  assert( sqlite3pager_iswriteable(pPage->aData) );
  assert( pPage->pBt!=0 );
  assert( pPage->pBt->pageSize <= MX_PAGE_SIZE );
  assert( !pPage->needRelink );
  assert( !pPage->isOverfull );
  oldPage = pPage->aData;
  hdr = pPage->hdrOffset;
  addr = 3+hdr;
  n = 6+hdr;
  if( !pPage->leaf ){
    n += 4;
  }
................................................................................
  pc = get2byte(&oldPage[addr]);
  i = 0;
  while( pc>0 ){
    assert( n<pPage->pBt->pageSize );
    size = cellSize(pPage, &oldPage[pc]);
    memcpy(&newPage[n], &oldPage[pc], size);
    put2byte(&newPage[addr],n);
    assert( pPage->aCell[i]==&oldPage[pc] );
    pPage->aCell[i++] = &oldPage[n];
    addr = n;
    n += size;

    pc = get2byte(&oldPage[pc]);
  }
  assert( i==pPage->nCell );
  leftover = pPage->pBt->pageSize - n;
  assert( leftover>=0 );
  assert( pPage->nFree==leftover );
  if( leftover<4 ){
................................................................................
  int pageSize;
  int sumCell = 0;       /* Total size of all cells */

  assert( pPage->pBt!=0 );
  assert( pParent==0 || pParent->pBt==pPage->pBt );
  assert( pPage->pgno==sqlite3pager_pagenumber(pPage->aData) );
  assert( pPage->aData == &((unsigned char*)pPage)[-pPage->pBt->pageSize] );
  assert( pPage->pParent==0 || pPage->pParent==pParent );

  if( pPage->pParent==0 && pParent!=0 ){
    pPage->pParent = pParent;

    sqlite3pager_ref(pParent->aData);
  }
  if( pPage->isInit ) return SQLITE_OK;
  pPage->nCell = pPage->nCellAlloc = 0;
  assert( pPage->hdrOffset==(pPage->pgno==1 ? 100 : 0) );
  hdr = pPage->hdrOffset;
  data = pPage->aData;
  c = data[hdr];
  pPage->intKey = (c & PTF_INTKEY)!=0;
  pPage->zeroData = (c & PTF_ZERODATA)!=0;
  pPage->leaf = (c & PTF_LEAF)!=0;
  pPage->isOverfull = 0;
  pPage->needRelink = 0;
  pPage->idxShift = 0;
  pageSize = pPage->pBt->pageSize;

  /* Initialize the cell count and cell pointers */
  pc = get2byte(&data[hdr+3]);
  while( pc>0 ){
    if( pc>=pageSize ) return SQLITE_CORRUPT;
................................................................................
  /* Sanity check:  Cells and freespace and header must sum to the size
  ** a page. */
  if( sumCell+pPage->nFree+hdr+10-pPage->leaf*4 != pageSize ){
    return SQLITE_CORRUPT;
  }

  pPage->isInit = 1;
  pageIntegrity(pPage);
  return SQLITE_OK;
}

/*
** Set up a raw page so that it looks like a database page holding
** no entries.
*/
static void zeroPage(MemPage *pPage, int flags){
  unsigned char *data = pPage->aData;
  Btree *pBt = pPage->pBt;
  int hdr = pPage->hdrOffset;
  int first;

  assert( sqlite3pager_pagenumber(data)==pPage->pgno );
  assert( &data[pBt->pageSize] == (unsigned char*)pPage );
  assert( sqlite3pager_iswriteable(data) );
  memset(&data[hdr], 0, pBt->pageSize - hdr);
  data[hdr] = flags;
  first = hdr + 6 + 4*((flags&PTF_LEAF)==0);
  put2byte(&data[hdr+1], first);
  put2byte(&data[first+2], pBt->pageSize - first);
  sqliteFree(pPage->aCell);
................................................................................
  pPage->nCell = 0;
  pPage->nCellAlloc = 0;
  pPage->nFree = pBt->pageSize - first;
  pPage->intKey = (flags & PTF_INTKEY)!=0;
  pPage->leaf = (flags & PTF_LEAF)!=0;
  pPage->zeroData = (flags & PTF_ZERODATA)!=0;
  pPage->hdrOffset = hdr;
  pPage->isOverfull = 0;
  pPage->needRelink = 0;
  pPage->idxShift = 0;
  pPage->isInit = 1;
  pageIntegrity(pPage);
}

/*
** Get a page from the pager.  Initialize the MemPage.pBt and
** MemPage.aData elements if needed.
*/
static int getPage(Btree *pBt, Pgno pgno, MemPage **ppPage){
................................................................................
/*
** This routine is called when the reference count for a page
** reaches zero.  We need to unref the pParent pointer when that
** happens.
*/
static void pageDestructor(void *pData){
  MemPage *pPage = (MemPage*)&((char*)pData)[SQLITE_PAGE_SIZE];
  assert( pPage->isInit==0 || pPage->needRelink==0 );
  if( pPage->pParent ){
    MemPage *pParent = pPage->pParent;
    pPage->pParent = 0;
    releasePage(pParent);
  }
  sqliteFree(pPage->aCell);
  pPage->aCell = 0;
................................................................................
/*
** Invalidate all cursors
*/
static void invalidateCursors(Btree *pBt){
  BtCursor *pCur;
  for(pCur=pBt->pCursor; pCur; pCur=pCur->pNext){
    MemPage *pPage = pCur->pPage;
    if( pPage /* && !pPage->isInit */ ){
      pageIntegrity(pPage);
      releasePage(pPage);
      pCur->pPage = 0;
      pCur->isValid = 0;
      pCur->status = SQLITE_ABORT;
    }
  }
}

#ifdef SQLITE_TEST
/*
** Print debugging information about all cursors to standard output.
*/
void sqlite3BtreeCursorList(Btree *pBt){
  BtCursor *pCur;
  for(pCur=pBt->pCursor; pCur; pCur=pCur->pNext){
    MemPage *pPage = pCur->pPage;
    char *zMode = pCur->wrFlag ? "rw" : "ro";
    printf("CURSOR %08x rooted at %4d(%s) currently at %d.%d%s\n",
       (int)pCur, pCur->pgnoRoot, zMode,
       pPage ? pPage->pgno : 0, pCur->idx,
       pCur->isValid ? "" : " eof"
    );
  }
}
#endif

/*
** Rollback the transaction in progress.  All cursors will be
** invalided by this operation.  Any attempt to use a cursor
** that was open at the beginning of this operation will result
** in an error.
**
................................................................................
  MemPage *pPage;
  unsigned char *cell;

  if( !pCur->isValid ){
    *pSize = 0;
  }else{
    pPage = pCur->pPage;
    pageIntegrity(pPage);
    assert( pPage!=0 );
    assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
    cell = pPage->aCell[pCur->idx];
    cell += 2;   /* Skip the offset to the next cell */
    if( !pPage->leaf ){
      cell += 4;  /* Skip the child pointer */
    }
................................................................................
  u64 nData, nKey;
  int maxLocal, ovflSize;

  assert( pCur!=0 && pCur->pPage!=0 );
  assert( pCur->isValid );
  pBt = pCur->pBt;
  pPage = pCur->pPage;
  pageIntegrity(pPage);
  assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  aPayload = pPage->aCell[pCur->idx];
  aPayload += 2;  /* Skip the next cell index */
  if( !pPage->leaf ){
    aPayload += 4;  /* Skip the child pointer */
  }
  if( pPage->zeroData ){
................................................................................
  if( !pCur->isValid ){
    return 0;
  }
  assert( pCur->pPage!=0 );
  assert( pCur->idx>=0 && pCur->idx<pCur->pPage->nCell );
  pBt = pCur->pBt;
  pPage = pCur->pPage;
  pageIntegrity(pPage);
  assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
  assert( pPage->intKey==0 );
  aPayload = pPage->aCell[pCur->idx];
  aPayload += 2;  /* Skip the next cell index */
  if( !pPage->leaf ){
    aPayload += 4;  /* Skip the child pointer */
  }
................................................................................

  if( !pCur->isValid ){
    return pCur->status ? pCur->status : SQLITE_INTERNAL;
  }
  pPage = pCur->pPage;
  assert( pPage!=0 );
  assert( pPage->isInit );
  pageIntegrity(pPage);
  if( pPage->zeroData ){
    *pSize = 0;
  }else{
    assert( pCur->idx>=0 && pCur->idx<pPage->nCell );
    cell = pPage->aCell[pCur->idx];
    cell += 2;   /* Skip the offset to the next cell */
    if( !pPage->leaf ){
................................................................................
  MemPage *pNewPage;
  MemPage *pOldPage;
  Btree *pBt = pCur->pBt;

  assert( pCur->isValid );
  rc = getAndInitPage(pBt, newPgno, &pNewPage, pCur->pPage);
  if( rc ) return rc;
  pageIntegrity(pNewPage);
  pNewPage->idxParent = pCur->idx;
  pOldPage = pCur->pPage;
  pOldPage->idxShift = 0;
  releasePage(pOldPage);
  pCur->pPage = pNewPage;
  pCur->idx = 0;
  if( pNewPage->nCell<1 ){
................................................................................
** for the table rooted on page 1, sometime the real root page
** is empty except for the right-pointer.  In such cases the
** virtual root page is the page that the right-pointer of page
** 1 is pointing to.
*/
static int isRootPage(MemPage *pPage){
  MemPage *pParent = pPage->pParent;
  if( pParent==0 ) return 1;
  if( pParent->pgno>1 ) return 0;
  if( get2byte(&pParent->aData[pParent->hdrOffset+3])==0 ) return 1;
  return 0;
}

/*
** Move the cursor up to the parent page.
**
** pCur->idx is set to the cell index that contains the pointer
................................................................................
  MemPage *pPage;
  int idxParent;

  assert( pCur->isValid );
  pPage = pCur->pPage;
  assert( pPage!=0 );
  assert( !isRootPage(pPage) );
  pageIntegrity(pPage);
  pParent = pPage->pParent;
  assert( pParent!=0 );
  pageIntegrity(pParent);
  idxParent = pPage->idxParent;
  sqlite3pager_ref(pParent->aData);
  oldPgno = pPage->pgno;
  releasePage(pPage);
  pCur->pPage = pParent;
  assert( pParent->idxShift==0 );
  if( pParent->idxShift==0 ){
................................................................................

  rc = getAndInitPage(pBt, pCur->pgnoRoot, &pRoot, 0);
  if( rc ){
    pCur->isValid = 0;
    return rc;
  }
  releasePage(pCur->pPage);
  pageIntegrity(pRoot);
  pCur->pPage = pRoot;
  pCur->idx = 0;
  if( pRoot->nCell==0 && !pRoot->leaf ){
    Pgno subpage;
    assert( pRoot->pgno==1 );
    subpage = get4byte(&pRoot->aData[pRoot->hdrOffset+6]);
    assert( subpage>0 );
................................................................................
  for(;;){
    int lwr, upr;
    Pgno chldPg;
    MemPage *pPage = pCur->pPage;
    int c = -1;  /* pRes return if table is empty must be -1 */
    lwr = 0;
    upr = pPage->nCell-1;
    pageIntegrity(pPage);
    while( lwr<=upr ){
      void *pCellKey;
      u64 nCellKey;
      pCur->idx = (lwr+upr)/2;
      sqlite3BtreeKeySize(pCur, &nCellKey);
      if( pPage->intKey ){
        if( nCellKey<nKey ){
................................................................................
  rc = sqlite3pager_write(pPage1->aData);
  if( rc ) return rc;
  n = get4byte(&pPage1->aData[36]);
  put4byte(&pPage1->aData[36], n+1);

  if( n==0 ){
    /* This is the first free page */
    rc = sqlite3pager_write(pPage->aData);
    if( rc ) return rc;
    memset(pPage->aData, 0, 8);
    put4byte(&pPage1->aData[32], pPage->pgno);
  }else{
    /* Other free pages already exist.  Retrive the first trunk page
    ** of the freelist and find out how many leaves it has. */
    MemPage *pTrunk;
    rc = getPage(pBt, get4byte(&pPage1->aData[32]), &pTrunk);
................................................................................
static void reparentPage(Btree *pBt, Pgno pgno, MemPage *pNewParent, int idx){
  MemPage *pThis;
  unsigned char *aData;

  if( pgno==0 ) return;
  assert( pBt->pPager!=0 );
  aData = sqlite3pager_lookup(pBt->pPager, pgno);
  if( aData ){
    pThis = (MemPage*)&aData[pBt->pageSize];
    if( pThis->isInit ){
      if( pThis->pParent!=pNewParent ){
        if( pThis->pParent ) sqlite3pager_unref(pThis->pParent->aData);
        pThis->pParent = pNewParent;
        if( pNewParent ) sqlite3pager_ref(pNewParent->aData);
      }
      pThis->idxParent = idx;
    }
    sqlite3pager_unref(aData);
  }
}

/*
** Change the pParent pointer of all children of pPage to point back
** to pPage.
................................................................................
** Remove the i-th cell from pPage.  This routine effects pPage only.
** The cell content is not freed or deallocated.  It is assumed that
** the cell content has been copied someplace else.  This routine just
** removes the reference to the cell from pPage.
**
** "sz" must be the number of bytes in the cell.
**
** Try to maintain the integrity of the linked list of cells.  But if
** the cell being inserted does not fit on the page, this will not be
** possible.  If the linked list is not maintained, then just update
** pPage->aCell[] and set the pPage->needRelink flag so that we will
** know to rebuild the linked list later.
*/
static void dropCell(MemPage *pPage, int idx, int sz){
  int j, pc;
  u8 *data;
  assert( idx>=0 && idx<pPage->nCell );
  assert( sz==cellSize(pPage, pPage->aCell[idx]) );
  assert( sqlite3pager_iswriteable(pPage->aData) );
  assert( pPage->aCell[idx]>=pPage->aData );
  assert( pPage->aCell[idx]<&pPage->aData[pPage->pBt->pageSize-sz] );
  data = pPage->aData;
  pc = Addr(pPage->aCell[idx]) - Addr(data);
  assert( pc>pPage->hdrOffset && pc+sz<=pPage->pBt->pageSize );
  freeSpace(pPage, pc, sz);
  for(j=idx; j<pPage->nCell-1; j++){
    pPage->aCell[j] = pPage->aCell[j+1];
  }
  pPage->nCell--;
  if( !pPage->isOverfull && !pPage->needRelink ){
    u8 *pPrev;
    if( idx==0 ){
      pPrev = &data[pPage->hdrOffset+3];
    }else{
      pPrev = pPage->aCell[idx-1];
    }
    if( idx<pPage->nCell ){
      pc = Addr(pPage->aCell[idx]) - Addr(data);
    }else{
      pc = 0;
    }
    put2byte(pPrev, pc);
    pageIntegrity(pPage);
  }else{
    pPage->needRelink = 1;
  }
  pPage->idxShift = 1;
}

/*
** Insert a new cell on pPage at cell index "i".  pCell points to the
** content of the cell.
**
** If the cell content will fit on the page, then put it there.  If it
** will not fit, then just make pPage->aCell[i] point to the content
** and set pPage->isOverfull.  
**
** Try to maintain the integrity of the linked list of cells.  But if
** the cell being inserted does not fit on the page, this will not be
** possible.  If the linked list is not maintained, then just update
** pPage->aCell[] and set the pPage->needRelink flag so that we will
** know to rebuild the linked list later.
*/
static void insertCell(MemPage *pPage, int i, unsigned char *pCell, int sz){
  int idx, j;
  assert( i>=0 && i<=pPage->nCell );
  assert( sz==cellSize(pPage, pCell) );
  assert( sqlite3pager_iswriteable(pPage->aData) );
  idx = pPage->needRelink ? 0 : allocateSpace(pPage, sz);
  resizeCellArray(pPage, pPage->nCell+1);
  for(j=pPage->nCell; j>i; j--){
    pPage->aCell[j] = pPage->aCell[j-1];
  }
  pPage->nCell++;
  if( idx<=0 ){
    pPage->isOverfull = 1;
    pPage->aCell[i] = pCell;
  }else{
    u8 *data = pPage->aData;
    memcpy(&data[idx], pCell, sz);
    pPage->aCell[i] = &data[idx];
  }
  if( !pPage->isOverfull && !pPage->needRelink ){
    u8 *pPrev;
    int pc;
    if( i==0 ){
      pPrev = &pPage->aData[pPage->hdrOffset+3];
    }else{
      pPrev = pPage->aCell[i-1];
    }
    pc = get2byte(pPrev);
    put2byte(pPrev, idx);
    put2byte(pPage->aCell[i], pc);
    pageIntegrity(pPage);
  }else{
    pPage->needRelink = 1;
  }
  pPage->idxShift = 1;
}

/*
** Rebuild the linked list of cells on a page so that the cells
** occur in the order specified by the pPage->aCell[] array.  
** Invoke this routine once to repair damage after one or more
** invocations of either insertCell() or dropCell().
*/
static void relinkCellList(MemPage *pPage){
  int i, idxFrom;
  assert( sqlite3pager_iswriteable(pPage->aData) );
  if( !pPage->needRelink ) return;
  idxFrom = pPage->hdrOffset+3;
  for(i=0; i<pPage->nCell; i++){
    int idx = Addr(pPage->aCell[i]) - Addr(pPage->aData);
    assert( idx>pPage->hdrOffset && idx<pPage->pBt->pageSize );
    put2byte(&pPage->aData[idxFrom], idx);
    idxFrom = idx;
  }
  put2byte(&pPage->aData[idxFrom], 0);
  pPage->needRelink = 0;
}

/*
** GCC does not define the offsetof() macro so we'll have to do it
** ourselves.
*/
#ifndef offsetof
................................................................................
** Move the content of the page at pFrom over to pTo.  The pFrom->aCell[]
** pointers that point into pFrom->aData[] must be adjusted to point
** into pTo->aData[] instead.  But some pFrom->aCell[] entries might
** not point to pFrom->aData[].  Those are unchanged.
**
** Over this operation completes, the meta data for pFrom is zeroed.
*/
static void movePage(MemPage *pTo, MemPage *pFrom){
  uptr from, to;
  int i;
  int pageSize;
  int ofst;

  assert( pTo->hdrOffset==0 );
  ofst = pFrom->hdrOffset;
................................................................................
    uptr x = Addr(pTo->aCell[i]);
    if( x>from && x<from+pageSize-ofst ){
      *((uptr*)&pTo->aCell[i]) = x + to - from;
    }
  }
}

/*
** For debugging...
*/
#if 1
# define TRACE(X)   if( pager3_refinfo_enable ) printf X
#else
# define TRACE(X)
#endif

/*
** The following parameters determine how many adjacent pages get involved
** in a balancing operation.  NN is the number of neighbors on either side
** of the page that participate in the balancing operation.  NB is the
** total number of pages that participate, including the target page and
** NN neighbors on either side.
**
................................................................................
  int idx;                     /* Index of pPage in pParent->aCell[] */
  int nxDiv;                   /* Next divider slot in pParent->aCell[] */
  int rc;                      /* The return code */
  int leafCorrection;          /* 4 if pPage is a leaf.  0 if not */
  int usableSpace;             /* Bytes in pPage beyond the header */
  int pageFlags;               /* Value of pPage->aData[0] */
  int subtotal;                /* Subtotal of bytes in cells on one page */
  MemPage *extraUnref = 0;     /* Unref this page if not zero */
  MemPage *apOld[NB];          /* pPage and up to two siblings */
  Pgno pgnoOld[NB];            /* Page numbers for each page in apOld[] */
  MemPage *apCopy[NB];         /* Private copies of apOld[] pages */
  MemPage *apNew[NB+1];        /* pPage and up to NB siblings after balancing */
  Pgno pgnoNew[NB+1];          /* Page numbers for each page in apNew[] */
  int idxDiv[NB];              /* Indices of divider cells in pParent */
  u8 *apDiv[NB];               /* Divider cells in pParent */
................................................................................
  }

  /*
  ** Find the parent of the page to be balanced.  If there is no parent,
  ** it means this page is the root page and special rules apply.
  */
  pParent = pPage->pParent;
  TRACE(("BALANCE: begin page %d\n", pPage->pgno));
  if( pParent==0 ){
    Pgno pgnoChild;
    MemPage *pChild;
    assert( pPage->isInit );
    if( pPage->nCell==0 ){
      if( pPage->leaf ){
        /* The table is completely empty */
        relinkCellList(pPage);
        TRACE(("BALANCE: empty table\n"));
      }else{
        /* The root page is empty but has one child.  Transfer the
        ** information from that one child into the root page if it 
        ** will fit.  This reduces the depth of the BTree by one.
        **
        ** If the root page is page 1, it has less space available than
        ** its child (due to the 100 byte header that occurs at the beginning
................................................................................
            zeroPage(pPage, pChild->aData[0]);
            resizeCellArray(pPage, pChild->nCell);
            for(i=0; i<pChild->nCell; i++){
              insertCell(pPage, i, pChild->aCell[i], 
                        cellSize(pChild, pChild->aCell[i]));
            }
            freePage(pChild);
            TRACE(("BALANCE: child %d transfer to page 1\n", pChild->pgno));
          }else{
            /* The child has more information that will fit on the root.
            ** The tree is already balanced.  Do nothing. */
            TRACE(("BALANCE: child %d will not fit on page 1\n", pChild->pgno));
          }
        }else{
          memcpy(pPage, pChild, pBt->pageSize);
          pPage->isInit = 0;
          pPage->pParent = 0;
          rc = initPage(pPage, 0);
          assert( rc==SQLITE_OK );
          freePage(pChild);
          TRACE(("BALANCE: transfer child %d into root\n", pChild->pgno));
        }
        reparentChildPages(pPage);
        releasePage(pChild);
      }
      return SQLITE_OK;
    }
    if( !pPage->isOverfull ){
      /* It is OK for the root page to be less than half full.
      */
      relinkCellList(pPage);
      TRACE(("BALANCE:  Root page is underfull but that is ok\n"));
      return SQLITE_OK;
    }
    /*
    ** If we get to here, it means the root page is overfull.
    ** When this happens, Create a new child page and copy the
    ** contents of the root into the child.  Then make the root
    ** page an empty page with rightChild pointing to the new
    ** child.  Then fall thru to the code below which will cause
    ** the overfull child page to be split.
    */
    rc = allocatePage(pBt, &pChild, &pgnoChild, pPage->pgno);
    if( rc ) return rc;
    assert( sqlite3pager_iswriteable(pChild->aData) );
    movePage(pChild, pPage);
    assert( pChild->aData[0]==pPage->aData[pPage->hdrOffset] );
    pChild->pParent = pPage;

    sqlite3pager_ref(pPage->aData);
    pChild->idxParent = 0;
    pChild->isOverfull = 1;
    zeroPage(pPage, pChild->aData[0] & ~PTF_LEAF);
    put4byte(&pPage->aData[pPage->hdrOffset+6], pChild->pgno);
    pParent = pPage;
    pPage = pChild;
    extraUnref = pChild;
    TRACE(("BALANCE: Copy root into %d and blance\n", pPage->pgno));
  }
  rc = sqlite3pager_write(pParent->aData);
  if( rc ) return rc;
  assert( pParent->isInit );
  
  /*
  ** Find the cell in the parent page whose left child points back
................................................................................
  ** The rest of this function will use data from the copies rather
  ** that the original pages since the original pages will be in the
  ** process of being overwritten.
  */
  for(i=0; i<nOld; i++){
    MemPage *p = apCopy[i] = (MemPage*)&aCopy[i+1][-sizeof(MemPage)];
    p->aData = &((u8*)p)[-pBt->pageSize];
    p->aCell = 0;
    p->hdrOffset = 0;
    movePage(p, apOld[i]);
  }

  /*
  ** Load pointers to all cells on sibling pages and the divider cells
  ** into the local apCell[] array.  Make copies of the divider cells
  ** into aTemp[] and remove the the divider Cells from pParent.
  **
................................................................................
    MemPage *pOld = apCopy[i];
    for(j=0; j<pOld->nCell; j++){
      apCell[nCell] = pOld->aCell[j];
      szCell[nCell] = cellSize(pOld, apCell[nCell]);
      nCell++;
    }
    if( i<nOld-1 ){
      szCell[nCell] = cellSize(pParent, apDiv[i]);
      memcpy(aTemp[i], apDiv[i], szCell[nCell]);
      apCell[nCell] = &aTemp[i][leafCorrection];
      dropCell(pParent, nxDiv, szCell[nCell]);
      szCell[nCell] -= leafCorrection;
      assert( get4byte(&aTemp[i][2])==pgnoOld[i] );
      if( !pOld->leaf ){
        assert( leafCorrection==0 );
        /* The right pointer of the child page pOld becomes the left
        ** pointer of the divider cell */
        memcpy(&apCell[nCell][2], &pOld->aData[pOld->hdrOffset+6], 4);
      }else{
        assert( leafCorrection==4 );
................................................................................

  /*
  ** Allocate k new pages.  Reuse old pages where possible.
  */
  assert( pPage->pgno>1 );
  pageFlags = pPage->aData[0];
  for(i=0; i<k; i++){
    MemPage *pNew;
    if( i<nOld ){
      pNew = apNew[i] = apOld[i];
      pgnoNew[i] = pgnoOld[i];
      apOld[i] = 0;
      sqlite3pager_write(pNew->aData);
    }else{
      rc = allocatePage(pBt, &pNew, &pgnoNew[i], pgnoNew[i-1]);
      if( rc ) goto balance_cleanup;
      apNew[i] = pNew;
    }
    nNew++;
    zeroPage(pNew, pageFlags);

  }

  /* Free any old pages that were not reused as new pages.
  */
  while( i<nOld ){
    rc = freePage(apOld[i]);
    if( rc ) goto balance_cleanup;
    releasePage(apOld[i]);
    apOld[i] = 0;
    i++;
  }

  /*
  ** Put the new pages in accending order.  This helps to
  ** keep entries in the disk file in order so that a scan
................................................................................
    reparentChildPages(apNew[i]);
  }
  reparentChildPages(pParent);

  /*
  ** balance the parent page.
  */
  assert( pPage->isInit );
  assert( pParent->isInit );
  pageIntegrity(pPage);
  rc = balance(pParent);
  

  /*
  ** Cleanup before returning.
  */
balance_cleanup:
  for(i=0; i<nOld; i++){
    releasePage(apOld[i]);
    if( apCopy[i] ){

      sqliteFree(apCopy[i]->aCell);
    }
  }
  for(i=0; i<nNew; i++){
    releasePage(apNew[i]);
  }
  releasePage(pParent);
  releasePage(extraUnref);
  TRACE(("BALANCE: Finished with %d\n", pPage->pgno));
  return rc;
}

/*
** This routine checks all cursors that point to the same table
** as pCur points to.  If any of those cursors were opened with
** wrFlag==0 then this routine returns SQLITE_LOCKED.  If all
................................................................................
    ** next Cell after the one to be deleted is guaranteed to exist and
    ** to be a leaf so we can use it.
    */
    BtCursor leafCur;
    unsigned char *pNext;
    int szNext;
    int notUsed;
    unsigned char tempbuf[4];
    getTempCursor(pCur, &leafCur);
    rc = sqlite3BtreeNext(&leafCur, &notUsed);
    if( rc!=SQLITE_OK ){
      if( rc!=SQLITE_NOMEM ) rc = SQLITE_CORRUPT;
      return rc;
    }
    rc = sqlite3pager_write(leafCur.pPage->aData);
    if( rc ) return rc;
    dropCell(pPage, pCur->idx, cellSize(pPage, pCell));
    pNext = leafCur.pPage->aCell[leafCur.idx];
    szNext = cellSize(leafCur.pPage, pNext);
    memcpy(tempbuf, &pNext[-2], 4);
    put4byte(&pNext[-2], pgnoChild);
    insertCell(pPage, pCur->idx, &pNext[-4], szNext+4);
    rc = balance(pPage);
    if( rc ) return rc;
    memcpy(&pNext[-2], tempbuf, 4);
    dropCell(leafCur.pPage, leafCur.idx, szNext);
    rc = balance(leafCur.pPage);
    releaseTempCursor(&leafCur);
  }else{
    dropCell(pPage, pCur->idx, cellSize(pPage, pCell));
    rc = balance(pPage);
  }
................................................................................
  if( !pBt->inTrans ){
    /* Must start a transaction first */
    return pBt->readOnly ? SQLITE_READONLY : SQLITE_ERROR;
  }
  if( pBt->readOnly ){
    return SQLITE_READONLY;
  }
  rc = allocatePage(pBt, &pRoot, &pgnoRoot, 1);
  if( rc ) return rc;
  assert( sqlite3pager_iswriteable(pRoot->aData) );
  zeroPage(pRoot, flags | PTF_LEAF);
  sqlite3pager_unref(pRoot->aData);
  *piTable = (int)pgnoRoot;
  return SQLITE_OK;
}
................................................................................
  }
  hdr = pPage->hdrOffset;
  data = pPage->aData;
  c = data[hdr];
  pPage->intKey = (c & PTF_INTKEY)!=0;
  pPage->zeroData = (c & PTF_ZERODATA)!=0;
  pPage->leaf = (c & PTF_LEAF)!=0;
  printf("PAGE %d:  flags=0x%02x  frag=%d   parent=%d\n", pgno,
    data[hdr], data[hdr+5], 
    (pPage->isInit && pPage->pParent) ? pPage->pParent->pgno : 0);
  i = 0;
  assert( hdr == (pgno==1 ? 100 : 0) );
  idx = get2byte(&data[hdr+3]);
  while( idx>0 && idx<=pBt->pageSize ){
    u64 nData, nKey;
    int nHeader;
    Pgno child;
................................................................................
**   aResult[4] =  Number of free bytes on this page
**   aResult[5] =  Number of free blocks on the page
**   aResult[6] =  Page number of the left child of this entry
**   aResult[7] =  Page number of the right child for the whole page
**
** This routine is used for testing and debugging only.
*/
int sqlite3BtreeCursorInfo(BtCursor *pCur, int *aResult){
  int cnt, idx;
  MemPage *pPage = pCur->pPage;

  pageIntegrity(pPage);
  assert( pPage->isInit );
  aResult[0] = sqlite3pager_pagenumber(pPage->aData);
  assert( aResult[0]==pPage->pgno );
  aResult[1] = pCur->idx;
  aResult[2] = pPage->nCell;
  if( pCur->idx>=0 && pCur->idx<pPage->nCell ){
    aResult[3] = cellSize(pPage, pPage->aCell[pCur->idx]);
................................................................................
** the tree depth.  Root pages return 0.  Parents of root pages
** return 1, and so forth.
** 
** These checks are done:
**
**      1.  Make sure that cells and freeblocks do not overlap
**          but combine to completely cover the page.
**  NO  2.  Make sure cell keys are in order.
**  NO  3.  Make sure no key is less than or equal to zLowerBound.
**  NO  4.  Make sure no key is greater than or equal to zUpperBound.
**      5.  Check the integrity of overflow pages.
**      6.  Recursively call checkTreePage on all children.
**      7.  Verify that the depth of all children is the same.
**      8.  Make sure this page is at least 33% full or else it is
**          the root of the tree.
*/
static int checkTreePage(
................................................................................
  char *zParentContext, /* Parent context */
  char *zLowerBound,    /* All keys should be greater than this, if not NULL */
  int nLower,           /* Number of characters in zLowerBound */
  char *zUpperBound,    /* All keys should be less than this, if not NULL */
  int nUpper            /* Number of characters in zUpperBound */
){
  MemPage *pPage;
  int i, rc, depth, d2, pgno, cnt;
  int hdr;
  u8 *data;
  char *zKey1, *zKey2;
  int nKey1, nKey2;
  BtCursor cur;
  Btree *pBt;
  int maxLocal, pageSize;
  char zMsg[100];
  char zContext[100];

  char hit[MX_PAGE_SIZE];

  /* Check that the page exists
  */
  cur.pBt = pBt = pCheck->pBt;
  maxLocal = pBt->maxLocal;
  pageSize = pBt->pageSize;
  if( iPage==0 ) return 0;
  if( checkRef(pCheck, iPage, zParentContext) ) return 0;
  sprintf(zContext, "On tree page %d: ", iPage);
  if( (rc = getPage(pBt, (Pgno)iPage, &pPage))!=0 ){
    sprintf(zMsg, "unable to get the page. error code=%d", rc);
    checkAppendMsg(pCheck, zContext, zMsg);
    return 0;
................................................................................
  if( (rc = initPage(pPage, pParent))!=0 ){
    sprintf(zMsg, "initPage() returns error code %d", rc);
    checkAppendMsg(pCheck, zContext, zMsg);
    releasePage(pPage);
    return 0;
  }



  /* Check out all the cells.
  */
  depth = 0;








  cur.pPage = pPage;
  for(i=0; i<pPage->nCell; i++){
    u8 *pCell = pPage->aCell[i];
    u64 nKey, nData;
    int sz, nHeader;

    /* Check payload overflow pages
    */






    parseCellHeader(pPage, pCell, &nData, &nKey, &nHeader);
    sz = nData;
    if( !pPage->intKey ) sz += nKey;
    if( sz>maxLocal ){
      int nPage = (sz - maxLocal + pageSize - 5)/(pageSize - 4);
      checkList(pCheck, 0, get4byte(&pCell[nHeader+maxLocal]),nPage,zContext);
    }










    /* Check sanity of left child page.
    */

    if( !pPage->leaf ){
      pgno = get4byte(&pCell[2]);
      d2 = checkTreePage(pCheck,pgno,pPage,zContext,0,0,0,0);
      if( i>0 && d2!=depth ){
        checkAppendMsg(pCheck, zContext, "Child page depth differs");
      }
      depth = d2;



    }

  }
  if( !pPage->leaf ){
    pgno = get4byte(&pPage->aData[pPage->hdrOffset+6]);
    sprintf(zContext, "On page %d at right child: ", iPage);
    checkTreePage(pCheck, pgno, pPage, zContext,0,0,0,0);

  }
 
  /* Check for complete coverage of the page
  */
  memset(hit, 0, pageSize);
  memset(hit, 1, pPage->hdrOffset+10-4*(pPage->leaf));
  data = pPage->aData;
  hdr = pPage->hdrOffset;
  for(cnt=0, i=get2byte(&data[hdr+3]); i>0 && i<pageSize && cnt<10000; cnt++){
    int size = cellSize(pPage, &data[i]);
    int j;
    for(j=i+size-1; j>=i; j--) hit[j]++;

    i = get2byte(&data[i]);
  }


  for(cnt=0, i=get2byte(&data[hdr+1]); i>0 && i<pageSize && cnt<10000; cnt++){
    int size = get2byte(&data[i+2]);
    int j;
    for(j=i+size-1; j>=i; j--) hit[j]++;

    i = get2byte(&data[i]);
  }

  for(i=cnt=0; i<pageSize; i++){
    if( hit[i]==0 ){



      cnt++;
    }else if( hit[i]>1 ){
      sprintf(zMsg, "Multiple uses for byte %d of page %d", i, iPage);
      checkAppendMsg(pCheck, zMsg, 0);
      break;
    }
  }
  if( cnt!=data[hdr+5] ){
    sprintf(zMsg, "Fragmented space is %d byte reported as %d on page %d",
        cnt, data[hdr+5], iPage);
    checkAppendMsg(pCheck, zMsg, 0);
  }


  releasePage(pPage);
  return depth+1;
}

/*
** This routine does a complete check of the given BTree file.  aRoot[] is
** an array of pages numbers were each page number is the root page of
** a table.  nRoot is the number of entries in aRoot.
**
................................................................................
  sCheck.pPager = pBt->pPager;
  sCheck.nPage = sqlite3pager_pagecount(sCheck.pPager);
  if( sCheck.nPage==0 ){
    unlockBtreeIfUnused(pBt);
    return 0;
  }
  sCheck.anRef = sqliteMallocRaw( (sCheck.nPage+1)*sizeof(sCheck.anRef[0]) );

  for(i=0; i<=sCheck.nPage; i++){ sCheck.anRef[i] = 0; }
  sCheck.zErrMsg = 0;

  /* Check the integrity of the freelist
  */
  checkList(&sCheck, 1, get4byte(&pBt->pPage1->aData[32]),
            get4byte(&pBt->pPage1->aData[36]), "Main freelist: ");

Changes to test/btree.test.

7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671



672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689



690
691
692
693
694
695
696
...
814
815
816
817
818
819
820

821
822

823
824
825
826
827





828
829
830
831
832
833
834
...
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
#    May you find forgiveness for yourself and forgive others.
#    May you share freely, never taking more than you give.
#
#***********************************************************************
# This file implements regression tests for SQLite library.  The
# focus of this script is btree database backend
#
# $Id: btree.test,v 1.22 2004/05/09 01:35:06 drh Exp $


set testdir [file dirname $argv0]
source $testdir/tester.tcl

# Basic functionality.  Open and close a database.
#
................................................................................
do_test btree-8.7 {
  btree_data $::c1
} $::data
do_test btree-8.8 {
  btree_commit $::b1
  btree_data $::c1
} $::data
do_test btree-8.9 {
  btree_close_cursor $::c1
  btree_close $::b1
  set ::b1 [btree_open test1.bt 2000 0]
  set ::c1 [btree_cursor $::b1 1 1]
  btree_move_to $::c1 2030
  btree_data $::c1
} $::data



do_test btree-8.10 {
  btree_begin_transaction $::b1
  btree_delete $::c1
} {}
do_test btree-8.11 {
  lindex [btree_get_meta $::b1] 0
} {4}

# Now check out keys on overflow pages.
#
do_test btree-8.12 {
  set ::keyprefix "This is a long prefix to a key "
  while {[string length $::keyprefix]<256} {append ::keyprefix $::keyprefix}
  btree_close_cursor $::c1
  btree_clear_table $::b1 2
  lindex [btree_get_meta $::b1] 0
} {4}
do_test btree-8.12.1 {



  set ::c1 [btree_cursor $::b1 2 1]
  btree_insert $::c1 ${::keyprefix}1 1
  btree_first $::c1
  btree_data $::c1
} {1}
do_test btree-8.13 {
  btree_key $::c1
................................................................................
  }] "*** [format %03d $i] ***"
}
#btree_pager_ref_dump $::b1
do_test btree-9.6 {
  btree_close_cursor $::c1
  lindex [btree_pager_stats $::b1] 1
} {1}

puts "222: [btree_integrity_check $::b1 1 2]"


do_test btree-9.7 {
  btree_rollback $::b1
  lindex [btree_pager_stats $::b1] 1
} {0}
do_test btree-9.8 {





  btree_integrity_check $::b1 1 2
} {}

# Create a tree of depth two.  That is, there is a single divider entry
# on the root pages and two leaf pages.  Then delete the divider entry
# see what happens.
#
................................................................................
  btree_next $::c1
  btree_data $::c1
} $::data
do_test btree-12.12 {
  btree_next $::c1
  btree_key $::c1
} {402}
btree_commit $::b1
btree_tree_dump $::b1 1
do_test btree-13.1 {
  btree_integrity_check $::b1 1 2
} {}

# To Do:
#
#   1.  Do some deletes from the 3-layer tree







|







 







|







>
>
>










|






|
>
>
>







 







>
|
<
>
|



|
>
>
>
>
>







 







|
|







7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
...
820
821
822
823
824
825
826
827
828

829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
...
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
#    May you find forgiveness for yourself and forgive others.
#    May you share freely, never taking more than you give.
#
#***********************************************************************
# This file implements regression tests for SQLite library.  The
# focus of this script is btree database backend
#
# $Id: btree.test,v 1.23 2004/05/09 11:51:40 drh Exp $


set testdir [file dirname $argv0]
source $testdir/tester.tcl

# Basic functionality.  Open and close a database.
#
................................................................................
do_test btree-8.7 {
  btree_data $::c1
} $::data
do_test btree-8.8 {
  btree_commit $::b1
  btree_data $::c1
} $::data
do_test btree-8.9.1 {
  btree_close_cursor $::c1
  btree_close $::b1
  set ::b1 [btree_open test1.bt 2000 0]
  set ::c1 [btree_cursor $::b1 1 1]
  btree_move_to $::c1 2030
  btree_data $::c1
} $::data
do_test btree-8.9.2 {
  btree_integrity_check $::b1 1 2
} {}
do_test btree-8.10 {
  btree_begin_transaction $::b1
  btree_delete $::c1
} {}
do_test btree-8.11 {
  lindex [btree_get_meta $::b1] 0
} {4}

# Now check out keys on overflow pages.
#
do_test btree-8.12.1 {
  set ::keyprefix "This is a long prefix to a key "
  while {[string length $::keyprefix]<256} {append ::keyprefix $::keyprefix}
  btree_close_cursor $::c1
  btree_clear_table $::b1 2
  lindex [btree_get_meta $::b1] 0
} {4}
do_test btree-8.12.2 {
  btree_integrity_check $::b1 1 2
} {}
do_test btree-8.12.3 {
  set ::c1 [btree_cursor $::b1 2 1]
  btree_insert $::c1 ${::keyprefix}1 1
  btree_first $::c1
  btree_data $::c1
} {1}
do_test btree-8.13 {
  btree_key $::c1
................................................................................
  }] "*** [format %03d $i] ***"
}
#btree_pager_ref_dump $::b1
do_test btree-9.6 {
  btree_close_cursor $::c1
  lindex [btree_pager_stats $::b1] 1
} {1}
do_test btree-9.7 {
  btree_integrity_check $::b1 1 2

} {}
do_test btree-9.8 {
  btree_rollback $::b1
  lindex [btree_pager_stats $::b1] 1
} {0}
do_test btree-9.9 {
  btree_integrity_check $::b1 1 2
} {}
do_test btree-9.10 {
  btree_close $::b1
  set ::b1 [btree_open test1.bt 2000 0]
  btree_integrity_check $::b1 1 2
} {}

# Create a tree of depth two.  That is, there is a single divider entry
# on the root pages and two leaf pages.  Then delete the divider entry
# see what happens.
#
................................................................................
  btree_next $::c1
  btree_data $::c1
} $::data
do_test btree-12.12 {
  btree_next $::c1
  btree_key $::c1
} {402}
# btree_commit $::b1
# btree_tree_dump $::b1 1
do_test btree-13.1 {
  btree_integrity_check $::b1 1 2
} {}

# To Do:
#
#   1.  Do some deletes from the 3-layer tree