Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Update comments and remove dead code from btree.c (CVS 1432) |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA1: |
8069caca82bc4d40d8ac95bafdd91a18 |
User & Date: | drh 2004-05-22 02:55:23.000 |
Context
2004-05-22
| ||
03:05 | Steps towards UTF-16 databases. Some tests are failing because of this commit. (CVS 1433) (check-in: c4a8246864 user: danielk1977 tags: trunk) | |
02:55 | Update comments and remove dead code from btree.c (CVS 1432) (check-in: 8069caca82 user: drh tags: trunk) | |
2004-05-21
| ||
21:12 | Floating point values are serialized in big-endian byte order. (CVS 1431) (check-in: acb65297b6 user: drh tags: trunk) | |
Changes
Changes to src/btree.c.
1 2 3 4 5 6 7 8 9 10 11 | /* ** 2004 April 6 ** ** The author disclaims copyright to this source code. In place of ** a legal notice, here is a blessing: ** ** May you do good and not evil. ** May you find forgiveness for yourself and forgive others. ** May you share freely, never taking more than you give. ** ************************************************************************* | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | /* ** 2004 April 6 ** ** The author disclaims copyright to this source code. In place of ** a legal notice, here is a blessing: ** ** May you do good and not evil. ** May you find forgiveness for yourself and forgive others. ** May you share freely, never taking more than you give. ** ************************************************************************* ** $Id: btree.c,v 1.146 2004/05/22 02:55:23 drh Exp $ ** ** This file implements a external (disk-based) database using BTrees. ** For a detailed discussion of BTrees, refer to ** ** Donald E. Knuth, THE ART OF COMPUTER PROGRAMMING, Volume 3: ** "Sorting And Searching", pages 473-480. Addison-Wesley ** Publishing Company, Reading, Massachusetts. |
︙ | ︙ | |||
69 70 71 72 73 74 75 | ** 28 4 Reserved for future use ** 32 4 First freelist page ** 36 4 Number of freelist pages in the file ** 40 60 15 4-byte meta values passed to higher layers ** ** All of the integer values are big-endian (most significant byte first). ** | | | | 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | ** 28 4 Reserved for future use ** 32 4 First freelist page ** 36 4 Number of freelist pages in the file ** 40 60 15 4-byte meta values passed to higher layers ** ** All of the integer values are big-endian (most significant byte first). ** ** The file change counter is incremented when the database is changed more ** than once within the same second. This counter, together with the ** modification time of the file, allows other processes to know ** when the file has changed and thus when they need to flush their ** cache. ** ** The max embedded payload fraction is the amount of the total usable ** space in a page that can be consumed by a single cell for standard ** B-tree (non-LEAFDATA) tables. A value of 255 means 100%. The default ** is to limit the maximum cell size so that at least 4 cells will fit ** on one page. Thus the default max embedded payload fraction is 64. ** ** If the payload for a cell is larger than the max payload, then extra ** payload is spilled to overflow pages. Once an overflow page is allocated, ** as many bytes as possible are moved into the overflow pages without letting ** the cell size drop below the min embedded payload fraction. ** ** The min leaf payload fraction is like the min embedded payload fraction |
︙ | ︙ | |||
110 111 112 113 114 115 116 | ** this page has no children. The zerodata flag means that this page carries ** only keys and no data. The intkey flag means that the key is a single ** variable length integer at the beginning of the payload. ** ** A variable-length integer is 1 to 9 bytes where the lower 7 bits of each ** byte are used. The integer consists of all bytes that have bit 8 set and ** the first byte with bit 8 clear. The most significant byte of the integer | | > > | > | | 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 | ** this page has no children. The zerodata flag means that this page carries ** only keys and no data. The intkey flag means that the key is a single ** variable length integer at the beginning of the payload. ** ** A variable-length integer is 1 to 9 bytes where the lower 7 bits of each ** byte are used. The integer consists of all bytes that have bit 8 set and ** the first byte with bit 8 clear. The most significant byte of the integer ** appears first. A variable-length integer may not be more than 9 bytes long. ** As a special case, all 8 bytes of the 9th byte are used as data. This ** allows a 64-bit integer to be encoded in 9 bytes. ** ** 0x00 becomes 0x00000000 ** 0x7f becomes 0x0000007f ** 0x81 0x00 becomes 0x00000080 ** 0x82 0x00 becomes 0x00000100 ** 0x80 0x7f becomes 0x0000007f ** 0x8a 0x91 0xd1 0xac 0x78 becomes 0x12345678 ** 0x81 0x81 0x81 0x81 0x01 becomes 0x10204081 ** ** Variable length integers are used for rowids and to hold the number of ** bytes of key and data in a btree cell. ** ** Unused space within a btree page is collected into a linked list of ** freeblocks. Each freeblock is at least 4 bytes in size. The byte offset ** to the first freeblock is given in the header. Freeblocks occur in ** increasing order. Because a freeblock is 4 bytes in size, the minimum ** size allocation on a btree page is 4 bytes. Because a freeblock must be ** at least 4 bytes in size, any group of 3 or fewer unused bytes cannot ** exist on the freeblock chain. A group of 3 or fewer free bytes is called ** a fragment. The total number of bytes in all fragments is recorded. ** in the page header at offset 5. ** ** SIZE DESCRIPTION ** 2 Byte offset of the next freeblock ** 2 Bytes in this freeblock ** ** Cells are of variable length. The first cell begins on the byte defined ** in the page header. Cells do not necessarily occur in order - they can |
︙ | ︙ | |||
186 187 188 189 190 191 192 | #ifndef MX_PAGE_SIZE # define MX_PAGE_SIZE 1024 #endif /* The following value is the maximum cell size assuming a maximum page ** size give above. */ | | | | 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 | #ifndef MX_PAGE_SIZE # define MX_PAGE_SIZE 1024 #endif /* The following value is the maximum cell size assuming a maximum page ** size give above. */ #define MX_CELL_SIZE (MX_PAGE_SIZE-6) /* The maximum number of cells on a single page of the database. This ** assumes a minimum cell size of 3 bytes. Such small cells will be ** exceedingly rare, but they are possible. */ #define MX_CELL ((MX_PAGE_SIZE-6)/3) /* Forward declarations */ typedef struct MemPage MemPage; /* ** This is a magic string that appears at the beginning of every ** SQLite database in order to identify the file as a real database. |
︙ | ︙ | |||
223 224 225 226 227 228 229 | ** ** The pParent field points back to the parent page. This allows us to ** walk up the BTree from any leaf to the root. Care must be taken to ** unref() the parent page pointer when this page is no longer referenced. ** The pageDestructor() routine handles that chore. */ struct MemPage { | < | > > > | > > > < < < | > | | | < | | > > | | < < | < | | 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 | ** ** The pParent field points back to the parent page. This allows us to ** walk up the BTree from any leaf to the root. Care must be taken to ** unref() the parent page pointer when this page is no longer referenced. ** The pageDestructor() routine handles that chore. */ struct MemPage { u8 isInit; /* True if previously initialized */ u8 idxShift; /* True if Cell indices have changed */ u8 isOverfull; /* Some aCell[] do not fit on page */ u8 intKey; /* True if intkey flag is set */ u8 leaf; /* True if leaf flag is set */ u8 zeroData; /* True if table stores keys only */ u8 leafData; /* True if tables stores data on leaves only */ u8 hasData; /* True if this page stores data */ u8 hdrOffset; /* 100 for page 1. 0 otherwise */ u8 needRelink; /* True if cell not linked properly in aData */ int idxParent; /* Index in pParent->aCell[] of this node */ int nFree; /* Number of free bytes on the page */ int nCell; /* Number of entries on this page */ int nCellAlloc; /* Number of slots allocated in aCell[] */ unsigned char **aCell; /* Pointer to start of each cell */ struct Btree *pBt; /* Pointer back to BTree structure */ /* When page content is move from one page to the other (by the movePage() ** subroutine) only the information about is moved. The information below ** is fixed. */ unsigned char *aData; /* Pointer back to the start of the page */ Pgno pgno; /* Page number for this page */ MemPage *pParent; /* The parent of this page. NULL for root */ }; /* ** The in-memory image of a disk page has the auxiliary information appended ** to the end. EXTRA_SIZE is the number of bytes of space needed to hold ** that extra information. */ #define EXTRA_SIZE sizeof(MemPage) /* ** Everything we need to know about an open database */ struct Btree { Pager *pPager; /* The page cache */ BtCursor *pCursor; /* A list of all open cursors */ MemPage *pPage1; /* First page of the database */ u8 inTrans; /* True if a transaction is in progress */ u8 inStmt; /* True if we are in a statement subtransaction */ u8 readOnly; /* True if the underlying file is readonly */ u8 maxEmbedFrac; /* Maximum payload as % of total page size */ u8 minEmbedFrac; /* Minimum payload as % of total page size */ u8 minLeafFrac; /* Minimum leaf payload as % of total page size */ int pageSize; /* Total number of bytes on a page */ int usableSize; /* Number of usable bytes on each page */ int maxLocal; /* Maximum local payload in non-LEAFDATA tables */ int minLocal; /* Minimum local payload in non-LEAFDATA tables */ int maxLeaf; /* Maximum local payload in a LEAFDATA table */ int minLeaf; /* Minimum local payload in a LEAFDATA table */ }; typedef Btree Bt; /* ** An instance of the following structure is used to hold information ** about a cell. The parseCell() function fills in this structure ** based on information extract from the raw disk page. */ typedef struct CellInfo CellInfo; struct CellInfo { i64 nKey; /* The key for INTKEY tables, or number of bytes in key */ u32 nData; /* Number of bytes of data */ u16 nHeader; /* Size of the cell header in bytes */ u16 nLocal; /* Amount of payload held locally */ u16 iOverflow; /* Offset to overflow page number. Zero if no overflow */ u16 nSize; /* Total size of the cell (on the main b-tree page) */ }; /* ** A cursor is a pointer to a particular entry in the BTree. ** The entry is identified by its MemPage and the index in ** MemPage.aCell[] of the entry. */ struct BtCursor { Btree *pBt; /* The Btree to which this cursor belongs */ BtCursor *pNext, *pPrev; /* Forms a linked list of all cursors */ BtCursor *pShared; /* Loop of cursors with the same root page */ int (*xCompare)(void*,int,const void*,int,const void*); /* Key comp func */ void *pArg; /* First arg to xCompare() */ Pgno pgnoRoot; /* The root page of this tree */ MemPage *pPage; /* Page that contains the entry */ int idx; /* Index of the entry in pPage->aCell[] */ CellInfo info; /* A parse of the cell we are pointing at */ u8 infoValid; /* True if information in BtCursor.info is valid */ u8 wrFlag; /* True if writable */ u8 isValid; /* TRUE if points to a valid entry */ u8 status; /* Set to SQLITE_ABORT if cursors is invalidated */ }; /* ** Read or write a two- and four-byte big-endian integer values. */ static u32 get2byte(unsigned char *p){ return (p[0]<<8) | p[1]; } static u32 get4byte(unsigned char *p){ return (p[0]<<24) | (p[1]<<16) | (p[2]<<8) | p[3]; } static void put2byte(unsigned char *p, u32 v){ p[0] = v>>8; p[1] = v; } static void put4byte(unsigned char *p, u32 v){ p[0] = v>>24; p[1] = v>>16; p[2] = v>>8; p[3] = v; } /* ** Routines to read and write variable-length integers. These used to ** be defined locally, but now we use the varint routines in the util.c ** file. */ #define getVarint sqlite3GetVarint #define getVarint32 sqlite3GetVarint32 #define putVarint sqlite3PutVarint /* ** Parse a cell header and fill in the CellInfo structure. */ static void parseCell( MemPage *pPage, /* Page containing the cell */ unsigned char *pCell, /* Pointer to the first byte of the cell */ CellInfo *pInfo /* Fill in this structure */ ){ int n; int nPayload; Btree *pBt; int minLocal, maxLocal; assert( pPage->leaf==0 || pPage->leaf==1 ); n = 6 - 4*pPage->leaf; if( pPage->hasData ){ n += getVarint32(&pCell[n], &pInfo->nData); }else{ pInfo->nData = 0; } n += getVarint(&pCell[n], &pInfo->nKey); pInfo->nHeader = n; nPayload = pInfo->nData; if( !pPage->intKey ){ nPayload += pInfo->nKey; } pBt = pPage->pBt; if( pPage->leafData ){ minLocal = pBt->minLeaf; maxLocal = pBt->maxLeaf; }else{ minLocal = pBt->minLocal; maxLocal = pBt->maxLocal; } if( nPayload<=maxLocal ){ pInfo->nLocal = nPayload; pInfo->iOverflow = 0; |
︙ | ︙ | |||
553 554 555 556 557 558 559 | ** ** If the page contains nBytes of free space but does not contain ** nBytes of contiguous free space, then this routine automatically ** calls defragementPage() to consolidate all free space before ** allocating the new chunk. ** ** Algorithm: Carve a piece off of the first freeblock that is | | | 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 | ** ** If the page contains nBytes of free space but does not contain ** nBytes of contiguous free space, then this routine automatically ** calls defragementPage() to consolidate all free space before ** allocating the new chunk. ** ** Algorithm: Carve a piece off of the first freeblock that is ** nByte in size or larger. */ static int allocateSpace(MemPage *pPage, int nByte){ int addr, pc, hdr; int size; int nFrag; unsigned char *data; #ifndef NDEBUG |
︙ | ︙ | |||
1220 1221 1222 1223 1224 1225 1226 | } invalidateCursors(pBt); unlockBtreeIfUnused(pBt); return rc; } /* | | | | | | | > > > > > | | | | | | 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 | } invalidateCursors(pBt); unlockBtreeIfUnused(pBt); return rc; } /* ** Start a statement subtransaction. The subtransaction can ** can be rolled back independently of the main transaction. ** You must start a transaction before starting a subtransaction. ** The subtransaction is ended automatically if the main transaction ** commits or rolls back. ** ** Only one subtransaction may be active at a time. It is an error to try ** to start a new subtransaction if another subtransaction is already active. ** ** Statement subtransactions are used around individual SQL statements ** that are contained within a BEGIN...COMMIT block. If a constraint ** error occurs within the statement, the effect of that one statement ** can be rolled back without having to rollback the entire transaction. */ int sqlite3BtreeBeginStmt(Btree *pBt){ int rc; if( !pBt->inTrans || pBt->inStmt ){ return pBt->readOnly ? SQLITE_READONLY : SQLITE_ERROR; } rc = pBt->readOnly ? SQLITE_OK : sqlite3pager_stmt_begin(pBt->pPager); pBt->inStmt = 1; return rc; } /* ** Commit the statment subtransaction currently in progress. If no ** subtransaction is active, this is a no-op. */ int sqlite3BtreeCommitStmt(Btree *pBt){ int rc; if( pBt->inStmt && !pBt->readOnly ){ rc = sqlite3pager_stmt_commit(pBt->pPager); }else{ rc = SQLITE_OK; } pBt->inStmt = 0; return rc; } /* ** Rollback the active statement subtransaction. If no subtransaction ** is active this routine is a no-op. ** ** All cursors will be invalidated by this operation. Any attempt ** to use a cursor that was open at the beginning of this operation ** will result in an error. */ int sqlite3BtreeRollbackStmt(Btree *pBt){ int rc; if( pBt->inStmt==0 || pBt->readOnly ) return SQLITE_OK; rc = sqlite3pager_stmt_rollback(pBt->pPager); |
︙ | ︙ | |||
1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 | if( pCur->pPage ){ sqlite3pager_unref(pCur->pPage->aData); } } /* ** Make sure the BtCursor.info field of the given cursor is valid. */ static void getCellInfo(BtCursor *pCur){ MemPage *pPage = pCur->pPage; if( !pCur->infoValid ){ parseCell(pPage, pPage->aCell[pCur->idx], &pCur->info); pCur->infoValid = 1; }else{ | > > > > | 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 | if( pCur->pPage ){ sqlite3pager_unref(pCur->pPage->aData); } } /* ** Make sure the BtCursor.info field of the given cursor is valid. ** If it is not already valid, call parseCell() to fill it in. ** ** BtCursor.info is a cache of the information in the current cell. ** Using this cache reduces the number of calls to parseCell(). */ static void getCellInfo(BtCursor *pCur){ MemPage *pPage = pCur->pPage; if( !pCur->infoValid ){ parseCell(pPage, pPage->aCell[pCur->idx], &pCur->info); pCur->infoValid = 1; }else{ |
︙ | ︙ | |||
1521 1522 1523 1524 1525 1526 1527 | /* ** Read payload information from the entry that the pCur cursor is ** pointing to. Begin reading the payload at "offset" and read ** a total of "amt" bytes. Put the result in zBuf. ** ** This routine does not make a distinction between key and data. | | > | 1534 1535 1536 1537 1538 1539 1540 1541 1542 1543 1544 1545 1546 1547 1548 1549 | /* ** Read payload information from the entry that the pCur cursor is ** pointing to. Begin reading the payload at "offset" and read ** a total of "amt" bytes. Put the result in zBuf. ** ** This routine does not make a distinction between key and data. ** It just reads bytes from the payload area. Data might appear ** on the main page or be scattered out on multiple overflow pages. */ static int getPayload( BtCursor *pCur, /* Cursor pointing to entry to read from */ int offset, /* Begin reading this far into payload */ int amt, /* Read this many bytes */ unsigned char *pBuf, /* Write the bytes into this buffer */ int skipKey /* offset begins at data if this is true */ |
︙ | ︙ | |||
1736 1737 1738 1739 1740 1741 1742 | const void *sqlite3BtreeDataFetch(BtCursor *pCur, int amt){ return (const void*)fetchPayload(pCur, amt, 1); } /* ** Move the cursor down to a new child page. The newPgno argument is the | | | 1750 1751 1752 1753 1754 1755 1756 1757 1758 1759 1760 1761 1762 1763 1764 | const void *sqlite3BtreeDataFetch(BtCursor *pCur, int amt){ return (const void*)fetchPayload(pCur, amt, 1); } /* ** Move the cursor down to a new child page. The newPgno argument is the ** page number of the child page to move to. */ static int moveToChild(BtCursor *pCur, u32 newPgno){ int rc; MemPage *pNewPage; MemPage *pOldPage; Btree *pBt = pCur->pBt; |
︙ | ︙ | |||
1967 1968 1969 1970 1971 1972 1973 | ** ** If an exact match is not found, then the cursor is always ** left pointing at a leaf page which would hold the entry if it ** were present. The cursor might point to an entry that comes ** before or after the key. ** ** The result of comparing the key with the entry to which the | < | | 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 | ** ** If an exact match is not found, then the cursor is always ** left pointing at a leaf page which would hold the entry if it ** were present. The cursor might point to an entry that comes ** before or after the key. ** ** The result of comparing the key with the entry to which the ** cursor is written to *pRes if pRes!=NULL. The meaning of ** this value is as follows: ** ** *pRes<0 The cursor is left pointing at an entry that ** is smaller than pKey or if the table is empty ** and the cursor is therefore left point to nothing. ** ** *pRes==0 The cursor is left pointing at an entry that |
︙ | ︙ | |||
2034 2035 2036 2037 2038 2039 2040 | } if( c==0 ){ if( pPage->leafData && !pPage->leaf ){ lwr = pCur->idx; upr = lwr - 1; break; }else{ | < < | 2047 2048 2049 2050 2051 2052 2053 2054 2055 2056 2057 2058 2059 2060 2061 2062 2063 2064 2065 2066 2067 2068 2069 2070 2071 2072 2073 2074 2075 2076 2077 2078 2079 2080 | } if( c==0 ){ if( pPage->leafData && !pPage->leaf ){ lwr = pCur->idx; upr = lwr - 1; break; }else{ if( pRes ) *pRes = 0; return SQLITE_OK; } } if( c<0 ){ lwr = pCur->idx+1; }else{ upr = pCur->idx-1; } } assert( lwr==upr+1 ); assert( pPage->isInit ); if( pPage->leaf ){ chldPg = 0; }else if( lwr>=pPage->nCell ){ chldPg = get4byte(&pPage->aData[pPage->hdrOffset+6]); }else{ chldPg = get4byte(&pPage->aCell[lwr][2]); } if( chldPg==0 ){ assert( pCur->idx>=0 && pCur->idx<pCur->pPage->nCell ); if( pRes ) *pRes = c; return SQLITE_OK; } pCur->idx = lwr; pCur->infoValid = 0; rc = moveToChild(pCur, chldPg); |
︙ | ︙ | |||
2666 2667 2668 2669 2670 2671 2672 2673 2674 2675 2676 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 | pc += aSize[i]; assert( pc<=pPage->pBt->usableSize ); } pPage->nCell = nCell; put2byte(data+prevpc, 0); } /* ** Rebuild the linked list of cells on a page so that the cells ** occur in the order specified by the pPage->aCell[] array. ** Invoke this routine once to repair damage after one or more ** invocations of either insertCell() or dropCell(). */ static void relinkCellList(MemPage *pPage){ int i, idxFrom; assert( sqlite3pager_iswriteable(pPage->aData) ); if( !pPage->needRelink ) return; idxFrom = pPage->hdrOffset+3; for(i=0; i<pPage->nCell; i++){ int idx = Addr(pPage->aCell[i]) - Addr(pPage->aData); assert( idx>pPage->hdrOffset && idx<pPage->pBt->usableSize ); put2byte(&pPage->aData[idxFrom], idx); idxFrom = idx; } put2byte(&pPage->aData[idxFrom], 0); pPage->needRelink = 0; } /* ** GCC does not define the offsetof() macro so we'll have to do it ** ourselves. */ #ifndef offsetof #define offsetof(STRUCTURE,FIELD) ((int)((char*)&((STRUCTURE*)0)->FIELD)) | > > | 2677 2678 2679 2680 2681 2682 2683 2684 2685 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 2701 2702 2703 2704 2705 2706 2707 2708 2709 2710 2711 2712 | pc += aSize[i]; assert( pc<=pPage->pBt->usableSize ); } pPage->nCell = nCell; put2byte(data+prevpc, 0); } #if 0 /* Never Used */ /* ** Rebuild the linked list of cells on a page so that the cells ** occur in the order specified by the pPage->aCell[] array. ** Invoke this routine once to repair damage after one or more ** invocations of either insertCell() or dropCell(). */ static void relinkCellList(MemPage *pPage){ int i, idxFrom; assert( sqlite3pager_iswriteable(pPage->aData) ); if( !pPage->needRelink ) return; idxFrom = pPage->hdrOffset+3; for(i=0; i<pPage->nCell; i++){ int idx = Addr(pPage->aCell[i]) - Addr(pPage->aData); assert( idx>pPage->hdrOffset && idx<pPage->pBt->usableSize ); put2byte(&pPage->aData[idxFrom], idx); idxFrom = idx; } put2byte(&pPage->aData[idxFrom], 0); pPage->needRelink = 0; } #endif /* ** GCC does not define the offsetof() macro so we'll have to do it ** ourselves. */ #ifndef offsetof #define offsetof(STRUCTURE,FIELD) ((int)((char*)&((STRUCTURE*)0)->FIELD)) |
︙ | ︙ | |||
2747 2748 2749 2750 2751 2752 2753 | ** in exchange for a larger degradation in INSERT and UPDATE performance. ** The value of NN appears to give the best results overall. */ #define NN 1 /* Number of neighbors on either side of pPage */ #define NB (NN*2+1) /* Total pages involved in the balance */ /* | | | | | | < < < < < < | 2760 2761 2762 2763 2764 2765 2766 2767 2768 2769 2770 2771 2772 2773 2774 2775 2776 2777 2778 2779 2780 2781 2782 2783 2784 2785 2786 2787 | ** in exchange for a larger degradation in INSERT and UPDATE performance. ** The value of NN appears to give the best results overall. */ #define NN 1 /* Number of neighbors on either side of pPage */ #define NB (NN*2+1) /* Total pages involved in the balance */ /* ** This routine redistributes Cells on pPage and up to NN*2 siblings ** of pPage so that all pages have about the same amount of free space. ** Usually one sibling on either side of pPage is used in the balancing, ** though both siblings might come from one side if pPage is the first ** or last child of its parent. If pPage has fewer than 2*NN siblings ** (something which can only happen if pPage is the root page or a ** child of root) then all available siblings participate in the balancing. ** ** The number of siblings of pPage might be increased or decreased by ** one in an effort to keep pages nearly full but not over full. The root page ** is special and is allowed to be nearly empty. If pPage is ** the root page, then the depth of the tree might be increased ** or decreased by one, as necessary, to keep the root page from being ** overfull or completely empty. ** ** Note that when this routine is called, some of the Cells on pPage ** might not actually be stored in pPage->aData[]. This can happen ** if the page is overfull. Part of the job of this routine is to ** make sure all Cells for pPage once again fit in pPage->aData[]. ** ** In the course of balancing the siblings of pPage, the parent of pPage |
︙ | ︙ | |||
2822 2823 2824 2825 2826 2827 2828 | ** underfull. */ assert( pPage->isInit ); assert( sqlite3pager_iswriteable(pPage->aData) ); pBt = pPage->pBt; if( !pPage->isOverfull && pPage->nFree<pBt->usableSize*2/3 && pPage->nCell>=2){ | | | | 2829 2830 2831 2832 2833 2834 2835 2836 2837 2838 2839 2840 2841 2842 2843 2844 2845 2846 2847 2848 2849 2850 2851 2852 2853 2854 2855 2856 2857 2858 2859 | ** underfull. */ assert( pPage->isInit ); assert( sqlite3pager_iswriteable(pPage->aData) ); pBt = pPage->pBt; if( !pPage->isOverfull && pPage->nFree<pBt->usableSize*2/3 && pPage->nCell>=2){ assert( pPage->needRelink==0 ); return SQLITE_OK; } /* ** Find the parent of the page to be balanced. If there is no parent, ** it means this page is the root page and special rules apply. */ pParent = pPage->pParent; if( pParent==0 ){ Pgno pgnoChild; MemPage *pChild; assert( pPage->isInit ); if( pPage->nCell==0 ){ if( pPage->leaf ){ /* The table is completely empty */ assert( pPage->needRelink==0 ); TRACE(("BALANCE: empty table %d\n", pPage->pgno)); }else{ /* The root page is empty but has one child. Transfer the ** information from that one child into the root page if it ** will fit. This reduces the depth of the tree by one. ** ** If the root page is page 1, it has less space available than |
︙ | ︙ | |||
2893 2894 2895 2896 2897 2898 2899 | releasePage(pChild); } return SQLITE_OK; } if( !pPage->isOverfull ){ /* It is OK for the root page to be less than half full. */ | | | 2900 2901 2902 2903 2904 2905 2906 2907 2908 2909 2910 2911 2912 2913 2914 | releasePage(pChild); } return SQLITE_OK; } if( !pPage->isOverfull ){ /* It is OK for the root page to be less than half full. */ assert( pPage->needRelink==0 ); TRACE(("BALANCE: root page %d is low - no changes\n", pPage->pgno)); return SQLITE_OK; } /* ** If we get to here, it means the root page is overfull. ** When this happens, Create a new child page and copy the ** contents of the root into the child. Then make the root |
︙ | ︙ | |||
3213 3214 3215 3216 3217 3218 3219 | MemPage *pNew = apNew[i]; assert( pNew->pgno==pgnoNew[i] ); resizeCellArray(pNew, cntNew[i] - j); assemblePage(pNew, cntNew[i]-j, &apCell[j], &szCell[j]); j = cntNew[i]; assert( pNew->nCell>0 ); assert( !pNew->isOverfull ); | | | 3220 3221 3222 3223 3224 3225 3226 3227 3228 3229 3230 3231 3232 3233 3234 | MemPage *pNew = apNew[i]; assert( pNew->pgno==pgnoNew[i] ); resizeCellArray(pNew, cntNew[i] - j); assemblePage(pNew, cntNew[i]-j, &apCell[j], &szCell[j]); j = cntNew[i]; assert( pNew->nCell>0 ); assert( !pNew->isOverfull ); assert( pNew->needRelink==0 ); if( i<nNew-1 && j<nCell ){ u8 *pCell; u8 *pTemp; int sz; pCell = apCell[j]; sz = szCell[j] + leafCorrection; if( !pNew->leaf ){ |
︙ | ︙ | |||
3487 3488 3489 3490 3491 3492 3493 | return rc; } /* ** Create a new BTree table. Write into *piTable the page ** number for the root page of the new table. ** | > | < | | > | | 3494 3495 3496 3497 3498 3499 3500 3501 3502 3503 3504 3505 3506 3507 3508 3509 3510 3511 3512 3513 | return rc; } /* ** Create a new BTree table. Write into *piTable the page ** number for the root page of the new table. ** ** The type of type is determined by the flags parameter. Only the ** following values of flags are currently in use. Other values for ** flags might not work: ** ** BTREE_INTKEY|BTREE_LEAFDATA Used for SQL tables with rowid keys ** BTREE_ZERODATA Used for SQL indices */ int sqlite3BtreeCreateTable(Btree *pBt, int *piTable, int flags){ MemPage *pRoot; Pgno pgnoRoot; int rc; if( !pBt->inTrans ){ /* Must start a transaction first */ |
︙ | ︙ | |||
3555 3556 3557 3558 3559 3560 3561 | zeroPage(pPage, pPage->aData[0] | PTF_LEAF); } releasePage(pPage); return rc; } /* | | > > > > > > | 3563 3564 3565 3566 3567 3568 3569 3570 3571 3572 3573 3574 3575 3576 3577 3578 3579 3580 3581 3582 3583 | zeroPage(pPage, pPage->aData[0] | PTF_LEAF); } releasePage(pPage); return rc; } /* ** Delete all information from a single table in the database. iTable is ** the page number of the root of the table. After this routine returns, ** the root page is empty, but still exists. ** ** This routine will fail with SQLITE_LOCKED if there are any open ** read cursors on the table. Open write cursors are moved to the ** root of the table. */ int sqlite3BtreeClearTable(Btree *pBt, int iTable){ int rc; BtCursor *pCur; if( !pBt->inTrans ){ return pBt->readOnly ? SQLITE_READONLY : SQLITE_ERROR; } |
︙ | ︙ | |||
3579 3580 3581 3582 3583 3584 3585 | } return rc; } /* ** Erase all information in a table and add the root of the table to ** the freelist. Except, the root of the principle table (the one on | | > > > | 3593 3594 3595 3596 3597 3598 3599 3600 3601 3602 3603 3604 3605 3606 3607 3608 3609 3610 | } return rc; } /* ** Erase all information in a table and add the root of the table to ** the freelist. Except, the root of the principle table (the one on ** page 1) is never added to the freelist. ** ** This routine will fail with SQLITE_LOCKED if there are any open ** cursors on the table. */ int sqlite3BtreeDropTable(Btree *pBt, int iTable){ int rc; MemPage *pPage; BtCursor *pCur; if( !pBt->inTrans ){ return pBt->readOnly ? SQLITE_READONLY : SQLITE_ERROR; |
︙ | ︙ |