Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Stricter enforcement of cell sizes when doing balancing operations on the btree, in order to catch file corruption sooner. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | trunk |
Files: | files | file ages | folders |
SHA3-256: |
12713f320b2c1def273dd8b7833dddaa |
User & Date: | drh 2019-01-23 19:25:59.893 |
References
2019-02-01
| ||
14:50 | Improve the strict enforcement of cell sizes in balancing from check-in [12713f320b2c1def] so that it also works with table-btrees in addition to index-btrees. (check-in: ef27e7a087 user: drh tags: trunk) | |
Context
2019-01-23
| ||
19:50 | Fix a problem with renaming a table within a schema that contains a composite query that uses a column alias as an ORDER BY term. (check-in: 2ca6b8f84e user: dan tags: trunk) | |
19:25 | Stricter enforcement of cell sizes when doing balancing operations on the btree, in order to catch file corruption sooner. (check-in: 12713f320b user: drh tags: trunk) | |
19:17 | Fix another fts5 crash that can occur if the database is corrupted. (check-in: 44ce8baa47 user: dan tags: trunk) | |
Changes
Changes to src/btree.c.
︙ | ︙ | |||
6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 | */ ptrmapPutOvflPtr(pPage, pPage, pCell, pRC); } #endif } } /* ** A CellArray object contains a cache of pointers and sizes for a ** consecutive sequence of cells that might be held on multiple pages. */ typedef struct CellArray CellArray; struct CellArray { int nCell; /* Number of cells in apCell[] */ MemPage *pRef; /* Reference page */ u8 **apCell; /* All cells begin balanced */ u16 *szCell; /* Local size of all cells in apCell[] */ }; /* ** Make sure the cell sizes at idx, idx+1, ..., idx+N-1 have been ** computed. */ static void populateCellCache(CellArray *p, int idx, int N){ | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | 6689 6690 6691 6692 6693 6694 6695 6696 6697 6698 6699 6700 6701 6702 6703 6704 6705 6706 6707 6708 6709 6710 6711 6712 6713 6714 6715 6716 6717 6718 6719 6720 6721 6722 6723 6724 6725 6726 6727 6728 6729 6730 6731 6732 6733 6734 6735 6736 6737 6738 6739 6740 6741 6742 6743 6744 6745 6746 6747 6748 6749 6750 6751 6752 6753 6754 6755 6756 6757 6758 6759 6760 6761 6762 6763 6764 6765 6766 6767 6768 6769 6770 6771 6772 6773 6774 6775 6776 6777 | */ ptrmapPutOvflPtr(pPage, pPage, pCell, pRC); } #endif } } /* ** The following parameters determine how many adjacent pages get involved ** in a balancing operation. NN is the number of neighbors on either side ** of the page that participate in the balancing operation. NB is the ** total number of pages that participate, including the target page and ** NN neighbors on either side. ** ** The minimum value of NN is 1 (of course). Increasing NN above 1 ** (to 2 or 3) gives a modest improvement in SELECT and DELETE performance ** in exchange for a larger degradation in INSERT and UPDATE performance. ** The value of NN appears to give the best results overall. ** ** (Later:) The description above makes it seem as if these values are ** tunable - as if you could change them and recompile and it would all work. ** But that is unlikely. NB has been 3 since the inception of SQLite and ** we have never tested any other value. */ #define NN 1 /* Number of neighbors on either side of pPage */ #define NB 3 /* (NN*2+1): Total pages involved in the balance */ /* ** A CellArray object contains a cache of pointers and sizes for a ** consecutive sequence of cells that might be held on multiple pages. ** ** The cells in this array are the divider cell or cells from the pParent ** page plus up to three child pages. There are a total of nCell cells. ** ** pRef is a pointer to one of the pages that contributes cells. This is ** used to access information such as MemPage.intKey and MemPage.pBt->pageSize ** which should be common to all pages that contribute cells to this array. ** ** apCell[] and szCell[] hold, respectively, pointers to the start of each ** cell and the size of each cell. Some of the apCell[] pointers might refer ** to overflow cells. In other words, some apCel[] pointers might not point ** to content area of the pages. ** ** A szCell[] of zero means the size of that cell has not yet been computed. ** ** The cells come from as many as four different pages: ** ** ----------- ** | Parent | ** ----------- ** / | \ ** / | \ ** --------- --------- --------- ** |Child-1| |Child-2| |Child-3| ** --------- --------- --------- ** ** The order of cells is in the array is: ** ** 1. All cells from Child-1 in order ** 2. The first divider cell from Parent ** 3. All cells from Child-2 in order ** 4. The second divider cell from Parent ** 5. All cells from Child-3 in order ** ** The apEnd[] array holds pointer to the end of page for Child-1, the ** Parent, Child-2, the Parent (again), and Child-3. The ixNx[] array ** holds the number of cells contained in each of these 5 stages, and ** all stages to the left. Hence: ** ixNx[0] = Number of cells in Child-1. ** ixNx[1] = Number of cells in Child-1 plus 1 for first divider. ** ixNx[2] = Number of cells in Child-1 and Child-2 + 1 for 1st divider. ** ixNx[3] = Number of cells in Child-1 and Child-2 + both divider cells ** ixNx[4] = Total number of cells. */ typedef struct CellArray CellArray; struct CellArray { int nCell; /* Number of cells in apCell[] */ MemPage *pRef; /* Reference page */ u8 **apCell; /* All cells begin balanced */ u16 *szCell; /* Local size of all cells in apCell[] */ u8 *apEnd[NB*2]; /* MemPage.aDataEnd values */ int ixNx[NB*2]; /* Index of at which we move to the next apEnd[] */ }; /* ** Make sure the cell sizes at idx, idx+1, ..., idx+N-1 have been ** computed. */ static void populateCellCache(CellArray *p, int idx, int N){ |
︙ | ︙ | |||
6749 6750 6751 6752 6753 6754 6755 | ** function works around problems caused by this by making a copy of any ** such cells before overwriting the page data. ** ** The MemPage.nFree field is invalidated by this function. It is the ** responsibility of the caller to set it correctly. */ static int rebuildPage( | > | < | > > | > > > | > | > > < > | > > | > > > > > | | | | > > > > > > | < | | | | | 6814 6815 6816 6817 6818 6819 6820 6821 6822 6823 6824 6825 6826 6827 6828 6829 6830 6831 6832 6833 6834 6835 6836 6837 6838 6839 6840 6841 6842 6843 6844 6845 6846 6847 6848 6849 6850 6851 6852 6853 6854 6855 6856 6857 6858 6859 6860 6861 6862 6863 6864 6865 6866 6867 6868 6869 6870 6871 6872 6873 6874 6875 6876 6877 6878 6879 6880 6881 6882 6883 6884 6885 6886 6887 6888 6889 6890 6891 6892 6893 6894 6895 6896 6897 6898 | ** function works around problems caused by this by making a copy of any ** such cells before overwriting the page data. ** ** The MemPage.nFree field is invalidated by this function. It is the ** responsibility of the caller to set it correctly. */ static int rebuildPage( CellArray *pCArray, /* Content to be added to page pPg */ int iFirst, /* First cell in pCArray to use */ int nCell, /* Final number of cells on page */ MemPage *pPg /* The page to be reconstructed */ ){ const int hdr = pPg->hdrOffset; /* Offset of header on pPg */ u8 * const aData = pPg->aData; /* Pointer to data for pPg */ const int usableSize = pPg->pBt->usableSize; u8 * const pEnd = &aData[usableSize]; int i = iFirst; /* Which cell to copy from pCArray*/ int j; /* Start of cell content area */ int iEnd = i+nCell; /* Loop terminator */ u8 *pCellptr = pPg->aCellIdx; u8 *pTmp = sqlite3PagerTempSpace(pPg->pBt->pPager); u8 *pData; int k; /* Current slot in pCArray->apEnd[] */ u8 *pSrcEnd; /* Current pCArray->apEnd[k] value */ assert( i<iEnd ); j = get2byte(&aData[hdr+5]); memcpy(&pTmp[j], &aData[j], usableSize - j); for(k=0; pCArray->ixNx[k]<=i && ALWAYS(k<NB*2); k++){} pSrcEnd = pCArray->apEnd[k]; pData = pEnd; while( 1/*exit by break*/ ){ u8 *pCell = pCArray->apCell[i]; u16 sz = pCArray->szCell[i]; assert( sz>0 ); if( SQLITE_WITHIN(pCell,aData,pEnd) ){ if( ((uptr)(pCell+sz))>(uptr)pEnd ) return SQLITE_CORRUPT_BKPT; pCell = &pTmp[pCell - aData]; }else if( (uptr)(pCell+sz)>(uptr)pSrcEnd && (uptr)(pCell)<(uptr)pSrcEnd ){ return SQLITE_CORRUPT_BKPT; } pData -= sz; put2byte(pCellptr, (pData - aData)); pCellptr += 2; if( pData < pCellptr ) return SQLITE_CORRUPT_BKPT; memcpy(pData, pCell, sz); assert( sz==pPg->xCellSize(pPg, pCell) || CORRUPT_DB ); testcase( sz!=pPg->xCellSize(pPg,pCell) ); i++; if( i>=iEnd ) break; if( pCArray->ixNx[k]<=i ){ k++; pSrcEnd = pCArray->apEnd[k]; } } /* The pPg->nFree field is now set incorrectly. The caller will fix it. */ pPg->nCell = nCell; pPg->nOverflow = 0; put2byte(&aData[hdr+1], 0); put2byte(&aData[hdr+3], pPg->nCell); put2byte(&aData[hdr+5], pData - aData); aData[hdr+7] = 0x00; return SQLITE_OK; } /* ** The pCArray objects contains pointers to b-tree cells and the cell sizes. ** This function attempts to add the cells stored in the array to page pPg. ** If it cannot (because the page needs to be defragmented before the cells ** will fit), non-zero is returned. Otherwise, if the cells are added ** successfully, zero is returned. ** ** Argument pCellptr points to the first entry in the cell-pointer array ** (part of page pPg) to populate. After cell apCell[0] is written to the ** page body, a 16-bit offset is written to pCellptr. And so on, for each ** cell in the array. It is the responsibility of the caller to ensure ** that it is safe to overwrite this part of the cell-pointer array. ** |
︙ | ︙ | |||
6821 6822 6823 6824 6825 6826 6827 | ** all cells - not just those inserted by the current call). If the content ** area must be extended to before this point in order to accomodate all ** cells in apCell[], then the cells do not fit and non-zero is returned. */ static int pageInsertArray( MemPage *pPg, /* Page to add cells to */ u8 *pBegin, /* End of cell-pointer array */ | | | | | | > > | > > > > > > > > > > > > > > > > | | | | | | 6906 6907 6908 6909 6910 6911 6912 6913 6914 6915 6916 6917 6918 6919 6920 6921 6922 6923 6924 6925 6926 6927 6928 6929 6930 6931 6932 6933 6934 6935 6936 6937 6938 6939 6940 6941 6942 6943 6944 6945 6946 6947 6948 6949 6950 6951 6952 6953 6954 6955 6956 6957 6958 6959 6960 6961 6962 6963 6964 6965 6966 6967 6968 6969 6970 6971 6972 6973 6974 6975 6976 6977 | ** all cells - not just those inserted by the current call). If the content ** area must be extended to before this point in order to accomodate all ** cells in apCell[], then the cells do not fit and non-zero is returned. */ static int pageInsertArray( MemPage *pPg, /* Page to add cells to */ u8 *pBegin, /* End of cell-pointer array */ u8 **ppData, /* IN/OUT: Page content-area pointer */ u8 *pCellptr, /* Pointer to cell-pointer area */ int iFirst, /* Index of first cell to add */ int nCell, /* Number of cells to add to pPg */ CellArray *pCArray /* Array of cells */ ){ int i = iFirst; /* Loop counter - cell index to insert */ u8 *aData = pPg->aData; /* Complete page */ u8 *pData = *ppData; /* Content area. A subset of aData[] */ int iEnd = iFirst + nCell; /* End of loop. One past last cell to ins */ int k; /* Current slot in pCArray->apEnd[] */ u8 *pEnd; /* Maximum extent of cell data */ assert( CORRUPT_DB || pPg->hdrOffset==0 ); /* Never called on page 1 */ if( iEnd<=iFirst ) return 0; for(k=0; pCArray->ixNx[k]<=i && ALWAYS(k<NB*2); k++){} pEnd = pCArray->apEnd[k]; while( 1 /*Exit by break*/ ){ int sz, rc; u8 *pSlot; sz = cachedCellSize(pCArray, i); if( (aData[1]==0 && aData[2]==0) || (pSlot = pageFindSlot(pPg,sz,&rc))==0 ){ if( (pData - pBegin)<sz ) return 1; pData -= sz; pSlot = pData; } /* pSlot and pCArray->apCell[i] will never overlap on a well-formed ** database. But they might for a corrupt database. Hence use memmove() ** since memcpy() sends SIGABORT with overlapping buffers on OpenBSD */ assert( (pSlot+sz)<=pCArray->apCell[i] || pSlot>=(pCArray->apCell[i]+sz) || CORRUPT_DB ); if( (uptr)(pCArray->apCell[i]+sz)>(uptr)pEnd && (uptr)(pCArray->apCell[i])<(uptr)pEnd ){ assert( CORRUPT_DB ); (void)SQLITE_CORRUPT_BKPT; return 1; } memmove(pSlot, pCArray->apCell[i], sz); put2byte(pCellptr, (pSlot - aData)); pCellptr += 2; i++; if( i>=iEnd ) break; if( pCArray->ixNx[k]<=i ){ k++; pEnd = pCArray->apEnd[k]; } } *ppData = pData; return 0; } /* ** The pCArray object contains pointers to b-tree cells and their sizes. ** ** This function adds the space associated with each cell in the array ** that is currently stored within the body of pPg to the pPg free-list. ** The cell-pointers and other fields of the page are not updated. ** ** This function returns the total number of cells added to the free-list. */ static int pageFreeArray( MemPage *pPg, /* Page to edit */ int iFirst, /* First cell to delete */ int nCell, /* Cells to delete */ |
︙ | ︙ | |||
6910 6911 6912 6913 6914 6915 6916 | assert( pFree>aData && (pFree - aData)<65536 ); freeSpace(pPg, (u16)(pFree - aData), szFree); } return nRet; } /* | | | | | 7013 7014 7015 7016 7017 7018 7019 7020 7021 7022 7023 7024 7025 7026 7027 7028 7029 | assert( pFree>aData && (pFree - aData)<65536 ); freeSpace(pPg, (u16)(pFree - aData), szFree); } return nRet; } /* ** pCArray contains pointers to and sizes of all cells in the pages being ** balanced. The current page, pPg, has pPg->nCell cells starting with ** pCArray->apCell[iOld]. After balancing, this page should hold nNew cells ** starting at apCell[iNew]. ** ** This routine makes the necessary adjustments to pPg so that it contains ** the correct cells after being balanced. ** ** The pPg->nFree field is invalid when this function returns. It is the ** responsibility of the caller to set it correctly. |
︙ | ︙ | |||
7012 7013 7014 7015 7016 7017 7018 | } #endif return SQLITE_OK; editpage_fail: /* Unable to edit this page. Rebuild it from scratch instead. */ populateCellCache(pCArray, iNew, nNew); | | < < < < < < < < < < < < < < < | 7115 7116 7117 7118 7119 7120 7121 7122 7123 7124 7125 7126 7127 7128 7129 7130 | } #endif return SQLITE_OK; editpage_fail: /* Unable to edit this page. Rebuild it from scratch instead. */ populateCellCache(pCArray, iNew, nNew); return rebuildPage(pCArray, iNew, nNew, pPg); } #ifndef SQLITE_OMIT_QUICKBALANCE /* ** This version of balance() handles the common special case where ** a new entry is being inserted on the extreme right-end of the ** tree, in other words, when the new entry will become the largest |
︙ | ︙ | |||
7079 7080 7081 7082 7083 7084 7085 7086 7087 7088 7089 | if( rc==SQLITE_OK ){ u8 *pOut = &pSpace[4]; u8 *pCell = pPage->apOvfl[0]; u16 szCell = pPage->xCellSize(pPage, pCell); u8 *pStop; assert( sqlite3PagerIswriteable(pNew->pDbPage) ); assert( pPage->aData[0]==(PTF_INTKEY|PTF_LEAFDATA|PTF_LEAF) ); zeroPage(pNew, PTF_INTKEY|PTF_LEAFDATA|PTF_LEAF); | > > > > > > > | | > > > | 7167 7168 7169 7170 7171 7172 7173 7174 7175 7176 7177 7178 7179 7180 7181 7182 7183 7184 7185 7186 7187 7188 7189 7190 7191 7192 7193 7194 7195 7196 | if( rc==SQLITE_OK ){ u8 *pOut = &pSpace[4]; u8 *pCell = pPage->apOvfl[0]; u16 szCell = pPage->xCellSize(pPage, pCell); u8 *pStop; CellArray b; assert( sqlite3PagerIswriteable(pNew->pDbPage) ); assert( pPage->aData[0]==(PTF_INTKEY|PTF_LEAFDATA|PTF_LEAF) ); zeroPage(pNew, PTF_INTKEY|PTF_LEAFDATA|PTF_LEAF); b.nCell = 1; b.pRef = pPage; b.apCell = &pCell; b.szCell = &szCell; b.apEnd[0] = pPage->aDataEnd; b.ixNx[0] = 2; rc = rebuildPage(&b, 0, 1, pNew); if( NEVER(rc) ){ releasePage(pNew); return rc; } pNew->nFree = pBt->usableSize - pNew->cellOffset - 2 - szCell; /* If this is an auto-vacuum database, update the pointer map ** with entries for the new page, and any pointer from the ** cell on the page to an overflow page. If either of these ** operations fails, the return code is set, but the contents ** of the parent page are still manipulated by thh code below. |
︙ | ︙ | |||
7564 7565 7566 7567 7568 7569 7570 7571 7572 7573 7574 7575 7576 7577 | ** the right of the i-th sibling page. ** usableSpace: Number of bytes of space available on each sibling. ** */ usableSpace = pBt->usableSize - 12 + leafCorrection; for(i=0; i<nOld; i++){ MemPage *p = apOld[i]; szNew[i] = usableSpace - p->nFree; for(j=0; j<p->nOverflow; j++){ szNew[i] += 2 + p->xCellSize(p, p->apOvfl[j]); } cntNew[i] = cntOld[i]; } k = nOld; | > > > > | 7662 7663 7664 7665 7666 7667 7668 7669 7670 7671 7672 7673 7674 7675 7676 7677 7678 7679 | ** the right of the i-th sibling page. ** usableSpace: Number of bytes of space available on each sibling. ** */ usableSpace = pBt->usableSize - 12 + leafCorrection; for(i=0; i<nOld; i++){ MemPage *p = apOld[i]; b.apEnd[i*2] = p->aDataEnd; b.apEnd[i*2+1] = pParent->aDataEnd; b.ixNx[i*2] = cntOld[i]; b.ixNx[i*2+1] = cntOld[i]+1; szNew[i] = usableSpace - p->nFree; for(j=0; j<p->nOverflow; j++){ szNew[i] += 2 + p->xCellSize(p, p->apOvfl[j]); } cntNew[i] = cntOld[i]; } k = nOld; |
︙ | ︙ |