Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Add a big introductory comment to vdbesort.c explaining its operation at a high level. Also adjust some symbolic names and fix other comment issues in that file. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | threads |
Files: | files | file ages | folders |
SHA1: |
eef60f1bf54fcdc7b32f96ebb87a9a0b |
User & Date: | drh 2014-04-02 18:58:49.259 |
Context
2014-04-03
| ||
02:54 | Refactor local object and method names in vdbesort.c so that their names more closely reflect their actual use. (check-in: d284e30eb1 user: drh tags: threads) | |
2014-04-02
| ||
18:58 | Add a big introductory comment to vdbesort.c explaining its operation at a high level. Also adjust some symbolic names and fix other comment issues in that file. (check-in: eef60f1bf5 user: drh tags: threads) | |
15:15 | Fix some problems with OOM handling in vdbesort.c. (check-in: 47e702bd83 user: dan tags: threads) | |
Changes
Changes to src/vdbesort.c.
1 | /* | | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > | | > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > < | > > | | | | < | < | | | | | | | | | | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | /* ** 2011-07-09 ** ** The author disclaims copyright to this source code. In place of ** a legal notice, here is a blessing: ** ** May you do good and not evil. ** May you find forgiveness for yourself and forgive others. ** May you share freely, never taking more than you give. ** ************************************************************************* ** This file contains code for the VdbeSorter object, used in concert with ** a VdbeCursor to sort large numbers of keys for CREATE TABLE statements ** or by SELECT statements with ORDER BY clauses that cannot be satisfied ** using indexes and without LIMIT clauses. ** ** The VdbeSorter object implements a multi-threaded external merge sort ** algorithm that is efficient even if the number of element being sorted ** exceeds the available memory. ** ** Here is the (internal, non-API) interface between this module and the ** rest of the SQLite system: ** ** sqlite3VdbeSorterInit() Create a new VdbeSorter object. ** ** sqlite3VdbeSorterWrite() Add a single new row to the VdbeSorter ** object. The row is a binary blob in the ** OP_MakeRecord format that contains both ** the ORDER BY key columns and result columns ** in the case of a SELECT w/ ORDER BY, or ** the complete record for an index entry ** in the case of a CREATE INDEX. ** ** sqlite3VdbeSorterRewind() Sort all content previously added. ** Position the read cursor on the ** first sorted element. ** ** sqlite3VdbeSorterNext() Advance the read cursor to the next sorted ** element. ** ** sqlite3VdbeSorterRowkey() Return the complete binary blob for the ** row currently under the read cursor. ** ** sqlite3VdbeSorterCompare() Compare the binary blob for the row ** currently under the read cursor against ** another binary blob X and report if ** X is strictly less than the read cursor. ** Used to enforce uniqueness in a ** CREATE UNIQUE INDEX statement. ** ** sqlite3VdbeSorterClose() Close the VdbeSorter object and reclaim ** all resources. ** ** sqlite3VdbeSorterReset() Refurbish the VdbeSorter for reuse. This ** is like Close() followed by Init() only ** much faster. ** ** The interfaces above must be called in a particular order. Write() can ** only occur in between Init()/Reset() and Rewind(). Next(), Rowkey(), and ** Compare() can only occur in between Rewind() and Close()/Reset(). ** ** Algorithm: ** ** Records to be sorted are initially held in memory, in the order in ** which they arrive from Write(). When the amount of memory needed exceeds ** a threshold, all in-memory records are sorted and then appended to ** a temporary file as a "Packed-Memory-Array" or "PMA" and the memory is ** reset. There is a single temporary file used for all PMAs. The PMAs ** are packed one after another in the file. The VdbeSorter object keeps ** track of the number of PMAs written. ** ** When the Rewind() is seen, any records still held in memory are sorted. ** If no PMAs have been written (if all records are still held in memory) ** then subsequent Rowkey(), Next(), and Compare() operations work directly ** from memory. But if PMAs have been written things get a little more ** complicated. ** ** When Rewind() is seen after PMAs have been written, any records still ** in memory are sorted and written as a final PMA. Then all the PMAs ** are merged together into a single massive PMA that Next(), Rowkey(), ** and Compare() walk to extract the records in sorted order. ** ** If SQLITE_MAX_WORKER_THREADS is non-zero, various steps of the above ** algorithm might be performed in parallel by separate threads. Threads ** are only used when one or more PMA spill to disk. If the sort is small ** enough to fit entirely in memory, everything happens on the main thread. */ #include "sqliteInt.h" #include "vdbeInt.h" /* ** Private objects used by the sorter */ typedef struct VdbeSorterIter VdbeSorterIter; typedef struct SortSubtask SortSubtask; typedef struct SorterRecord SorterRecord; typedef struct SorterMerger SorterMerger; typedef struct FileWriter FileWriter; /* ** Candidate values for SortSubtask.eWork */ #define SORT_SUBTASK_SORT 1 /* Sort records on pList */ #define SORT_SUBTASK_TO_PMA 2 /* Xfer pList to Packed-Memory-Array pTemp1 */ #define SORT_SUBTASK_CONS 3 /* Consolidate multiple PMAs */ /* ** Sorting is divided up into smaller subtasks. Each subtask is controlled ** by an instance of this object. Subtask might run in either the main thread ** or in a background thread. ** ** Exactly VdbeSorter.nThread instances of this object are allocated ** as part of each VdbeSorter object. Instances are never allocated any other ** way. VdbeSorter.nThread is set to the number of worker threads allowed ** (see SQLITE_CONFIG_WORKER_THREADS) plus one (the main thread). ** ** When a background thread is launched to perform work, SortSubtask.bDone ** is set to 0 and the SortSubtask.pThread variable set to point to the ** thread handle. SortSubtask.bDone is set to 1 (to indicate to the main ** thread that joining SortSubtask.pThread will not block) before the thread ** exits. SortSubtask.pThread and bDone are always cleared after the ** background thread has been joined. ** ** One object (specifically, VdbeSorter.aThread[VdbeSorter.nThread-1]) ** is reserved for the foreground thread. ** ** The nature of the work performed is determined by SortSubtask.eWork, ** as follows: ** ** SORT_SUBTASK_SORT: ** Sort the linked list of records at SortSubtask.pList. ** ** SORT_SUBTASK_TO_PMA: ** Sort the linked list of records at SortSubtask.pList, and write ** the results to a new PMA in temp file SortSubtask.pTemp1. Open ** the temp file if it is not already open. ** ** SORT_SUBTASK_CONS: ** Merge existing PMAs until SortSubtask.nConsolidate or fewer ** remain in temp file SortSubtask.pTemp1. */ struct SortSubtask { SQLiteThread *pThread; /* Thread handle, or NULL */ int bDone; /* Set to true by pThread when finished */ sqlite3_vfs *pVfs; /* VFS used to open temporary files */ KeyInfo *pKeyInfo; /* How to compare records */ UnpackedRecord *pUnpacked; /* Space to unpack a record */ int pgsz; /* Main database page size */ u8 eWork; /* One of the SORT_SUBTASK_* constants */ int nConsolidate; /* For SORT_SUBTASK_CONS, max final PMAs */ SorterRecord *pList; /* List of records for pThread to sort */ int nInMemory; /* Expected size of PMA based on pList */ u8 *aListMemory; /* Records memory (or NULL) */ int nPMA; /* Number of PMAs currently in pTemp1 */ i64 iTemp1Off; /* Offset to write to in pTemp1 */ sqlite3_file *pTemp1; /* File to write PMAs to, or NULL */ |
︙ | ︙ | |||
1091 1092 1093 1094 1095 1096 1097 | /* ** The main routine for sorter-thread operations. */ static void *vdbeSortSubtaskMain(void *pCtx){ int rc = SQLITE_OK; SortSubtask *pThread = (SortSubtask*)pCtx; | | | | | | 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 | /* ** The main routine for sorter-thread operations. */ static void *vdbeSortSubtaskMain(void *pCtx){ int rc = SQLITE_OK; SortSubtask *pThread = (SortSubtask*)pCtx; assert( pThread->eWork==SORT_SUBTASK_SORT || pThread->eWork==SORT_SUBTASK_TO_PMA || pThread->eWork==SORT_SUBTASK_CONS ); assert( pThread->bDone==0 ); if( pThread->pUnpacked==0 ){ char *pFree; pThread->pUnpacked = sqlite3VdbeAllocUnpackedRecord( pThread->pKeyInfo, 0, 0, &pFree ); assert( pThread->pUnpacked==(UnpackedRecord*)pFree ); if( pFree==0 ){ rc = SQLITE_NOMEM; goto thread_out; } pThread->pUnpacked->nField = pThread->pKeyInfo->nField; pThread->pUnpacked->errCode = 0; } if( pThread->eWork==SORT_SUBTASK_CONS ){ assert( pThread->pList==0 ); while( pThread->nPMA>pThread->nConsolidate && rc==SQLITE_OK ){ int nIter = MIN(pThread->nPMA, SORTER_MAX_MERGE_COUNT); sqlite3_file *pTemp2 = 0; /* Second temp file to use */ SorterMerger *pMerger; /* Object for reading/merging PMA data */ i64 iReadOff = 0; /* Offset in pTemp1 to read from */ i64 iWriteOff = 0; /* Offset in pTemp2 to write to */ |
︙ | ︙ | |||
1184 1185 1186 1187 1188 1189 1190 | pThread->iTemp1Off = iWriteOff; } }else{ /* Sort the pThread->pList list */ rc = vdbeSorterSort(pThread); /* If required, write the list out to a PMA. */ | | | 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 | pThread->iTemp1Off = iWriteOff; } }else{ /* Sort the pThread->pList list */ rc = vdbeSorterSort(pThread); /* If required, write the list out to a PMA. */ if( rc==SQLITE_OK && pThread->eWork==SORT_SUBTASK_TO_PMA ){ #ifdef SQLITE_DEBUG i64 nExpect = pThread->nInMemory + sqlite3VarintLen(pThread->nInMemory) + pThread->iTemp1Off; #endif rc = vdbeSorterListToPMA(pThread); assert( rc!=SQLITE_OK || (nExpect==pThread->iTemp1Off) ); |
︙ | ︙ | |||
1254 1255 1256 1257 1258 1259 1260 | if( pThread==0 ){ pThread = &pSorter->aThread[nWorker]; } pSorter->iPrev = (pThread - pSorter->aThread); if( rc==SQLITE_OK ){ assert( pThread->pThread==0 && pThread->bDone==0 ); | | | 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 | if( pThread==0 ){ pThread = &pSorter->aThread[nWorker]; } pSorter->iPrev = (pThread - pSorter->aThread); if( rc==SQLITE_OK ){ assert( pThread->pThread==0 && pThread->bDone==0 ); pThread->eWork = SORT_SUBTASK_TO_PMA; pThread->pList = pSorter->pRecord; pThread->nInMemory = pSorter->nInMemory; pSorter->nInMemory = 0; pSorter->pRecord = 0; if( pSorter->aMemory ){ u8 *aMem = pThread->aListMemory; |
︙ | ︙ | |||
1419 1420 1421 1422 1423 1424 1425 | ** sort the VdbeSorter.pRecord list. The vdbe layer will read data directly ** from the in-memory list. */ if( pSorter->bUsePMA==0 ){ if( pSorter->pRecord ){ SortSubtask *pThread = &pSorter->aThread[0]; *pbEof = 0; pThread->pList = pSorter->pRecord; | | | 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 | ** sort the VdbeSorter.pRecord list. The vdbe layer will read data directly ** from the in-memory list. */ if( pSorter->bUsePMA==0 ){ if( pSorter->pRecord ){ SortSubtask *pThread = &pSorter->aThread[0]; *pbEof = 0; pThread->pList = pSorter->pRecord; pThread->eWork = SORT_SUBTASK_SORT; assert( pThread->aListMemory==0 ); pThread->aListMemory = pSorter->aMemory; rc = vdbeSorterRunThread(pThread); pThread->aListMemory = 0; pSorter->pRecord = pThread->pList; pThread->pList = 0; }else{ |
︙ | ︙ | |||
1448 1449 1450 1451 1452 1453 1454 | ** some of them together so that this is no longer the case. */ if( vdbeSorterCountPMA(pSorter)>SORTER_MAX_MERGE_COUNT ){ int i; for(i=0; rc==SQLITE_OK && i<pSorter->nThread; i++){ SortSubtask *pThread = &pSorter->aThread[i]; if( pThread->pTemp1 ){ pThread->nConsolidate = SORTER_MAX_MERGE_COUNT / pSorter->nThread; | | | 1518 1519 1520 1521 1522 1523 1524 1525 1526 1527 1528 1529 1530 1531 1532 | ** some of them together so that this is no longer the case. */ if( vdbeSorterCountPMA(pSorter)>SORTER_MAX_MERGE_COUNT ){ int i; for(i=0; rc==SQLITE_OK && i<pSorter->nThread; i++){ SortSubtask *pThread = &pSorter->aThread[i]; if( pThread->pTemp1 ){ pThread->nConsolidate = SORTER_MAX_MERGE_COUNT / pSorter->nThread; pThread->eWork = SORT_SUBTASK_CONS; #if SQLITE_MAX_WORKER_THREADS>0 if( i<(pSorter->nThread-1) ){ void *pCtx = (void*)pThread; rc = sqlite3ThreadCreate(&pThread->pThread,vdbeSortSubtaskMain,pCtx); }else #endif |
︙ | ︙ | |||
1572 1573 1574 1575 1576 1577 1578 | /* ** Compare the key in memory cell pVal with the key that the sorter cursor ** passed as the first argument currently points to. For the purposes of ** the comparison, ignore the rowid field at the end of each record. ** ** If the sorter cursor key contains any NULL values, consider it to be | | > > > | 1642 1643 1644 1645 1646 1647 1648 1649 1650 1651 1652 1653 1654 1655 1656 1657 1658 1659 1660 1661 1662 1663 1664 | /* ** Compare the key in memory cell pVal with the key that the sorter cursor ** passed as the first argument currently points to. For the purposes of ** the comparison, ignore the rowid field at the end of each record. ** ** If the sorter cursor key contains any NULL values, consider it to be ** less than pVal. Even if pVal also contains NULL values. ** ** If an error occurs, return an SQLite error code (i.e. SQLITE_NOMEM). ** Otherwise, set *pRes to a negative, zero or positive value if the ** key in pVal is smaller than, equal to or larger than the current sorter ** key. ** ** This routine forms the core of the OP_SorterCompare opcode, which in ** turn is used to verify uniqueness when constructing a UNIQUE INDEX. */ int sqlite3VdbeSorterCompare( const VdbeCursor *pCsr, /* Sorter cursor */ Mem *pVal, /* Value to compare to current sorter key */ int nIgnore, /* Ignore this many fields at the end */ int *pRes /* OUT: Result of comparison */ ){ |
︙ | ︙ |