Many hyperlinks are disabled.
Use anonymous login
to enable hyperlinks.
Overview
Comment: | Fix a bug in using stat4 data to estimate the number of rows selected by a range constraint. |
---|---|
Downloads: | Tarball | ZIP archive |
Timelines: | family | ancestors | descendants | both | sqlite_stat4 |
Files: | files | file ages | folders |
SHA1: |
f783938ea999731ea073cd2c78e27809 |
User & Date: | dan 2013-08-08 11:48:57.819 |
Context
2013-08-08
| ||
12:21 | Fix a segfault in "ALTER TABLE t1 ADD COLUMN b DEFAULT (-+1)". Also an assert() failure that could occur if SQLITE_ENABLE_STAT4 were not defined. (check-in: 9fec3e3828 user: dan tags: sqlite_stat4) | |
11:48 | Fix a bug in using stat4 data to estimate the number of rows selected by a range constraint. (check-in: f783938ea9 user: dan tags: sqlite_stat4) | |
2013-08-07
| ||
19:46 | Replace variable Index.avgEq (average number of rows in keys for which there is no sample in sqlite_stat4) with vector Index.aAvgEq. (check-in: 7b70b419c4 user: dan tags: sqlite_stat4) | |
Changes
Changes to src/where.c.
︙ | ︙ | |||
390 391 392 393 394 395 396 | WhereClause *pWC; /* WHERE clause terms */ ExprList *pOrderBy; /* ORDER BY clause */ WhereLoop *pNew; /* Template WhereLoop */ WhereOrSet *pOrSet; /* Record best loops here, if not NULL */ #ifdef SQLITE_ENABLE_STAT4 UnpackedRecord *pRec; /* Probe for stat4 (if required) */ int nRecValid; /* Number of valid fields currently in pRec */ | < | 390 391 392 393 394 395 396 397 398 399 400 401 402 403 | WhereClause *pWC; /* WHERE clause terms */ ExprList *pOrderBy; /* ORDER BY clause */ WhereLoop *pNew; /* Template WhereLoop */ WhereOrSet *pOrSet; /* Record best loops here, if not NULL */ #ifdef SQLITE_ENABLE_STAT4 UnpackedRecord *pRec; /* Probe for stat4 (if required) */ int nRecValid; /* Number of valid fields currently in pRec */ #endif }; /* ** The WHERE clause processing routine has two halves. The ** first part does the start of the WHERE loop and the second ** half does the tail of the WHERE loop. An instance of |
︙ | ︙ | |||
2474 2475 2476 2477 2478 2479 2480 | ** |_____| |_____| ** | | ** pLower pUpper ** ** If either of the upper or lower bound is not present, then NULL is passed in ** place of the corresponding WhereTerm. ** | | | | | | | | | | | | > | | | | | | > | 2473 2474 2475 2476 2477 2478 2479 2480 2481 2482 2483 2484 2485 2486 2487 2488 2489 2490 2491 2492 2493 2494 2495 2496 2497 2498 2499 2500 2501 2502 2503 2504 2505 2506 2507 2508 2509 2510 2511 2512 2513 2514 2515 2516 2517 2518 2519 2520 | ** |_____| |_____| ** | | ** pLower pUpper ** ** If either of the upper or lower bound is not present, then NULL is passed in ** place of the corresponding WhereTerm. ** ** The value in (pBuilder->pNew->u.btree.nEq) is the index of the index ** column subject to the range constraint. Or, equivalently, the number of ** equality constraints optimized by the proposed index scan. For example, ** assuming index p is on t1(a, b), and the SQL query is: ** ** ... FROM t1 WHERE a = ? AND b > ? AND b < ? ... ** ** then nEq is set to 1 (as the range restricted column, b, is the second ** left-most column of the index). Or, if the query is: ** ** ... FROM t1 WHERE a > ? AND a < ? ... ** ** then nEq is set to 0. ** ** When this function is called, *pnOut is set to the whereCost() of the ** number of rows that the index scan is expected to visit without ** considering the range constraints. If nEq is 0, this is the number of ** rows in the index. Assuming no error occurs, *pnOut is adjusted (reduced) ** to account for the range contraints pLower and pUpper. ** ** In the absence of sqlite_stat4 ANALYZE data, or if such data cannot be ** used, each range inequality reduces the search space by a factor of 4. ** Hence a pair of constraints (x>? AND x<?) reduces the expected number of ** rows visited by a factor of 16. */ static int whereRangeScanEst( Parse *pParse, /* Parsing & code generating context */ WhereLoopBuilder *pBuilder, WhereTerm *pLower, /* Lower bound on the range. ex: "x>123" Might be NULL */ WhereTerm *pUpper, /* Upper bound on the range. ex: "x<455" Might be NULL */ WhereCost *pnOut /* IN/OUT: Number of rows visited */ ){ int rc = SQLITE_OK; int nOut = (int)*pnOut; #ifdef SQLITE_ENABLE_STAT4 Index *p = pBuilder->pNew->u.btree.pIndex; int nEq = pBuilder->pNew->u.btree.nEq; if( nEq==pBuilder->nRecValid && p->nSample |
︙ | ︙ | |||
2546 2547 2548 2549 2550 2551 2552 | ){ iUpper = a[0]; if( (pUpper->eOperator & WO_LE)!=0 ) iUpper += a[1]; } } pBuilder->pRec = pRec; if( rc==SQLITE_OK ){ | | | > > | < < | > | | < | | > > | 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561 2562 2563 2564 2565 2566 2567 2568 2569 2570 2571 2572 2573 2574 2575 2576 2577 2578 2579 2580 2581 2582 2583 2584 2585 2586 2587 2588 2589 2590 | ){ iUpper = a[0]; if( (pUpper->eOperator & WO_LE)!=0 ) iUpper += a[1]; } } pBuilder->pRec = pRec; if( rc==SQLITE_OK ){ WhereCost nNew; if( iUpper>iLower ){ nNew = whereCost(iUpper - iLower); }else{ nNew = whereCost(2); /* Small number */ } if( nNew<nOut ){ nOut = nNew; } *pnOut = (WhereCost)nOut; WHERETRACE(0x100, ("range scan regions: %u..%u est=%d\n", (u32)iLower, (u32)iUpper, nOut)); return SQLITE_OK; } } #else UNUSED_PARAMETER(pParse); UNUSED_PARAMETER(pBuilder); #endif assert( pLower || pUpper ); /* TUNING: Each inequality constraint reduces the search space 4-fold. ** A BETWEEN operator, therefore, reduces the search space 16-fold */ if( pLower && (pLower->wtFlags & TERM_VNULL)==0 ){ nOut -= 20; assert( 20==whereCost(4) ); } if( pUpper ){ nOut -= 20; assert( 20==whereCost(4) ); } if( nOut<10 ) nOut = 10; *pnOut = (WhereCost)nOut; return rc; } #ifdef SQLITE_ENABLE_STAT4 /* ** Estimate the number of rows that will be returned based on ** an equality constraint x=VALUE and where that VALUE occurs in |
︙ | ︙ | |||
2637 2638 2639 2640 2641 2642 2643 | if( bOk==0 ) return SQLITE_NOTFOUND; pBuilder->nRecValid = nEq; rc = whereKeyStats(pParse, p, pRec, 0, a); if( rc==SQLITE_OK ){ WHERETRACE(0x100,("equality scan regions: %d\n", (int)a[1])); *pnRow = a[1]; | < < < | 2640 2641 2642 2643 2644 2645 2646 2647 2648 2649 2650 2651 2652 2653 | if( bOk==0 ) return SQLITE_NOTFOUND; pBuilder->nRecValid = nEq; rc = whereKeyStats(pParse, p, pRec, 0, a); if( rc==SQLITE_OK ){ WHERETRACE(0x100,("equality scan regions: %d\n", (int)a[1])); *pnRow = a[1]; } return rc; } #endif /* defined(SQLITE_ENABLE_STAT4) */ #ifdef SQLITE_ENABLE_STAT4 |
︙ | ︙ | |||
2686 2687 2688 2689 2690 2691 2692 | rc = whereEqualScanEst(pParse, pBuilder, pList->a[i].pExpr, &nEst); nRowEst += nEst; pBuilder->nRecValid = nRecValid; } if( rc==SQLITE_OK ){ if( nRowEst > p->aiRowEst[0] ) nRowEst = p->aiRowEst[0]; | < < < | < | 2686 2687 2688 2689 2690 2691 2692 2693 2694 2695 2696 2697 2698 2699 2700 | rc = whereEqualScanEst(pParse, pBuilder, pList->a[i].pExpr, &nEst); nRowEst += nEst; pBuilder->nRecValid = nRecValid; } if( rc==SQLITE_OK ){ if( nRowEst > p->aiRowEst[0] ) nRowEst = p->aiRowEst[0]; *pnRow = nRowEst; WHERETRACE(0x100,("IN row estimate: est=%g\n", nRowEst)); } assert( pBuilder->nRecValid==nRecValid ); return rc; } #endif /* defined(SQLITE_ENABLE_STAT4) */ |
︙ | ︙ | |||
4242 4243 4244 4245 4246 4247 4248 | saved_nOut = pNew->nOut; pNew->rSetup = 0; rLogSize = estLog(whereCost(pProbe->aiRowEst[0])); for(; rc==SQLITE_OK && pTerm!=0; pTerm = whereScanNext(&scan)){ int nIn = 0; #ifdef SQLITE_ENABLE_STAT4 int nRecValid = pBuilder->nRecValid; | < | 4238 4239 4240 4241 4242 4243 4244 4245 4246 4247 4248 4249 4250 4251 | saved_nOut = pNew->nOut; pNew->rSetup = 0; rLogSize = estLog(whereCost(pProbe->aiRowEst[0])); for(; rc==SQLITE_OK && pTerm!=0; pTerm = whereScanNext(&scan)){ int nIn = 0; #ifdef SQLITE_ENABLE_STAT4 int nRecValid = pBuilder->nRecValid; if( (pTerm->wtFlags & TERM_VNULL)!=0 && pSrc->pTab->aCol[iCol].notNull ){ continue; /* skip IS NOT NULL constraints on a NOT NULL column */ } #endif if( pTerm->prereqRight & pNew->maskSelf ) continue; pNew->wsFlags = saved_wsFlags; |
︙ | ︙ | |||
4305 4306 4307 4308 4309 4310 4311 | pNew->wsFlags |= WHERE_COLUMN_RANGE|WHERE_TOP_LIMIT; pTop = pTerm; pBtm = (pNew->wsFlags & WHERE_BTM_LIMIT)!=0 ? pNew->aLTerm[pNew->nLTerm-2] : 0; } if( pNew->wsFlags & WHERE_COLUMN_RANGE ){ /* Adjust nOut and rRun for STAT3 range values */ | | | < < < > | > > | | 4300 4301 4302 4303 4304 4305 4306 4307 4308 4309 4310 4311 4312 4313 4314 4315 4316 4317 4318 4319 4320 4321 4322 4323 4324 4325 4326 4327 4328 4329 4330 4331 4332 4333 4334 4335 4336 4337 4338 4339 4340 4341 4342 4343 4344 4345 4346 4347 4348 4349 4350 4351 4352 | pNew->wsFlags |= WHERE_COLUMN_RANGE|WHERE_TOP_LIMIT; pTop = pTerm; pBtm = (pNew->wsFlags & WHERE_BTM_LIMIT)!=0 ? pNew->aLTerm[pNew->nLTerm-2] : 0; } if( pNew->wsFlags & WHERE_COLUMN_RANGE ){ /* Adjust nOut and rRun for STAT3 range values */ assert( pNew->nOut==saved_nOut ); whereRangeScanEst(pParse, pBuilder, pBtm, pTop, &pNew->nOut); } #ifdef SQLITE_ENABLE_STAT4 if( nInMul==0 && pProbe->nSample && OptimizationEnabled(db, SQLITE_Stat3) ){ Expr *pExpr = pTerm->pExpr; tRowcnt nOut = 0; if( (pTerm->eOperator & (WO_EQ|WO_ISNULL))!=0 ){ testcase( pTerm->eOperator & WO_EQ ); testcase( pTerm->eOperator & WO_ISNULL ); rc = whereEqualScanEst(pParse, pBuilder, pExpr->pRight, &nOut); }else if( (pTerm->eOperator & WO_IN) && !ExprHasProperty(pExpr, EP_xIsSelect) ){ rc = whereInScanEst(pParse, pBuilder, pExpr->x.pList, &nOut); } assert( nOut==0 || rc==SQLITE_OK ); if( nOut ){ nOut = whereCost(nOut); pNew->nOut = MIN(nOut, saved_nOut); } } #endif if( (pNew->wsFlags & (WHERE_IDX_ONLY|WHERE_IPK))==0 ){ /* Each row involves a step of the index, then a binary search of ** the main table */ pNew->rRun = whereCostAdd(pNew->rRun, rLogSize>27 ? rLogSize-17 : 10); } /* Step cost for each output row */ pNew->rRun = whereCostAdd(pNew->rRun, pNew->nOut); /* TBD: Adjust nOut for additional constraints */ rc = whereLoopInsert(pBuilder, pNew); if( (pNew->wsFlags & WHERE_TOP_LIMIT)==0 && pNew->u.btree.nEq<(pProbe->nColumn + (pProbe->zName!=0)) ){ whereLoopAddBtreeIndex(pBuilder, pSrc, pProbe, nInMul+nIn); } #ifdef SQLITE_ENABLE_STAT4 pBuilder->nRecValid = nRecValid; pNew->nOut = saved_nOut; #endif } pNew->prereq = saved_prereq; pNew->u.btree.nEq = saved_nEq; pNew->wsFlags = saved_wsFlags; pNew->nOut = saved_nOut; pNew->nLTerm = saved_nLTerm; |
︙ | ︙ |
Changes to test/analyze8.test.
︙ | ︙ | |||
80 81 82 83 84 85 86 | do_test 2.1 { eqp {SELECT * FROM t1 WHERE a=100 AND b BETWEEN 50 AND 54} } {0 0 0 {SEARCH TABLE t1 USING INDEX t1b (b>? AND b<?)}} # There are many more values of c between 0 and 100000 than there are # between 800000 and 900000. So t1c is more selective for the latter # range. | | > > > > > > > > > > > > | | > | 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 | do_test 2.1 { eqp {SELECT * FROM t1 WHERE a=100 AND b BETWEEN 50 AND 54} } {0 0 0 {SEARCH TABLE t1 USING INDEX t1b (b>? AND b<?)}} # There are many more values of c between 0 and 100000 than there are # between 800000 and 900000. So t1c is more selective for the latter # range. # # Test 3.2 is a little unstable. It depends on the planner estimating # that (b BETWEEN 40 AND 44) will match more rows than (c BETWEEN # 800000 AND 900000). Which is a pretty close call (50 vs. 32), so # the planner could get it wrong with an unlucky set of samples. This # case happens to work, but others ("b BETWEEN 50 AND 54" for example) # will fail. # do_execsql_test 3.0 { SELECT count(*) FROM t1 WHERE b BETWEEN 40 AND 44; SELECT count(*) FROM t1 WHERE c BETWEEN 0 AND 100000; SELECT count(*) FROM t1 WHERE c BETWEEN 800000 AND 900000; } {50 376 32} do_test 3.1 { eqp {SELECT * FROM t1 WHERE b BETWEEN 40 AND 44 AND c BETWEEN 0 AND 100000} } {0 0 0 {SEARCH TABLE t1 USING INDEX t1b (b>? AND b<?)}} do_test 3.2 { eqp {SELECT * FROM t1 WHERE b BETWEEN 40 AND 44 AND c BETWEEN 800000 AND 900000} } {0 0 0 {SEARCH TABLE t1 USING INDEX t1c (c>? AND c<?)}} do_test 3.3 { eqp {SELECT * FROM t1 WHERE a=100 AND c BETWEEN 0 AND 100000} } {0 0 0 {SEARCH TABLE t1 USING INDEX t1a (a=?)}} do_test 3.4 { eqp {SELECT * FROM t1 WHERE a=100 AND c BETWEEN 800000 AND 900000} } {0 0 0 {SEARCH TABLE t1 USING INDEX t1c (c>? AND c<?)}} |
︙ | ︙ |