/ Check-in [eefeda32]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Fix a couple of out-of-date comments in where.c.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | experimental-costs
Files: files | file ages | folders
SHA1: eefeda32d54efbbdf7d20b719299eda48b891fae
User & Date: dan 2014-04-30 14:47:01
Context
2014-04-30
14:53
Update a couple of test cases to account for the fact that this branch prefers an index scan and partial sort over a full-table scan and full external sort. check-in: 9b975bf3 user: dan tags: experimental-costs
14:47
Fix a couple of out-of-date comments in where.c. check-in: eefeda32 user: dan tags: experimental-costs
14:22
Improved rendering of LogEst values corresponding to real values near 0.0 in the tool/logest.c utility program. check-in: 32910c8c user: drh tags: experimental-costs
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to src/where.c.

  4018   4018       if( j<0 ){
  4019   4019         pLoop->nOut += (pTerm->truthProb<=0 ? pTerm->truthProb : -1);
  4020   4020       }
  4021   4021     }
  4022   4022   }
  4023   4023   
  4024   4024   /*
  4025         -** We have so far matched pBuilder->pNew->u.btree.nEq terms of the index pIndex.
  4026         -** Try to match one more.
         4025  +** We have so far matched pBuilder->pNew->u.btree.nEq terms of the 
         4026  +** index pIndex. Try to match one more.
         4027  +**
         4028  +** When this function is called, pBuilder->pNew->nOut contains the 
         4029  +** number of rows expected to be visited by filtering using the nEq 
         4030  +** terms only. If it is modified, this value is restored before this 
         4031  +** function returns.
  4027   4032   **
  4028   4033   ** If pProbe->tnum==0, that means pIndex is a fake index used for the
  4029   4034   ** INTEGER PRIMARY KEY.
  4030   4035   */
  4031   4036   static int whereLoopAddBtreeIndex(
  4032   4037     WhereLoopBuilder *pBuilder,     /* The WhereLoop factory */
  4033   4038     struct SrcList_item *pSrc,      /* FROM clause term being analyzed */
................................................................................
  4081   4086     saved_prereq = pNew->prereq;
  4082   4087     saved_nOut = pNew->nOut;
  4083   4088     pNew->rSetup = 0;
  4084   4089     rLogSize = estLog(pProbe->aiRowLogEst[0]);
  4085   4090   
  4086   4091     /* Consider using a skip-scan if there are no WHERE clause constraints
  4087   4092     ** available for the left-most terms of the index, and if the average
  4088         -  ** number of repeats in the left-most terms is at least 18.  The magic
  4089         -  ** number 18 was found by experimentation to be the payoff point where
  4090         -  ** skip-scan become faster than a full-scan.
  4091         -  */
         4093  +  ** number of repeats in the left-most terms is at least 18. 
         4094  +  **
         4095  +  ** The magic number 18 is selected on the basis that scanning 17 rows
         4096  +  ** is almost always quicker than an index seek (even though if the index
         4097  +  ** contains fewer than 2^17 rows we assume otherwise in other parts of
         4098  +  ** the code). And, even if it is not, it should not be too much slower. 
         4099  +  ** On the other hand, the extra seeks could end up being significantly
         4100  +  ** more expensive.  */
  4092   4101     assert( 42==sqlite3LogEst(18) );
  4093   4102     if( pTerm==0
  4094   4103      && saved_nEq==saved_nSkip
  4095   4104      && saved_nEq+1<pProbe->nKeyCol
  4096   4105      && pProbe->aiRowLogEst[saved_nEq+1]>=42  /* TUNING: Minimum for skip-scan */
  4097   4106      && (rc = whereLoopResize(db, pNew, pNew->nLTerm+1))==SQLITE_OK
  4098   4107     ){
................................................................................
  5222   5231           nOut = pFrom->nRow + pWLoop->nOut;
  5223   5232           maskNew = pFrom->maskLoop | pWLoop->maskSelf;
  5224   5233           if( isOrdered<0 ){
  5225   5234             isOrdered = wherePathSatisfiesOrderBy(pWInfo,
  5226   5235                          pWInfo->pOrderBy, pFrom, pWInfo->wctrlFlags,
  5227   5236                          iLoop, pWLoop, &revMask);
  5228   5237             if( isOrdered>=0 && isOrdered<nOrderBy ){
  5229         -            /* TUNING: Estimated cost of sorting is N*log(N).
  5230         -            ** If the order-by clause has X terms but only the last Y terms
  5231         -            ** are out of order, then block-sorting will reduce the sorting
  5232         -            ** cost to N*log(N)*log(Y/X).  The log(Y/X) term is computed
  5233         -            ** by rScale.
  5234         -            ** TODO: Should the sorting cost get a small multiplier to help
  5235         -            ** discourage the use of sorting and encourage the use of index
  5236         -            ** scans instead?
  5237         -            */
         5238  +            /* TUNING: Estimated cost of a full external sort, where N is 
         5239  +            ** the number of rows to sort is:
         5240  +            **
         5241  +            **   cost = (3.0 * N * log(N)).
         5242  +            ** 
         5243  +            ** Or, if the order-by clause has X terms but only the last Y 
         5244  +            ** terms are out of order, then block-sorting will reduce the 
         5245  +            ** sorting cost to:
         5246  +            **
         5247  +            **   cost = (3.0 * N * log(N)) * (Y/X)
         5248  +            **
         5249  +            ** The (Y/X) term is implemented using stack variable rScale
         5250  +            ** below.  */
  5238   5251               LogEst rScale, rSortCost;
  5239   5252               assert( nOrderBy>0 && 66==sqlite3LogEst(100) );
  5240   5253               rScale = sqlite3LogEst((nOrderBy-isOrdered)*100/nOrderBy) - 66;
  5241   5254               rSortCost = nRowEst + estLog(nRowEst) + rScale + 16;
  5242   5255   
  5243   5256               /* TUNING: The cost of implementing DISTINCT using a B-TREE is
  5244         -            ** also N*log(N) but it has a larger constant of proportionality.
  5245         -            ** Multiply by 3.0. */
         5257  +            ** similar but with a larger constant of proportionality. 
         5258  +            ** Multiply by an additional factor of 3.0.  */
  5246   5259               if( pWInfo->wctrlFlags & WHERE_WANT_DISTINCT ){
  5247   5260                 rSortCost += 16;
  5248   5261               }
  5249   5262               WHERETRACE(0x002,
  5250   5263                  ("---- sort cost=%-3d (%d/%d) increases cost %3d to %-3d\n",
  5251   5264                   rSortCost, (nOrderBy-isOrdered), nOrderBy, rCost,
  5252   5265                   sqlite3LogEstAdd(rCost,rSortCost)));