/ Check-in [878da276]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Refactor the cost function in the query planner. Give extra cost (thus reduce likelihood of selection) to full table scans.
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1:878da276ebf643b716ddd650d4d0ca3595fe5bf2
User & Date: drh 2011-02-10 00:08:47
Context
2011-02-10
17:46
Prevent a segfault when automatic indices try to use a column with an unknown collating function. Ticket [77aa3b1e6592582e38605d36]. This check-in also removes some stray \r characters unrelated to the problem. check-in: f01030a0 user: drh tags: trunk
00:08
Refactor the cost function in the query planner. Give extra cost (thus reduce likelihood of selection) to full table scans. check-in: 878da276 user: drh tags: trunk
2011-02-09
19:55
Make sure code *compiles* with each OMIT and ENABLE option. Mostly changes to test modules. check-in: 7cc515ed user: shaneh tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

configure became executable.


install-sh became executable.


Changes to src/where.c.

    15     15   ** rows.  Indices are selected and used to speed the search when doing
    16     16   ** so is applicable.  Because this module is responsible for selecting
    17     17   ** indices, you might also think of this module as the "query optimizer".
    18     18   */
    19     19   #include "sqliteInt.h"
    20     20   
    21     21   
    22         -/*
    23         -** The following parameter define the relative cost of various
    24         -** search operations.  These parameters are used to estimate query
    25         -** plan costs and to select the query plan with the lowest estimated
    26         -** cost.
    27         -**
    28         -** Let the cost of moving from one row in a table or index to the next
    29         -** or previous row be SEEK_COST.
    30         -**
    31         -** Let the base-10 logarithm of the number of rows in a table or
    32         -** index be L.  The estLog() function below will estimate this
    33         -** numbmer given the number of rows in the table.
    34         -**
    35         -** The cost of doing a lookup of an index will be IDX_LKUP_COST*L.
    36         -**
    37         -** The cost of doing a lookup on a table is TBL_LKUP_COST*L.
    38         -**
    39         -** The cost of sorting a result set of N rows is assumed to be
    40         -** N*log10(N)*SORT_COST.
    41         -*/
    42         -#if defined(SEEK_COST)
    43         -  /* Assume that IDX_LKUP_COST, TBL_LKUP_COST, and SORT_COST are also defined */
    44         -#elif !defined(SQLITE_OMIT_FLOATING_POINT)
    45         -#  define SEEK_COST     1.0
    46         -#  define IDX_LKUP_COST 1.0
    47         -#  define TBL_LKUP_COST 0.1
    48         -#  define SORT_COST     3.0
    49         -#else
    50         -#  define SEEK_COST     10
    51         -#  define IDX_LKUP_COST 10
    52         -#  define TBL_LKUP_COST 1
    53         -#  define SORT_COST     30
    54         -#endif
    55         -
    56     22   /*
    57     23   ** Trace output macros
    58     24   */
    59     25   #if defined(SQLITE_TEST) || defined(SQLITE_DEBUG)
    60     26   int sqlite3WhereTrace = 0;
    61     27   #endif
    62     28   #if defined(SQLITE_TEST) && defined(SQLITE_DEBUG)
................................................................................
  2767   2733   
  2768   2734     /* Loop over all indices looking for the best one to use
  2769   2735     */
  2770   2736     for(; pProbe; pIdx=pProbe=pProbe->pNext){
  2771   2737       const unsigned int * const aiRowEst = pProbe->aiRowEst;
  2772   2738       double cost;                /* Cost of using pProbe */
  2773   2739       double nRow;                /* Estimated number of rows in result set */
  2774         -    double nTabSrch;            /* Est number of table searches */
  2775         -    double nIdxSrch;            /* Est number of index searches */
         2740  +    double log10N;              /* base-10 logarithm of nRow (inexact) */
  2776   2741       int rev;                    /* True to scan in reverse order */
  2777   2742       int wsFlags = 0;
  2778   2743       Bitmask used = 0;
  2779   2744   
  2780   2745       /* The following variables are populated based on the properties of
  2781   2746       ** index being evaluated. They are then used to determine the expected
  2782   2747       ** cost and number of rows returned.
................................................................................
  2963   2928           whereEqualScanEst(pParse, pProbe, pFirstTerm->pExpr->pRight, &nRow);
  2964   2929         }else if( pFirstTerm->eOperator==WO_IN && bInEst==0 ){
  2965   2930           whereInScanEst(pParse, pProbe, pFirstTerm->pExpr->x.pList, &nRow);
  2966   2931         }
  2967   2932       }
  2968   2933   #endif /* SQLITE_ENABLE_STAT2 */
  2969   2934   
  2970         -    /* Adjust the number of rows and the cost downward to reflect rows
         2935  +    /* Adjust the number of output rows and downward to reflect rows
  2971   2936       ** that are excluded by range constraints.
  2972   2937       */
  2973   2938       nRow = (nRow * (double)estBound) / (double)100;
  2974   2939       if( nRow<1 ) nRow = 1;
  2975   2940   
  2976         -    /* Assume constant cost to advance from one row to the next and
  2977         -    ** logarithmic cost to do a binary search.  Hence, the initial cost
  2978         -    ** is the number of output rows plus log2(table-size) times the
  2979         -    ** number of binary searches.
  2980         -    **
  2981         -    ** Because fan-out on tables is so much higher than the fan-out on
  2982         -    ** indices (because table btrees contain only integer keys in non-leaf
  2983         -    ** nodes) we weight the cost of a table binary search as 1/10th the
  2984         -    ** cost of an index binary search.
  2985         -    */
  2986         -    if( pIdx ){
  2987         -      if( bLookup ){
  2988         -        /* For an index lookup followed by a table lookup:
  2989         -        **    nInMul index searches to find the start of each index range
  2990         -        **  + nRow steps through the index
  2991         -        **  + nRow table searches to lookup the table entry using the rowid
  2992         -        */
  2993         -        nIdxSrch = nInMul;
  2994         -        nTabSrch = nRow;
  2995         -      }else{
  2996         -        /* For a covering index:
  2997         -        **     nInMul index searches to find the initial entry 
  2998         -        **   + nRow steps through the index
  2999         -        */
  3000         -        nIdxSrch = nInMul;
  3001         -        nTabSrch = 0;
  3002         -      }
         2941  +    /* Experiments run on real SQLite databases show that the time needed
         2942  +    ** to do a binary search to locate a row in a table or index is roughly
         2943  +    ** log10(N) times the time to move from one row to the next row within
         2944  +    ** a table or index.  The actual times can vary, with the size of
         2945  +    ** records being an important factor.  Both moves and searches are
         2946  +    ** slower with larger records, presumably because fewer records fit
         2947  +    ** on one page and hence more pages have to be fetched.
         2948  +    **
         2949  +    ** The ANALYZE command and the sqlite_stat1 and sqlite_stat2 tables do
         2950  +    ** not give us data on the relative sizes of table and index records.
         2951  +    ** So this computation assumes table records are about twice as big
         2952  +    ** as index records
         2953  +    */
         2954  +    if( (wsFlags & WHERE_NOT_FULLSCAN)==0 ){
         2955  +      /* The cost of a full table scan is a number of move operations equal
         2956  +      ** to the number of rows in the table.
         2957  +      **
         2958  +      ** We add an additional 4x penalty to full table scans.  This causes
         2959  +      ** the cost function to err on the side of choosing an index over
         2960  +      ** choosing a full scan.  This 4x full-scan penalty is an arguable
         2961  +      ** decision and one which we expect to revisit in the future.  But
         2962  +      ** it seems to be working well enough at the moment.
         2963  +      */
         2964  +      cost = aiRowEst[0]*4;
  3003   2965       }else{
  3004         -      /* For a rowid primary key lookup:
  3005         -      **    nInMult table searches to find the initial entry for each range
  3006         -      **  + nRow steps through the table
  3007         -      */
  3008         -      nIdxSrch = 0;
  3009         -      nTabSrch = nInMul;
  3010         -    }
  3011         -    cost = nRow + (nIdxSrch*IDX_LKUP_COST + nTabSrch*TBL_LKUP_COST)
  3012         -                      *estLog(aiRowEst[0])/SEEK_COST;
  3013         -
  3014         -    /* Add in the estimated cost of sorting the result.  This cost is expanded
  3015         -    ** by a fudge factor of 3.0 to account for the fact that a sorting step 
  3016         -    ** involves a write and is thus more expensive than a lookup step.
         2966  +      log10N = estLog(aiRowEst[0]);
         2967  +      cost = nRow;
         2968  +      if( pIdx ){
         2969  +        if( bLookup ){
         2970  +          /* For an index lookup followed by a table lookup:
         2971  +          **    nInMul index searches to find the start of each index range
         2972  +          **  + nRow steps through the index
         2973  +          **  + nRow table searches to lookup the table entry using the rowid
         2974  +          */
         2975  +          cost += (nInMul + nRow)*log10N;
         2976  +        }else{
         2977  +          /* For a covering index:
         2978  +          **     nInMul index searches to find the initial entry 
         2979  +          **   + nRow steps through the index
         2980  +          */
         2981  +          cost += nInMul*log10N;
         2982  +        }
         2983  +      }else{
         2984  +        /* For a rowid primary key lookup:
         2985  +        **    nInMult table searches to find the initial entry for each range
         2986  +        **  + nRow steps through the table
         2987  +        */
         2988  +        cost += nInMul*log10N;
         2989  +      }
         2990  +    }
         2991  +
         2992  +    /* Add in the estimated cost of sorting the result.  Actual experimental
         2993  +    ** measurements of sorting performance in SQLite show that sorting time
         2994  +    ** adds C*N*log10(N) to the cost, where N is the number of rows to be 
         2995  +    ** sorted and C is a factor between 1.95 and 4.3.  We will split the
         2996  +    ** difference and select C of 3.0.
  3017   2997       */
  3018   2998       if( bSort ){
  3019         -      cost += nRow*estLog(nRow)*SORT_COST/SEEK_COST;
         2999  +      cost += nRow*estLog(nRow)*3;
  3020   3000       }
  3021   3001   
  3022   3002       /**** Cost of using this index has now been computed ****/
  3023   3003   
  3024   3004       /* If there are additional constraints on this table that cannot
  3025   3005       ** be used with the current index, but which might lower the number
  3026   3006       ** of output rows, adjust the nRow value accordingly.  This only 
................................................................................
  3078   3058         }
  3079   3059         if( nRow<2 ) nRow = 2;
  3080   3060       }
  3081   3061   
  3082   3062   
  3083   3063       WHERETRACE((
  3084   3064         "%s(%s): nEq=%d nInMul=%d estBound=%d bSort=%d bLookup=%d wsFlags=0x%x\n"
  3085         -      "         notReady=0x%llx nTSrch=%.1f nISrch=%.1f nRow=%.1f\n"
  3086         -      "         estLog=%.1f cost=%.1f used=0x%llx\n",
         3065  +      "         notReady=0x%llx log10N=%.1f nRow=%.1f cost=%.1f used=0x%llx\n",
  3087   3066         pSrc->pTab->zName, (pIdx ? pIdx->zName : "ipk"), 
  3088   3067         nEq, nInMul, estBound, bSort, bLookup, wsFlags,
  3089         -      notReady, nTabSrch, nIdxSrch, nRow, estLog(aiRowEst[0]), cost, used
         3068  +      notReady, log10N, nRow, cost, used
  3090   3069       ));
  3091   3070   
  3092   3071       /* If this index is the best we have seen so far, then record this
  3093   3072       ** index and its cost in the pCost structure.
  3094   3073       */
  3095   3074       if( (!pIdx || wsFlags)
  3096   3075        && (cost<pCost->rCost || (cost<=pCost->rCost && nRow<pCost->plan.nRow))

Changes to test/analyze5.test.

    98     98      14  {z>3 AND z<100}       t1z   50
    99     99      15  {z>=4 AND z<100}      t1z   50
   100    100      16  {z>=-100 AND z<=-1}   t1z   50
   101    101      17  {z>=-100 AND z<=0}    t1z  400
   102    102      18  {z>=-100 AND z<0}     t1z   50
   103    103      19  {z>=-100 AND z<=1}    t1z  700
   104    104      20  {z>=-100 AND z<2}     t1z  700
   105         -   21  {z>=-100 AND z<=2}    {}   111
   106         -   22  {z>=-100 AND z<3}     {}   111
          105  +   21  {z>=-100 AND z<=2}    t1z  900
          106  +   22  {z>=-100 AND z<3}     t1z  900
   107    107     
   108    108      31  {z>=0.0 AND z<=0.0}   t1z  400
   109    109      32  {z>=1.0 AND z<=1.0}   t1z  300
   110    110      33  {z>=2.0 AND z<=2.0}   t1z  200
   111    111      34  {z>=3.0 AND z<=3.0}   t1z  100
   112    112      35  {z>=4.0 AND z<=4.0}   t1z   50
   113    113      36  {z>=-1.0 AND z<=-1.0} t1z   50
................................................................................
   121    121      44  {z>3.2 AND z<100}     t1z   50
   122    122      45  {z>=4.0 AND z<100}    t1z   50
   123    123      46  {z>=-100 AND z<=-1.0} t1z   50
   124    124      47  {z>=-100 AND z<=0.0}  t1z  400
   125    125      48  {z>=-100 AND z<0.0}   t1z   50
   126    126      49  {z>=-100 AND z<=1.0}  t1z  700
   127    127      50  {z>=-100 AND z<2.0}   t1z  700
   128         -   51  {z>=-100 AND z<=2.0}  {}   111
   129         -   52  {z>=-100 AND z<3.0}   {}   111
          128  +   51  {z>=-100 AND z<=2.0}  t1z  900
          129  +   52  {z>=-100 AND z<3.0}   t1z  900
   130    130     
   131    131     101  {z=-1}                t1z   50
   132    132     102  {z=0}                 t1z  400
   133    133     103  {z=1}                 t1z  300
   134    134     104  {z=2}                 t1z  200
   135    135     105  {z=3}                 t1z  100
   136    136     106  {z=4}                 t1z   50
................................................................................
   147    147     202  {z IN (0)}            t1z  400
   148    148     203  {z IN (1)}            t1z  300
   149    149     204  {z IN (2)}            t1z  200
   150    150     205  {z IN (3)}            t1z  100
   151    151     206  {z IN (4)}            t1z   50
   152    152     207  {z IN (0.5)}          t1z   50
   153    153     208  {z IN (0,1)}          t1z  700
   154         -  209  {z IN (0,1,2)}        {}   100
          154  +  209  {z IN (0,1,2)}        t1z  900
   155    155     210  {z IN (0,1,2,3)}      {}   100
   156    156     211  {z IN (0,1,2,3,4,5)}  {}   100
   157    157     212  {z IN (1,2)}          t1z  500
   158    158     213  {z IN (2,3)}          t1z  300
   159    159     214  {z=3 OR z=2}          t1z  300
   160    160     215  {z IN (-1,3)}         t1z  150
   161    161     216  {z=-1 OR z=3}         t1z  150