Documentation Source Text

Check-in [3eaa7f572e]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the skip-scan documentation to mention the magic 18 number for when skip-scan becomes profitable.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 3eaa7f572e6af23d50f26ff3ee8472247236b9b1
User & Date: drh 2013-11-27 04:26:16.936
Context
2013-11-27
19:20
Add a tentative news item for the 3.8.2 release. Updates to CAST documentation. Fix typos. (check-in: 825b2c6448 user: drh tags: trunk)
04:26
Update the skip-scan documentation to mention the magic 18 number for when skip-scan becomes profitable. (check-in: 3eaa7f572e user: drh tags: trunk)
01:26
Add documentation for the SQLITE_WIN32_HEAP compile-time option. Fix typos in the change log. (check-in: 907e8cfb73 user: drh tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/optoverview.in.
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509

510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538

539
540

541
542
543
544
545
546
547
548
549
550

551
552
553
554
555
556
557
558
  SELECT name FROM people WHERE height>=180;
}

PARAGRAPH {
  Because the left-most column of the index does not appear in the
  WHERE clause of the query, one is tempted to conclude that the
  index is not usable here.  But SQLite is able to use the index.
  Conceptually, what SQLite does in this case is transform the
  query into something like the following:
}

CODE {
  SELECT name FROM people
   WHERE role IN (SELECT DISTINCT role FROM people)
     AND height>=180;
}

PARAGRAPH {
  Or this:
}

CODE {
  SELECT name FROM people WHERE role='teacher' AND height>=180
  UNION ALL
  SELECT name FROM people WHERE role='student' AND height>=180;
}

PARAGRAPH {

  The key word above is "conceptually".  SQLite does not really transform
  the query.  But the transformations help to convey a sense of how
  SQLite goes about running the query.  The plan goes like this:
  SQLite locates the first possible value for "role", which it
  can do by rewinding the "people_idx1" index to the beginning and reading
  the first record.  SQLite stores this first "role" value in an
  internal variable that we will here call "$role".  Then SQLite
  runs a query like: "SELECT name FROM people WHERE role=$role AND height>=180".
  This query has an equality constraint on the left-most column of the
  index and so the index can be used to resolve that query.  Once
  that query is finished, SQLite then uses the "people_idx1" index to
  locate the next value of the "role" column, using code that is logically
  similar to "SELECT role FROM people WHERE role>$role".  This new "role"
  value overwrites the $role variable, and the process repeats until all
  possible values for "role" have been examined.
}

PARAGRAPH {
  We call this kind of index usage a "skip-scan" because the database
  engine is basically doing a full scan of the index but it optimizes the
  scan (making it less than "full") by occasionally jumping ahead to the
  next candidate value.
}

PARAGRAPH {
  SQLite might use a skip-scan on an index if it knows that the first
  one or more columns contain only a small number of distinct values.
  If there are too many
  distinct values in the left-most columns of the index, then it would

  be faster to do an ordinary full table scan, and so SQLite will choose
  that option instead.

}

PARAGRAPH {
  The only way that SQLite can know that the left-most columns of an index
  have few distinct values is if the [ANALYZE] command has been run
  on the database.
  Without the results of ANALYZE, SQLite has to guess at the "shape" of
  the data in the table, and the default guess is that there are many
  distinct values for the left-most column of an index, far too many
  to make a skip-scan practical.  Hence, a skip-scan is never used on a

  database that has not been analyzed.
}

HEADING 1 {Joins} joins

PARAGRAPH {
  ^The ON and USING clauses of an inner join are converted into additional
  terms of the WHERE clause prior to WHERE clause analysis described







|
|



















>
|
<
|









|
|
|





|





|
|
|
>
|
<
>




|


|
|
|
>
|







482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511

512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540

541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
  SELECT name FROM people WHERE height>=180;
}

PARAGRAPH {
  Because the left-most column of the index does not appear in the
  WHERE clause of the query, one is tempted to conclude that the
  index is not usable here.  But SQLite is able to use the index.
  Conceptually, SQLite uses the index as if the query were more
  like the following:
}

CODE {
  SELECT name FROM people
   WHERE role IN (SELECT DISTINCT role FROM people)
     AND height>=180;
}

PARAGRAPH {
  Or this:
}

CODE {
  SELECT name FROM people WHERE role='teacher' AND height>=180
  UNION ALL
  SELECT name FROM people WHERE role='student' AND height>=180;
}

PARAGRAPH {
  The alternative query formulations shown above are conceptual only.
  SQLite does not really transform the query. 

  The actual query plan is like this:
  SQLite locates the first possible value for "role", which it
  can do by rewinding the "people_idx1" index to the beginning and reading
  the first record.  SQLite stores this first "role" value in an
  internal variable that we will here call "$role".  Then SQLite
  runs a query like: "SELECT name FROM people WHERE role=$role AND height>=180".
  This query has an equality constraint on the left-most column of the
  index and so the index can be used to resolve that query.  Once
  that query is finished, SQLite then uses the "people_idx1" index to
  locate the next value of the "role" column, using code that is logically
  similar to "SELECT role FROM people WHERE role>$role LIMIT 1".
  This new "role" value overwrites the $role variable, and the process
  repeats until all possible values for "role" have been examined.
}

PARAGRAPH {
  We call this kind of index usage a "skip-scan" because the database
  engine is basically doing a full scan of the index but it optimizes the
  scan (making it less than "full") by occasionally skipping ahead to the
  next candidate value.
}

PARAGRAPH {
  SQLite might use a skip-scan on an index if it knows that the first
  one or more columns contain many duplication values.
  If there are too few duplicates
  in the left-most columns of the index, then it would
  be faster to simply step ahead to the next value, and thus do
  a full table scan, than to do a binary search on an index to locate

  the next left-column value.
}

PARAGRAPH {
  The only way that SQLite can know that the left-most columns of an index
  have many duplicate is if the [ANALYZE] command has been run
  on the database.
  Without the results of ANALYZE, SQLite has to guess at the "shape" of
  the data in the table, and the default guess is that there are an average
  of 10 duplicates for every value in the left-most column of the index.
  But skip-scan only becomes profitable (it only gets to be faster than
  a full table scan) when the number of duplicates is about 18 or more.
  Hence, a skip-scan is never used on a database that has not been analyzed.
}

HEADING 1 {Joins} joins

PARAGRAPH {
  ^The ON and USING clauses of an inner join are converted into additional
  terms of the WHERE clause prior to WHERE clause analysis described