Documentation Source Text

Check-in [da4bf79ef3]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add more notes to btreemodule.html.
Downloads: Tarball | ZIP archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: da4bf79ef33bbaf44b08210e54c1a7fcfb9f5732
User & Date: dan 2009-06-12 12:06:28.000
Context
2009-06-13
10:59
Remove an unrequired "todo" in btreemodule.html. (check-in: 1563461430 user: dan tags: trunk)
2009-06-12
12:06
Add more notes to btreemodule.html. (check-in: da4bf79ef3 user: dan tags: trunk)
2009-06-11
03:44
Add a diagram for delete operations. (check-in: dbcfaf4922 user: dan tags: trunk)
Changes
Unified Diff Ignore Whitespace Patch
Changes to pages/btreemodule.in.
94
95
96
97
98
99
100



101
102
103
104
105
106
107
        be accessed. This term is used throughout this document to avoid confusing
	such connections with SQL level SQLite client connections, which are
        sometime simply termed "database connections".
      }]
      [Glossary "Lazy-write cache" {
        <span class=todo>Define this.
      }]



      [Glossary "Page cache" {
        <span class=todo>Define this.
      }]
      [Glossary "Persistent database" {
        <span class=todo>Define this.
      }]
      [Glossary "Read-through cache" {







>
>
>







94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
        be accessed. This term is used throughout this document to avoid confusing
	such connections with SQL level SQLite client connections, which are
        sometime simply termed "database connections".
      }]
      [Glossary "Lazy-write cache" {
        <span class=todo>Define this.
      }]
      [Glossary "Overflow Cell" {
        <span class=todo>Define this.
      }]
      [Glossary "Page cache" {
        <span class=todo>Define this.
      }]
      [Glossary "Persistent database" {
        <span class=todo>Define this.
      }]
      [Glossary "Read-through cache" {
340
341
342
343
344
345
346












347
348
349
350
351
352
353
      <li> <b>Prefix match mode</b>.
      <li> <b>Prefix search mode</b>.
    </ul>

    <p class=todo>
      Finish the bullet points above and add HLR for each search mode.













    [h3 "Writing to the Database Image"]

    <p>
      The B-Tree module allows the user to write values to a subset of the
      fields from the database image header. The set of writable fields is
      the same as the set of fields enumerated in section
      <cite>hlr_reading_data</cite> that the B-Tree module is required to







>
>
>
>
>
>
>
>
>
>
>
>







343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
      <li> <b>Prefix match mode</b>.
      <li> <b>Prefix search mode</b>.
    </ul>

    <p class=todo>
      Finish the bullet points above and add HLR for each search mode.

    <p>
      More than one cursor can be open on a single b-tree structure at one time.
      It is also possible for a write-cursor to modify the contents of a b-tree
      structure while other cursors are open on it. The b-tree module does not
      include any type of row-locking mechanism. It is possible for a write-cursor
      to be used to delete an entry from a b-tree structure even if there are
      one or more other cursors currently pointing to the entry being deleted.

    <p class=todo>
      Requirements to do with how the above is handled. Traceibility to 
      sqlite3BtreeCursorHasMoved is required.

    [h3 "Writing to the Database Image"]

    <p>
      The B-Tree module allows the user to write values to a subset of the
      fields from the database image header. The set of writable fields is
      the same as the set of fields enumerated in section
      <cite>hlr_reading_data</cite> that the B-Tree module is required to
524
525
526
527
528
529
530






531
532
533
534
535
536
537
538
539
540
541
542
543
544










545
546
547
548
549
550
551

    [h3 "Backup/Vacuum API Requirements"]
  <ul>
    <li> Callbacks for backup module.
    <li> Page read/write APIs for backup module.
  </ul>







  [h2 "Other Requirements"]

    [h3 "Caching and Memory Management Requirements" hlr_memory]
  <ul>
    <li> Memory allocation related features (pcache, scratch memory, other...).
    <li> Default pcache implementation (sqlite3_release_memory()).
    <li> Schema memory object allocation (destructor registration).
  </ul>

    [h3 "Fault Tolerance Requirements"]
  <ul>
    <li> Don't corrupt the database. Various modes and the expectations of them.
  </ul>













  [h1 "Module API"]

      <p class=todo>
        Description of the interface in btree.h. Also other interfaces accessed by
        external modules. Including release_memory() and those pager interfaces that







>
>
>
>
>
>
|













>
>
>
>
>
>
>
>
>
>







539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582

    [h3 "Backup/Vacuum API Requirements"]
  <ul>
    <li> Callbacks for backup module.
    <li> Page read/write APIs for backup module.
  </ul>

    [h3 "Integrity Check Requirements"]
  <ul>
    <li> Callbacks for backup module.
    <li> Page read/write APIs for backup module.
  </ul>

  [h2 "Other Requirements and Constraints"]

    [h3 "Caching and Memory Management Requirements" hlr_memory]
  <ul>
    <li> Memory allocation related features (pcache, scratch memory, other...).
    <li> Default pcache implementation (sqlite3_release_memory()).
    <li> Schema memory object allocation (destructor registration).
  </ul>

    [h3 "Fault Tolerance Requirements"]
  <ul>
    <li> Don't corrupt the database. Various modes and the expectations of them.
  </ul>

    [h3 "Well-Formedness Requirements"]
  <ul>
    <li> Identify the subset of file-format well-formedness requirements that
         this module is responsible for implementing.
    <li> Define how the module should respond to corrupt database files: don't
         crash, return SQLITE_CORRUPT as early as is practical. Should it also
         put the b-tree into a permanent error state?
  </ul>
 
 


  [h1 "Module API"]

      <p class=todo>
        Description of the interface in btree.h. Also other interfaces accessed by
        external modules. Including release_memory() and those pager interfaces that
784
785
786
787
788
789
790

791

792

793
794

795
796

797





798
799
800
801
802
803
804
805
      [fancyformat_import_requirement H51016]




    [h2 "Modifying the Database Image"]


      [btree_api_defn sqlite3BtreeCreateTable]



      [btree_api_defn BTREE_INTKEY BTREE_ZERODATA BTREE_LEAFDATA]


      [btree_api_defn sqlite3BtreeDropTable sqlite3BtreeClearTable sqlite3BtreeUpdateMeta]


      [btree_api_defn sqlite3BtreeCursorHasMoved sqlite3BtreePutData]





      [btree_api_defn sqlite3BtreeIncrVacuum]

      [h3 sqlite3BtreeDelete sqlite3BtreeDelete]

        [btree_api_defn sqlite3BtreeDelete]

        [fancyformat_import_requirement L50013]








>

>

>
|

>
|

>
|
>
>
>
>
>
|







815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
      [fancyformat_import_requirement H51016]




    [h2 "Modifying the Database Image"]

      [h3 sqlite3BtreeCreateTable sqlite3BtreeCreateTable]
      [btree_api_defn sqlite3BtreeCreateTable]
      [btree_api_defn BTREE_INTKEY BTREE_ZERODATA BTREE_LEAFDATA]

      [h3 sqlite3BtreeDropTable sqlite3BtreeDropTable]
      [btree_api_defn sqlite3BtreeDropTable ]

      [h3 sqlite3BtreeClearTable sqlite3BtreeClearTable]
      [btree_api_defn sqlite3BtreeClearTable]

      [h3 sqlite3BtreeCursorHasMoved sqlite3BtreeCursorHasMoved]
      [btree_api_defn sqlite3BtreeCursorHasMoved]

      [h3 sqlite3BtreePutData  sqlite3BtreePutData]
      [btree_api_defn sqlite3BtreePutData]

      [h3 sqlite3BtreeUpdateMeta sqlite3BtreeUpdateMeta]
        [btree_api_defn sqlite3BtreeUpdateMeta]

      [h3 sqlite3BtreeDelete sqlite3BtreeDelete]

        [btree_api_defn sqlite3BtreeDelete]

        [fancyformat_import_requirement L50013]

851
852
853
854
855
856
857



858
859
860
861
862
863
864
        and the b-tree cursor has not been positioned as assumed by L50006, the
        results are undefined.

      <p class=todo>
        Malloc and IO error handling. Maybe these should be grouped together
        for a whole bunch of APIs. And hook into the above via a defintion of
        "successful call".




    [h2 "What do these do?"]

    <p class=todo>
      The following is used only from within VdbeExec() to check whether or not
      a cursor was opened on a table or index b-tree. Corruption tests can move into
      the b-tree layer.







>
>
>







892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
        and the b-tree cursor has not been positioned as assumed by L50006, the
        results are undefined.

      <p class=todo>
        Malloc and IO error handling. Maybe these should be grouped together
        for a whole bunch of APIs. And hook into the above via a defintion of
        "successful call".

      [h3 sqlite3BtreeIncrVacuum sqlite3BtreeIncrVacuum]
      [btree_api_defn sqlite3BtreeIncrVacuum]

    [h2 "What do these do?"]

    <p class=todo>
      The following is used only from within VdbeExec() to check whether or not
      a cursor was opened on a table or index b-tree. Corruption tests can move into
      the b-tree layer.
919
920
921
922
923
924
925
926






























927
928
929
930
931
932
933

















































934
935
936
937
938
939
940
       balancing works, deleting an internal cell from an index b-tree etc.

    [h3 "Creating a B-Tree Structure"]
    [h3 "Clearing a B-Tree Structure"]
    [h3 "Deleting a B-Tree Structure"]

    [h3 "Inserting, Replacing and Deleting B-Tree Entries"]































    [h4 "B-Tree Insert/Replace Entry"]

      <p>
        This section describes the way in which new entries may be inserted 
        into a b-tree structure, and how existing entries may be replaced. Both
        of these operations are accessed using the sqlite3BtreeInsert API.


















































    [h4 "B-Tree Delete Entry"]

      <p>
        This section describes the way in which entries may be removed from
	a b-tree structure, as required when the sqlite3BtreeDelete (section
        <cite>sqlite3BtreeDelete</cite>) API is invoked. Removing an entry
        from a b-tree table involves the following steps:








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>







963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
       balancing works, deleting an internal cell from an index b-tree etc.

    [h3 "Creating a B-Tree Structure"]
    [h3 "Clearing a B-Tree Structure"]
    [h3 "Deleting a B-Tree Structure"]

    [h3 "Inserting, Replacing and Deleting B-Tree Entries"]

      <p>
        The following two sections describe the way entries are added and removed
        from B-Tree structures within a database image. 

      <p>
        As one might expect, the algorithms described in the following sections
        involve adding and removing b-tree cells to and from b-tree node pages.
        The format of b-tree node pages is described in detail in 
        <cite>ref_file_format</cite>. This document does not describe the exact
        way in which content is manipulated within a page, as these details are
        considered not considered high-level enough to be documented outside of
        the SQLite source code itself. For the purposes of the descriptions in
        the following sections, a b-tree node page is considered to be a container
        for an ordered list of b-tree cells. Cells may be inserted into or removed
        from any position in the ordered list as required.

      <p>
	A b-tree node page has a finite capacity. If one of the algorithms
	described here is required to insert a cell into a b-tree node page,
	and there is not enough free space within the page to accomadate the
	cell, it is still nominally inserted into the requested position within
        the node, but becomes an overflow cell. Overflow cells never remain so
        for very long. If an insert, replace or delete entry operation creates
        one or more overflow cells, the b-tree structure is rearranged so that
        all cells are stored within the body of a b-tree node page before the
        operation is considered complete. This process of rearranging the b-tree
        structure is termed b-tree balancing, and is described in section 
        <cite>btree_balancing_algorithm</cite>.
        

    [h4 "B-Tree Insert/Replace Entry"]

      <p>
        This section describes the way in which new entries may be inserted 
        into a b-tree structure, and how existing entries may be replaced. Both
        of these operations are accessed using the sqlite3BtreeInsert API.

      <p>
        An insert/replace operation involves the following steps:

      <ol>
        <li> Based on the supplied key and value, and the type of b-tree being
             inserted into, allocate and populate any required overflow pages.
             <span class=todo>Should reference file-format requirements that
             provide the formula for doing this.</span>

        <li> Attempt to move the b-tree write cursor to an entry with a key
             that matches the new key being inserted. If a matching entry is 
             found, then the operation is a replace. Otherwise, if the key is
             not found, an insert.

        <ol type="a">
          <li> Requirements L50008, L50009, L50010 and L50011 apply to the cursor
               seek operation here. This ensures that if the search does not find
	       an exact match, the cursor is left pointing to the leaf page that 
               the new entry should be added into.

          <li> As specified by L50006, the cursor may already be positioned. In 
               this case the seek operation is not required.
        </ol>

        <li> If a matching key was found in the b-tree, then it must be removed and
             the new entry added in its place.

        <ol type="a">
          <li> If there are one or more overflow pages associated with the entry
               being replaced, they are moved to the free-list.
          <li> The cell corresponding to the entry being removed is removed from
               the b-tree node page.
          <li> The new cell is inserted in the position previously occupied by the
               cell removed in the previous step. If the page is not a leaf page,
	       then the first four-bytes (the child-page pointer) of the old
               cell are copied to the first four bytes of the new cell. If the new
               cell is larger than the cell that it replaced, then it may become
               an overflow cell.
        </ol>

        <li> If no matching key was found in the b-tree, then the new cell is inserted
             into the leaf page that the cursor was left pointing to by step 1. The
             new cell may become an overflow cell.

        <li> If the new cell is now an overflow cell, then the balancing algorithm 
             (see section <cite>btree_balancing_algorithm</cite>) is run on the 
             overflowing b-tree node page.
      </ol>

    [h4 "B-Tree Delete Entry"]

      <p>
        This section describes the way in which entries may be removed from
	a b-tree structure, as required when the sqlite3BtreeDelete (section
        <cite>sqlite3BtreeDelete</cite>) API is invoked. Removing an entry
        from a b-tree table involves the following steps:
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
      <p>
        If the b-tree entry being removed is located on a leaf page (as is always the
        case with table b-tree structures), then deleting an entry from a b-tree
        is quite simple.

      [Figure btreemodule_delete1.svg figure_delete1 "Delete from an Internal Node"]

    [h3 "B-Tree Balancing Algorithm"]

     <ul>
       <li><p>The <b>balance deeper</b> sub-algorithm is used when the root page of
           a b-tree is overfull. It creates a new page and copies the
           entire contents of the overfull root page to it. The root page
           is then zeroed and the new page installed as its only child.
           The balancing algorithm is then run on the new child page (in case







|







1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
      <p>
        If the b-tree entry being removed is located on a leaf page (as is always the
        case with table b-tree structures), then deleting an entry from a b-tree
        is quite simple.

      [Figure btreemodule_delete1.svg figure_delete1 "Delete from an Internal Node"]

    [h3 "B-Tree Balancing Algorithm" btree_balancing_algorithm]

     <ul>
       <li><p>The <b>balance deeper</b> sub-algorithm is used when the root page of
           a b-tree is overfull. It creates a new page and copies the
           entire contents of the overfull root page to it. The root page
           is then zeroed and the new page installed as its only child.
           The balancing algorithm is then run on the new child page (in case
1194
1195
1196
1197
1198
1199
1200








1201
1202
1203
1204
1205
1206
1207
      </ol>

    [h3 "Page Allocation and Deallocation"]

     <p class=todo>
       Amongst other things, this section needs to explain our old pals the
       DontWrite() and DontRollback() optimizations.












  [h2 "Transactions and Savepoints"]

     <p class=todo>







>
>
>
>
>
>
>
>







1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
      </ol>

    [h3 "Page Allocation and Deallocation"]

     <p class=todo>
       Amongst other things, this section needs to explain our old pals the
       DontWrite() and DontRollback() optimizations.

    [h4 "Moving an overflow-chain to the free-list" free_overflow_chain]

     <p class=todo>
       Describe how this can sometimes be done without reading the content of
       overflow pages.

    [h3 "Incremental Vacuum Step"]




  [h2 "Transactions and Savepoints"]

     <p class=todo>
Changes to pages/fancyformat.tcl.
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    if {$i == $iLevel} {
      append zNumber "[incr ::SectionNumbers($i)]."
    }
    if {$i > $iLevel} {
      set ::SectionNumbers($i) 0
    }
  }
  # set zNumber [string range $zNumber 0 end-1]

  if {$zName == ""} {
    set zName [string range "section_[string map {. _} $zNumber]" 0 end-1]
  } else {
    set ::References($zName) [list $zNumber $zTitle]
  }

  append ::TOC [subst {
    <div style="margin-left:[expr $iLevel*6]ex">
    <a href="#$zName">${zNumber} $zTitle</a>







|


|







18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
    if {$i == $iLevel} {
      append zNumber "[incr ::SectionNumbers($i)]."
    }
    if {$i > $iLevel} {
      set ::SectionNumbers($i) 0
    }
  }
  set zNumber [string range $zNumber 0 end-1]

  if {$zName == ""} {
    set zName "section_[string map {. _} $zNumber]"
  } else {
    set ::References($zName) [list $zNumber $zTitle]
  }

  append ::TOC [subst {
    <div style="margin-left:[expr $iLevel*6]ex">
    <a href="#$zName">${zNumber} $zTitle</a>