Documentation Source Text

Check-in [9e047c0045]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Add extra details on balance-siblings algorithm to btreemodule.html.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 9e047c00450ad696861037467e01c895bf25face
User & Date: dan 2009-06-08 12:30:27
Context
2009-06-09
00:55
Correct the description on how short_column_names and full_column_names interact. check-in: 787c316f3f user: drh tags: trunk
2009-06-08
12:30
Add extra details on balance-siblings algorithm to btreemodule.html. check-in: 9e047c0045 user: dan tags: trunk
07:19
Change the form of the tags used to embed svg images. check-in: cd56d0a98c user: dan tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/btreemodule.in.

914
915
916
917
918
919
920





921
922
923
924

































925
926
927













928
929
930

















931

932
















933














934
935

936
937
938
939

940
941
942
943
944
945
946

      [Figure btreemodule_balance_quick.svg figure_balance_quick "Example Balance Quick Transform"]

    [h4 "Balance Siblings" balance_siblings]

      [fancyformat_import_requirement L50001]






      <ol>
        <li> Find the set of sibling pages to balance.
	<li> Find the set of cells involved in the balance. This is the set of
	     cells on the sibling pages, plus the "divider" cells if the

































             sibling pages are not leaves of a table (intkey) b-tree.
        <li> Determine the new number of sibling pages and the new distribution 
             of cells. This is in itself a two-step process:













        <ol type="a">
          <li> Pack all cells...
          <li> Loop through the set of leaf pages...

















        </ol>

        <li> Determine the set of database pages to use as the new sibling pages.
















        <li> Populate each new sibling page.














        <li> Populate the parent page.
        <li> Populate pointer map entries associated with any cells that have moved.

	<li> Populate pointer map entries associated with the sibling pages,
             and any overflow pages pointed to by the new divider cells.
      </ol>



  [h3 "Page Allocation and Deallocation"]

     <p class=todo>
       Amongst other things, this section needs to explain our old pals the
       DontWrite() and DontRollback() optimizations.








>
>
>
>
>

|
<
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
<
<
>
>
>
>
>
>
>
>
>
>
>
>
>

<
<
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

>
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
|
>
>
>
>
>
>
>
>
>
>
>
>
>
>

<
>
|
<
<

>







914
915
916
917
918
919
920
921
922
923
924
925
926
927


928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961


962
963
964
965
966
967
968
969
970
971
972
973
974
975


976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027

1028
1029


1030
1031
1032
1033
1034
1035
1036
1037
1038

      [Figure btreemodule_balance_quick.svg figure_balance_quick "Example Balance Quick Transform"]

    [h4 "Balance Siblings" balance_siblings]

      [fancyformat_import_requirement L50001]

    <p>
      The balance-siblings algorithm is described as a series of steps below.
      <span class=todo> there are a few terms used below that need
      definitions/clarifications.</span>

      <ol>
        <li> Determine the set of sibling pages to redistribute the cells of, using 


             the following rules:
        <ol type="a">
          <li> If the parent page has three or fewer child pages, then all child 
               pages are deemed to be sibling pages for the purposes of the balance-siblings
               algorithm.
	  <li> If the page being balanced is the left-most child of the parent
               page, then the three left-most child pages are used as the siblings.
	  <li> If the page being balanced is the right-most child of the parent
               page, then the three right-most child pages are used as the siblings.
	  <li> Otherwise, if none of the above three conditions are true, then the
               sibling pages are page being balanced and the child pages immediately
               to the left and right of it.
        </ol>

        <li> Determine an ordered list of cells to redistribute. There are several
             variations of this step, depending on the type of page being balanced.
        <ol type="a">
	  <li> If the page being balanced is a leaf page of a table b-tree,
	       then the list of cells to redistribute is simply the concatenation
               of the ordered lists of cells stored on each sibling page, in order
               from left-most sibling to right-most.
	  <li> If the page being balanced is a leaf page of an index b-tree, then 
               the list of cells to redistribute is comprised of the cells on each
               of the sibling pages and the divider cells in the parent page that 
               contain the pointers to each sibling page except the right-most. The
               list is arranged so that it contains: 
           <ul>
             <li> The cells from the left-most sibling page, in order, followed by
             <li> the divider cell from the parent page that contains the pointer 
		  to the left-most sibling (if there is more than one sibling
                  page), followed by
	     <li> the divider cell that contains the pointer to the second left-most
                  sibling and the cells from the remaining sibling page (if there are three
                  sibling pages).


           </ul>
	  <li> If the page being balanced is an internal b-tree node, then the list of
               cells to redistribute is determined as described in the previous case.
               However, when balancing an internal node each cell is associated with
               the page number of a child page of one of the sibling pages. The page 
               number associated with cells stored on a sibling page is the same as
               the page number stored as the first four bytes of the cell. The page
               number associated with a divider cell within the parent page is the page
               number of the right-child page of the sibling page to which the divider
               cell contains a pointer.
        </ol>

        <li> Determine the new cell distribution, using the following steps:
        <ol type="a">


          <li> Assign as may cells as will fit from the start of the ordered list of 
               cells to the left-most sibling page. Then, if any cells remain, assign
               one to be a divider cell, and as many as will fit to the next sibling 
	       page. Repeat until all cells have been assigned a location.
               <span class=todo> no divider cells for table b-tree leaf balance</span>

          <li> The previous step generates a distribution that is biased towards the
	       left-hand side. The right-most sibling may even be completely
	       empty (if the last cell in the ordered list was assigned to be a
               divider cell). To rectify this, cells are moved out of the second 
               right-most sibling page and into the right-most, one at a time, until
               there is at least one cell in the right-most sibling page and to move
               another cell would mean that the right-most sibling page is more full
               than the next to right-most sibling page. This is repeated for the next
               right-most pair of sibling pages, shifting cells out of the third 
               right-most sibling page and into the second right-most, and so on.
               <span class=todo> note about divider cells </span>
        </ol>

        <li> Determine the set of database pages to use as the new sibling pages. 

        <ol type="a">
	   <li> If there were an equal or greater number of siblings identified
                in step 1 than are required by the distribution calculated in step 3, 
                reuse as many as possible, starting with the left-most. If step 3
                calculated a distribution that requires more sibling pages than were
                identified in step 1, allocate the required extra pages using the
                <span class=todo>Refer to ???</span> algorithm.

	   <li> Arrange the new sibling pages from left to right in ascending
                page number order. The new sibling page with the smallest page number
                becomes the left-most sibling page, and so forth.
        </ol>

        <li> Populate the new sibling pages.
        <ol type="a">
	   <li> Populate each new sibling page with the required set of cells. If the
                page being balanced is not a leaf page, then the child-page pointer
                field of each cell is populated with the page-number associated with
                the cell as part of step 2 above. 

	   <li> If the page being balanced is not a leaf page, then the right-child 
                pointer stored in the page header of each new sibling page must also
                be populated. For each new sibling page except the right-most, this 
                field is set to the page number associated with the cell that 
                immediately follows the cells stored on the page (the cell that was
                assigned to be divider cell in step 3). For the right-most sibling page,
                the right-child pointer is set to the value that was stored in the
                right-child pointer of the right-most original sibling page identified
                in step 1.
        </ol>
        <li> Populate the parent page.


        <li> Populate pointer map entries.



      </ol>

  [h3 "Page Allocation and Deallocation"]

     <p class=todo>
       Amongst other things, this section needs to explain our old pals the
       DontWrite() and DontRollback() optimizations.