Documentation Source Text

Check-in [4650ddb4f7]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Remove the disused btreemodule.html document.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 4650ddb4f7d13ff5e9eff8a534c5af6f9bcc7812
User & Date: drh 2016-06-03 14:21:03
Context
2016-06-03
14:25
Added a change log for 3.14.0. Updates for SQLITE_OK_LOAD_PERMANENTLY. check-in: 2c9d634cc9 user: drh tags: trunk
14:21
Remove the disused btreemodule.html document. check-in: 4650ddb4f7 user: drh tags: trunk
2016-06-01
11:18
Fix typo in the description of the blessing. check-in: 8ff18d7976 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Deleted pages/btreemodule.in.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
1280
1281
1282
1283
1284
1285
1286
1287
1288
1289
1290
1291
1292
1293
1294
1295
1296
1297
1298
1299
1300
1301
1302
1303
1304
1305
1306
1307
1308
1309
1310
1311
1312
1313
1314
1315
1316
1317
1318
1319
1320
1321
1322
1323
1324
1325
1326
1327
1328
1329
1330
1331
1332
1333
1334
1335
1336
1337
1338
1339
1340
1341
1342
1343
1344
1345
1346
1347
1348
1349
1350
1351
1352
1353
1354
1355
1356
1357
1358
1359
1360
1361
1362
1363
1364
1365
1366
1367
1368
1369
1370
1371
1372
1373
1374
1375
1376
1377
1378
1379
1380
1381
1382
1383
1384
1385
1386
1387
1388
1389
1390
1391
1392
1393
1394
1395
1396
1397
1398
1399
1400
1401
1402
1403
1404
1405
1406
1407
1408
1409
1410
1411
1412
1413
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
1435
1436
1437
1438
1439
1440
1441
1442
1443
1444
1445
1446
1447
1448
1449
1450
1451
1452
1453
1454
<!-- title>SQLite B-Tree Module</title -->
<tcl>

hd_keywords {btree design}
source [file join $::DOC pages fancyformat.tcl]

foreach header {btree.h sqliteInt.h} {
  set fd [open [file join $SRC src $header]]
  set headers($header) [read $fd]
  close $fd
}

proc header_api_defn {header args} {
  foreach api $args {

    set re_head {\n[^\n]*}
    if { [string match struct* $api] } {
      set re_tail {[^\{]*\};\n}
    } elseif { [string match BTREE* $api] } {
      set re_tail {[^A-Za-z][^\n]*\n}
    } else {
      set re_tail {[^;A-Za-z][^;]*;}
    }

    set api_defn {}
    regexp $re_head$api$re_tail $::headers($header) api_defn
    lappend ret [string trim $api_defn]
  }
  return "<pre class=api>[join $ret "\n"]</pre>"
}

proc btree_api_defn {args} {
  eval header_api_defn btree.h $args
}
proc sqliteint_api_defn {args} {
  eval header_api_defn sqliteInt.h $args
}

fancyformat_document "SQLite B-Tree Module" {hlr50000.txt llr50000.txt} {

  [h1 "Document Overview"]

  [h2 "Scope and Purpose"]

  <p>
    This document provides a description of the functionality of and public 
    interface offered by the SQLite b-tree module. It also, to a certain extent,
    describes the algorithm and implementation techniques used internally by
    the module. 

  <ul>
    <li><p> To make it easier to maintain, test and improve this critical
            sub-system of the SQLite software library.

    <li><p> To facilitate development of compatible backend modules that can 
            be used with other SQLite sub-systems, either for experimental or 
            production purposes.
  </ul>

  <p>
    It is important to note that, even given the second bullet point above,
    the interfaces and sub-systems described in this document are not stable.
    They may be changed in any way with each new SQLite release. Any 
    external software development that uses these interfaces must be prepared
    to adapt to interface refactoring without notice.

  [h2 "Document and Requirements Organization"]

     <p class=todo>
       Change the following so that those requirements that describe the API
       are "low-level" requirements.

     [Table]
        [Tr] <th> Requirement ids <th> Contents
        [Tr] <td> H50**** <td> Requirement statements specifying the functionality required of the B-Tree module.
        [Tr] <td> H51**** <td> Requirement statements specifying the API provided by the B-Tree module.
        [Tr] <td> L****** <td> Requirement statements specifying some details of the internal workings of the B-Tree module.
      </table>

  [h2 "Glossary"]

    <table id=glossary>
      [Glossary "Balance-Siblings Algorithm" {
        The balance-siblings algorithm is one of four algorithms that may be
	used to redistribute data within a b-tree structure after an insert or
        delete operation that causes a b-tree node to become overfull or underfull.
        See section <cite>balance_siblings</cite> for details.
      }]
      [Glossary "B-Tree Cursor" {
        <span class=todo>Define this.
      }]
      [Glossary "B-Tree Database Connection" {
        A B-Tree database connection is a single client connection to an in-memory 
        page cache, through which a single temporary or persistent database may
        be accessed. This term is used throughout this document to avoid confusing
	such connections with SQL level SQLite client connections, which are
        sometime simply termed "database connections".
      }]
      [Glossary "Lazy-write cache" {
        <span class=todo>Define this.
      }]
      [Glossary "Overflow Cell" {
        <span class=todo>Define this.
      }]
      [Glossary "Page cache" {
        <span class=todo>Define this.
      }]
      [Glossary "Persistent database" {
        <span class=todo>Define this.
      }]
      [Glossary "Read-through cache" {
        <span class=todo>Define this.
      }]
      [Glossary "Shared-cache mode" {
        <span class=todo>Define this.
      }]
      [Glossary "SQLite Error Code" {
        <span class=todo>Define this.
      }]
      [Glossary "Temporary database" {
        <span class=todo>Define this.
      }]
    </table>
   

  [h1 "Module Requirements"]

  <p>
    The SQLite B-Tree module, the software module described by this document,
    is designed to query and modify a database stored using the database image
    format described in <cite>ref_file_format</cite>. Database images may
    exist only in volatile main-memory (in-memory databases), or may be stored 
    persistently within the file-system (also described in
    <cite>ref_file_format</cite>). Or, a database image may be stored primarily
    in main-memory with the file-system used as secondary storage if the
    database image grows too large. Database images stored only in main-memory,
    and those stored primarily in main-memory with the file-system used only to
    provide secondary storage space are known collectively as temporary
    databases. Database images stored persistently in the file-system are termed
    persistent databases.

  <p>
    This module implements an in-memory page cache to manage database image
    content. The size of the pages managed by the cache are the same as the
    page-size of the database image. When operating on a persistent database, 
    the cache operates as a read-through, lazy-write cache. When committing a 
    database transaction, the user explicitly directs the cache to flush all 
    dirty pages through to persistent storage. A single in-memory page cache 
    used to access the content of a persistent database may support multiple
    logical client connections. <span class=todo>Some brief explanation of what
    this means. And maybe a pointer to the "Multi-User Database Requirements"
    section.</span>

  <p>
    When operating on a temporary database, there may only be one client for
    each page cache. Depending on the SQLite configuration, either the database
    or journal file, or both, may be omitted from the system.

      [Figure btreemodule_overview.svg figure_overview "Role of Page Cache"]

  <p>
    Figure <cite>figure_overview</cite> depicts...

  <p><i>
    Roughly what is encapsulated by the module.
  </i>

    [h2 "Functional Requirements"]

    <p>
      This section contains requirements describing the functionality required 
      from the B-Tree module.

    <p class=todo>
      Figure out where integrity-check goes.

    [h3 "Opening and Closing Connections"]
  
    <p>
      The B-Tree module provides an interface to open new b-tree database connections.

      [fancyformat_import_requirement H50010]
      [fancyformat_import_requirement H50020]
      [fancyformat_import_requirement H50030]
      [fancyformat_import_requirement H50040]
      [fancyformat_import_requirement H50050]

    <p>
      The B-Tree module also provides an interface to close existing b-tree database 
      connections.

      [fancyformat_import_requirement H50060]
      [fancyformat_import_requirement H50070]

    [h3 "New Database Image Configuration"]

    <p>
      The following requirements describe database configuration options that
      are only applicable to new database images. For the purposes of the
      following requirements, a "new database image" is defined as one that is
      zero pages in size.

      [fancyformat_import_requirement H50080]
      [fancyformat_import_requirement H50090]

    [h3 "Transaction and Savepoint Functions" hlr_transactions]

     <p class=todo>
       This needs a lot of work...

    <p>
      All read and write operations performed on a database image via the
      B-Tree module interfaces occur within the context of a read or write
      transaction. <span class=todo>Something about the ACID nature of
      transactions and how this applies to read and write transactions</span>)

      [fancyformat_import_requirement H50100]
      [fancyformat_import_requirement H50101]

    <p>
      Read/write:

      [fancyformat_import_requirement H50102]
      [fancyformat_import_requirement H50103]
      [fancyformat_import_requirement H50104]

    <p class=todo>
      Multi-file transaction support.

    <p>
      Transaction state query:
      [fancyformat_import_requirement H50108]

    <p>
      Savepoints:

    <p class=todo>
      Define "savepoint transactions" and fix the following requirements.

      [fancyformat_import_requirement H50105]
      [fancyformat_import_requirement H50106]
      [fancyformat_import_requirement H50107]

    [h3 "Reading From the Database Image" hlr_reading_data]

    <p>
      The B-Tree module allows the user to read a subset of the fields from the
      database image header. Each such field is stored in the header as a 4-byte 
      unsigned big-endian integer. A complete description of each field and its 
      interpretation may be found in <cite>ref_file_format</cite>. 

      [fancyformat_import_requirement H50109]

    <p>
      In other words, the database image header fields that may be read via
      this module are:

    <ul>
      <li> The number of free pages in the database image,
      <li> The database image schema version (schema cookie).
      <li> The database image schema layer file-format.
      <li> The default page-cache size.
      <li> The "auto-vacuum last root-page" field.
      <li> The database image text-encoding field.
      <li> The database image user-cookie value.
      <li> The database image incremental-vacuum flag.
    </ul>

    <p>
      With the exception of the database image header fields described above,
      all data is read from the database image using B-Tree cursors. A B-Tree
      cursor is a control structure for traversing the contents of a single
      table or index b-tree structure within a database image. As well as
      "forward" and "back" operations, a B-Tree cursor supports fast seeking to
      a table entry identified by key value, or to the first or last entry in
      the table.

    <p>
      When a B-Tree cursor is created, the specific table or index b-tree that
      it is used to traverse is identified by the database image page number 
      of its root page. Since the root-page of the schema table is always page
      1, and the contents of the schema table includes the root page numbers of all
      other index and table b-tree structures in the database image, it is
      possible for the application to determine the set of valid root-page
      numbers by first traversing the schema table.

      [fancyformat_import_requirement H50110]
      [fancyformat_import_requirement H50111]
      [fancyformat_import_requirement H50112]
      [fancyformat_import_requirement H50113]
      [fancyformat_import_requirement H50114]
      [fancyformat_import_requirement H50115]
      [fancyformat_import_requirement H50116]
      [fancyformat_import_requirement H50117]
      [fancyformat_import_requirement H50118]

    <p>
      As well as traversing a b-tree structure using the operations enumerated
      by the above requirements, it is also possible to use a cursor to search
      a b-tree structure for a specified key value. If the key value can be
      found, the cursor is left pointing at the entry with the specified key
      value. Otherwise, the cursor is left pointing at either the entry with the 
      largest key that is smaller than the specified key, or to the entry with 
      the smallest key that is larger than the specified key. For table b-tree
      structures, where the key values are 64-bit integers, the definition of
      smaller, larger and equal to is straightforward. For index b-tree
      structures, where the key values are database records, the manner in
      which key values must be compared is more complicated. Refer to
      <cite>ref_file_format</cite> for a full explanation.

    <p class=todo>
      There is a specific section in <cite>ref_file_format</cite> devoted to
      record sort order in index b-tree structures. There needs to be some way to
      point to it. Or, better, to the requirement or range of requirements.

    <p class=todo>
      Maybe a system that automatically links text like H30100 to the
      corresponding requirement. Within a document if it can find it, or a
      summary page (hlreq.html for example).


      [fancyformat_import_requirement H50119]
      [fancyformat_import_requirement H50120]
      [fancyformat_import_requirement H50121]

    <p class=todo>
      Does it depend on the structure of the tree whether the cursor is left
      pointing to a smaller or larger entry after a failed search? Or is it
      possible to determine which it will be based only on the set of keys
      stored in the tree?

    <p>
      As well as the standard search operation described by the above
      requirements, cursors open on index b-tree structures are required to
      support several variants, as follows:

    <ul>
      <li> <b>Ignore rowid search mode</b>. The final value in a database
	   record used as an index-btree key is always an integer "rowid"
	   field. A search in this mode proceeds as if each key in the b-tree
           was missing this field.

      <li> <b>Increment key mode</b>.
      <li> <b>Prefix match mode</b>.
      <li> <b>Prefix search mode</b>.
    </ul>

    <p class=todo>
      Finish the bullet points above and add HLR for each search mode.

    <p>
      More than one cursor can be open on a single b-tree structure at one time.
      It is also possible for a write-cursor to modify the contents of a b-tree
      structure while other cursors are open on it. The b-tree module does not
      include any type of row-locking mechanism. It is possible for a write-cursor
      to be used to delete an entry from a b-tree structure even if there are
      one or more other cursors currently pointing to the entry being deleted.

    <p class=todo>
      Requirements to do with how the above is handled. Traceability to 
      sqlite3BtreeCursorHasMoved is required.

    [h3 "Writing to the Database Image"]

    <p>
      The B-Tree module allows the user to write values to a subset of the
      fields from the database image header. The set of writable fields is
      the same as the set of fields enumerated in section
      <cite>hlr_reading_data</cite> that the B-Tree module is required to
      provide read access to by requirement H50109.

      [fancyformat_import_requirement H50122]

    <p>
      The B-Tree module also supports operations to create new b-tree 
      structures within the database image. Existing b-tree structures may be
      deleted from the database image entirely, or their entire contents may be
      deleted, leaving an empty b-tree structure.

      [fancyformat_import_requirement H50123]
      [fancyformat_import_requirement H50124]
      [fancyformat_import_requirement H50125]

    <p>
      As one would expect, the B-Tree module also provides an interface to
      insert and delete entries from b-tree structures. These operations are
      performed using a B-Tree write cursor, a special type of B-Tree cursor
      (see section <cite>hlr_reading_data</cite>).

      [fancyformat_import_requirement H50126]
      [fancyformat_import_requirement H50127]
      [fancyformat_import_requirement H50128]

    <p class=todo>
      Incremental vacuum step.

    [h3 "Page-Cache Configuration Requirements"]

    <p>
      A page-cache has a number of operational parameters that may be configured
      at run-time via an open b-tree database connection. Note that even though the
      interfaces provided by this module allow these parameters to be set via a
      b-tree database connection, they are properties of the page-cache, not
      the b-tree database connection. In situations where more than one b-tree
      database connection is connected to a single page-cache, writes made via
      one b-tree database connection may overwrite the values set by another.
      The following table summarizes the available configuration parameters.

    [Table]
      [Tr] <th>Parameter <th>Description  <th>Requirements
      [Tr] <td>Locking-mode 
           <td><span class=todo>This!</span> 
           <td>H50138, H50139, H50140
      [Tr] <td>Journal-mode 
           <td><span class=todo>This!</span> 
           <td>H50141, H50142, H50143, H50144, H50145, H50146
      [Tr] <td>Journal-file size limit
	   <td>The journal-file size limit parameter may be set to any integer
	       value within the range of a 64-bit signed integer. Any negative
	       values is interpreted as "no limit". Otherwise, if the
	       journal-file size limit is set to zero or a positive number, it
	       represents an upper limit on the size of the journal file in
	       bytes. If the application executes a database write operation that
	       would normally cause the journal file to grow larger than this
	       configured limit, the operation fails and an error is returned
	       to the user. The default value of this parameter is -1 (no
               limit).
           <td>H50147, H50148, H50149
      [Tr] <td style="white-space:nowrap">Database-file size limit
	   <td>The database-image size limit parameter may be set to any integer
	       value greater than zero within the range of a 32-bit signed
               integer. The configured value represents an upper limit on the size of
	       the database image in pages. If the application executes a
               database write operation that would normally cause the database image to
	       grow larger than this configured limit, the operation fails and
               an error is returned to the user.
           <td>H50150, H50151, H50152
      [Tr] <td>Cache size
	   <td>The cache-size parameter may be set to any integer value. How it
               affects operation depends on the specific P-Cache implementation used
	       by the page-cache. <span class=todo>Refer to details for the
               behaviour of the built-in default P-Cache.</span>
           <td>H50153
      [Tr] <td>Safety level
	   <td>The safety-level parameter may be set to "none", "normal" or "full".
               <span class=todo> Where will the effect of this defined/required?</span>
           <td>H50154, H50155
    </table>

      [fancyformat_import_requirement H50138]
      [fancyformat_import_requirement H50139]
      [fancyformat_import_requirement H50140]

    <p class=todo>
      And if a read/write transaction is downgraded to a read-only transaction?
      This scenario should also be dealt with in section <cite>hlr_transactions</cite>.

      [fancyformat_import_requirement H50141]
      [fancyformat_import_requirement H50142]
      [fancyformat_import_requirement H50143]
      [fancyformat_import_requirement H50144]
      [fancyformat_import_requirement H50145]
      [fancyformat_import_requirement H50146]

    <p class=todo>
      The difference in functionality provided by "off", "memory" and the 3
      modes that use a real journal file should also feature in
      <cite>hlr_transactions</cite>.

      [fancyformat_import_requirement H50147]
      [fancyformat_import_requirement H50148]
      [fancyformat_import_requirement H50149]

      [fancyformat_import_requirement H50150]
      [fancyformat_import_requirement H50151]
      [fancyformat_import_requirement H50152]

      [fancyformat_import_requirement H50153]

    <p>
      See section <cite>hlr_memory</cite> for a description of and requirements
      specifying how the value of the cache-size parameter affects the
      operation of a page-cache. <span class=todo>Check this reference is
      relevant after it is written. Refer to a specific requirement if possible
      too.</span>

      [fancyformat_import_requirement H50154]
      [fancyformat_import_requirement H50155]

    <p class=todo>
      Description of what the safety-level actually does. Or pointer to where a
      description and requirements can be found (transactions section?).

    <p class=todo>
      Interface to set the codec function (encryption).

    <p class=todo>
      The busy-handler. Where exactly does this come in? Transactions and
      savepoints section?

    <p>
      The six page-cache operational parameters listed above may also be
      queried. The following requirements specify the required query
      interfaces.

      [fancyformat_import_requirement H50132]
      [fancyformat_import_requirement H50133]
      [fancyformat_import_requirement H50134]
      [fancyformat_import_requirement H50135]
      [fancyformat_import_requirement H50136]
      [fancyformat_import_requirement H50137]

    <p>
      It is also possible to interrogate a b-tree database handle to determine
      if it was opened on a temporary or persistent database. An b-tree
      database handle opened on a persistent database may be queried for the
      name of (full-path to) either the database or journal file associated
      with the open database.

      [fancyformat_import_requirement H50131]
      [fancyformat_import_requirement H50129]
      [fancyformat_import_requirement H50130]

    [h3 "Multi-User Database Requirements"]

      [fancyformat_import_requirement H50156]

  <ul>
    <li> Lock on schema memory object.
    <li> Locks on b-tree tables.
    <li> "Unlock notify" feature.
    <li> Mutexes/thread-safety features.
  </ul>

    <p class=todo>
      The b-tree module preventing deadlock (by always grabbing mutexes in
      order of BtShared pointer) should be required here.

    [h3 "Backup/Vacuum API Requirements"]
  <ul>
    <li> Callbacks for backup module.
    <li> Page read/write APIs for backup module.
  </ul>

    [h3 "Integrity Check Requirements"]
  <ul>
    <li> Callbacks for backup module.
    <li> Page read/write APIs for backup module.
  </ul>

  [h2 "Other Requirements and Constraints"]

    [h3 "Caching and Memory Management Requirements" hlr_memory]
  <ul>
    <li> Memory allocation related features (pcache, scratch memory, other...).
    <li> Default pcache implementation (sqlite3_release_memory()).
    <li> Schema memory object allocation (destructor registration).
  </ul>

    [h3 "Exception Handling Requirements"]

  <p>
    System failure. Do not corrupt the database image.
  <p>
    Three kinds of exception:
  <ul>
    <li> IO Error.
    <li> Malloc request failure.
    <li> Database image corruption.
  </ul>

    [h3 "Well-Formedness Requirements"]
  <ul>
    <li> Identify the subset of file-format well-formedness requirements that
         this module is responsible for implementing.
    <li> Define how the module should respond to corrupt database files: don't
         crash, return SQLITE_CORRUPT as early as is practical. Should it also
         put the b-tree into a permanent error state?
  </ul>
 
 


  [h1 "Module API"]

      <p class=todo>
        Description of the interface in btree.h. Also other interfaces accessed by
        external modules. Including release_memory() and those pager interfaces that
        are accessed directly by other modules. All of these requirements will be
        descended/derived from requirements in the previous sections. Some of the
        text could/should be pulled in from btree.h.

      <p class=todo>
	  The name of sqlite3BtreeBeginStmt() should probably change to
	  sqlite3BtreeOpenSavepoint(). Matches the pager layer and is a more
	  accurate description of the function.

      <p class=todo>
        There are only a few places in which the pager object is used directly,
        always to call some trivial get/set configuration function. These should
	  be replaced somehow with sqlite3BtreeXXX() APIs. Also, the current
	  approach is probably Ok, but worth looking it over for thread-safety
        issues.

      <p class=todo>
	  It would be easier to write up if the dependency between the B-Tree
        layer and the sqlite3 structure did not exist. At present, it is used for:
        
          <br> * The unlock-notify feature (arguments to sqlite3ConnectionBlocked() are database handles),
          <br> * Accessing the SQLITE_ReadUncommitted flag,
          <br> * Invoking the busy-handler callback,
          <br> * During sqlite3BtreeOpen(), to find the VFS to use,
          <br> * Accessing the SQLITE_SharedCache flag (for setting it),
          <br> * To check the same B-Tree is not attached more than once in shared-cache mode,
          <br> * To link the B-Tree into the pointer-order list of shared-cache b-trees used by the same handle (used for mutexes).
          <br> * To determine if an in-memory sub-journal should be used.
          <br> * To know how many savepoints are open in BtreeBeginTrans().
          <br> * Many, many times to assert() that the db mutex is held when the b-tree layer is accessed..


    [h2 "Opening and Closing Connections"]

    <p>
      This section describes the API offered by the B-Tree module to other
      SQLite sub-systems to open and close B-Tree database connections.

      [btree_api_defn Btree]

    <p>
      A B-Tree database connection is accessed by other SQLite sub-systems
      using an opaque handle, modelled in C code using the type "Btree*".

    [h3 sqlite3BtreeOpen]

      [btree_api_defn sqlite3BtreeOpen] 

      [fancyformat_import_requirement H51001]
      [fancyformat_import_requirement H51002]

      [fancyformat_import_requirement H51003]
      [fancyformat_import_requirement H51004]

    <p>
      The combination of the above two requirements implies that if the
      zFilename argument passed to sqlite3BtreeOpen is other than a NULL
      pointer or a pointer to a nul-terminated string, the type of or
      filename of the database that sqlite3BtreeOpen attempts to open a
      connection to are undefined.

    <p>
      Valid values for the <i>flags</i> argument to the sqlite3BtreeOpen
      function consist of the bitwise OR of zero or more of the following
      symbols.

      [btree_api_defn BTREE_OMIT_JOURNAL BTREE_NO_READLOCK]

      [fancyformat_import_requirement H51005]

    <p>
      When opening a connection to a persistent database, the value of the
      BTREE_OMIT_JOURNAL bit in the flags parameter is ignored by
      sqlite3BtreeOpen.

      [fancyformat_import_requirement H51006]

    <p>
      When opening a connection to a temporary database, the value of the
      BTREE_NO_READLOCK bit in the flags parameter is ignored, as temporary
      databases are never locked for either reading or writing 
      (<span class=todo>reference to some requirement for this statement.</span>).
      Whether or not a new page-cache is created when a connection to a
      persistent database is opened is governed by requirements H50040 and
      H50050.

      [fancyformat_import_requirement H51007]
      [fancyformat_import_requirement H51008]

    <p class=todo>
      Requirements explaining how the db parameter to sqlite3BtreeOpen is used. Must be there for something.

    [h3 sqlite3BtreeClose]

      [btree_api_defn sqlite3BtreeClose]

      [fancyformat_import_requirement H51009]

    <p>
      If a call to sqlite3BtreeClose is made with a value that is not a valid
      b-tree database connection handle passed as the only argument, the
      results are undefined.

      [fancyformat_import_requirement H51010]

    <p>
      See also requirement H50070.


    [h2 "Database Image Configuration"]

      <p class=todo>
	This category doesn't work all that well. These APIs are used for other
        things too (i.e. switching to incremental-vacuum mode).

      [btree_api_defn BTREE_AUTOVACUUM_NONE BTREE_AUTOVACUUM_FULL BTREE_AUTOVACUUM_INCR]
      [btree_api_defn sqlite3BtreeSetAutoVacuum sqlite3BtreeSetPageSize]

    <p>
      Queries:

      [btree_api_defn sqlite3BtreeGetPageSize]
      [btree_api_defn sqlite3BtreeGetReserve]
      [btree_api_defn sqlite3BtreeGetAutoVacuum]

    [h2 "Connection Configuration"]

      [btree_api_defn sqlite3BtreeSetCacheSize sqlite3BtreeSetSafetyLevel sqlite3BtreeMaxPageCount ]

    <p>
      And functions to query the current configuration:

      [btree_api_defn sqlite3BtreeSyncDisabled]

    [h2 "Query Interfaces"]

      [btree_api_defn sqlite3BtreeGetFilename]

      [fancyformat_import_requirement H51011]
      [fancyformat_import_requirement H51012]

      [btree_api_defn sqlite3BtreeGetJournalname]

      [fancyformat_import_requirement H51013]
      [fancyformat_import_requirement H51014]

    <p>
      Requirement H51013 holds true even if the B-Tree database connection is
      configured to use an in-memory journal file or no journal file at all
      (<span class=todo>ref requirements</span>). In these cases the buffer returned
      contains the full-path of the journal file that would be used if the
      connection were configured to use a journal file.

    [h2 "Mutex Functions"]

      [btree_api_defn {typedef struct BtreeMutexArray} {struct BtreeMutexArray \{}]

      [btree_api_defn sqlite3BtreeEnter sqlite3BtreeEnterAll sqlite3BtreeLeave sqlite3BtreeEnterCursor sqlite3BtreeLeaveCursor sqlite3BtreeLeaveAll sqlite3BtreeMutexArrayEnter sqlite3BtreeMutexArrayLeave sqlite3BtreeMutexArrayInsert]

    [h2 "Transaction and Savepoint API"]

      [btree_api_defn sqlite3BtreeBeginTrans sqlite3BtreeCommitPhaseOne sqlite3BtreeCommitPhaseTwo sqlite3BtreeCommit sqlite3BtreeRollback]

      [btree_api_defn sqlite3BtreeBeginStmt sqlite3BtreeSavepoint]

      [btree_api_defn sqlite3BtreeIsInTrans sqlite3BtreeIsInReadTrans sqlite3BtreeIsInBackup]


    [h2 "Reading and Traversing the Database Image"]

      <p class=todo>
	sqlite3BtreeMoveto is never called from outside of the b-tree layer. It
        could/should be removed from the API.

      <p class=todo>
	The "bias" argument to sqlite3BtreeMovetoUnpacked is only ever true
        when it is called from within sqlite3BtreeInsert. This argument could/should 
        also be removed from the API, if only to make it simpler to describe.

      [btree_api_defn BtCursor]

      [btree_api_defn sqlite3BtreeCursor sqlite3BtreeCursorSize]
      [btree_api_defn sqlite3BtreeCloseCursor sqlite3BtreeClearCursor]

      [btree_api_defn sqlite3BtreeFirst sqlite3BtreeLast sqlite3BtreeNext \
                      sqlite3BtreePrevious sqlite3BtreeEof]

      [btree_api_defn sqlite3BtreeKeySize sqlite3BtreeKey sqlite3BtreeKeyFetch \
                      sqlite3BtreeDataFetch sqlite3BtreeDataSize sqlite3BtreeData]

      [btree_api_defn sqlite3BtreeCount]

    [h3 sqlite3BtreeMovetoUnpacked]

      <p>
        The sqlite3BtreeMovetoUnpacked function is used to 

      [btree_api_defn sqlite3BtreeMovetoUnpacked]

      <p>
        The following requirements specify exactly how a b-tree cursor is to be moved
        by a successful call to sqlite3BtreeMovetoUnpacked.

      [fancyformat_import_requirement L50008]
      [fancyformat_import_requirement L50009]
      [fancyformat_import_requirement L50010]
      [fancyformat_import_requirement L50011]


      <p class=todo>
	Not clear how to deal with these. Define an external module to
	encapsulate these and define sorting order etc.? That's tricky as things are
	because the UnpackedRecord.flags field defines the "search mode" used
        by sqlite3BtreeMovetoUnpacked.

      [sqliteint_api_defn {typedef struct KeyInfo} {struct KeyInfo \{}]
      [sqliteint_api_defn {typedef struct UnpackedRecord} {struct UnpackedRecord \{}]

    [h3 sqlite3BtreeGetMeta]

      <p>
	The sqlite3BtreeGetMeta interface may be used to retrieve the current
        value of certain fields from the database image header.

      [btree_api_defn sqlite3BtreeGetMeta]

      [fancyformat_import_requirement H51015]
      [fancyformat_import_requirement H51016]

      <p>
        The two requirements above imply that if sqlite3BtreeGetMeta is called with
        anything other than a b-tree database connection handle with an open read-only
        or read-write transaction as the first argument, or with anything other than
        an integer between 0 and 7 (inclusive) as the second, the results are undefined.

      [btree_api_defn BTREE_FREE_PAGE_COUNT BTREE_SCHEMA_VERSION BTREE_FILE_FORMAT \
                      BTREE_DEFAULT_CACHE_SIZE BTREE_LARGEST_ROOT_PAGE BTREE_TEXT_ENCODING \
                      BTREE_USER_VERSION BTREE_INCR_VACUUM]

      [fancyformat_import_requirement H51017]




    [h2 "Modifying the Database Image"]

      [h3 sqlite3BtreeCreateTable sqlite3BtreeCreateTable]
      [btree_api_defn sqlite3BtreeCreateTable]
      [btree_api_defn BTREE_INTKEY BTREE_ZERODATA BTREE_LEAFDATA]

      [h3 sqlite3BtreeDropTable sqlite3BtreeDropTable]
      [btree_api_defn sqlite3BtreeDropTable ]

      [h3 sqlite3BtreeClearTable sqlite3BtreeClearTable]
      [btree_api_defn sqlite3BtreeClearTable]

      [h3 sqlite3BtreeCursorHasMoved sqlite3BtreeCursorHasMoved]
      [btree_api_defn sqlite3BtreeCursorHasMoved]

      [h3 sqlite3BtreePutData  sqlite3BtreePutData]
      [btree_api_defn sqlite3BtreePutData]

      [h3 sqlite3BtreeUpdateMeta sqlite3BtreeUpdateMeta]
        [btree_api_defn sqlite3BtreeUpdateMeta]

      [h3 sqlite3BtreeDelete sqlite3BtreeDelete]

        [btree_api_defn sqlite3BtreeDelete]

        [fancyformat_import_requirement L50013]

      <p class=todo>
        Effect of a delete operation on other cursors that are pointing to the
        deleted b-tree entry.

      <p class=todo>
        Malloc and IO error handling. Same as for sqlite3BtreeInsert.

      [h3 sqlite3BtreeInsert]

        [btree_api_defn sqlite3BtreeInsert]

        [fancyformat_import_requirement L50001]

      <p>
        The requirement above implies that the results of passing anything else as 
        the first argument to sqlite3BtreeInsert, for example a read-only b-tree cursor,
        are undefined.

        [fancyformat_import_requirement L50012]

      <p>
	In other words, the sqlite3BtreeInsert API could easily be renamed
        sqlite3BtreeInsertOrReplace. <span class=todo>We will probably need a module
        requirement for the "replace" operation.</span>

        [fancyformat_import_requirement L50002]
        [fancyformat_import_requirement L50003]
        [fancyformat_import_requirement L50004]

      <p>
        The following requirements describe the seventh and eighth paramaters passed
        to the sqlite3BtreeInsert function. Both of these are used to provide extra
        information used by sqlite3BtreeInsert to optimize the insert operation. They
        may be safely ignored by alternative b-tree implementations.

      <p class=todo>
        There should be some rationalization for these, eventually. Some traceability
        from somewhere to show how the b-tree module offering these slightly esoteric
        interfaces is helpful to SQLite overall.

        [fancyformat_import_requirement L50005]
        [fancyformat_import_requirement L50006]

      <p>
        If a non-zero value is passed as the eighth parameter to sqlite3BtreeInsert 
        and the b-tree cursor has not been positioned as assumed by L50006, the
        results are undefined.

      <p class=todo>
        Malloc and IO error handling. Maybe these should be grouped together
        for a whole bunch of APIs. And hook into the above via a definition of
        "successful call".

      [h3 sqlite3BtreeIncrVacuum sqlite3BtreeIncrVacuum]
      [btree_api_defn sqlite3BtreeIncrVacuum]

    [h2 "Advisory B-Tree Locks"] 

      <p>
	This section describes the b-tree module interfaces used for acquiring
	and querying the advisory locks that can be placed on database image
	pages. The locking mechanisms described in this section are only used
	to arbitrate between multiple clients of the same in-memory page-cache.
	The locking mechanism used to control access to a file-system
	representation of the database when multiple in-memory page caches
	(possibly located in different OS processes) are open on it is
        described in <span class=todo>this</span>.

      <p>
        As well as obtaining advisory locks explicitly using the 
        sqlite3BtreeLockTable API (see below), a read-lock on page 1 of the
        database image is automatically obtained whenever a b-tree database 
	connection opens a read-only or read-write transaction (see 
        <span class=todo>requirement number</span>). Note that this means
        that a write-lock on page 1 is effectively an exclusive lock on
	the entire page-cache, as it prevents any other connection from opening
        a transaction of any kind.

      [h3 sqlite3BtreeLockTable]

      [btree_api_defn sqlite3BtreeLockTable]

      <p>
        The sqlite3BtreeLockTable API allows database clients to place 
        advisory read or write locks on a specified page of the database 
        image. The specified page need not exist within the database image.
        By convention, SQLite acquires read and write locks on the root
        pages of table b-trees only, but this is not required to be enforced
        by the b-tree module. Locks may only be obtained when a database
        client has an open transaction. All locks are automatically released
        when the open transaction is concluded.

        [fancyformat_import_requirement L50016]
        [fancyformat_import_requirement L50017]

      <p>
        The two requirements above imply that the results of calling 
        sqlite3BtreeLockTable on a b-tree database connection handle that does
        not currently have an open transaction, or attempting to obtain
        a write-lock using a b-tree database connection handle that only has
        a read-only transaction open are undefined.


        [fancyformat_import_requirement L50019]
        [fancyformat_import_requirement L50020]

      <p>
        Requirement L50020 is overly conservative. Because a write-lock may 
        only be requested if the b-tree database connection has an open read-write 
	transaction (L50017), and at most a single b-tree database connection
        may have such an open transaction at one time, it is not possible for
        a request for a write-lock to fail because another connection is holding
        a write-lock on the same b-tree database image page. It may, however,
        fail because another connection is holding a read-lock.

      <p>
        All locks are held until the current transaction is concluded. Or, if
        a read-write transaction is downgraded to a read-only transaction, all
        currently held write-locks are changed to read-locks.

        [fancyformat_import_requirement L50018]
        [fancyformat_import_requirement L50021]

      <p class=todo> The requirement above should refer to some other
	requirement that defines when a read-write transaction is downgraded to
        a read-only transaction.

      <p class=todo> Malloc failure?

      <p class=todo> Read uncommitted flag. Maybe this should be handled
        outside of the b-tree module. Is there anything to stop connections
        with this flag set simply not obtaining read locks? There are assert()
        statements in the b-tree module that need to take this flag into account,
        but not actual functionality.

      [h3 sqlite3BtreeSchemaLocked]
      [btree_api_defn sqlite3BtreeSchemaLocked]

        [fancyformat_import_requirement L50014]
        [fancyformat_import_requirement L50015]

    [h2 "What do these do?"]


      <p class=todo>
        Where do the following go?

      [btree_api_defn sqlite3BtreeIntegrityCheck sqlite3BtreePager sqlite3BtreeCopyFile]

      [btree_api_defn sqlite3BtreeSchema sqlite3BtreeSchemaLocked sqlite3BtreeLockTable sqlite3BtreeTripAllCursors]

      <p class=todo>
        I know what the following do, but is this mechanism ever used? Or has it been superceded by other tricks in OP_NewRowid?

      [btree_api_defn sqlite3BtreeSetCachedRowid sqlite3BtreeGetCachedRowid]

      <p class=todo>
        Should move to btreeInt.h

      [btree_api_defn BtShared]

    [h2 "APIs not branded sqlite3BtreeXXX()"]

<ul>
<li> sqlite3PagerLockingMode
<li> sqlite3PagerJournalMode
<li> sqlite3PagerIsMemdb (vacuum and backup).
<li> sqlite3PagerJournalSizeLimit
<li> sqlite3PagerFile (used by sqlite3_file_control() and pragma lock_proxy_file).
<li> sqlite3PagerPagecount (pragma page_count and backup).
<li> Page APIs used by backup routines:
  <ul>
    <li> sqlite3PagerGet
    <li> sqlite3PagerWrite
    <li> sqlite3PagerGetData
    <li> sqlite3PagerGetExtra
    <li> sqlite3PagerUnref
    <li> sqlite3PagerTruncateImage
    <li> sqlite3PagerSync
    <li> sqlite3PagerFile
    <li> sqlite3PagerCommitPhaseOne, sqlite3PagerCommitPhaseTwo
    <li> sqlite3PagerBackupPtr
  </ul>
</ul>


  [h1 "Module Implementation"]

  [h2 "Database Image Traversal"] 

  [h2 "Database Image Manipulation"]

     <p class=todo>
       This section should describe exactly how bits and bytes are shifted
       around when the database image is traversed and modified. i.e. how the b-tree 
       balancing works, deleting an internal cell from an index b-tree etc.

    [h3 "Creating a B-Tree Structure"]
    [h3 "Clearing a B-Tree Structure"]
    [h3 "Deleting a B-Tree Structure"]

    [h3 "Inserting, Replacing and Deleting B-Tree Entries"]

      <p>
        The following two sections describe the way entries are added and removed
        from B-Tree structures within a database image. 

      <p>
        As one might expect, the algorithms described in the following sections
        involve adding and removing b-tree cells to and from b-tree node pages.
        The format of b-tree node pages is described in detail in 
        <cite>ref_file_format</cite>. This document does not describe the exact
        way in which content is manipulated within a page, as these details are
        considered not considered high-level enough to be documented outside of
        the SQLite source code itself. For the purposes of the descriptions in
        the following sections, a b-tree node page is considered to be a container
        for an ordered list of b-tree cells. Cells may be inserted into or removed
        from any position in the ordered list as required.

      <p>
	A b-tree node page has a finite capacity. If one of the algorithms
	described here is required to insert a cell into a b-tree node page,
	and there is not enough free space within the page to accommodate the
	cell, it is still nominally inserted into the requested position within
        the node, but becomes an overflow cell. Overflow cells never remain so
        for very long. If an insert, replace or delete entry operation creates
        one or more overflow cells, the b-tree structure is rearranged so that
        all cells are stored within the body of a b-tree node page before the
        operation is considered complete. This process of rearranging the b-tree
        structure is termed b-tree balancing, and is described in section 
        <cite>btree_balancing_algorithm</cite>.
        

    [h4 "B-Tree Insert/Replace Entry"]

      <p>
        This section describes the way in which new entries may be inserted 
        into a b-tree structure, and how existing entries may be replaced. Both
        of these operations are accessed using the sqlite3BtreeInsert API.

      <p>
        An insert/replace operation involves the following steps:

      <ol>
        <li> Based on the supplied key and value, and the type of b-tree being
             inserted into, allocate and populate any required overflow pages.
             <span class=todo>Should reference file-format requirements that
             provide the formula for doing this.</span>

        <li> Attempt to move the b-tree write cursor to an entry with a key
             that matches the new key being inserted. If a matching entry is 
             found, then the operation is a replace. Otherwise, if the key is
             not found, an insert.

        <ol type="a">
          <li> Requirements L50008, L50009, L50010 and L50011 apply to the cursor
               seek operation here. This ensures that if the search does not find
	       an exact match, the cursor is left pointing to the leaf page that 
               the new entry should be added into.

          <li> As specified by L50006, the cursor may already be positioned. In 
               this case the seek operation is not required.
        </ol>

        <li> If a matching key was found in the b-tree, then it must be removed and
             the new entry added in its place.

        <ol type="a">
          <li> If there are one or more overflow pages associated with the entry
               being replaced, they are moved to the free-list.
          <li> The cell corresponding to the entry being removed is removed from
               the b-tree node page.
          <li> The new cell is inserted in the position previously occupied by the
               cell removed in the previous step. If the page is not a leaf page,
	       then the first four-bytes (the child-page pointer) of the old
               cell are copied to the first four bytes of the new cell. If the new
               cell is larger than the cell that it replaced, then it may become
               an overflow cell.
        </ol>

        <li> If no matching key was found in the b-tree, then the new cell is inserted
             into the leaf page that the cursor was left pointing to by step 1. The
             new cell may become an overflow cell.

        <li> If the new cell is now an overflow cell, then the balancing algorithm 
             (see section <cite>btree_balancing_algorithm</cite>) is run on the 
             overflowing b-tree node page.
      </ol>

    [h4 "B-Tree Delete Entry"]

      <p>
        This section describes the way in which entries may be removed from
	a b-tree structure, as required when the sqlite3BtreeDelete (section
        <cite>sqlite3BtreeDelete</cite>) API is invoked. Removing an entry
        from a b-tree table involves the following steps:

      <ol>
        <li> All overflow pages in the overflow page chain (if any) associated
             with the entry must be moved to the database free-list. If the
             database image is an autovacuum database, the pointer-map entries
             that correspond to each overflow page in the chain must be updated.
             
        <li> The b-tree cell corresponding to the entry must be removed from
             the b-tree structure.
      </ol>

      <p class=todo>
	Note about the optimization that makes it possible to move overflow pages
        to the free-list without reading their contents (i.e. without loading them
        into the cache).

      <p>
        If the b-tree entry being removed is located on a leaf page (as is always the
        case with table b-tree structures), then deleting an entry from a b-tree
        is quite simple.

      [Figure btreemodule_delete1.svg figure_delete1 "Delete from an Internal Node"]

    [h3 "B-Tree Balancing Algorithm" btree_balancing_algorithm]

     <ul>
       <li><p>The <b>balance deeper</b> sub-algorithm is used when the root page of
           a b-tree is overfull. It creates a new page and copies the
           entire contents of the overfull root page to it. The root page
           is then zeroed and the new page installed as its only child.
           The balancing algorithm is then run on the new child page (in case
           it is overfull).

       <li><p>The <b>balance shallower</b> sub-algorithm is used when the root page 
           of a b-tree has only a single child page. If possible, the data from
           the child page is copied into the root-page and the child page discarded.

       <li><p>The <b>balance quick</b> sub-algorithm is used in a very specific,
           but common scenario. It is used only for table b-trees, when a new entry that
           has a key value greater than all existing keys in the b-tree is inserted and
           causes the right-most leaf page of the b-tree structure to become overfull.

       <li><p>The <b>balance siblings</b> sub-algorithm is run when a b-tree page that
           is not the root-page of its b-tree structure is either overfull or underfull.




     </ul>

    [h4 "Balance Deeper"]
      <ol>
        <li> Allocate a new page (the child-page).
        <li> Copy page data from root-page to child-page (including overflow cells).
        <li> Fix pointer map entries associated with new child-page content.
        <li> Zero the root-page.
        <li> Set the right-child pointer of the root-page to point to the new child-page.
        <li> Set the pointer map entry for the new child page.
        <li> Execute the balance procedure on the new child page.
      </ol>

      [Figure btreemodule_balance_deeper.svg figure_balance_deeper "Example Balance Deeper Transform"]

    [h4 "Balance Shallower"]

      <ol>
        <li> Copy node data from child-page to root-page.
        <li> Fix pointer map entries associated with new root-page content.
        <li> Move child-page to database image free-list.
      </ol>

      [Figure btreemodule_balance_shallower.svg figure_balance_shallower "Example Balance Shallower Transform"]

    [h4 "Balance Quick"]

      <ol>
        <li> Allocate a new page (the new sibling-page).

        <li> Populate the new sibling page with the new b-tree entry.

	<li> Add a new divider cell to the parent. The divider cell contains a
             pointer to the page that is currently the right-child of the parent.
             The key in the new divider cell is a copy of the largest key in the
             page that is currently the right-child of the parent.

        <li> Set the right-child of the parent page to point to the new sibling page.

	<li> If the database is an auto-vacuum database, set the pointer map
	     entry associated with the new sibling page. If the cell on the new
             sibling page contains a pointer to an overflow page, set the pointer map
	     entry associated with the overflow page.

        <li> Execute the balance procedure on the parent page.
      </ol>

      [Figure btreemodule_balance_quick.svg figure_balance_quick "Example Balance Quick Transform"]

    [h4 "Balance Siblings" balance_siblings]

      [fancyformat_import_requirement L51001]

    <p class=todo>
      The following description describes how balance() is to be implemented. This
      represents (I think) the lowest level of detail that should be in this document.
      One skilled in the art could use this description to reimplement SQLite's
      balance-siblings algorithm. We also need requirements at a higher level
      of detail in this section. Something to test!

    <p>
      The balance-siblings algorithm, as implemented by SQLite, is described as
      a series of steps below.  <span class=todo> there are a few terms used
      below that need definitions/clarifications.</span>

      <ol>
        <li> Determine the set of sibling pages to redistribute the cells of, using 
             the following rules:
        <ol type="a">
          <li> If the parent page has three or fewer child pages, then all child 
               pages are deemed to be sibling pages for the purposes of the balance-siblings
               algorithm.
	  <li> If the page being balanced is the left-most child of the parent
               page, then the three left-most child pages are used as the siblings.
	  <li> If the page being balanced is the right-most child of the parent
               page, then the three right-most child pages are used as the siblings.
	  <li> Otherwise, if none of the above three conditions are true, then the
               sibling pages are page being balanced and the child pages immediately
               to the left and right of it.
        </ol>

        <li> Determine an ordered list of cells to redistribute. There are several
             variations of this step, depending on the type of page being balanced.
        <ol type="a">
	  <li> If the page being balanced is a leaf page of a table b-tree,
	       then the list of cells to redistribute is simply the concatenation
               of the ordered lists of cells stored on each sibling page, in order
               from left-most sibling to right-most.
	  <li> If the page being balanced is a leaf page of an index b-tree, then 
               the list of cells to redistribute is comprised of the cells on each
               of the sibling pages and the divider cells in the parent page that 
               contain the pointers to each sibling page except the right-most. The
               list is arranged so that it contains: 
           <ul>
             <li> The cells from the left-most sibling page, in order, followed by
             <li> the divider cell from the parent page that contains the pointer 
		  to the left-most sibling (if there is more than one sibling
                  page), followed by
	     <li> the divider cell that contains the pointer to the second left-most
                  sibling and the cells from the remaining sibling page (if there are three
                  sibling pages).
           </ul>
	  <li> If the page being balanced is an internal b-tree node, then the list of
               cells to redistribute is determined as described in the previous case.
               However, when balancing an internal node each cell is associated with
               the page number of a child page of one of the sibling pages. The page 
               number associated with cells stored on a sibling page is the same as
               the page number stored as the first four bytes of the cell. The page
               number associated with a divider cell within the parent page is the page
               number of the right-child page of the sibling page to which the divider
               cell contains a pointer.
        </ol>

        <li> Determine the new cell distribution, using the following steps:
        <ol type="a">
          <li> Assign as may cells as will fit from the start of the ordered list of 
               cells to the left-most sibling page. Then, if any cells remain, assign
               one to be a divider cell, and as many as will fit to the next sibling 
	       page. Repeat until all cells have been assigned a location.
               <span class=todo> no divider cells for table b-tree leaf balance</span>

          <li> The previous step generates a distribution that is biased towards the
	       left-hand side. The right-most sibling may even be completely
	       empty (if the last cell in the ordered list was assigned to be a
               divider cell). To rectify this, cells are moved out of the second 
               right-most sibling page and into the right-most, one at a time, until
               there is at least one cell in the right-most sibling page and to move
               another cell would mean that the right-most sibling page is more full
               than the next to right-most sibling page. This is repeated for the next
               right-most pair of sibling pages, shifting cells out of the third 
               right-most sibling page and into the second right-most, and so on.
               <span class=todo> note about divider cells </span>
        </ol>

        <li> Determine the set of database pages to use as the new sibling pages. 

        <ol type="a">
	   <li> If there were an equal or greater number of siblings identified
                in step 1 than are required by the distribution calculated in step 3, 
                reuse as many as possible, starting with the left-most. If step 3
                calculated a distribution that requires more sibling pages than were
                identified in step 1, allocate the required extra pages using the
                <span class=todo>Refer to ???</span> algorithm.

	   <li> Arrange the new sibling pages from left to right in ascending
                page number order. The new sibling page with the smallest page number
                becomes the left-most sibling page, and so forth.
        </ol>

        <li> Populate the new sibling pages.
        <ol type="a">
	   <li> Populate each new sibling page with the required set of cells. If the
                page being balanced is not a leaf page, then the child-page pointer
                field of each cell is populated with the page-number associated with
                the cell as part of step 2 above. 

	   <li> If the page being balanced is not a leaf page, then the right-child 
                pointer stored in the page header of each new sibling page must also
                be populated. For each new sibling page except the right-most, this 
                field is set to the page number associated with the cell that 
                immediately follows the cells stored on the page (the cell that was
                assigned to be divider cell in step 3). For the right-most sibling page,
                the right-child pointer is set to the value that was stored in the
                right-child pointer of the right-most original sibling page identified
                in step 1.
        </ol>
        <li> Populate the parent page.
        <ol type="a">
	  <li> If the page being balanced is (was) not a leaf page of a table
	       b-tree, the cells that contained pointers to the old sibling
               pages are replaced by the cells designated as divider cells as part
               of step 3. The right-child pointer field of the first divider cell
               is overwritten with the page number of the first new sibling page, and 
               so on.

	  <li> If the page being balanced is (was) a leaf page of a table
	       b-tree, the cells that contained pointers to the old sibling
               pages are replaced by a divider cell associated with all but the
               right-most sibling page. The child-page number stored in each divider
               cell is set to the page number of the associated sibling. The integer key 
               value stored in each divider cell is a copy of the largest integer key
               value stored on the associated sibling page.

	  <li> Before balancing, the parent page contained a pointer to the right-most
               sibling page, either as part of a cell or as the right-child pointer
               stored in the page header. Either way, this value must be overwritten
               with the page number of the new right-most sibling page.
               
        </ol>

        <li> Populate pointer map entries.
        <ol type="a">
          <li> For each sibling page that was not also an original sibling page, the
               associated pointer-map entry must be updated. Similarly, the pointer-map
               entry associated with each original sibling page that is no longer a
               sibling page must be updated.
          <li> For each cell containing an overflow pointer that has been moved from one 
               page to another, the pointer-map entry associated with the overflow page 
               must be updated.
          <li> If the page being balanced is (was) not a leaf, then for each cell that
               has moved from one page to another the pointer-map entry associated with
               the cell's child page must be updated.
          <li> If the page being balanced is (was) not a leaf, then the pointer-map entry
               associated with each sibling's right-child page may need to be updated.
        </ol>
      </ol>

    [h3 "Page Allocation and Deallocation"]

     <p class=todo>
       Amongst other things, this section needs to explain our old pals the
       DontWrite() and DontRollback() optimizations.

    [h4 "Moving an overflow-chain to the free-list" free_overflow_chain]

     <p class=todo>
       Describe how this can sometimes be done without reading the content of
       overflow pages.

    [h3 "Incremental Vacuum Step"]




  [h2 "Transactions and Savepoints"]

     <p class=todo>
       Requirements surrounding how transactions are made atomic and isolated.
       Also how savepoints are implemented. What happens to active cursors after
       a rollback or savepoint-rollback.

  [h1 References]

  <table id="refs" style="width:auto; margin: 1em 5ex">
    [Ref 1 ref_file_format {
      SQLite Online Documentation,<u>SQLite Database File Format</u>,
      <a href="fileformat.html">http://www.sqlite.org/fileformat.html</a>.
    }]
    [Ref 2 ref_pcache_interface {
      SQLite Online Documentation,<u>Application Defined Page Cache</u>,
      <a href="c3ref/pcache_methods.html">http://www.sqlite.org/c3ref/pcache_methods.html</a>.
    }]
    [Ref 3 ref_os_interface {
      SQLite Online Documentation,<u>OS Interface Object</u>,
      <a href="c3ref/vfs.html">http://www.sqlite.org/c3ref/vfs.html</a>.
    }]

  </table>
}

</tcl>
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<
<