Documentation Source Text

Check-in [861079d8dc]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Move fileio.html and fileformat.html from the other repository to this one.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 861079d8dcd21b67c81d72f5f6e1903620892c31
User & Date: dan 2008-07-22 17:24:03
Context
2008-07-22
17:40
Change the requirement numbers used in fileformat.in and fileio.in to start with "H2". check-in: c1bf78dd4f user: dan tags: trunk
17:24
Move fileio.html and fileformat.html from the other repository to this one. check-in: 861079d8dc user: dan tags: trunk
17:22
(no comment) check-in: 906aef1f58 user: dan tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Added images/fileformat/db_connection.gif.

cannot compute difference between binary files

Added images/fileformat/examplepop.gif.

cannot compute difference between binary files

Added images/fileformat/fileio_diagrams.odg.

cannot compute difference between binary files

Added images/fileformat/freelistpage.gif.

cannot compute difference between binary files

Added images/fileformat/indexlongrecord.gif.

cannot compute difference between binary files

Added images/fileformat/indexpage.gif.

cannot compute difference between binary files

Added images/fileformat/indexpage.odg.

cannot compute difference between binary files

Added images/fileformat/indexshortrecord.gif.

cannot compute difference between binary files

Added images/fileformat/indextree.gif.

cannot compute difference between binary files

Added images/fileformat/journal_header.gif.

cannot compute difference between binary files

Added images/fileformat/journal_record.gif.

cannot compute difference between binary files

Added images/fileformat/overflowpage.gif.

cannot compute difference between binary files

Added images/fileformat/pointermapentry.gif.

cannot compute difference between binary files

Added images/fileformat/recordformat.gif.

cannot compute difference between binary files

Added images/fileformat/rtdocs.css.





























































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
  /* Style for requirements paragraph. */
  .req       { margin: 1.0em 10ex; color: darkblue }
  .subreq    { margin: 1.0em 5ex 1.0em 15ex; color: darkblue }
  .subsubreq { margin: 1.0em 5ex 1.0em 20ex; color: darkblue }

  .req:before { color: black; content: "[" attr(id) "] "}
  .subreq:before { color: black; content: "[" attr(id) "] "}
  .subsubreq:before { color: black; content: "[SUBSUBREQ] " }

  /* The dark-green color used for headings. */
  h1,h2,h3,h4,#toc_header,#document_title { color: #80a796 }

  .req code {display: block; margin: 0.5em 5ex}

  code {white-space: pre}

  /* Style for content headings */
  h2           { margin-left: 10px }
  h3           { margin-left: 20px }
  h4           { margin-left: 30px }
  h1,h2,h3,h4  { font-weight: normal }

  /* Style for document and toc headings */
  #toc_header      { font-size: 1.5em; margin: 1.0em; }
  #document_title  { font-size: 2em; text-align: center }

  #toc a { color: darkblue ; text-decoration: none }

  /* Document font */
  body                     { font-family: sans-serif }

  /* Margins for block boxes that occur in the document flow. */
  p,ul,ol                     { margin: 1em 5ex }
  td p, td ul, td ol          { margin: 1em auto }

  /* Table style */
  table.striped, table#glossary { margin: 1em auto; width: 80% ; border-spacing: 0}
  .striped th, #glossary th { 
    white-space:nowrap; 
    text-align:left;
    border-bottom: solid 1px #444444;
    padding: 0.2em 1ex;
  }
  .striped td, #glossary td { vertical-align: top }
  .striped td, #glossary td { padding: 0.2em 1ex; }

  .spacedlist li { margin-top 0.5em ; margin-bottom: 0.5em }

  /* Style for "todo" notes. These are represented by markup like: 
  **
  **     <span class=todo>Fix this bit!</span>
  **     <p class=todo>Longer todo note.</p>
  */
  .todo        { color: #AA3333 ; font-style : italic }
  .todo:before { content: 'TODO:' }
  p.todo       { border: solid #AA3333 1px; padding: 1ex }

  cite         { font-style: normal; font-weight: normal }
  cite a       { color: inherit; text-decoration: none }
  :link:hover,:visited:hover { background: wheat }

  img {display:block}

Added images/fileformat/rtdocs.js.









































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116


function populate_toc () {
  var children = document.getElementsByTagName("h1").item(0).parentNode.childNodes
  var toc = ""

  var counters = new Array()
  counters[1] = 0
  counters[2] = 0
  counters[3] = 0
  counters[4] = 0

  /* Generate the table of contents */
  for(var ii=0; ii<children.length; ii++){
    var node = children.item(ii)
    var iHeader = -1
    if( node.tagName == "H1" ){ iHeader = 1 }
    if( node.tagName == "H2" ){ iHeader = 2 }
    if( node.tagName == "H3" ){ iHeader = 3 }
    if( node.tagName == "H4" ){ iHeader = 4 }

    if( iHeader>0 ){
      var anchor = "tocentry_" + ii

      for(var jj=iHeader+1; jj<=4; jj++){ counters[jj] = 0 }
      counters[iHeader]++

      var number = ""
      for(var jj=1; jj<=iHeader; jj++){ number += counters[jj] + "." }

      toc += '<div style="margin-left:' + (iHeader*6) + 'ex">'
      toc += '<a href="#' + anchor + '">' + number + " " + node.innerHTML
      toc += "</a></div>"
      
      var a = '<a style="color:inherit" name="' + anchor + '">' + number + '</a>'
      node.innerHTML = a + " " + node.innerHTML
    }
  }
  document.getElementById("toc").innerHTML = toc
}

function number_figs () {
  /* Number the figures in this document */
  var figcounter = 1
  var spans = document.getElementsByTagName("span")
  for(var ii=0; ii<spans.length; ii++){
    var s = spans.item(ii)
    if( s.className=="fig" ){
      s.innerHTML = figcounter
      figcounter++
    }
  }
}

function populate_refs () {
  /* Fix up <cite> references */
  var cites = document.getElementsByTagName("cite")
  for(var ii=0; ii<cites.length; ii++){
    var t = cites.item(ii).innerHTML
    var h = document.getElementById(t)

    if( !h ){
      alert("Bad reference: " + t)
      continue
    }

    var label
    if( h.tagName=="H1" || h.tagName=="H2"
     || h.tagName=="H3" || h.tagName=="H4"
    ){
      label = h.firstChild.firstChild.data
      label = label.substring(0, label.length-1)
    } else {
      label = h.firstChild.data
    }

    cites.item(ii).innerHTML = '<a href="#' + t + '">' + label + '</a>'
  }
}

function decorate_tables () {
  /* Decorate tables */
  var tables = document.getElementsByTagName("table")
  for(var ii=0; ii<tables.length; ii++){
    var t = tables.item(ii)
    if( t.className!="striped" ) continue
    var rows = t.rows
    for(var jj=1; jj<rows.length; jj += 2){
      rows.item(jj).style.backgroundColor = '#DDDDDD'
    }
  }
}

function check_for_duplicates () {
  var aReq = new Array();
  var ps = document.getElementsByTagName("p")

  for(var ii=0; ii<ps.length; ii++){
    var p = ps.item(ii)
    if( p.className!="req" || !p.id ) continue;

    if( aReq[p.id] ){
      alert("Duplicate requirement number: " + p.id)
    }
    aReq[p.id] = 1;
  }
}

onload = function () {
  number_figs()
  populate_toc()
  populate_refs()
  decorate_tables()
  check_for_duplicates()
}

Added images/fileformat/tablelongrecord.gif.

cannot compute difference between binary files

Added images/fileformat/tablenodecell.gif.

cannot compute difference between binary files

Added images/fileformat/tableshortrecord.gif.

cannot compute difference between binary files

Added images/fileformat/tabletree.gif.

cannot compute difference between binary files

Added images/fileformat/vfs_role.gif.

cannot compute difference between binary files

Added images/fileformat/write_transaction.gif.

cannot compute difference between binary files

Added pages/fileio.in.































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































































>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

<html>
<head>
  <link type="text/css" rel="stylesheet" href="images/fileformat/rtdocs.css">
  <script type="text/javascript" src=images/fileformat/rtdocs.js></script>
</head>

<div id=document_title>SQLite File IO Specification</div>
<div id=toc_header>Table Of Contents</div>
<div id=toc>
  <b>Javascript is required for some features of this document, including 
     table of contents, figure numbering and internal references (section
     numbers and hyper-links.
  </b>
</div>
<!-- End of standard rt docs header -->

<h1>Document Overview</h1>
  <h2>Scope and Purpose</h2>
    <p>
      The <i>SQLite File Database File Format</i> document
      <cite>ff_sqlitert_requirements</cite> contains a description of the 
      serialized format used to store an SQLite database 
      <span class=todo>ref</span>. This document describes, in detail, 
      the methods used by SQLite to safely read and modify a database 
      file stored in a file-system.

    <p>
      SQLite does not interact directly with the host operating system that
      provides the file-system service. Instead, the user is required to
      supply an adaptor component that implements the <i>SQLite Virtual
      File System</i> interface (described in
      <cite>capi_sqlitert_requirements</cite>) used by SQLite. The adaptor
      component is responsible for translating the calls made by SQLite to the
      calls to the operating system specific interface provided to access the
      file-system service.

    <center><img src="images/fileformat/vfs_role.gif">
    <p><i>Figure <span class=fig id=figure_vfs_role></span> - Virtual File System (VFS) Adaptor</i>
      </center>

    <p>
      The descriptions in this document are of the way that SQLite invokes the
      <i>VFS API</i> provided by the adaptor component depicted in figure
      <cite>figure_vfs_role</cite>.

  <h2>Document Organization</h2>
  <h2>Glossary</h2>
    <p class=todo>
      After this document is ready, make the vocabulary consistent and
      then add a glossary here.

    <ul>

      <li>xFileControl

      <li>xDeviceCharacteristics
      <li>xSectorSize
      <li>xAccess
      <li>xOpen
      <li>xFullPathname

      <li>xClose

    </ul>

<h1>VFS Adaptor Implementations</h1>

  <h2>SQLite File-System Usage </h2>
    <p>
      SQLite uses the file-system service made available via the 
      <i>VFS adaptor</i> to create, read and modify files for a variety
      of purposes, including:

    <ul>
      <li> database files,
      <li> journal files,
      <li> master journal files, and
      <li> statement journal files.
    </ul>

    <p>
      A database file is a file-system file used to store an SQLite database 
      image (see <cite>ff_sqlitert_requirements</cite>).

  <h2 id=fs_characteristics>File-System Characteristics</h2>
    <p>
      In the event of an operating system or power failure, the various 
      combinations of file-system and storage hardware available provide
      varying levels of guarantee as to the integrity of the data written
      to the file system just before or during the failure. The exact
      combination of IO operations that SQLite is required to perform
      in order to safely modify a database file depend on the exact 
      characteristics of the target platform.

    <p>
      SQLite queries an implementation for file-system characteristics
      using the xDeviceCharacteristics() and xSectorSize() methods of the
      database file file-handle. These two methods are only ever called
      on file-handles open on database files. They are not called for 
      <i>journal files</i>, <i>master-journal files</i> or 
      <i>temporary database files</i>.

    <p>
      The return value of the xSectorSize() method, the <i>sector-size</i>, is
      expected by SQLite to be a power of 2 value greater than or equal to 512.
    <p class=todo> 
      What does it do if this is not the case? If the sector size is less
      than 512 then 512 is used instead. How about a non power-of-two value?
      UPDATE: How this situation is handled should be described in the API
      requirements. Here we can just refer to the other document.

    <p>
      SQLite assumes that files are stored and written to within the
      file-system as a collection of blocks (hereafter sectors) of data, each
      <i>sector-size</i> bytes in size. This model is used to derive
      the following assumptions related to the expected state of the
      file-system following a power failure or operating system crash.

    <ul>
      <li>
          After part or all of a file sector has been modified
          using the xWrite() method of an open file-handle, the sector
          is said to be in a transient state, where the operating system
          makes no guarantees about the actual content of the sector on the
          persistent media. The sector remains in the transient state until
          the next successful call to xSync() on the same file-handle 
          returns. If a power failure or operating system crash occurs, then
          part or all of all sectors in the transient state when the crash
          occured may contain invalid data following system recovery.
      <li>
          Following a power failure or operating system crash, the content
          of all sectors that were not in a transient state when the crash
          occured may be trusted.
    </ul>

    <p class=todo>
      What do we assume about the other three file-system write 
      operations - xTruncate(), xDelete() and "create file"?

    <p>
      For example, if the <i>sector-size</i> of a given file-system is
      2048 bytes, and SQLite opens a file and writes a 1024 byte block
      of data to offset 3072 of the file, then according to the model 
      the second sector of the file is in the transient state. If a 
      power failure or operating system crash occurs before or during
      the next call to xSync() on the file handle, then following system
      recovery SQLite assumes that all file data between byte offsets 2048 
      and 4095, inclusive, is invalid. It also assumes that since the first
      sector of the file, containing the data from byte offset 0 to 2047 
      inclusive, is valid, since it was not in a transient state when the 
      crash occured.



    <p>
      The xDeviceCharacteristics() method returns a set of flags, 
      indicating which of the following properties (if any) the 
      file-system provides:

    <ul>
      <li>The <b><i>sequential IO</i></b> property. If a file-system has this 
          property, then in the event of a crash at most a single sector
          may contain invalid data. The file-system guarantees
      <li>The <b><i>safe-append</i></b> property.
      <li>The <b><i>atomic write</i></b> property.
    </ul>

    <p class=todo>
      Write an explanation as to how the file-system properties influence
      the model used to predict file damage after a catastrophy.
 

<h1>Database Connections</h1>

  <p>
    Within this document, the term <i>database connection</i> has a slightly
    different meaning from that which one might assume. The handles returned
    by the <code>sqlite3_open()</code> and <code>sqlite3_open16()</code>
    APIs (<span class=todo>reference</span>) are referred to as <i>database
    handles</i>.  A <i>database connection</i> is a connection to a single
    database file using a single file-handle, which is held open for the
    lifetime of the connection. Using the "ATTACH" syntax, multiple <i>database
    connections</i> may be accessed via a single <i>database handle</i>. Or,
    using SQLite's <i>shared-cache mode</i> feature, multiple <i>database
    handles</i> may access a single <i>database connection</i>.

    <center><img src="images/fileformat/db_connection.gif">
    <p><i>Figure <span class=fig id=figure_db_connection></span> - Relationship between Database Connections and Database Handles.</i>
      </center>

  <p>
    As well as a file-handle open on the database file, each
    <i>database connection</i> has a <i>page cache</i> associated with it.
    The <i>page cache</i> is used to cache data read from the database file
    to reduce the amount of data that must be read from the file-handle. It
    is also used to accumulate data written to the database file so that 
    write operations can be batched for greater efficiency. Figure 
    <cite>figure_db_connection</cite> illustrates a system containing two
    database connections, each to a separate database file. The leftmost of
    the two depicted <i>database connections</i> is shared between two 
    <i>database handles</i>. The connection illustrated towards the right of
    the diagram is used by a single <i>database handle</i>.

  <p>
    A database connection is always in one of the following states:

  <ol>
    <li><i>Unlocked state</i> (no transaction).
    <li><i>Shared lock state</i> (read-only transaction).
    <li><i>Reserved lock state</i> (read/write transaction).
    <li><i>Pending lock state</i> (read/write transaction).
    <li><i>Exclusive lock state</i> (read/write transaction).
  </ol>

  <p> Obviously, each state corresponds to the type of lock held on the
      database file. In some cases, various actions apart from simply obtaining
      the file-system lock must take place when a <i>database connection</i>
      transitions from one state to another.
 
  <p class=todo>
    Maybe a state diagram will be possible...

  <h2 id=open_new_connection>Opening a New Connection</h2>

    <p>
      Opening a new database connection is a two-step process:

    <ol>
      <li> A file-handle is opened on the database file.
      <li> If step 1 was successful, an attempt is made to read the 
	   <i>database file header</i> from the database file using the 
           new file-handle.
    </ol>

    <p>
      In step 2 of the procedure above, the database file is not locked
      before it is read from. This is the only exception to the locking 
      rules described in section <cite>reading_data</cite>.
    <p>
      One reason for attempting to read the <i>database file header</i>
      is to determine the <i>page-size</i> used by the database file. 
      Because it is not possible to be certain as to the <i>page-size</i> 
      without holding at least a <i>shared lock</i> on the database file
      (because some other <i>database connection</i> might have changed it
      since the <i>database file header</i> was read), the value read from the
      <i>database file header</i> is known as the <i>expected page size</i>. 

    <p class=req id=FS0060>
      When a new <i>database connection</i> is required, SQLite shall attempt
      to open a file-handle on the database file. If the attempt fails, then
      no new <i>database connection</i> is created and an error returned.

    <p class=req id=FS0070>
      When a new <i>database connection</i> is required, after opening the
      new file-handle, SQLite shall attempt to read the first 100 bytes
      of the database file. If the attempt fails for any other reason than
      that the opened file is less than 100 bytes in size, then 
      the file-handle is closed, no new <i>database connection</i> is created
      and an error returned instead.

    <p class=req id=FS0080>
      If the <i>database file header</i> is successfully read from a newly
      opened database file, the connections <i>expected page-size</i> shall 
      be set to the value stored in the <i>page-size field</i> of the 
      database header.

    <p class=req id=FS0090>
      If the <i>database file header</i> cannot be read from a newly opened 
      database file (because the file is less than 100 bytes in size), the 
      connections <i>expected page-size</i> shall be set to the compile time
      value of the SQLITE_DEFAULT_PAGESIZE option.


  <h2>Closing a Connection</h2>
    <p class=todo>
      Close file handles. Discard pager cache.
    </ol>

  <h2>The Page Cache</h2>
 
<h1 id=reading_data>Reading Data</h1>
  <p>
    In order to return data from the database to the user, for example as
    the results of a SELECT query, SQLite must at some point read data
    from the database file. Usually, data is read from the database file in 
    aligned blocks of <i>page-size</i> bytes. The exception is when the
    database file header fields are being inspected, before the
    <i>page-size</i> used by the database can be known.

  <p>
    With two exceptions, a <i>database connection</i> must have an open 
    transaction (either a <i>read-only transaction</i> or a 
    <i>read/write transaction</i>) on the database file before data may be 
    read from the database connection. In this case, data "read from the 
    database connection" includes data that is read from the database file 
    and data that is already present in the <i>page cache</i>. Without an 
    open transaction on the database file, the contents of the 
    <i>page cache</i> may not be trusted.

  <p>
    The two exceptions are:
  <ul>
    <li> When an attempt is made to read the 100 byte <i>database file
	 header</i> immediately after opening the <i>database connection</i>
	 (see section <cite>open_new_connection</cite>). When this occurs
         no lock is held on the database file.
    <li> Data read while in the process of opening a read-only transaction
         (see section <cite>open_read_only_trans</cite>). These occur after
         a <i>shared lock</i> is held on the database file.
  </ul>

  <p>
    Once a transaction has been opened, reading data from a database 
    connection is a simple operation. Using the xRead() method of the 
    file-handle open on the database file, the required database file 
    pages are read one at a time. SQLite never reads partial pages and
    always uses a single call to xRead() for each required page. After
    reading the data for a database page, SQLite adds it to the connections
    <i>page cache</i> so that it does not have to be read if required
    again. Refer to section <cite>page_cache_algorithms</cite> for a
    description of how this affects the IO performed by SQLite.

  <p class=req id=FS0010>
    Except for the read operation required by FS0070 and those reads made
    as part of opening a read-only transaction, SQLite shall only read
    data from a <i>database connection</i> while the 
    <i>database connection</i> has an open read-only or read/write transaction.

  <p>
    In the above requirement, reading data from a database connection
    includes retrieving data from the connections <i>page cache</i>.

  <p class=req id=FS0020>
    Aside from those read operations described by FS0070 and FSXXXX, SQLite
    shall read data from the database in aligned blocks of <i>page-size</i>
    bytes, where <i>page-size</i> is the database page size used by the
    database file.

  <h2 id=open_read_only_trans>Opening a Read-Only Transaction</h2>
    <p>
      Before data may be read from a <i>database connection</i>, a 
      <i>read-only transaction</i> must be successfully opened (this is true
      even if the connection will eventually write to the database, as a
      <i>read/write transaction</i> may only be opened by upgrading from a
      <i>read-only transaction</i>). This section describes the procedure
      for opening a <i>read-only transaction</i>.

    <p>
      The key element of a <i>read-only transaction</i> is that the 
      file-handle open on the database file obtains and holds a
      <i>shared-lock</i> on the database file. Because a connection requires
      an <i>exclusive-lock</i> before it may actually modify the contents
      of the database file, and by definition while one connection is holding
      a <i>shared-lock</i> no other connection may hold an 
      <i>exclusive-lock</i>, holding a <i>shared-lock</i> guarantees that
      no other process may modify the database file while the <i>read-only
      transaction</i> remains open.

    <p>Obtaining the <i>shared lock</i> itself on the database file is quite
       simple, SQLite just calls the xLock() method of the database file 
       handle. Some of the other processes that take place as part of 
       opening the <i>read-only transaction</i> are quite complex. The list
       of steps SQLite is required to take to open a <i>read-only
       transaction</i>, in the order in which the must occur, is as follows:

    <ol>
      <li>A <i>shared-lock</i> is obtained on the database file.
      <li>The connection checks if a <i>hot journal file</i> exists in the
          file-system. If one does, then it is rolled back before continuing.
      <li>The connection checks if the data in the <i>page cache</i> may 
          still be trusted. If not, all page cache data is discarded.
      <li>If the file-size is not zero bytes and the page cache does not
	  contain valid data for the first page of the database, then the
          data for the first page must be read from the database.
    </ol>

    <p>
      Of course, an error may occur while attempting any of the 4 steps
      enumerated above. If this happens, then the <i>shared-lock</i> is 
      released (if it was obtained) and an error returned to the user. 
      Step 2 of the procedure above is described in more detail in section
      <cite>hot_journal_detection</cite>. Section <cite>cache_validation</cite>
      describes the process identified by step 3 above. Further detail
      on step 4 may be found in section <cite>read_page_one</cite>.

    <p class=req id=FS0100>
      When required to open a <i>read-only transaction</i> using a 
      <i>database connection</i>, SQLite shall first attempt to obtain 
      a <i>shared-lock</i> on the file-handle open on the database file.

    <p class=req id=FS0110>
      If, while opening a <i>read-only transaction</i>, SQLite fails to obtain
      the <i>shared-lock</i> on the database file, then the process is
      abandoned, no transaction is opened and an error returned to the user.

    <p>
      The most common reason an attempt to obtain a <i>shared-lock</i> may
      fail is that some other connection is holding an <i>exclusive</i> or
      <i>pending lock</i>. However it may also fail because some other
      error (e.g. IO, comms related) occurs within the call to the xLock()
      method.

    <p class=req id=FS0030>
      While opening a <i>read-only transaction</i>, after successfully
      obtaining a <i>shared lock</i> on the database file, SQLite shall 
      attempt to detect and roll back a <i>hot journal file</i> associated 
      with the same database file.

    <p class=req id=FS0120>
      If, while opening a <i>read-only transaction</i>, SQLite encounters
      an error while attempting to detect or roll back a <i>hot journal
      file</i>, then the <i>shared-lock</i> on the database file is released,
      no transaction is opened and an error returned to the user.

    <p>
      Section <cite>hot_journal_detection</cite> contains a description of
      and requirements governing the detection of a hot-journal file refered
      to in the above requirements.

    <p class=req id=FS0040>
      Assuming no errors have occured, then after attempting to detect and
      roll back a <i>hot journal file</i>, if the connections 
      <i>page cache</i> is not empty, then SQLite shall validate the contents 
      of the <i>page cache</i> by testing the <i>file change counter</i>. 
      This procedure is known as <i>cache validiation</i>.

    <p class=req id=FS0050>
      If the contents of the <i>page cache</i> are found to be invalid by
      the check prescribed by F20040, SQLite shall discard the cache contents
      before continuing.
      

  <h3 id=hot_journal_detection>Hot Journal Detection</h3>
    <p>
      This section describes the procedure that SQLite uses to detect a
      <i>hot journal file</i>. If a <i>hot journal file</i> is detected,
      this indicates that at some point the process of writing a 
      transaction to the database was interrupted and a recovery operation
      (<i>hot journal rollback</i>) needs to take place.

    <p>
      The procedure used to detect a <i>hot-journal file</i> is quite
      complex. The following steps take place:

      <ol class=spacedlist>
        <li>Using the VFS xAccess() method, SQLite queries the file-system 
            to see if the journal file associated with the database exists. 
            If it does not, then there is no hot-journal file.

	<li>By invoking the xCheckReservedLock() method of the file-handle
	    opened on the database file, SQLite checks if some other connection
	    holds a <i>reserved lock</i> or greater. If some other connection
	    does hold a <i>reserved lock</i>, this indicates that the other
	    connection is midway through a <i>read/write transaction</i> (see
	    section <cite>writing_data</cite>). In this case the 
            <i>journal file</i> is not a <i>hot-journal</i> and must not be 
            rolled back.

        <li>Using the xFileSize() method of the file-handle opened
            on the database file, SQLite checks if the database file is 
            0 bytes in size. If it is, the journal file is not considered
            to be a <i>hot journal</i> file. Instead of rolling back the
            journal file, in this case it is deleted from the file-system
            by calling the VFS xDelete() method. <span class=todo>Technically,
            there is a race condition here. This step should be moved to
            after the exclusive lock is held.</span>

        <li>An attempt is made to upgrade to an <i>exclusive lock</i> on the
            database file. If the attempt fails, then all locks, including 
            the recently obtained <i>shared lock</i> are dropped. The attempt
            to open a <i>read-only transaction</i> has failed. This occurs
	    when some other connection is also attempting to open a 
	    <i>read-only transaction</i> and the attempt to gain the
	    <i>exclusive lock</i> fails because the other connection is also
	    holding a <i>shared lock</i>. It is left to the other connection 
            to roll back the <i>hot journal</i>.
            <div style="margin-top:0.5em"></div>
            It is important that the file-handle lock is upgraded 
	    directly from <i>shared</i> to <i>exclusive</i> in this case,
            instead of first upgrading to <i>reserved</i> or </i>pending</i>
            locks as is required when obtaining an <i>exclusive lock</i> to
            write to the database file (section <cite>writing_data</cite>).
	    If SQLite were to first upgrade to a <i>reserved</i> or
	    <i>pending</i> lock in this scenario, then a second process also
            trying to open a <i>read-transaction</i> on the database file might
	    detect the <i>reserved</i> lock in step 2 of this process, 
            conclude that there was no <i>hot journal</i>, and commence
            reading data from the <i>database file</i>.

        <li>The xAccess() method is invoked again to detect if the journal 
            file is still in the file system. If it is, then it is a 
            hot-journal file and SQLite tries to roll it back (see section
            <cite>rollback</cite>).
      </ol>

    <p>
      The following requirements describe step 1 of the above procedure in
      more detail.

    <p class=req id=FS0140>
      When required to attempt to detect a <i>hot-journal file</i>, SQLite
      shall first use the xAccess() method of the VFS layer to check if a
      journal file exists in the file-system.

    <p class=req id=FS0150>
      When required to attempt to detect a <i>hot-journal file</i>, if the
      call to xAccess() required by FS0140 indicates that a journal file does
      not exist, then the attempt to detect a <i>hot-journal file</i> is
      finished. A <i>hot-journal file</i> was not detected.

    <p>
      The following requirements describe step 2 of the above procedure in
      more detail.

    <p class=req id=FS0160>
      When required to attempt to detect a <i>hot-journal file</i>, if the
      call to xAccess() required by FS0140 indicates that a journal file
      is present, then the xCheckReservedLock() method of the database file
      file-handle is invoked to determine whether or not some other 
      process is holding a <i>reserved</i> or greater lock on the database 
      file.

    <p class=req id=FS0170>
      If the call to xCheckReservedLock() required by FS0160 indicates that
      some other <i>database connection</i> is holding a <i>reserved</i>
      or greater lock on the database file, 

    <p class=todo>
      Finish this section.

  <h3 id=cache_validation>Cache Validation</h3>
    <p>
      When a <i>database connection</i> opens a <i>read transaction</i>, the
      associated <i>page cache</i> may already contain data. However, if
      another process has modified the database file since the cached pages
      were loaded it is possible that the cached data is invalid.

    <p>
      SQLite determines whether or not the contents of a <i>page cache</i>
      are valid or not using the <i>file change counter</i>, a field
      in the <i>database file header</i>. The <i>file change counter</i>
      is a 4-byte big-endian integer field stored starting at byte offset
      24 of the <i>database file header</i>. Before the conclusion of a 
      <i>read/write transaction</i> that modifies the contents of the
      database file in any way (see section <cite>writing_data</cite>),
      the value stored in the <i>file change counter</i> is incremented.
      When a <i>database connection</i> unlocks the database file, it stores
      the current value of the <i>file change counter</i>. Later, while
      opening a new <i>read-only transaction</i>, SQLite checks the value
      of the <i>file change counter</i> stored in the database file. If the
      value has not changed since the database file was unlocked, then the
      contents of the <i>page cache</i> can be trusted. If the value has
      changed, then the <i>page cache</i> cannot be trusted and all data
      is discarded.
   
    <p class=req id=FS0180>
      When a file-handle open on a database file is unlocked, if the
      <i>page cache</i> belonging to the associated <i>database connection</i>
      is not empty, SQLite shall store the value of the <i>file change
      counter</i> internally.

    <p class=req id=FS0190>
      When required to perform <i>cache validation</i> as part of opening
      a <i>read transaction</i>, SQLite shall read a 16 byte block 
      starting at byte offset 24 of the <i>database file</i> using the xRead()
      method of the <i>database connections</i> file handle.

    <p class=todo>
      Why a 16 byte block? Why not 4? (something to do with encrypted
      databases).

    <p class=req id=FS0200>
      While performing <i>cache validation</i>, after loading the 16 byte
      block as required by FS0190, SQLite shall compare the 32-bit big-endian
      integer stored in the first 4 bytes of the block to the most
      recently stored value of the <i>file change counter</i> (see FS0180).
      If the values are not the same, then SQLite shall conclude that
      the contents of the cache are invalid.

    <p>
      Requirement FS0050 (section <cite>open_read_only_trans</cite>) 
      specifies the action SQLite is required to take upon determining that 
      the cache contents are invalid.


  <h3 id=read_page_one>Page 1 and the Expected Page Size</h3>
    <p>
      As the last step in opening a <i>read transaction</i> on a database
      file that is more than 0 bytes in size, SQLite is required to load 
      data for page 1 of the database into the <i>page cache</i>, if it is 
      not already there. This is slightly more complicated than it seems, 
      as the database <i>page-size</i> is no known at this point.

    <p>
      Even though the database <i>page-size</i> cannot be known for sure,
      SQLite is usually able to guess correctly by assuming it to be equal to
      the connections <i>expected page size</i>. The <i>expected page size</i>
      is the value of the <i>page-size</i> field read from the 
      <i>database file header</i> while opening the database connection 
      (see section <cite>open_new_connection</cite>), or the <i>page-size</i>
      stored of the database file when the most <i>read transaction</i> was
      concluded.

    <p class=req id=FS0210>
      During the conclusing of a <i>read transaction</i>, before unlocking
      the database file, SQLite shall set the connections 
      <i>expected page size</i> to the current database <i>page-size</i>.

    <p class=req id=FS0220>
      As part of opening a new <i>read transaction</i>, immediately after 
      performing <i>cache validation</i>, if there is no data for database
      page 1 in the <i>page cache</i>, SQLite shall read <i>N</i> bytes from
      the start of the database file using the xRead() method of the 
      connections file handle, where <i>N</i> is the connections current 
      <i>expected page size</i> value.

    <p class=req id=FS0230>
      If page 1 data is read as required by FS0230, then the value of the
      <i>page-size</i> field that appears in the database file header that
      consumes the first 100 bytes of the read block is not the same as the
      connections current <i>expected page size</i>, then the 
      <i>expected page size</i> is set to this value, the database file is
      unlocked and the entire procedure to open a <i>read transaction</i>
      is repeated.

    <p class=req id=FS0240>
      If page 1 data is read as required by FS0230, then the value of the
      <i>page-size</i> field that appears in the database file header that
      consumes the first 100 bytes of the read block is the same as the
      connections current <i>expected page size</i>, then the block of data
      read is added to the connections <i>page cache</i> as page 1.

  <h2>Ending a Read-only Transaction</h2>
    <p>
      To end a <i>read-only transaction</i>, SQLite simply relinquishes the
      <i>shared lock</i> on the file-handle open on the database file. No
      other action is required.

    <p class=req id=FS0130>
      When required to end a <i>read-only transaction</i>, SQLite shall
      relinquish the <i>shared lock</i> held on the database file by
      calling the xUnlock() method of the file-handle.

    <p>
      See also requirements FS0180 and FS0210 above.

<h1 id=writing_data>Writing Data</h1>
  <p>
    Safely writing data to a database file is also a complex procedure. The
    database file must be updated in such a way that if a power failure,
    operating system crash or application fault occurs while SQLite is midway
    through writing to the database file the database contents are still
    accessible and correct after system recovery.

  <p>
    Logically, an SQLite database file is modified using 
    <i>write transactions</i>. Each <i>write transaction</i> may contain any
    number of modifications to the database files content or size. From the
    point of view of an external observer (a second 
    <i>database connection</i>) an entire <i>write transaction</i> is applied
    to the database file atomically. If a failure of some sort occurs while
    SQLite is midway through applying a <i>write transaction</i> to a database
    file, then it must appear from the point of view of the next <i>database
    connection</i> that reads data from the <i>database file</i> that the 
    aborted transaction was not applied.

  <p>
    SQLite accomplishes these goals using two techniques:

  <ul class=spacedlist>
    <li>While modifying the content or size of a <i>database file</i> to
        apply a <i>write transaction</i>, SQLite maintains an <i>exclusive
        lock</i> on the <i>database file</i>. Because reading from the 
        <i>database file</i> requires a <i>shared lock</i> (see section 
        <cite>reading_data</cite>), and because holding an <i>exclusive lock</i>
        guarantees that no other <i>database connection</i> is holding
        or can obtain a <i>shared lock</i>, this ensures that no other 
        connection may read data from the <i>database file</i> at a point
        when a <i>write transaction</i> has been partially applied. This
        alone ensures that the absence of an application or system failure
        <i>write transactions</i> appear to be atomically applied from the
        point of view of a second <i>database connection</i>.

    <li>In almost all cases, before the contents of a page of the <i>database
	file</i> may be modified or deleted by file truncation, the original
	contents of that page is stored in the <i>journal file</i>.
        Similarly, before the size of the <i>database file</i> may be modifed
        (either by extending or truncating the file), then the original size
        of the database file is stored in the <i>journal file</i>. If an
        application or system failure occurs while updating the <i>database
        file</i>, then the database file content may be restored based on the 
        contents of the <i>journal file</i> before data is next read from it.
        Restoring the contents of a <i>database file</i> using the contents
        of the <i>journal file</i> after an application or system failure
        occurs is known as <i>hot-journal rollback</i> and is described in
        section <cite>hot_journal_rollback</cite>. The methods used by
        SQLite for detecting that <i>hot-journal rollback</i> is required is 
        detailed in section <cite>hot_journal_detection</cite>.
  </ul>

  <p>
    The <i>page cache</i> belonging to the <i>database connection</i> is 
    used to buffer writes before they are written to the <i>database file</i>.
    Often, all changes for an entire <i>write transaction</i> are accumulated
    within the <i>page cache</i>. In this case no write operations are 
    performed on the database file until the user commits the transaction.

  <p>
    Even if an application or system failure does not occur while a
    <i>write transaction</i> is in progress, a rollback operation to restore
    the database file to the state that it was in before the transaction
    started may be required. This may occur if the user explicitly requests
    transaction rollback (i.e. by issuing a "ROLLBACK" command), or 
    automatically, as a result of encountering an SQL constraint (see
    <cite>sql_sqlitert_requirements</cite>). For this reason, the original
    page content is stored in the <i>journal file</i> before the page is 
    even modified within the <i>page cache</i>.

  <p class=req id=FS0250>
    Before modifying or adding any in-memory <i>page cache</i> pages in 
    preparation for writing to the <i>database file</i>, the 
    <i>database connection</i> shall open a <i>write transaction</i> on 
    the database file.

  <p class=req id=FS0260>
    Before modifying the <i>page cache</i> image of a database page that
    existed and was not a <i>free-list leaf</i> page when the current <i>write
    transaction</i> began, SQLite shall ensure that the original page content
    has been written to the journal file (<i>journalled</i>).

  <p class=todo>
    If the sector size is larger than the page-size, coresident pages must
    also be journalled.

  <p>
    The process of <i>journalling a database page</i> is described in detail in
    section <cite>journalling_a_page</cite>.

  <p>
    Eventually, the content of pages modified by a transaction must be 
    copied from the <i>page cache</i> and into the actual database file.
    This may occur for either of the following two reasons:

    <ol>
      <li> Because the <i>write transaction</i> is being committed (section 
           <cite>committing_a_transaction</cite>), or
      <li> To free up memory if the number of modified pages grows too 
           large (see section <cite>page_cache_algorithms</cite>).
    </ol>

  <p>
    In both cases, the region of the journal file containing the original
    data for the pages being modified within the database file must be 
    flushed through to the persistent media before the database file may
    be written to. This is to ensure that the original data is recoverable
    in the event of a system failure. This process is known as <i>syncing the
    journal file</i> and is described in section
    <cite>syncing_journal_file</cite>.

  <p>
    A <i>write transaction</i> may be terminated in one of two ways. It
    may be committed, meaning that the changes involved in the transaction
    are written to the database file, or rolled back, meaning no changes
    are applied. Committing a transaction is described in section 
    <cite>committing_a_transaction</cite>. Transaction rollback is described 
    in section <cite>rollback</cite>.

  <p>
    Figure <cite>figure_write_transaction</cite> depicts an overview of an
    entire <i>write transaction</i>. This is intended to be illustrative only,
    many operations are omitted.

    <center><img src="images/fileformat/write_transaction.gif">
    <p><i>Figure <span class=fig id=figure_write_transaction></span> - Progression of a Write Transaction</i>
      </center>
 

  <h2>Journal File Format</h2>
    <h3 id=journal_header_format>Journal Header Format</h3>

    <p>
      A <i>journal header</i> is <i>sector-size</i> bytes in size, where <i>
      sector-size</i> is the value returned by the xSectorSize method of
      the file handle opened on the database file.
     

    <center><img src="images/fileformat/journal_header.gif">
    <p><i>Figure <span class=fig id=figure_journal_record></span> - Journal Header Format</i>
      </center>

  <h3 id=journal_record_format>Journal Record Format</h3>

    <center><img src="images/fileformat/journal_record.gif">
    <p><i>Figure <span class=fig id=figure_journal_record></span> - Journal Record Format</i>
      </center>

  <h3>Master Journal Pointer</h3>

  <h2>Write Transactions</h2>
  <h3>Beginning a Write Transaction</h3>
    <p>
      Before any database pages may be modified within the <i>page cache</i>,
      the <i>database connection</i> must open a <i>write transaction</i>. 
      Opening a <i>write transaction</i> requires that the <i>database
      connection</i> obtains a <i>reserved lock</i> (or greater) on the 
      <i>database file</i>. Because a obtaining a <i>reserved lock</i> on
      a <i>database file</i> guarantees that no other <i>database
      connection</i> may hold or obtain a <i>reserved lock</i> or greater,
      it follows that no other <i>database connection</i> may have an
      open <i>write transaction</i>.

    <p>
      A <i>reserved lock</i> on the <i>database file</i> may be thought of
      as an exclusive lock on the <i>journal file</i>. No 
      <i>database connection</i> may read from or write to a <i>journal
      file</i> without a <i>reserved</i> or greater lock on the corresponding
      <i>database file</i>.

    <p>
      Before opening a <i>write transaction</i>, a <i>database connection</i>
      must have an open <i>read transaction</i>, opened via the procedure
      described in section <cite>open_read_only_trans</cite>. This ensures
      that there is no <i>hot-journal file</i> that needs to be rolled back
      and that the content of the <i>page cache</i>, if any, can be trusted.

    <p>
      Once a <i>read transaction</i> has been opened, upgrading to a 
      <i>write transaction</i> is a two step process, as follows:

    <ol>
      <li>A <i>reserved lock</i> is obtained on the <i>database file</i>.
      <li>The <i>journal file</i> is opened and created if necessary (using 
          the VFS xOpen method), and a <i>journal file header</i> written 
          to the start of it using a single call to the file handles xWrite 
          method.
    </ol>

    <p>
      Requirements describing step 1 of the above procedure in detail:
    <p class=req id=FS0350>
      When required to open a <i>write transaction</i> on the database, 
      SQLite shall first open a <i>read transaction</i>, if the <i>database
      connection</i> in question has not already opened one.

    <p class=req id=FS0360>
      When required to open a <i>write transaction</i> on the database, after
      ensuring a <i>read transaction</i> has already been opened, SQLite 
      shall obtain a <i>reserved lock</i> on the database file by calling
      the xLock method of the file-handle open on the database file.

    <p>
      Requirements describing step 2 of the above procedure in detail:

    <p class=req id=FS0370>
      When required to open a <i>write transaction</i> on the database, after
      obtaining a <i>reserved lock</i> on the database file, SQLite shall
      open a read/write file-handle on the corresponding <i>journal file</i>.

    <p class=req id=FS0380>
      When required to open a <i>write transaction</i> on the database, after
      opening a file-handle on the <i>journal file</i>, SQLite shall write
      a <i>journal header</i> into the first <i>sector-size</i> bytes of the
      journal file, using single call to the xWrite method of the recently
      opened file-handle.

    <p>
      Requirements describing the <i>journal header</i> written to 
      the <i>journal file</i>:

    <p class=req id=FS0390>
      The first 8 bytes of the <i>journal header</i> required to be written
      by FS0380 shall be:
    

    <p class=todo>
      Reqirements describing the details of opening a <i>write transaction</i>.

    <p class=todo>
      Reqirement for error handling?

  <h3 id=journalling_a_page>Journalling a Database Page</h3>

    <p>
      Before modifying a database page within the <i>page cache</i>, the
      page must be <i>journalled</i>. <i>Journalling a page</i> is the
      process of copying that pages original data into the journal file
      so that it can be recovered if the <i>write transaction</i> is rolled
      back.

    <p>
      A page is journalled by adding a <i>journal record</i> to the <i>
      journal file</i>. The format of a <i>journal record</i> is described
      in section <cite>journal_record_format</cite>.

    <p class=req id=FS0270>
      When required to <i>journal a database page</i>, SQLite shall first
      append the <i>page number</i> of the page being journalled to the
      <i>journal file</i>, formatted as a 4-byte big-endian unsigned integer,
      using a single call to the xWrite method of the file-handle opened
      on the journal file.

    <p class=req id=FS0280>
      When required to <i>journal a database page</i>, if the attempt to 
      append the <i>page number</i> to the journal file is successful, 
      then the current page data (<i>page-size</i> bytes) shall be appended 
      to the journal file, using a single call to the xWrite method of the 
      file-handle opened on the journal file.

    <p class=req id=FS0290>
      When required to <i>journal a database page</i>, if the attempt to 
      append the current page data to the journal file is successful, 
      then SQLite shall append a 4-byte big-endian integer checksum value 
      to the to the journal file, using a single call to the xWrite method 
      of the file-handle opened on the journal file.

    <p>
      The checksum value written to the <i>journal file</i> immediately after
      the page data (requirement FS0290), is a function of both the page
      data and the <i>checksum initializer</i> field stored in the 
      <i>journal header</i> (see section <cite>journal_header_format</cite>).
      Specifically, it is the sum of the <i>checksum initializer</i> and
      the value of every 200th byte of page data interpreted as an 8-bit
      unsigned integer, starting with the (<i>page-size</i> % 200)'th 
      byte of page data. For example, if the <i>page-size</i> is 1024 bytes,
      then a checksum is calculated by adding the values of the bytes at
      offsets 23, 223, 423, 623, 823 and 1023 (the last byte of the page)
      together with the value of the <i>checksum initializer</i>.

    <p class=req id=FS0300>
      The checksum value written to the <i>journal file</i> by the write
      required by FS0290 shall be equal to the sum of the <i>checksum
      initializer</i> field stored in the <i>journal header</i> (FS0XXX) and
      every 200th byte of the page data, beginning with the 
      (<i>page-size</i> % 200)th byte.

    <p>
      The '%' character is used in the two paragraphs to represent the modulo
      operator, just as it is in programming languages such as C, Java and
      Javascript.

  <h3 id=syncing_journal_file>Syncing the Journal File</h3>

    <p>
      Even after the original data of a database page has been written into
      the journal file using calls to the journal file file-handle xWrite 
      method (section <cite>journalling_a_page</cite>), it is still not
      safe to write to the page within the database file. This is because
      in the event of a system failure the data written to the journal file
      may still be corrupted (see section <cite>fs_characteristics</cite>).
      Before the page can be updated within the database itself, the 
      following procedure takes place:

    <ol>
      <li> The xSync method of the file-handle opened on the journal file 
           is called. This operation ensures that all <i>journal records</i>
           in the journal file have been written to persistent storage, and
           that they will not become corrupted as a result of a subsequent
           system failure.
      <li> The <i>journal record count</i> field (see section 
           <cite>journal_header_format</cite>) of the most recently written
           journal header in the journal file is updated to contain the
           number of <i>journal records</i> added to the journal file since
           the header was written.
      <li> The xSync method is called again, to ensure that the update to
           the <i>journal record count</i> has been committed to persistent
           storage.
    </ol> 

    <p>
      If all three of the steps enumerated above are executed succesfully,
      then it is safe to modify the content of the <i>journalled</i> 
      database pages within the database file itself. The combination of
      the three steps above is refered to as <i>syncing the journal file</i>.
 

  <h3 id=upgrading_to_exclusive_lock>Upgrading to an Exclusive Lock</h3>
    <p>
      Before the content of a page modified within the <i>page cache</i> may
      be written to the database file, an <i>exclusive lock</i> must be held
      on the database file. The purpose of this lock is to prevent another
      connection from reading from the database file while the first 
      connection is midway through writing to it. Whether the reason for
      writing to the database file is because a transaction is being committed,
      or to free up space within the <i>page cache</i>, upgrading to an 
      <i>exclusive lock</i> always occurs immediately after 
      <i>syncing the journal file</i>.

  <h3 id=committing_a_transaction>Committing a Transaction</h3>
    <p>
      Committing a <i>write transaction</i> is the final step in updating the
      database file.

    <ol>
      <li> Update the change counter.
      <li> Sync the journal file.
      <li> Obtain exclusive lock
      <li> Write the database file.
      <li> Sync the database file.
      <li> Delete the journal file.
    </ol>

  <h4 id=writing_out_cache>Writing out the Page Cache</h4>

    <p>
      When a modification is made to the database, the change is first applied
      in-memory, to pages stored in the <i>page cache</i>. The process of 
      copying the modified pages from the <i>page cache</i> to the actual
      database file in the file system is known as 
      <i>writing out the page cache</i>. There are two circumstances in which
      this may occur:

    <ol>
      <li> Because the <i>write transaction</i> is being committed (section 
           <cite>committing_a_transaction</cite>), or
      <li> To free up memory if the number of modified pages grows too 
           large (see section <cite>page_cache_algorithms</cite>).
    </ol>

    <p>
      Before any data can be written into the database file, it must be
      locked with an <i>exclusive</i> lock if it is not already. This is to
      prevent any other <i>database connection</i> from reading the database
      after a subset of the modifications that have been or will be made by
      a <i>write transaction</i> have been written into the database file.

    <p class=todo>
      Journal header operations?

    <p class=req id=FS0310>
      Unless a <i>pending</i> or <i>exclusive</i> lock has already been
      obtained, when SQLite is required to <i>write out a page cache</i>, it
      shall first upgrade the lock on the database file to a 
      <i>pending lock</i> using a call to the xLock method of the file-handle
      open on the database file.

    <p class=req id=FS0320>
      Unless one has already been obtained, when SQLite is required to 
      <i>write out a page cache</i>, after successfully obtaining a 
      <i>pending lock</i> it shall upgrade the lock on the database file 
      to an <i>exclusive lock</i> using a call to the xLock method of the
      file-handle open on the database file.

    <p class=todo>
      If obtaining the lock fails?

    <p class=req id=FS0330>
      When SQLite is required to <i>write out a page cache</i>, if the 
      required <i>exclusive lock</i> is already held or successfully 
      obtained, SQLite shall copy the contents of all pages that have been
      modified within the <i>page cache</i> to the database file, using a
      single write of <i>page-size</i> bytes for each.

    <p class=req id=FS0340>
      When the modified contents of a <i>page cache</i> is copied into the
      database file, as required by FS0330, the write operations shall 
      occur in <i>page number</i> order, from lowest to highest.

    <p>
      The above requirement to write data to the database file in the order
      in which it occurs in the file is added to improve performance. On
      many systems, sorting the regions of the file to be written before 
      writing to them allows the storage hardware to operate more efficiently.


  <h2>Statement Transactions</h2>

  <h2>Multi-File Transactions</h2>

<h1 id=rollback>Rollback</h1>
  <h2 id=hot_journal_rollback>Hot Journal Rollback</h2>
  <h2>Transaction Rollback</h2>
  <h2>Statement Rollback</h2>

<h1 id=page_cache_algorithms>Page Cache Algorithms</h1>

<h1>References</h1>
  <table id="refs" style="width:auto; margin: 1em 5ex">
    <tr><td style="width:5ex" id="capi_sqlitert_requirements">[1]<td>
      C API Requirements Document.
    <tr><td style="width:5ex" id="sql_sqlitert_requirements">[2]<td>
      SQL Requirements Document.
    <tr><td style="width:5ex" id="ff_sqlitert_requirements">[3]<td>
      File Format Requirements Document.
  </table>