Documentation Source Text

Check-in [8abf78baf9]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Update the documentation to include information about aggregated mode for the DBSTAT table.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 8abf78baf969e3fc213acd59f070929e3f79bf4e724e0ad3af2b99c597dfdf04
User & Date: drh 2019-11-19 19:15:14
Context
2019-11-23
00:22
Documentation updates for new features. check-in: 604ac8689d user: drh tags: trunk
2019-11-19
19:15
Update the documentation to include information about aggregated mode for the DBSTAT table. check-in: 8abf78baf9 user: drh tags: trunk
2019-11-18
17:43
Mention the new SQLITE_OPEN_NOFOLLOW option in the change log. check-in: d3ba10d894 user: drh tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to pages/changes.in.

21
22
23
24
25
26
27


28
29
30
31
32
33
34
  incr nChng
}

chng {2019-12-31 (3.31.0)} {
<li>Add support for [generated columns].
<li>Add the [sqlite3_hard_heap_limit64()] interface and the corresponding
    [PRAGMA hard_heap_limit] command.


<li>Add the [SQLITE_OPEN_NOFOLLOW] option to [sqlite3_open_v2()] that
    prevents SQLite from opening symbolic links.
<li>Faster response to [sqlite3_interrupt()].
<li>Added the [https://sqlite.org/src/file/ext/misc/uuid.c|uuid.c] extension module
    implementing functions for processing RFC-4122 UUIDs.
<li>The [legacy_file_format pragma] is deactivated.  It is now a no-op.  In its place,
    the [SQLITE_DBCONFIG_LEGACY_FILE_FORMAT] option to [sqlite3_db_config()] is







>
>







21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
  incr nChng
}

chng {2019-12-31 (3.31.0)} {
<li>Add support for [generated columns].
<li>Add the [sqlite3_hard_heap_limit64()] interface and the corresponding
    [PRAGMA hard_heap_limit] command.
<li>Add the [DBSTAT aggregated mode|aggregated mode] feature to the
    [DBSTAT virtual table].
<li>Add the [SQLITE_OPEN_NOFOLLOW] option to [sqlite3_open_v2()] that
    prevents SQLite from opening symbolic links.
<li>Faster response to [sqlite3_interrupt()].
<li>Added the [https://sqlite.org/src/file/ext/misc/uuid.c|uuid.c] extension module
    implementing functions for processing RFC-4122 UUIDs.
<li>The [legacy_file_format pragma] is deactivated.  It is now a no-op.  In its place,
    the [SQLITE_DBCONFIG_LEGACY_FILE_FORMAT] option to [sqlite3_db_config()] is

Changes to pages/dbstat.in.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
..
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

76
77
78
79
80
81
82
83
84










85
86
87
88
89
90

91
92
93
94
95
96
97
...
113
114
115
116
117
118
119
120































































121
122
123
124
125


126
127
128

129
130
131
132
133
134
135
136


137
138
139


140
141
142
143
144
145
146
<title>The DBSTAT Virtual Table</title>
<tcl>hd_keywords dbstat {dbstat virtual table}</tcl>
<fancy_format>


<h1>Overview</h1>

<p>
The DBSTAT virtual table is a read-only [eponymous virtual table] that returns
information about which pages of the database files are used by which
tables and indexes in the schema.
The DBSTAT virtual table is used to implement [sqlite3_analyzer.exe]
utility program, and to help compute the 
[https://www.sqlite.org/src/repo-tabsize|table size pie-chart] in
the [https://www.fossil-scm.org/|Fossil-implemented] version control system
for SQLite.
</p>

<p>
^The <b>dbstat</b> virtual table is available on all 
[database connections] when SQLite is built using the
[SQLITE_ENABLE_DBSTAT_VTAB] compile-time option.
^The dbstat virtual table provides low-level information 
about btree and overflow pages in a database file.

<p>
^The dbstat virtual table is an [eponymous virtual table], meaning
that is not necessary to run [CREATE VIRTUAL TABLE] to create an
instance of the dbstat virtual table before using it.  The "dbstat"
module name can be used as if it were a table name to query the
dbstat virtual table directly.  For example:

<codeblock>
SELECT * FROM dbstat;
................................................................................
so the use of "main" in the example above is redundant.  For any
particular query, the schema can be changed by specifying the
alternative schema as a function argument to the virtual table
name in the FROM clause of the query.  (See further discussion of
[table-valued functions in the FROM clause] for more details.)

<p>
The schema for the dbstat virtual table is this:
<codeblock>
CREATE TABLE dbstat(
  name       TEXT,        -- Name of table or index
  path       TEXT,        -- Path to page from root
  pageno     INTEGER,     -- Page number
  pagetype   TEXT,        -- 'internal', 'leaf' or 'overflow'
  ncell      INTEGER,     -- Cells on page (0 for overflow)
  payload    INTEGER,     -- Bytes of payload on this page
  unused     INTEGER,     -- Bytes of unused space on this page
  mx_payload INTEGER,     -- Largest payload size of all cells on this page
  pgoffset   INTEGER,     -- Offset of page in file
  pgsize     INTEGER,     -- Size of the page
  schema     TEXT HIDDEN  -- Database schema being analyzed

);
</codeblock>

<p>
There is a single row of the dbstat table for each page in the
database file.  Freelist pages, the lock page, and
pointer-map pages of the database file do not appear in the
dbstat virtual table.











<h1>The "path" column of the dbstat virtual table</h1>

<p>
The "path" column describes the path taken from the 
root node of the btree structure to each page.  The
"path" of the root node itself is '/'.


The "path" for the left-most child page of the root of
a btree page is '/000/'. (Btrees store content ordered from left to right
so the pages to the left have smaller keys than the pages to the right.)
The next to left-most child of the root page is '/001', and so on,
each sibling page identified by a 3-digit hex  value.
The children of the 451st left-most sibling have paths such
................................................................................
If the paths are sorted using the BINARY collation sequence, then
the overflow pages associated with a cell will appear earlier in the
sort-order than its child page:

<codeblock>
'/1c2/000/'               // Left-most child of 451st child of root
</codeblock>
































































<h1>Example uses of the dbstat virtual table</h1>

<p>
To find the total number of pages used to store table "xyz" in schema "aux1",
use:



<codeblock>
SELECT count(*) FROM dbstat('aux1') WHERE name='xyz';

</codeblock>

<p>
To see how efficiently the content of a table is stored on disk,
compute the amount of space used to hold actual content divided
by the total amount of disk space used.  The closer this number
is to 100%, the more efficient the packing.  (In this example, the
'xyz' table is assumed to be in the 'main' schema.)



<codeblock>
SELECT sum(pgsize-unused)/sum(pgsize) FROM dbstat WHERE name='xyz';


</codeblock>

<p>
To find the average fan-out for a table, run:

<codeblock>
SELECT avg(ncell) FROM dbstat WHERE name='xyz' AND pagetype='internal';

|







|
|




|



|


<
<


|







 







|












|
>




|
|
|
<

>
>
>
>
>
>
>
>
>
>






>







 








>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>




<
>
>



>







|
>
>


|
>
>







1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22


23
24
25
26
27
28
29
30
31
32
..
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
...
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196

197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
<title>The DBSTAT Virtual Table</title>
<tcl>hd_keywords dbstat {dbstat virtual table} {DBSTAT virtual table}</tcl>
<fancy_format>


<h1>Overview</h1>

<p>
The DBSTAT virtual table is a read-only [eponymous virtual table] that returns
information about the amount of disk space used to store page or btrees
of an SQLite database.
The DBSTAT virtual table is used to implement [sqlite3_analyzer.exe]
utility program, and to help compute the 
[https://www.sqlite.org/src/repo-tabsize|table size pie-chart] in
the [https://www.fossil-scm.org/|Fossil-implemented] version control system
for SQLite, for example.
</p>

<p>
^The DBSTAT virtual table is available on all 
[database connections] when SQLite is built using the
[SQLITE_ENABLE_DBSTAT_VTAB] compile-time option.



<p>
^The DBSTAT virtual table is an [eponymous virtual table], meaning
that is not necessary to run [CREATE VIRTUAL TABLE] to create an
instance of the dbstat virtual table before using it.  The "dbstat"
module name can be used as if it were a table name to query the
dbstat virtual table directly.  For example:

<codeblock>
SELECT * FROM dbstat;
................................................................................
so the use of "main" in the example above is redundant.  For any
particular query, the schema can be changed by specifying the
alternative schema as a function argument to the virtual table
name in the FROM clause of the query.  (See further discussion of
[table-valued functions in the FROM clause] for more details.)

<p>
The schema for the DBSTAT virtual table looks like this:
<codeblock>
CREATE TABLE dbstat(
  name       TEXT,        -- Name of table or index
  path       TEXT,        -- Path to page from root
  pageno     INTEGER,     -- Page number
  pagetype   TEXT,        -- 'internal', 'leaf' or 'overflow'
  ncell      INTEGER,     -- Cells on page (0 for overflow)
  payload    INTEGER,     -- Bytes of payload on this page
  unused     INTEGER,     -- Bytes of unused space on this page
  mx_payload INTEGER,     -- Largest payload size of all cells on this page
  pgoffset   INTEGER,     -- Offset of page in file
  pgsize     INTEGER,     -- Size of the page
  schema     TEXT HIDDEN, -- Database schema being analyzed
  aggregate  BOOL HIDDEN  -- True to enable aggregate mode
);
</codeblock>

<p>
The DBSTAT table analyzes btrees within the database file.
Freelist pages, pointer-map pages, and the lock page are all
ignored by DBSTAT.


<p>
By default, there is a single row in the DBSTAT table for each
btree page the database file.  The row provides
information about the space utilization of that one page of the
database.  However, if the hidden column "aggregate" is TRUE, then
results are aggregated and there is a single row in the DBSTAT table
for each btree in the database, providing information about space
utilization across the entire btree.

<a name="dbstatpath"></a>
<h1>The "path" column of the dbstat virtual table</h1>

<p>
The "path" column describes the path taken from the 
root node of the btree structure to each page.  The
"path" of the root node itself is '/'.
The "path" is NULL when "aggregate" is TRUE.

The "path" for the left-most child page of the root of
a btree page is '/000/'. (Btrees store content ordered from left to right
so the pages to the left have smaller keys than the pages to the right.)
The next to left-most child of the root page is '/001', and so on,
each sibling page identified by a 3-digit hex  value.
The children of the 451st left-most sibling have paths such
................................................................................
If the paths are sorted using the BINARY collation sequence, then
the overflow pages associated with a cell will appear earlier in the
sort-order than its child page:

<codeblock>
'/1c2/000/'               // Left-most child of 451st child of root
</codeblock>

<tcl>hd_fragment {dbstatagg} {DBSTAT aggregated mode}</tcl>
<h1>Aggregated Data</h1>

<p>
Beginning with SQLite version 3.31.0 ([dateof:3.31.0]), the DBSTAT table
has a new [hidden column] named "aggregate", which if constrainted to be
TRUE will cause DBSTAT to generate one row per btree in the database,
rather than one row per page.  When running in aggregated mode, the 
"path", "pagetype", and "pgoffset" columns are always NULL and the
"pageno" column holds the number of pages in the entire btree, rather
than the number of the page that corresponds to the row.

<p>
The following table shows the meanings of the (non-hidden) columns of
DBSTAT in both normal and aggregated mode:

<center><blockquote>
<table border="1" cellpadding="5" cellspacing="0">
<tr>
<th>Column
<th>Normal meaning
<th>Aggregate-mode meaning
</tr>
<tr>
<th>name
<td colspan="2">
The name of the table or index that is implemented by
the btree of the current row
<tr>
<th>path
<td>See <a href="#dbstatpath">description above</a>
<td>Always NULL
<tr>
<th>pageno
<td>The page number of the database page for the current row
<td>The total number of pages in the btree for the current row
<tr>
<th>pagetype
<td>'leaf' or 'interior'
<td>Always NULL
<tr>
<th>ncell
<td colspan="2">Number of cells on the current page or btree
<tr>
<th>payload
<td colspan="2">Bytes of useful payload on the current page or btree
<tr>
<th>unused
<td colspan="2">Unused bytes of on the current page or btree
<tr>
<th>mx_payload
<td colspan="2">The largest payload found anywhere in the current page
or btree.
<tr>
<th>pgoffset
<td>Byte offset to the start of the page
<td>Always NULL
<tr>
<th>pgsize
<td colspan="2">Total storage space used by the current page or btree.
</table>
</blockquote></center>

<h1>Example uses of the dbstat virtual table</h1>

<p>
To find the total number of pages used to store table "xyz" in schema "aux1",

use either of the following two queries (the first is the traditional way,
and the second shows the use of the aggregated feature):

<codeblock>
SELECT count(*) FROM dbstat('aux1') WHERE name='xyz';
SELECT pageno FROM dbstat('aux1',1) WHERE name='xyz';
</codeblock>

<p>
To see how efficiently the content of a table is stored on disk,
compute the amount of space used to hold actual content divided
by the total amount of disk space used.  The closer this number
is to 100%, the more efficient the packing.  (In this example, the
'xyz' table is assumed to be in the 'main' schema.  Again, there
are two different versions that show the use of DBSTAT both without
and with the new aggregated feature, respectively.)

<codeblock>
SELECT sum(pgsize-unused)*100.0/sum(pgsize) FROM dbstat WHERE name='xyz';
SELECT (pgsize-unused)*100.0/pgsize FROM dbstat
 WHERE name='xyz' AND aggregate=TRUE;
</codeblock>

<p>
To find the average fan-out for a table, run:

<codeblock>
SELECT avg(ncell) FROM dbstat WHERE name='xyz' AND pagetype='internal';