Documentation Source Text

Check-in [664ed2dce4]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:First cut at a separate document for the printf() string formatters.
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA3-256: 664ed2dce459beca21a5ce245f41a1bfbccdfcf1d15e5e96f4ba29e83c251ba6
User & Date: drh 2018-02-20 02:53:36
Context
2018-02-20
13:46
Further refinement of the new printf.html document. check-in: 6c1f37df7f user: drh tags: trunk
02:53
First cut at a separate document for the printf() string formatters. check-in: 664ed2dce4 user: drh tags: trunk
2018-02-19
10:46
Add a list of shadow tables to the fts5 documentation. check-in: 1225bcc93d user: dan tags: trunk
Changes
Hide Diffs Side-by-Side Diffs Ignore Whitespace Patch

Changes to pages/crew.in.

     1      1   <title>SQLite Developers</title>
            2  +<tcl>hd_keywords {crew} {Hipp} {Kennedy}</tcl>
            3  +
            4  +<fancy_format>
     2      5   
     3      6   <h2>The SQLite Development Team</h2>
     4      7   
     5      8   <img src="images/drh1.jpg" align="left" hspace="25" vspace="0">
     6      9   <p>
     7     10   <b>D. Richard Hipp
     8     11   </b> began the SQLite project on 2000-05-29

Added pages/printf.in.

            1  +<title>SQLite's Built-in printf()</title>
            2  +<tcl>hd_keywords {string-formatter}</tcl>
            3  +<table_of_contents>
            4  +
            5  +<h1>Overview</h1>
            6  +
            7  +<p>SQLite contains its own implementation of the string formatting routine "printf()",
            8  +accessible via the following interfaces:
            9  +
           10  +<ul>
           11  +<li> [printf()] &rarr; an SQL function returning the formatted string
           12  +<li> [sqlite3_mprintf()] &rarr; Store the formatted string in memory obtained
           13  +     [sqlite3_malloc64()].
           14  +<li> [sqlite3_snprintf()] &rarr; Store the formatted string in a static buffer
           15  +<li> [sqlite3_vmprintf()] &rarr; Varargs version of sqlite3_mprintf()
           16  +<li> [sqlite3_vsnprintf()] &rarr; Varargs version of sqlite3_snprintf()
           17  +</ul>
           18  +
           19  +<p>The same core string formatter is also used internally by SQLite.
           20  +
           21  +<h2>Advantages</h2>
           22  +
           23  +<p>Why does SQLite have its own private built-in printf() implementation?
           24  +Why not use the printf() implementation from the standard C library?
           25  +
           26  +<p>Several reasons:
           27  +
           28  +<p>
           29  +<ol>
           30  +<li><p>
           31  +By using its own built-in implementation, SQLite guarantees that the
           32  +output will be the same on all platforms and in all LOCALEs.
           33  +This is important for consistency and for testing.  It would be problematic
           34  +if one machine gave and answer of "5.25e+08" and another gave an answer
           35  +of "5.250e+008".  But answers are correct, but we prefer that SQLite also
           36  +give the same answer.
           37  +
           38  +<li><p>
           39  +We know of no way to use the standard library printf() C interface to
           40  +implement the [printf() SQL function] feature of SQLite.  The built-in
           41  +printf() C implementation can be easily adapted, however.
           42  +
           43  +<li><p>
           44  +The printf() built into SQLite supports new non-standard substitution
           45  +types (%q, %Q, %w, and %z) that are useful both internally to SQLite
           46  +and to applications using SQLite.
           47  +Standard library printf()s cannot normally be extended in this way.
           48  +
           49  +<li><p>
           50  +The built-in SQLite implementation supports the ability to render an
           51  +arbitrary-length string into a memory buffer obtained from [sqlite3_malloc64()].
           52  +This is safer and less error prone than trying to precompute an upper size
           53  +limit on the result string, allocate an appropriately sized buffer, and
           54  +then calling snprintf().
           55  +
           56  +<li><p>
           57  +The SQLite-specific printf() supports a new flag (!) called the
           58  +"alternate-form-2" flag.  The alternate-form-2 flag changes the processing
           59  +of floating-point conversions in subtle ways so that the output is always
           60  +an SQL-compatible text representation of a floating-point number - something
           61  +that is not possible to achieve with standard-library printf().  For
           62  +string substitutions, the alternate-form-2 flag causes the width and
           63  +precision to be measured in characters instead of bytes, which simplifies
           64  +processing of strings containing multi-byte UTF8 characters.
           65  +
           66  +<li><p>
           67  +The built-in SQLite has compile-time options such as
           68  +SQLITE_PRINTF_PRECISION_LIMIT that provide defense against 
           69  +denial-of-service attacks for application that expose the
           70  +printf() functionality to untrusted users.
           71  +
           72  +<li><p>
           73  +Using a built-in printf() implementation means that SQLite has one
           74  +fewer dependency on the host environment, making it more portable.
           75  +</ol>
           76  +
           77  +<h2>Disadvantages</h2>
           78  +
           79  +<p>
           80  +In fairness, having a built-in implementation of printf() also comes with
           81  +some disadvantages.  To wit:
           82  +
           83  +<ol>
           84  +<li><p>
           85  +The built-in printf() implementation uses extra code space (about 7800 bytes).
           86  +
           87  +<li><p>
           88  +The floating-point to text conversion subfunction for the built-in printf()
           89  +is limited in precision to 16 significant digits (or 26 significant digits
           90  +if the "!" alternate-form-2 flag is used).
           91  +Every IEEE-754 double can be represented exactly as a decimal floating-point
           92  +value, but some doubles require more than 16 or 26 significant digits.
           93  +
           94  +<li><p>
           95  +The order of the buffer pointer and buffer size parameters in the built-in
           96  +snprintf() implementation is reversed from the order used in standard-library
           97  +implementations.
           98  +</ol>
           99  +
          100  +<p>
          101  +In spite of the disadvantages, the developers believe that having a built-in
          102  +printf() implementation inside of SQLite is a net positive.
          103  +
          104  +<h1>Formatting Details</h1>
          105  +
          106  +<p>The format string for printf() is a template for the generated
          107  +string.  Substitutions are made whenever a "%" character appears in
          108  +the format string.  The "%" is followed by one or more additional
          109  +characters that describe the substitution.  Each substitution has
          110  +the following format:
          111  +
          112  +<blockquote>
          113  +<b>%</b><i>&#91;flags&#93;&#91;width&#93;&#91</i><b>.</b><i>precision&#93;&#91;length&#93;type</i>
          114  +</blockquote>
          115  +
          116  +<p>All substitutions begin with a single "%" and end with a single type character.
          117  +The other elements of the substitution are optional.
          118  +
          119  +<p>To include a single "%" character in the output, put two consecutive
          120  +"%" characters in the template.
          121  +
          122  +<h2>Substitution Types</h2>
          123  +
          124  +<p>The following chart shows the types supported by SQLite:
          125  +
          126  +<center>
          127  +<table border=1 cellpadding="10" width="80%">
          128  +<tr>
          129  +<th>Substitution Type<th>Meaning
          130  +<tr>
          131  +<td>%
          132  +<td>Two "%" characters in a row are translated into a single "%" in the output,
          133  +    without substiting any values.
          134  +<tr>
          135  +<td>d, i
          136  +<td>The argument is a signed integer which is displayed in decimal.
          137  +<tr>
          138  +<td>u
          139  +<td>The argument is an unsigned integer which is displayed in decimal.
          140  +<tr>
          141  +<td>f
          142  +<td>The argument is a double which is displayed in decimal.
          143  +<tr>
          144  +<td>e, E
          145  +<td>The argument is a double which is displayed in exponential notation.
          146  +    The exponent character is 'e' or 'E' depending on the type.
          147  +<tr>
          148  +<td>g, G
          149  +<td>The argument is a double which is displayed in either normal decimal
          150  +    notation or if the exponent is not close to zero, in exponential
          151  +    notation.
          152  +<tr>
          153  +<td>x, X
          154  +<td>The argument is an integer which is displayed in hexadecimal.
          155  +    Lower-case hexadecimal is used for %x and and upper-case is used
          156  +    for %X
          157  +<tr>
          158  +<td>o
          159  +<td>The argument is an integer which is displayed in octal.
          160  +<tr>
          161  +<td>s, z
          162  +<td>The argument is a zero-terminated string that is displayed.  For
          163  +    the %z type in C-language interfaces, [sqlite3_free()] is invoked
          164  +    on the string after it has be copied into the output. The %s and %z
          165  +    substitutions are identical for the SQL printf() function.<br><br>
          166  +    The %s substitution is universal, but
          167  +    the %z substitution is an SQLite enhancement, not found in other
          168  +    printf() implementations.
          169  +<tr>
          170  +<td>c
          171  +<td>For the C-language interfaces, the argument is an integer which
          172  +    is interpreted as a character.  For the [printf() SQL function] the
          173  +    argument is a string from which the first character is extracted and
          174  +    displayed.
          175  +<tr>
          176  +<td>p
          177  +<td>The argument is a pointer which is displayed as a hexadecimal address.
          178  +    Since the SQL language has no concept of a pointer, the %p substitution
          179  +    for the [printf() SQL function] works like %x.
          180  +<tr>
          181  +<td>n
          182  +<td>The argument is a pointer to an integer.  Nothing is displayed for
          183  +    this substitution type.  Instead, the integer to which the argument
          184  +    points is overwritten with the number of characters in the generated
          185  +    string that result from all format symbols to the left of the %n.
          186  +<tr>
          187  +<td>q, Q
          188  +<td>The argument is a zero-terminated string.  The string is printed with
          189  +    all single quote (') characters doubled so that the string can safely
          190  +    appear inside an SQL string literal.  The %Q substitution type also
          191  +    puts single-quotes on both ends of the substituted string.
          192  +    <br><br>If the argument
          193  +    to %Q is a null pointer then the output is an unquoted "NULL".  In other
          194  +    words, a null pointer generates an SQL NULL, and a non-null pointer generates
          195  +    a valid SQL string literal.  If the argument to %q is a null pointer
          196  +    then no output is generated.  Thus a null-pointer to %q is the same as
          197  +    an empty string.
          198  +    <br><br>For these  substitutions, the precision is the number of bytes or
          199  +    characters taken from the argument, not the number of bytes or characters that
          200  +    are written into the output.
          201  +    <br><br>
          202  +    The %q and %Q substitutions are SQLite enhancements, not found in
          203  +    most other printf() implementations.
          204  +<tr>
          205  +<td>w
          206  +<td>This substitution works like %q except that it doubles all double-quote
          207  +    characters (") instead of single-quotes, making the result suitable for
          208  +    using with a double-quoted identifier name in an SQL statement.
          209  +    <br><br>
          210  +    The %w substitution is an SQLite enhancements, not found in
          211  +    most other printf() implementations.
          212  +</table>
          213  +</center>
          214  +
          215  +<h2>The Optional Length Field</h2>
          216  +
          217  +<p>The length of the argument value can be specified by one or more letters
          218  +that occur just prior to the substitution type letter.  In SQLite, the
          219  +length only matter for integer types.  The length is ignored for the
          220  +[printf() SQL function] which always uses 64-bit values.  The following
          221  +table shows the length specifiers allowed by SQLite:
          222  +
          223  +<center>
          224  +<table border=1 cellpadding="10" width="80%">
          225  +<tr>
          226  +<th>Length Specifier
          227  +<th>Meaning
          228  +<tr>
          229  +<td><i>(default)</i>
          230  +<td>An "int" or "unsigned int".  32-bits on all modern systems.
          231  +<tr>
          232  +<td>l
          233  +<td>A "long int" or "long unsigned int".  Also 32-bits on all modern systems.
          234  +<tr>
          235  +<td>ll
          236  +<td>A "long long int" or "long long unsigned" or an "sqlite3_int64" or
          237  +    "sqlite3_uint64" value.  These are 64-bit integers on all modern systems.
          238  +</table>
          239  +</center>
          240  +
          241  +<p>Only the "ll" length modifier ever makes a difference for SQLite.  And
          242  +it only makes a difference when using the C-language interfaces.
          243  +
          244  +<h2>The Optional Width Field</h2>
          245  +
          246  +<p>The width field specifies the minimum width of the substituted value in
          247  +the output.  If the string or number that is written into the output is shorter
          248  +than the width, then the value is padded.  Padding is on the left (the
          249  +value is right-justified) by default.  If the "-" flag is used, then the
          250  +padding is on the right and the value is left-justified.
          251  +
          252  +<p>The width is measured in bytes by default.  However, if the "!" flag is
          253  +present then the width is in characters.  This only makes a difference for
          254  +multi-byte utf-8 characters, and those only occur on string substitutions.
          255  +
          256  +<p>If the width is a single "*" character instead of a number, then the
          257  +actual width value is read as an integer from the argument list.  If the
          258  +value read is negative, then the absolute value is used for the width and
          259  +the value is left-justified as if the "-" flag were present.
          260  +
          261  +<p>If the value being substituted is larger than the width, then full value
          262  +is added to the output.  In other words, the width is the minimum width of
          263  +the value as it is rendered in the output.
          264  +
          265  +<h2>The Optional Precision Field</h2>
          266  +
          267  +<p>The precision field, if it is present, must follow the width separated
          268  +by a single "." character.  If there is no width, then the "." that introduces
          269  +the precision immediately follows either the flags (if there are any) or
          270  +the initial "%".
          271  +
          272  +<p>For string substitutions (%s, %z, %q, %Q, or %w) the precision is the number
          273  +of byte or character used from the argument.  The number is bytes by default but
          274  +is characters if the "!" flag is present.  If there is no precision, then the
          275  +entire string is substituted.  Examples:  "%.3s" substitutes the first 3 bytes
          276  +of the argument string.  "%!.3s" substitutes the first three characters of the
          277  +argument string.
          278  +
          279  +<p>For integer substitutions (%d, %i, %x, %X, %o, and %p) the precision specifies
          280  +minimum number of digits to display.  Leading zeros are added if necessary, to
          281  +expand the output to the minimum number of digits.
          282  +
          283  +<p>For floating-point substitutions (%e, %E, %f, %g, %G) the precision specifies 
          284  +the number of digits to display to the right of the decimal point.
          285  +
          286  +<p>For the character substitution (%c) a precision N greater than 1 causes the
          287  +character to be repeated N times.  This is a non-standard extension found only
          288  +in SQLite.
          289  +
          290  +<h2>The Options Flags Field</h2>
          291  +
          292  +<p>Flags consist of zero or more characters that immediately follow the
          293  +"%" that introduces the substitution.  The various flags and their meanings
          294  +are as follows:
          295  +
          296  +<center>
          297  +<table border=1 cellpadding="10" width="80%">
          298  +<tr>
          299  +<th>Flag
          300  +<th>Meaning
          301  +<tr>
          302  +<td><b>-</b>
          303  +<td>Left-justify the value in the output.  The default is to right-justify.
          304  +If the width is zero or is otherwise less than the length of the value being
          305  +substituted, then there is no padding and the "-" flag is a no-op.
          306  +<tr>
          307  +<td><b>+</b>
          308  +<td>For signed numeric substitutions, include a "+" sign before positive numbers.
          309  +A "-" sign always appears before negative numbers regardless of flag settings.
          310  +<tr>
          311  +<td><i>(space)</i>
          312  +<td>For signed numeric substitutions, prepend a single space before positive
          313  +numbers.
          314  +<tr>
          315  +<td><b>0</b>
          316  +<td>Prepend as many "0" characters to numeric substitutions as necessary to
          317  +expand the value out to the specified width.  If the width field is omitted,
          318  +then this flag is a no-op.
          319  +<tr>
          320  +<td><b>#</b>
          321  +<td>This is the "alternate-form-1" flag.
          322  +For %g and %G substitutions, this causes trailing zeros to be removed.
          323  +This flag forces a decimal point to appear for all floating-point substitutions.
          324  +For %o, %x, and %X substitutions, the alternate-form-1 flag cause the value
          325  +to be prepended with "0", "0x", or "0X", respectively.
          326  +<tr>
          327  +<td><b>,</b>
          328  +<td>This flag causes comma-separators to be added to the output of %d and %i
          329  +substitutions, between every 3 digits from the left.  This can help humans
          330  +to more easily discern the magnitude of large integer values.  For example,
          331  +the value 2147483647 would be rendered as "2147483647" using "%d" but would
          332  +appear as "2,147,483,647" with "%,d".  This flag is a non-standard extension.
          333  +<tr>
          334  +<td><b>!</b>
          335  +<td>This is the "alternate-form-2 flag.
          336  +For string substitutions, this flag causes the width and precision to be understand
          337  +in terms of characters rather than bytes.
          338  +For floating point substitutions, the alternate-form-2 flag increases the 
          339  +maximum number of significant digits displayed from 16 to 26,
          340  +forces the display of the decimal point and causes at least one digit
          341  +to appear after the decimal point.<br><br>
          342  +The alternate-form-2 flag is a non-standard extension that appears in no
          343  +other printf() implementations, as far as we know.
          344  +</table>
          345  +</center>
          346  +
          347  +<h1>Implementation And History</h1>
          348  +
          349  +<p>
          350  +The core string formatting routine is the sqlite3VXPrintf() function found in the
          351  +[https://sqlite.org/src/file/src/printf.c|printf.c] source file.  All the
          352  +various interfaces invoke (sometimes indirectly) this one core function.
          353  +The sqlite3VXPrintf() function began as code written by the first author
          354  +of SQLite ([Hipp]) when he was a graduate student a Duke University in the
          355  +late 1980s.  Hipp kept this printf() implementation in his personal toolbox until
          356  +he started working on SQLite in 2000.  The code was incorporated into the
          357  +SQLite source tree on [https://sqlite.org/src/timeline?c=f9372072a6|2000-10-08]
          358  +for SQLite version 1.0.9.
          359  +
          360  +<p>
          361  +The [https://www.fossil-scm.org/|Fossil Version Control System] uses its own
          362  +printf() implementation that is derived from and early version of the SQLite
          363  +printf() implementation, but those two implementations have since diverged.
          364  +
          365  +<p>
          366  +The reason that the [sqlite3_snprintf()] has its buffer pointer and buffer size
          367  +arguments reversed from what is found in the standard library snprintf() routine
          368  +is because there was no snprintf() routine in the standard C library
          369  +when Hipp was first implementing his version, and he chose a different order
          370  +than the designers of the standard C library.