/ Check-in [826aab43]
Login

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview
Comment:Updated sqlite_encode_binary() comments with tighter bounds on output length. (CVS 1023)
Downloads: Tarball | ZIP archive | SQL archive
Timelines: family | ancestors | descendants | both | trunk
Files: files | file ages | folders
SHA1: 826aab43d5967ece2a272c49ce62021fa4a2ceb3
User & Date: jplyon 2003-06-15 10:35:05
Context
2003-06-15
23:42
Enhance the "PRAGMA integrity_check" command to verify that all indices are correctly constructed. New calls to integrity_check are made in the test suite. These changes are intended to prevent any future problems such as seen in ticket #334. (CVS 1024) check-in: c9734c27 user: drh tags: trunk
10:35
Updated sqlite_encode_binary() comments with tighter bounds on output length. (CVS 1023) check-in: 826aab43 user: jplyon tags: trunk
10:29
Documented integer values used by PRAGMAs. Fixed missing end tags in generated anchors. (CVS 1022) check-in: 6c24dfba user: jplyon tags: trunk
Changes
Hide Diffs Unified Diffs Ignore Whitespace Patch

Changes to src/encode.c.

11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
..
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
..
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
...
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
...
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
*************************************************************************
** This file contains helper routines used to translate binary data into
** a null-terminated string (suitable for use in SQLite) and back again.
** These are convenience routines for use by people who want to store binary
** data in an SQLite database.  The code in this file is not used by any other
** part of the SQLite library.
**
** $Id: encode.c,v 1.6 2003/05/10 03:36:54 drh Exp $
*/
#include <string.h>

/*
** How This Encoder Works
**
** The output is allowed to contain any character except 0x27 (') and
................................................................................
** it could double the size of the encoded string.  For example, to
** encode a string of 100 0x27 character would require 100 instances of
** the 0x01 0x03 escape sequence resulting in a 200-character output.
** We would prefer to keep the size of the encoded string smaller than
** this.
**
** To minimize the encoding size, we first add a fixed offset value to each 
** byte in the sequence.  The addition is module 256.  (That is to say, if
** the sum of the original character value and the offset exceeds 256, then
** the higher order bits are truncated.)  The offset is chosen to minimize
** the number of characters in the string that need to be escaped.  For
** example, in the case above where the string was composed of 100 0x27
** characters, the offset might be 0x01.  Each of the 0x27 characters would
** then be converted into an 0x28 character which would not need to be
** escaped at all and so the 100 character input string would be converted
................................................................................
**     (6)   Convert each 0x01 0x01 sequence into a single character 0x00.
**           Convert 0x01 0x02 into 0x01.  Convert 0x01 0x03 into 0x27.
**
**     (7)   Subtract the offset value that was the first character of
**           the encoded buffer from all characters in the output buffer.
**
** The only tricky part is step (1) - how to compute an offset value to
** minimize the size of the output buffer.  This is accomplished to testing
** all offset values and picking the one that results in the fewest number
** of escapes.  To do that, we first scan the entire input and count the
** number of occurances of each character value in the input.  Suppose
** the number of 0x00 characters is N(0), the number of occurances of 0x01
** is N(1), and so forth up to the number of occurances of 0xff is N(256).
** An offset of 0 is not allowed so we don't have to test it.  The number
** of escapes required for an offset of 1 is N(1)+N(2)+N(40).  The number
................................................................................
** Encode a binary buffer "in" of size n bytes so that it contains
** no instances of characters '\'' or '\000'.  The output is 
** null-terminated and can be used as a string value in an INSERT
** or UPDATE statement.  Use sqlite_decode_binary() to convert the
** string back into its original binary.
**
** The result is written into a preallocated output buffer "out".
** "out" must be able to hold at least (256*n + 1262)/253 bytes.
** In other words, the output will be expanded by as much as 3
** bytes for every 253 bytes of input plus 2 bytes of fixed overhead.
** (This is approximately 2 + 1.019*n or about a 2% size increase.)
**
** The return value is the number of characters in the encoded
** string, excluding the "\000" terminator.
*/
int sqlite_encode_binary(const unsigned char *in, int n, unsigned char *out){
  int i, j, e, m;
  int cnt[256];
................................................................................
  }
  out[j] = 0;
  return j;
}

/*
** Decode the string "in" into binary data and write it into "out".
** This routine reverses the encoded created by sqlite_encode_binary().
** The output will always be a few bytes less than the input.  The number
** of bytes of output is returned.  If the input is not a well-formed
** encoding, -1 is returned.
**
** The "in" and "out" parameters may point to the same buffer in order
** to decode a string in place.
*/







|







 







|







 







|







 







|

|
|







 







|







11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
..
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
..
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
...
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
...
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
*************************************************************************
** This file contains helper routines used to translate binary data into
** a null-terminated string (suitable for use in SQLite) and back again.
** These are convenience routines for use by people who want to store binary
** data in an SQLite database.  The code in this file is not used by any other
** part of the SQLite library.
**
** $Id: encode.c,v 1.7 2003/06/15 10:35:05 jplyon Exp $
*/
#include <string.h>

/*
** How This Encoder Works
**
** The output is allowed to contain any character except 0x27 (') and
................................................................................
** it could double the size of the encoded string.  For example, to
** encode a string of 100 0x27 character would require 100 instances of
** the 0x01 0x03 escape sequence resulting in a 200-character output.
** We would prefer to keep the size of the encoded string smaller than
** this.
**
** To minimize the encoding size, we first add a fixed offset value to each 
** byte in the sequence.  The addition is modulo 256.  (That is to say, if
** the sum of the original character value and the offset exceeds 256, then
** the higher order bits are truncated.)  The offset is chosen to minimize
** the number of characters in the string that need to be escaped.  For
** example, in the case above where the string was composed of 100 0x27
** characters, the offset might be 0x01.  Each of the 0x27 characters would
** then be converted into an 0x28 character which would not need to be
** escaped at all and so the 100 character input string would be converted
................................................................................
**     (6)   Convert each 0x01 0x01 sequence into a single character 0x00.
**           Convert 0x01 0x02 into 0x01.  Convert 0x01 0x03 into 0x27.
**
**     (7)   Subtract the offset value that was the first character of
**           the encoded buffer from all characters in the output buffer.
**
** The only tricky part is step (1) - how to compute an offset value to
** minimize the size of the output buffer.  This is accomplished by testing
** all offset values and picking the one that results in the fewest number
** of escapes.  To do that, we first scan the entire input and count the
** number of occurances of each character value in the input.  Suppose
** the number of 0x00 characters is N(0), the number of occurances of 0x01
** is N(1), and so forth up to the number of occurances of 0xff is N(256).
** An offset of 0 is not allowed so we don't have to test it.  The number
** of escapes required for an offset of 1 is N(1)+N(2)+N(40).  The number
................................................................................
** Encode a binary buffer "in" of size n bytes so that it contains
** no instances of characters '\'' or '\000'.  The output is 
** null-terminated and can be used as a string value in an INSERT
** or UPDATE statement.  Use sqlite_decode_binary() to convert the
** string back into its original binary.
**
** The result is written into a preallocated output buffer "out".
** "out" must be able to hold at least 2 +(257*n)/254 bytes.
** In other words, the output will be expanded by as much as 3
** bytes for every 254 bytes of input plus 2 bytes of fixed overhead.
** (This is approximately 2 + 1.0118*n or about a 1.2% size increase.)
**
** The return value is the number of characters in the encoded
** string, excluding the "\000" terminator.
*/
int sqlite_encode_binary(const unsigned char *in, int n, unsigned char *out){
  int i, j, e, m;
  int cnt[256];
................................................................................
  }
  out[j] = 0;
  return j;
}

/*
** Decode the string "in" into binary data and write it into "out".
** This routine reverses the encoding created by sqlite_encode_binary().
** The output will always be a few bytes less than the input.  The number
** of bytes of output is returned.  If the input is not a well-formed
** encoding, -1 is returned.
**
** The "in" and "out" parameters may point to the same buffer in order
** to decode a string in place.
*/