|Title:||Concatenation can create very invalid strings|
|Last Modified:||2010-04-22 12:08:03|
|Version Found In:||22.214.171.124|
With a UTF16 encoded database:
sqlite> insert into foo values('CD' || x'00' || 'E'); sqlite> select cast(x as blob) from foo; INSERT INTO "table" VALUES(X'43004400004500'); sqlite> select typeof(x) from foo; text sqlite> select length(x) from foo; 4
Note that the string is 7 bytes long which is very invalid for UTF16 encoding. It is unlikely all code paths have been tested to ensure they cope with this correctly, don't read the 8th byte, deal with potential issues such as introducing surrogate pairs this way that are not valid etc.
Suggestions include making || with blobs on either side result in a blob, more careful blob to string coercion that always results in a valid string, giving an error etc.
drh added on 2010-04-22 12:08:03:
One might argue that the suggested behavior makes more sense. But we could argue back: Garbage in, garbage out.
This ticket does, however, point out the desirability of very carefully testing the behavior when UTF16 strings have an odd number of bytes - to verify that there are no buffer overrun bugs elsewhere in the code. We'll be undertaking those tests soon.