|Title:||MAX yields unexpected result for UTF-16|
|Last Modified:||2020-03-05 16:26:30|
|Version Found In:||3.32.0|
mrigger added on 2020-03-04 22:56:29:
Consider the following statements:
PRAGMA encoding = 'UTF-16'; CREATE TABLE t0(c0 TEXT); INSERT INTO t0(c0) VALUES ('윆'), (1); SELECT MAX(CASE 1 WHEN 1 THEN t0.c0 END) FROM t0; -- 윆 SELECT MAX(t0.c0) FROM t0; -- 1
I would expect both queries to produce the same result, however, the first returns '윆', while the second returns 1. When using UTF-8, both queries return '윆'.
drh added on 2020-03-05 16:26:30:
The default sort order for text is whatever order the text is rendered in using memcmp() as a comparison function. That order is different depending on the text encoding. UTF-8 is mostly the same as UTF-16BE, except for a few corner cases involving surrogate pairs. But UTF-16LE is quite different. (This surprising fact about text sort order needs to be discussed on the "quirks" documentation page.)
The fact that the sort order of text is different depending on the text encoding is less of a problem than it might seem at first because:
The problem reported by this ticket arose because SQLite was using a UTF8 collating sequence for some, but not all, expressions in a database that had a UTF16LE text encoding. The fix was to ensure that the same collating sequence is used for *all* comparisons within the same database.
This ticket is labeled "minor" in severity because of the observation that it impacts hardly any databases in the wild.