View Ticket
Ticket Hash: 18ab5da2c05ad57d7f9d79c41d3138b141378543
Title: Issue with DISTINCT and COLLATE
Status: Closed Type: Not_A_Bug
Severity: Critical Priority: Immediate
Subsystem: Unknown Resolution: Rejected
Last Modified: 2019-07-29 06:05:34
Version Found In: 3.29.0
User Comments:
mrigger added on 2019-07-28 21:53:44:

Consider the test case below:

INSERT INTO t0(c0) VALUES (' ');
SELECT DISTINCT(t0.c0) != '' FROM t0; -- expected: 1, actual: 0

I would expect the SELECT to fetch a row with a value 1, since the DISTINCT(t0.c0) should be equal to ' ', and ' ' != '' should be true. However, the result is 0. I suspect that DISTINCT(t0.c0) assumes the COLLATE RTRIM. This is unexpected, since the documentation mentions that only the plus operator and CAST in combination with a column still assume the column's COLLATE, rather than having no implicit COLLATE:

If either operand is a column, then the collating function of that column is used with precedence to the left operand. For the purposes of the previous sentence, a column name preceded by one or more unary "+" operators and/or CAST operators is still considered a column name.

drh added on 2019-07-29 06:05:34:

The DISTINCT keyword is a modifier on SELECT. It is not a function that is applied to parts of the result set. The query actually parses out like this:

SELECT DISTINCT ((t0.c0) != '') FROM t0;

Hence, the RTRIM collating sequence does apply to the LHS of the comparison, and the result is 0.