[MDEV-33318] ORDER BY COLLATE improperly applied to non-character columns Created: 2024-01-26  Updated: 2024-01-30

Status: In Testing
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2, 11.3, 10.6.16, 10.11.6
Fix Version/s: 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Major
Reporter: Vladimir "Bob" Zakharychev Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-24584 Selecting INT column with COLLATE utf... Closed

 Description   

When explicitly specifying collation on a non-character sort column , the column type is not taken into account and it may be sorted incorrectly. In 10.4 an error was thrown, but in 10.6 and 10.11 no error is thrown but sort results are unpredictable.

create table ct (ts datetime);
insert into ct select now();
-- repeat a few more times
select * from ct order by ts COLLATE utf8_bin;
+---------------------+
| ts                  |
+---------------------+
| 2024-01-26 10:25:18 |
| 2024-01-26 10:25:41 |
| 2024-01-26 10:25:40 |
| 2024-01-26 10:25:38 |
| 2024-01-26 10:25:34 |
| 2024-01-26 10:25:31 |
| 2024-01-26 10:25:28 |
| 2024-01-26 10:25:25 |
| 2024-01-26 10:25:22 |
| 2024-01-26 10:25:44 |
+---------------------+
10 rows in set (0.000 sec)
-- note that timestamps are not actually sorted correctly and come in some pseudo-random order, not even order of insertion.

in 10.4 the same SQL results in

ERROR 1253 (42000): COLLATION 'utf8_bin' is not valid for CHARACTER SET 'latin1'

which is confusing because timestamp columns don't have collation, but at least it hints that there is a problem with the statement. Later versions silently produce wrong results instead.



 Comments   
Comment by Alexander Barkov [ 2024-01-29 ]

Hello serg,

The patch at
https://github.com/MariaDB/server/commit/8fc362f73f1bd14d38a75969b9e546b6d857622a
is almost fine.

I suggest instead of:

> +--echo # End of 10.6 tests
> diff --git a/sql/item_strfunc.cc b/sql/item_strfunc.cc
> index ae855525060d6..cd1020164aba9 100644
> --- a/sql/item_strfunc.cc
> +++ b/sql/item_strfunc.cc
> @@ -3626,7 +3626,8 @@ bool Item_func_set_collation::fix_length_and_dec()
>    }
>    collation.set(m_set_collation, DERIVATION_EXPLICIT,
>                  args[0]->collation.repertoire);
> -  max_length= args[0]->max_length;
> +  max_length= args[0]->max_length * collation.collation->mbmaxlen
> +                         / args[0]->collation.collation->mbmaxlen;
>    return FALSE;
>  }

to use:

  ulonglong char_length= (ulonglong) args[0]->max_char_length() * collation.collation->mbmaxlen;
  fix_char_length_ulonglong(char_length);

Generated at Thu Feb 08 10:38:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.