[MDEV-10302] Strings beginning with control characters are "less than" empty string Created: 2016-06-29  Updated: 2016-06-30  Resolved: 2016-06-29

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5.47, 10.0.25
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Michael Balzer Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

openSuSE 42, CentOS 7


Issue Links:
Relates
relates to MDEV-9711 NO PAD collations Closed

 Description   

Any non empty string is expected to be larger than the empty string. Contrary to this, SELECT ... WHERE textfield > '' fails where the text data begins with a control character like newline or tab.

This affects multiple collations (tested with utf8 and latin1 standards), only binary comparison works as expected.

Reproduce / test:

select '\ntext' > '', strcmp('\ntext', '');
+---------------+----------------------+
| '\ntext' > '' | strcmp('\ntext', '') |
+---------------+----------------------+
|             0 |                   -1 |
+---------------+----------------------+

Any control character at the beginning of the text triggers this behaviour.

Workaround: use BINARY or *_bin collations or compare textfield <> ''.



 Comments   
Comment by Sergei Golubchik [ 2016-06-29 ]

This is expected behavior. MariaDB implements what SQL standard calls "PAD SPACE" collations. It means that when you compare two strings, the shorter is padded with spaces to the length of the longer one. That's why you get these results. Binary collation does not pad with spaces (in a binary collation the string is merely a sequence of bytes, it does not have a concept of a "space" or a "letter").

We plan to add support for NO PAD collations in 10.2, see MDEV-9711

Comment by Michael Balzer [ 2016-06-30 ]

Thanks for your explanation and pointers, Sergei!

Generated at Thu Feb 08 07:41:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.