Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10302

Strings beginning with control characters are "less than" empty string

Details

    • Bug
    • Status: Closed (View Workflow)
    • Minor
    • Resolution: Not a Bug
    • 5.5.47, 10.0.25
    • N/A
    • Character Sets
    • None
    • openSuSE 42, CentOS 7

    Description

      Any non empty string is expected to be larger than the empty string. Contrary to this, SELECT ... WHERE textfield > '' fails where the text data begins with a control character like newline or tab.

      This affects multiple collations (tested with utf8 and latin1 standards), only binary comparison works as expected.

      Reproduce / test:

      select '\ntext' > '', strcmp('\ntext', '');
      +---------------+----------------------+
      | '\ntext' > '' | strcmp('\ntext', '') |
      +---------------+----------------------+
      |             0 |                   -1 |
      +---------------+----------------------+
      

      Any control character at the beginning of the text triggers this behaviour.

      Workaround: use BINARY or *_bin collations or compare textfield <> ''.

      Attachments

        Issue Links

          Activity

            balzer@expeedo.de Michael Balzer created issue -
            balzer@expeedo.de Michael Balzer made changes -
            Field Original Value New Value
            Description Any non empty string is expected to be larger than the empty string. Contrary to this, SELECT ... WHERE textfield > '' fails where the text data begins with a control character like newline or tab.

            This affects multiple collations (tested with utf8 and latin1 standards), only binary comparison works as expected.

            Reproduce / test:

            {{
            select '\ntext' > '', strcmp('\ntext', '');
            +---------------+----------------------+
            | '\ntext' > '' | strcmp('\ntext', '') |
            +---------------+----------------------+
            | 0 | -1 |
            +---------------+----------------------+
            }}

            Any control character at the beginning of the text triggers this behaviour.

            Workaround: use BINARY or *_bin collations or compare textfield <> ''.
            Any non empty string is expected to be larger than the empty string. Contrary to this, SELECT ... WHERE textfield > '' fails where the text data begins with a control character like newline or tab.

            This affects multiple collations (tested with utf8 and latin1 standards), only binary comparison works as expected.

            Reproduce / test:

            {{select '\ntext' > '', strcmp('\ntext', '');
            +---------------+----------------------+
            | '\ntext' > '' | strcmp('\ntext', '') |
            +---------------+----------------------+
            | 0 | -1 |
            +---------------+----------------------+}}

            Any control character at the beginning of the text triggers this behaviour.

            Workaround: use BINARY or *_bin collations or compare textfield <> ''.
            balzer@expeedo.de Michael Balzer made changes -
            Description Any non empty string is expected to be larger than the empty string. Contrary to this, SELECT ... WHERE textfield > '' fails where the text data begins with a control character like newline or tab.

            This affects multiple collations (tested with utf8 and latin1 standards), only binary comparison works as expected.

            Reproduce / test:

            {{select '\ntext' > '', strcmp('\ntext', '');
            +---------------+----------------------+
            | '\ntext' > '' | strcmp('\ntext', '') |
            +---------------+----------------------+
            | 0 | -1 |
            +---------------+----------------------+}}

            Any control character at the beginning of the text triggers this behaviour.

            Workaround: use BINARY or *_bin collations or compare textfield <> ''.
            Any non empty string is expected to be larger than the empty string. Contrary to this, SELECT ... WHERE textfield > '' fails where the text data begins with a control character like newline or tab.

            This affects multiple collations (tested with utf8 and latin1 standards), only binary comparison works as expected.

            Reproduce / test:


            {noformat}
            select '\ntext' > '', strcmp('\ntext', '');
            +---------------+----------------------+
            | '\ntext' > '' | strcmp('\ntext', '') |
            +---------------+----------------------+
            | 0 | -1 |
            +---------------+----------------------+
            {noformat}


            Any control character at the beginning of the text triggers this behaviour.

            Workaround: use BINARY or *_bin collations or compare textfield <> ''.

            This is expected behavior. MariaDB implements what SQL standard calls "PAD SPACE" collations. It means that when you compare two strings, the shorter is padded with spaces to the length of the longer one. That's why you get these results. Binary collation does not pad with spaces (in a binary collation the string is merely a sequence of bytes, it does not have a concept of a "space" or a "letter").

            We plan to add support for NO PAD collations in 10.2, see MDEV-9711

            serg Sergei Golubchik added a comment - This is expected behavior. MariaDB implements what SQL standard calls "PAD SPACE" collations. It means that when you compare two strings, the shorter is padded with spaces to the length of the longer one. That's why you get these results. Binary collation does not pad with spaces (in a binary collation the string is merely a sequence of bytes, it does not have a concept of a "space" or a "letter"). We plan to add support for NO PAD collations in 10.2, see MDEV-9711
            serg Sergei Golubchik made changes -
            Fix Version/s N/A [ 14700 ]
            Assignee Sergei Golubchik [ serg ]
            Resolution Not a Bug [ 6 ]
            Status Open [ 1 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -

            Thanks for your explanation and pointers, Sergei!

            balzer@expeedo.de Michael Balzer added a comment - Thanks for your explanation and pointers, Sergei!
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 76268 ] MariaDB v4 [ 150562 ]

            People

              serg Sergei Golubchik
              balzer@expeedo.de Michael Balzer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.