Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34611

INNER JOIN returns incorrect results

Details

    Description

      Description:
      When I use INNER JOIN to join two tables, I get the wrong result.

      How to repeat:

      CREATE TABLE t0(c0 TINYTEXT) ;
      CREATE TABLE t1 LIKE t0;
      CREATE INDEX i1 ON t0(c0(1) ASC);
      INSERT INTO t1(c0) VALUES(19), (6);
      INSERT INTO t0(c0) VALUES("㏫"), (61);
      INSERT INTO t0(c0) VALUES(29);
      INSERT INTO t0(c0) VALUES(99);
       
      SELECT * FROM t0; -- get [㏫,61,29,99]
      SELECT * FROM t1; -- get [19,6]
       
      SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 ON t1.c0 >= t0.c0; -- get[6,6], but expect [19,6,6]
      

      Suggested fix:
      The problem may lie in the optimization of INNER JOIN by storage engine InnoDB.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            Verified as described:

            DROP TABLE IF EXISTS t0,t1;
            CREATE TABLE t0(c0 TINYTEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci);
            CREATE TABLE t1 LIKE t0;
            CREATE INDEX i1 ON t0(c0(1) ASC);
            INSERT INTO t1(c0) VALUES(19), (6);
            INSERT INTO t0(c0) VALUES("㏫"), (61);
            INSERT INTO t0(c0) VALUES(29);
            INSERT INTO t0(c0) VALUES(99);
            SELECT ALL t1.c0, t0.c0, t1.c0>=t0.c0 AS cond FROM t1 JOIN t0;
            

            +------+------+------+
            | c0   | c0   | cond |
            +------+------+------+
            | 19   | ㏫   |    1 |
            | 6    | ㏫   |    1 |
            | 19   | 61   |    0 |
            | 6    | 61   |    0 |
            | 19   | 29   |    0 |
            | 6    | 29   |    1 |
            | 19   | 99   |    0 |
            | 6    | 99   |    0 |
            +------+------+------+
            

            Looks good so far.

            Not I run an INNER join with IGNORE INDEX (i1):

            SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 IGNORE INDEX (i1) ON t1.c0 >= t0.c0;
            +------+
            | ref0 |
            +------+
            | 19   |
            | 6    |
            | 6    |
            +------+
            

            Looks good so far again. It returned all rows which had cond=1 in the previous query.

            Now I run the same INNER joins wihtout the IGNORE:

            SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 ON t1.c0 >= t0.c0;
            +------+
            | ref0 |
            +------+
            | 6    |
            | 6    |
            +------+
            

            Looks wrong. The record 19 disappeared.

            The problem is also repeatable with utf8mb4_unicode_ci.

            bar Alexander Barkov added a comment - - edited Verified as described: DROP TABLE IF EXISTS t0,t1; CREATE TABLE t0(c0 TINYTEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci); CREATE TABLE t1 LIKE t0; CREATE INDEX i1 ON t0(c0(1) ASC ); INSERT INTO t1(c0) VALUES (19), (6); INSERT INTO t0(c0) VALUES ( "㏫" ), (61); INSERT INTO t0(c0) VALUES (29); INSERT INTO t0(c0) VALUES (99); SELECT ALL t1.c0, t0.c0, t1.c0>=t0.c0 AS cond FROM t1 JOIN t0; +------+------+------+ | c0 | c0 | cond | +------+------+------+ | 19 | ㏫ | 1 | | 6 | ㏫ | 1 | | 19 | 61 | 0 | | 6 | 61 | 0 | | 19 | 29 | 0 | | 6 | 29 | 1 | | 19 | 99 | 0 | | 6 | 99 | 0 | +------+------+------+ Looks good so far. Not I run an INNER join with IGNORE INDEX (i1): SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 IGNORE INDEX (i1) ON t1.c0 >= t0.c0; +------+ | ref0 | +------+ | 19 | | 6 | | 6 | +------+ Looks good so far again. It returned all rows which had cond=1 in the previous query. Now I run the same INNER joins wihtout the IGNORE: SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 ON t1.c0 >= t0.c0; +------+ | ref0 | +------+ | 6 | | 6 | +------+ Looks wrong. The record 19 disappeared. The problem is also repeatable with utf8mb4_unicode_ci.
            Johnston Rex Johnston added a comment - - edited

            continuing the above analysis

            explain format=json SELECT t1.c0 as t1c0, t0.c0 as t0c0 FROM t1  JOIN t0  ON t1.c0 >= t0.c0;
            

            we see that the query plan is collect rows from t0 with data from t1 via a range-checked-for-each-record algorithm.

            {
              "query_block": {
                "select_id": 1,
                "cost": 0.02187429,
                "nested_loop": [
                  {
                    "table": {
                      "table_name": "t1",
                      "access_type": "ALL",
                      "loops": 1,
                      "rows": 2,
                      "cost": 0.01034841,
                      "filtered": 100
                    }
                  },
                  {
                    "range-checked-for-each-record": {
                      "keys": ["i1"],
                      "table": {
                        "table_name": "t0",
                        "access_type": "ALL",
                        "possible_keys": ["i1"],
                        "loops": 2,
                        "rows": 4,
                        "cost": 0.01152588,
                        "filtered": 100
                      }
                    }
                  }
                ]
              }
            }
            

            During the SQL_SELECT::test_quick_select call within JOIN::exec ... evaluate_join_record we construct a SEL_TREE on the Item representing the join condition t1.c0 > t0.c0.

            The first row from t1 is this

            (Field *){t0.c0=19}
            

            As this is a TEXT field, the Field object is a Field_blob and a call to Field_blob::get_key_image_itRAW() is meant to write into a passed in buffer a representation of the contents of this blob ("19").

            Here we see some additional checks on the actual length of this field taking into account the character set

              size_t local_char_length= length / mbmaxlen();
              local_char_length= field_charset()->charpos(blob, blob + blob_length,
                                                          local_char_length);
              set_if_smaller(blob_length, local_char_length);
            

            the field length is 3, this is the key length as defined by the index

            CREATE INDEX i1 ON t0(c0(1) ASC);
            

            if we use a full length index, we do not see this problem.
            mbmaxlen() is 3, so local_char_length is initialized with the value of 1.

            The next line we search for the last character in the string, in a charset aware way, and return the number of characters before this character.
            This is again 1, searching for '9' in the string "19" and seeing only '1' prior to it.
            So local_char_length is calculated as 1, the buffer is filled like this

            (gdb) x/4cb buff
            0x7fa68420e219: 1 '\001'        0 '\000'        49 '1'  0 '\000'
            

            instead of what we would expect

            {1, null, '1', '9', null}

            with a full length index.

            This buffer is then used as the key value to construct the SEL_TREE.

            Using the collation utf8mb4_unicode_ci, we can observe that while "19'"is greater than the first value from t0 we are to compare it with

            MariaDB [test]> select cast('19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci);
            +--------------------------------------------------------------------------------------------------------------------------------------------+
            | cast('19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci)  |
            +--------------------------------------------------------------------------------------------------------------------------------------------+
            |                                                                                                                                          1 |
            +--------------------------------------------------------------------------------------------------------------------------------------------+
            1 row in set (0.003 sec)
            

            if we truncate "19" to just "1"

            MariaDB [test]> select cast('1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci);
            +-------------------------------------------------------------------------------------------------------------------------------------------+
            | cast('1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci)  |
            +-------------------------------------------------------------------------------------------------------------------------------------------+
            |                                                                                                                                         0 |
            +-------------------------------------------------------------------------------------------------------------------------------------------+
            1 row in set (0.003 sec)
            

            we see the "1" is NOT greater, so the range scan indicates that multi_range_read_next(), part of the range-checked-for-each-record algorithm should skip over this value during join execution. As we are executing a LEFT join, we reject the row from t0 as not matching t1 and discard this row.

            I'm concluding that, because of the nature of the index and the collation specified, this is not a bug, but an expected behaviour.

            Johnston Rex Johnston added a comment - - edited continuing the above analysis explain format=json SELECT t1.c0 as t1c0, t0.c0 as t0c0 FROM t1 JOIN t0 ON t1.c0 >= t0.c0; we see that the query plan is collect rows from t0 with data from t1 via a range-checked-for-each-record algorithm. { "query_block": { "select_id": 1, "cost": 0.02187429, "nested_loop": [ { "table": { "table_name": "t1", "access_type": "ALL", "loops": 1, "rows": 2, "cost": 0.01034841, "filtered": 100 } }, { "range-checked-for-each-record": { "keys": ["i1"], "table": { "table_name": "t0", "access_type": "ALL", "possible_keys": ["i1"], "loops": 2, "rows": 4, "cost": 0.01152588, "filtered": 100 } } } ] } } During the SQL_SELECT::test_quick_select call within JOIN::exec ... evaluate_join_record we construct a SEL_TREE on the Item representing the join condition t1.c0 > t0.c0 . The first row from t1 is this (Field *){t0.c0=19} As this is a TEXT field, the Field object is a Field_blob and a call to Field_blob::get_key_image_itRAW() is meant to write into a passed in buffer a representation of the contents of this blob ("19"). Here we see some additional checks on the actual length of this field taking into account the character set size_t local_char_length= length / mbmaxlen(); local_char_length= field_charset()->charpos(blob, blob + blob_length, local_char_length); set_if_smaller(blob_length, local_char_length); the field length is 3, this is the key length as defined by the index CREATE INDEX i1 ON t0(c0(1) ASC ); if we use a full length index, we do not see this problem. mbmaxlen() is 3, so local_char_length is initialized with the value of 1. The next line we search for the last character in the string, in a charset aware way, and return the number of characters before this character. This is again 1, searching for '9' in the string "19" and seeing only '1' prior to it. So local_char_length is calculated as 1, the buffer is filled like this (gdb) x/4cb buff 0x7fa68420e219: 1 '\001' 0 '\000' 49 '1' 0 '\000' instead of what we would expect {1, null, '1', '9', null} with a full length index. This buffer is then used as the key value to construct the SEL_TREE. Using the collation utf8mb4_unicode_ci, we can observe that while "19'"is greater than the first value from t0 we are to compare it with MariaDB [test]> select cast ( '19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci); + --------------------------------------------------------------------------------------------------------------------------------------------+ | cast ( '19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci) | + --------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | + --------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.003 sec) if we truncate "19" to just "1" MariaDB [test]> select cast ( '1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci); + -------------------------------------------------------------------------------------------------------------------------------------------+ | cast ( '1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci) | + -------------------------------------------------------------------------------------------------------------------------------------------+ | 0 | + -------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.003 sec) we see the "1" is NOT greater , so the range scan indicates that multi_range_read_next(), part of the range-checked-for-each-record algorithm should skip over this value during join execution. As we are executing a LEFT join, we reject the row from t0 as not matching t1 and discard this row. I'm concluding that, because of the nature of the index and the collation specified, this is not a bug, but an expected behaviour.

            This is definitely a bug. The correct result, as shown above, is

            +------+
            | 19   |
            | 6    |
            | 6    |
            +------+
            

            anything else is wrong.
            The analysis above shows precisely where the query execution diverges from the correct path and why it delivers incorrect result.

            serg Sergei Golubchik added a comment - This is definitely a bug. The correct result, as shown above, is +------+ | 19 | | 6 | | 6 | +------+ anything else is wrong. The analysis above shows precisely where the query execution diverges from the correct path and why it delivers incorrect result.
            Johnston Rex Johnston added a comment -

            OK. So the any partial index on a multibyte character set column cannot be used for the purposes of, at the very least, comparing 2 keys?

            Johnston Rex Johnston added a comment - OK. So the any partial index on a multibyte character set column cannot be used for the purposes of, at the very least, comparing 2 keys?

            I searched for the symbol in gucharmap and found the following:

            ㏫ U+33EB IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY TWELVE

            Compatibility decomposition: 1 U+0031 DIGIT ONE + 2 U+0032 DIGIT TWO + 日 U+65E5 CJK UNIFIED IDEOGRAPH-65E5

            There are also other examples where a single Unicode code point corresponds to a sequence of ASCII. In the source file storage/innobase/ibuf/ibuf0ibuf.cc (up to MariaDB Server 10.11) you can find a couple of examples, which I think were originally provided by bar:

            		/* A collation may identify values that differ in
            		storage length.
            		Some examples (1 or 2 bytes):
            		utf8_turkish_ci: I = U+0131 LATIN SMALL LETTER DOTLESS I
            		utf8_general_ci: S = U+00DF LATIN SMALL LETTER SHARP S
            		utf8_general_ci: A = U+00E4 LATIN SMALL LETTER A WITH DIAERESIS
             
            		latin1_german2_ci: SS = U+00DF LATIN SMALL LETTER SHARP S
             
            		Examples of a character (3-byte UTF-8 sequence)
            		identified with 2 or 4 characters (1-byte UTF-8 sequences):
             
            		utf8_unicode_ci: 'II' = U+2171 SMALL ROMAN NUMERAL TWO
            		utf8_unicode_ci: '(10)' = U+247D PARENTHESIZED NUMBER TEN
            		*/
            

            I would suggest to test also with {{'ⅱ'}} and 'ii', as well as {{'⑽'}} and '(10)', and of course 'ss' and 'ß' or 'ẞ' when the multi-character sequence is being clipped by a column prefix index.

            If I understood it correctly, there should be no issue with multi-byte characters that correspond to a single character, because column prefix indexes use the correct number of bytes for encoding the glyphs. A classic example involves the two-byte Turkish İ (U+0130) and ı (U+0131). In a suitable collation such as utf8_turkish_ci, I might expect İ=i and I=ı, but I≠i.

            marko Marko Mäkelä added a comment - I searched for the symbol in gucharmap and found the following: ㏫ U+33EB IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY TWELVE … Compatibility decomposition: 1 U+0031 DIGIT ONE + 2 U+0032 DIGIT TWO + 日 U+65E5 CJK UNIFIED IDEOGRAPH-65E5 There are also other examples where a single Unicode code point corresponds to a sequence of ASCII. In the source file storage/innobase/ibuf/ibuf0ibuf.cc (up to MariaDB Server 10.11) you can find a couple of examples, which I think were originally provided by bar : /* A collation may identify values that differ in storage length. Some examples (1 or 2 bytes): utf8_turkish_ci: I = U+0131 LATIN SMALL LETTER DOTLESS I utf8_general_ci: S = U+00DF LATIN SMALL LETTER SHARP S utf8_general_ci: A = U+00E4 LATIN SMALL LETTER A WITH DIAERESIS   latin1_german2_ci: SS = U+00DF LATIN SMALL LETTER SHARP S   Examples of a character (3-byte UTF-8 sequence) identified with 2 or 4 characters (1-byte UTF-8 sequences):   utf8_unicode_ci: 'II' = U+2171 SMALL ROMAN NUMERAL TWO utf8_unicode_ci: '(10)' = U+247D PARENTHESIZED NUMBER TEN */ I would suggest to test also with {{'ⅱ'}} and 'ii' , as well as {{'⑽'}} and '(10)' , and of course 'ss' and 'ß' or 'ẞ' when the multi-character sequence is being clipped by a column prefix index. If I understood it correctly, there should be no issue with multi-byte characters that correspond to a single character, because column prefix indexes use the correct number of bytes for encoding the glyphs. A classic example involves the two-byte Turkish İ (U+0130) and ı (U+0131). In a suitable collation such as utf8_turkish_ci , I might expect İ=i and I=ı, but I≠i.

            Take-aways from optimizer call:

            Need to fix something that happens inside Item_...::get_mm_tree() calls for such cases. The generated range needs to be wider.

            psergei Sergei Petrunia added a comment - Take-aways from optimizer call: Need to fix something that happens inside Item_...::get_mm_tree() calls for such cases. The generated range needs to be wider.
            bar Alexander Barkov added a comment - - edited

            The optimizer does not work well with prefix keys on collations having so called expansions (i.e. characters which have multiple weights).
            '㏫' in collations utf8mb3_unicode_ci and uca1400_ai_ci have multiple weights.
            It's a single character and it has four two-byte weights:

            SET NAMES utf8mb4;
            SELECT HEX(WEIGHT_STRING('㏫' COLLATE utf8mb4_uca1400_ai_ci));
            

            +---------------------------------------------------------+
            | HEX(WEIGHT_STRING('㏫' COLLATE utf8mb4_uca1400_ai_ci))  |
            +---------------------------------------------------------+
            | 206C206DFB40E5E5                                        |
            +---------------------------------------------------------+
            

            See also this report for simpler test cases: MDEV-8626. MDEV-8625 .

            This problem existed for years. At least since MySQL-4.1 (in latin1_german2_ci). But we never had time to fix it.
            Now with utf8mb4_uca1400_ai_ci being the default, its priority should be changed to more important. Critical?

            For a prefix KEY(column(1)), the optimizer currently effectively translates the condition

            column1 = '㏫'
            

            into

            LEFT(column1,1) = '㏫'
            

            , which is wrong.

            It should instead translate

            • either to:

              LEFT(column1,1) = '1'
              

              This looks too hard to implement in the collation library, and may be even impossible for some characters.

            • or to:

              LEFT(column1,1)
                BETWEEN 'a character which is slightly less than ㏫'
                    AND 'a character which is slightly greater than ㏫'
              

              This way certainly needs some changes on the optimizer level. I'm not sure if changes in the collation library are needed. Are they?

            bar Alexander Barkov added a comment - - edited The optimizer does not work well with prefix keys on collations having so called expansions (i.e. characters which have multiple weights). '㏫' in collations utf8mb3_unicode_ci and uca1400_ai_ci have multiple weights. It's a single character and it has four two-byte weights: SET NAMES utf8mb4; SELECT HEX(WEIGHT_STRING( '㏫' COLLATE utf8mb4_uca1400_ai_ci)); +---------------------------------------------------------+ | HEX(WEIGHT_STRING('㏫' COLLATE utf8mb4_uca1400_ai_ci)) | +---------------------------------------------------------+ | 206C206DFB40E5E5 | +---------------------------------------------------------+ See also this report for simpler test cases: MDEV-8626 . MDEV-8625 . This problem existed for years. At least since MySQL-4.1 (in latin1_german2_ci). But we never had time to fix it. Now with utf8mb4_uca1400_ai_ci being the default, its priority should be changed to more important. Critical? For a prefix KEY(column(1)), the optimizer currently effectively translates the condition column1 = '㏫' into LEFT (column1,1) = '㏫' , which is wrong. It should instead translate either to: LEFT (column1,1) = '1' This looks too hard to implement in the collation library, and may be even impossible for some characters. or to: LEFT (column1,1) BETWEEN 'a character which is slightly less than ㏫' AND 'a character which is slightly greater than ㏫' This way certainly needs some changes on the optimizer level. I'm not sure if changes in the collation library are needed. Are they?

            People

              Johnston Rex Johnston
              Chenglin Liang Chenglin Liang
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.