[MDEV-34611] INNER JOIN returns incorrect results - Jira

Details

Type: Bug
Status: Stalled (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 11.6.0
Fix Version/s: 10.5, 10.6, 10.11, 11.4
Component/s: Optimizer, Storage Engine - InnoDB
Labels:
None
Environment:
Ubuntu 20.04.4 LTS (x86)
Kernel Version: 5.15.0-113-generic

Description

Description:
When I use INNER JOIN to join two tables, I get the wrong result.

How to repeat:

CREATE TABLE t0(c0 TINYTEXT) ;

CREATE TABLE t1 LIKE t0;

CREATE INDEX i1 ON t0(c0(1) ASC);

INSERT INTO t1(c0) VALUES(19), (6);

INSERT INTO t0(c0) VALUES("㏫"), (61);

INSERT INTO t0(c0) VALUES(29);

INSERT INTO t0(c0) VALUES(99);

SELECT * FROM t0; -- get [㏫,61,29,99]

SELECT * FROM t1; -- get [19,6]

SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 ON t1.c0 >= t0.c0; -- get[6,6], but expect [19,6,6]

Suggested fix:
The problem may lie in the optimization of INNER JOIN by storage engine InnoDB.

Attachments

Issue Links

relates to

MDEV-8625 Bad result set with ignorable characters when using a prefix key

Confirmed

MDEV-8626 Bad result set with expansions when using a prefix key

Confirmed

MDEV-20884 Different resultset when using prefix index and utf8mb4

Confirmed

MDEV-34006 Incorrect query result

Confirmed

Activity

Ascending order - Click to sort in descending order

Alexander Barkov added a comment - 2024-08-19 05:21 - edited

Verified as described:

DROP TABLE IF EXISTS t0,t1;

CREATE TABLE t0(c0 TINYTEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci);

CREATE TABLE t1 LIKE t0;

CREATE INDEX i1 ON t0(c0(1) ASC);

INSERT INTO t1(c0) VALUES(19), (6);

INSERT INTO t0(c0) VALUES("㏫"), (61);

INSERT INTO t0(c0) VALUES(29);

INSERT INTO t0(c0) VALUES(99);

SELECT ALL t1.c0, t0.c0, t1.c0>=t0.c0 AS cond FROM t1 JOIN t0;

+------+------+------+

| c0   | c0   | cond |

+------+------+------+

| 19   | ㏫   |    1 |

| 6    | ㏫   |    1 |

| 19   | 61   |    0 |

| 6    | 61   |    0 |

| 19   | 29   |    0 |

| 6    | 29   |    1 |

| 19   | 99   |    0 |

| 6    | 99   |    0 |

+------+------+------+

Looks good so far.

Not I run an INNER join with IGNORE INDEX (i1):

SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 IGNORE INDEX (i1) ON t1.c0 >= t0.c0;

+------+

| ref0 |

+------+

| 19   |

| 6    |

| 6    |

+------+

Looks good so far again. It returned all rows which had cond=1 in the previous query.

Now I run the same INNER joins wihtout the IGNORE:

SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 ON t1.c0 >= t0.c0;

+------+

| ref0 |

+------+

| 6    |

| 6    |

+------+

Looks wrong. The record 19 disappeared.

The problem is also repeatable with utf8mb4_unicode_ci.

Alexander Barkov added a comment - 2024-08-19 05:21 - edited Verified as described: DROP TABLE IF EXISTS t0,t1; CREATE TABLE t0(c0 TINYTEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci); CREATE TABLE t1 LIKE t0; CREATE INDEX i1 ON t0(c0(1) ASC ); INSERT INTO t1(c0) VALUES (19), (6); INSERT INTO t0(c0) VALUES ( "㏫" ), (61); INSERT INTO t0(c0) VALUES (29); INSERT INTO t0(c0) VALUES (99); SELECT ALL t1.c0, t0.c0, t1.c0>=t0.c0 AS cond FROM t1 JOIN t0; +------+------+------+ | c0 | c0 | cond | +------+------+------+ | 19 | ㏫ | 1 | | 6 | ㏫ | 1 | | 19 | 61 | 0 | | 6 | 61 | 0 | | 19 | 29 | 0 | | 6 | 29 | 1 | | 19 | 99 | 0 | | 6 | 99 | 0 | +------+------+------+ Looks good so far. Not I run an INNER join with IGNORE INDEX (i1): SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 IGNORE INDEX (i1) ON t1.c0 >= t0.c0; +------+ | ref0 | +------+ | 19 | | 6 | | 6 | +------+ Looks good so far again. It returned all rows which had cond=1 in the previous query. Now I run the same INNER joins wihtout the IGNORE: SELECT ALL t1.c0 AS ref0 FROM t1 INNER JOIN t0 ON t1.c0 >= t0.c0; +------+ | ref0 | +------+ | 6 | | 6 | +------+ Looks wrong. The record 19 disappeared. The problem is also repeatable with utf8mb4_unicode_ci.

Rex Johnston added a comment - 2024-08-19 18:35 - edited

continuing the above analysis

explain format=json SELECT t1.c0 as t1c0, t0.c0 as t0c0 FROM t1  JOIN t0  ON t1.c0 >= t0.c0;

we see that the query plan is collect rows from t0 with data from t1 via a range-checked-for-each-record algorithm.

  "query_block": {

    "select_id": 1,

    "cost": 0.02187429,

    "nested_loop": [

        "table": {

          "table_name": "t1",

          "access_type": "ALL",

          "loops": 1,

          "rows": 2,

          "cost": 0.01034841,

          "filtered": 100

},

        "range-checked-for-each-record": {

          "keys": ["i1"],

          "table": {

            "table_name": "t0",

            "access_type": "ALL",

            "possible_keys": ["i1"],

            "loops": 2,

            "rows": 4,

            "cost": 0.01152588,

            "filtered": 100

During the SQL_SELECT::test_quick_select call within JOIN::exec ... evaluate_join_record we construct a SEL_TREE on the Item representing the join condition t1.c0 > t0.c0.

The first row from t1 is this

(Field *){t0.c0=19}

As this is a TEXT field, the Field object is a Field_blob and a call to Field_blob::get_key_image_itRAW() is meant to write into a passed in buffer a representation of the contents of this blob ("19").

Here we see some additional checks on the actual length of this field taking into account the character set

  size_t local_char_length= length / mbmaxlen();

  local_char_length= field_charset()->charpos(blob, blob + blob_length,

                                              local_char_length);

  set_if_smaller(blob_length, local_char_length);

the field length is 3, this is the key length as defined by the index

CREATE INDEX i1 ON t0(c0(1) ASC);

if we use a full length index, we do not see this problem.
mbmaxlen() is 3, so local_char_length is initialized with the value of 1.

The next line we search for the last character in the string, in a charset aware way, and return the number of characters before this character.
This is again 1, searching for '9' in the string "19" and seeing only '1' prior to it.
So local_char_length is calculated as 1, the buffer is filled like this

(gdb) x/4cb buff

0x7fa68420e219: 1 '\001'        0 '\000'        49 '1'  0 '\000'

instead of what we would expect

{1, null, '1', '9', null}

with a full length index.

This buffer is then used as the key value to construct the SEL_TREE.

Using the collation utf8mb4_unicode_ci, we can observe that while "19'"is greater than the first value from t0 we are to compare it with

MariaDB [test]> select cast('19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci);

+--------------------------------------------------------------------------------------------------------------------------------------------+

| cast('19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci)  |

+--------------------------------------------------------------------------------------------------------------------------------------------+

|                                                                                                                                          1 |

+--------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.003 sec)

if we truncate "19" to just "1"

MariaDB [test]> select cast('1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci);

+-------------------------------------------------------------------------------------------------------------------------------------------+

| cast('1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast('㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci)  |

+-------------------------------------------------------------------------------------------------------------------------------------------+

|                                                                                                                                         0 |

+-------------------------------------------------------------------------------------------------------------------------------------------+

1 row in set (0.003 sec)

we see the "1" is NOT greater, so the range scan indicates that multi_range_read_next(), part of the range-checked-for-each-record algorithm should skip over this value during join execution. As we are executing a LEFT join, we reject the row from t0 as not matching t1 and discard this row.

I'm concluding that, because of the nature of the index and the collation specified, this is not a bug, but an expected behaviour.

Rex Johnston added a comment - 2024-08-19 18:35 - edited continuing the above analysis explain format=json SELECT t1.c0 as t1c0, t0.c0 as t0c0 FROM t1 JOIN t0 ON t1.c0 >= t0.c0; we see that the query plan is collect rows from t0 with data from t1 via a range-checked-for-each-record algorithm. { "query_block": { "select_id": 1, "cost": 0.02187429, "nested_loop": [ { "table": { "table_name": "t1", "access_type": "ALL", "loops": 1, "rows": 2, "cost": 0.01034841, "filtered": 100 } }, { "range-checked-for-each-record": { "keys": ["i1"], "table": { "table_name": "t0", "access_type": "ALL", "possible_keys": ["i1"], "loops": 2, "rows": 4, "cost": 0.01152588, "filtered": 100 } } } ] } } During the SQL_SELECT::test_quick_select call within JOIN::exec ... evaluate_join_record we construct a SEL_TREE on the Item representing the join condition t1.c0 > t0.c0 . The first row from t1 is this (Field *){t0.c0=19} As this is a TEXT field, the Field object is a Field_blob and a call to Field_blob::get_key_image_itRAW() is meant to write into a passed in buffer a representation of the contents of this blob ("19"). Here we see some additional checks on the actual length of this field taking into account the character set size_t local_char_length= length / mbmaxlen(); local_char_length= field_charset()->charpos(blob, blob + blob_length, local_char_length); set_if_smaller(blob_length, local_char_length); the field length is 3, this is the key length as defined by the index CREATE INDEX i1 ON t0(c0(1) ASC ); if we use a full length index, we do not see this problem. mbmaxlen() is 3, so local_char_length is initialized with the value of 1. The next line we search for the last character in the string, in a charset aware way, and return the number of characters before this character. This is again 1, searching for '9' in the string "19" and seeing only '1' prior to it. So local_char_length is calculated as 1, the buffer is filled like this (gdb) x/4cb buff 0x7fa68420e219: 1 '\001' 0 '\000' 49 '1' 0 '\000' instead of what we would expect {1, null, '1', '9', null} with a full length index. This buffer is then used as the key value to construct the SEL_TREE. Using the collation utf8mb4_unicode_ci, we can observe that while "19'"is greater than the first value from t0 we are to compare it with MariaDB [test]> select cast ( '19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci); + --------------------------------------------------------------------------------------------------------------------------------------------+ | cast ( '19' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci) | + --------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | + --------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.003 sec) if we truncate "19" to just "1" MariaDB [test]> select cast ( '1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci); + -------------------------------------------------------------------------------------------------------------------------------------------+ | cast ( '1' as char character set utf8mb3 collate utf8mb3_unicode_ci) > cast ( '㏫' as char character set utf8mb3 collate utf8mb3_unicode_ci) | + -------------------------------------------------------------------------------------------------------------------------------------------+ | 0 | + -------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.003 sec) we see the "1" is NOT greater , so the range scan indicates that multi_range_read_next(), part of the range-checked-for-each-record algorithm should skip over this value during join execution. As we are executing a LEFT join, we reject the row from t0 as not matching t1 and discard this row. I'm concluding that, because of the nature of the index and the collation specified, this is not a bug, but an expected behaviour.

Sergei Golubchik added a comment - 2024-08-19 22:03

This is definitely a bug. The correct result, as shown above, is

+------+

| 19   |

| 6    |

| 6    |

+------+

anything else is wrong.
The analysis above shows precisely where the query execution diverges from the correct path and why it delivers incorrect result.

Sergei Golubchik added a comment - 2024-08-19 22:03 This is definitely a bug. The correct result, as shown above, is +------+ | 19 | | 6 | | 6 | +------+ anything else is wrong. The analysis above shows precisely where the query execution diverges from the correct path and why it delivers incorrect result.

Rex Johnston added a comment - 2024-08-19 22:41

OK. So the any partial index on a multibyte character set column cannot be used for the purposes of, at the very least, comparing 2 keys?

Rex Johnston added a comment - 2024-08-19 22:41 OK. So the any partial index on a multibyte character set column cannot be used for the purposes of, at the very least, comparing 2 keys?

Marko Mäkelä added a comment - 2024-08-20 05:39

I searched for the symbol in gucharmap and found the following:

㏫ U+33EB IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY TWELVE
…
Compatibility decomposition: 1 U+0031 DIGIT ONE + 2 U+0032 DIGIT TWO + 日 U+65E5 CJK UNIFIED IDEOGRAPH-65E5

There are also other examples where a single Unicode code point corresponds to a sequence of ASCII. In the source file storage/innobase/ibuf/ibuf0ibuf.cc (up to MariaDB Server 10.11) you can find a couple of examples, which I think were originally provided by bar:

		/* A collation may identify values that differ in

		storage length.

		Some examples (1 or 2 bytes):

		utf8_turkish_ci: I = U+0131 LATIN SMALL LETTER DOTLESS I

		utf8_general_ci: S = U+00DF LATIN SMALL LETTER SHARP S

		utf8_general_ci: A = U+00E4 LATIN SMALL LETTER A WITH DIAERESIS

		latin1_german2_ci: SS = U+00DF LATIN SMALL LETTER SHARP S

		Examples of a character (3-byte UTF-8 sequence)

		identified with 2 or 4 characters (1-byte UTF-8 sequences):

		utf8_unicode_ci: 'II' = U+2171 SMALL ROMAN NUMERAL TWO

		utf8_unicode_ci: '(10)' = U+247D PARENTHESIZED NUMBER TEN

*/

I would suggest to test also with {{'ⅱ'}} and 'ii', as well as {{'⑽'}} and '(10)', and of course 'ss' and 'ß' or 'ẞ' when the multi-character sequence is being clipped by a column prefix index.

If I understood it correctly, there should be no issue with multi-byte characters that correspond to a single character, because column prefix indexes use the correct number of bytes for encoding the glyphs. A classic example involves the two-byte Turkish İ (U+0130) and ı (U+0131). In a suitable collation such as utf8_turkish_ci, I might expect İ=i and I=ı, but I≠i.

Marko Mäkelä added a comment - 2024-08-20 05:39 I searched for the symbol in gucharmap and found the following: ㏫ U+33EB IDEOGRAPHIC TELEGRAPH SYMBOL FOR DAY TWELVE … Compatibility decomposition: 1 U+0031 DIGIT ONE + 2 U+0032 DIGIT TWO + 日 U+65E5 CJK UNIFIED IDEOGRAPH-65E5 There are also other examples where a single Unicode code point corresponds to a sequence of ASCII. In the source file storage/innobase/ibuf/ibuf0ibuf.cc (up to MariaDB Server 10.11) you can find a couple of examples, which I think were originally provided by bar : /* A collation may identify values that differ in storage length. Some examples (1 or 2 bytes): utf8_turkish_ci: I = U+0131 LATIN SMALL LETTER DOTLESS I utf8_general_ci: S = U+00DF LATIN SMALL LETTER SHARP S utf8_general_ci: A = U+00E4 LATIN SMALL LETTER A WITH DIAERESIS latin1_german2_ci: SS = U+00DF LATIN SMALL LETTER SHARP S Examples of a character (3-byte UTF-8 sequence) identified with 2 or 4 characters (1-byte UTF-8 sequences): utf8_unicode_ci: 'II' = U+2171 SMALL ROMAN NUMERAL TWO utf8_unicode_ci: '(10)' = U+247D PARENTHESIZED NUMBER TEN */ I would suggest to test also with {{'ⅱ'}} and 'ii' , as well as {{'⑽'}} and '(10)' , and of course 'ss' and 'ß' or 'ẞ' when the multi-character sequence is being clipped by a column prefix index. If I understood it correctly, there should be no issue with multi-byte characters that correspond to a single character, because column prefix indexes use the correct number of bytes for encoding the glyphs. A classic example involves the two-byte Turkish İ (U+0130) and ı (U+0131). In a suitable collation such as utf8_turkish_ci , I might expect İ=i and I=ı, but I≠i.

Sergei Petrunia added a comment - 2024-08-21 12:04

Take-aways from optimizer call:

Need to fix something that happens inside Item_...::get_mm_tree() calls for such cases. The generated range needs to be wider.

Sergei Petrunia added a comment - 2024-08-21 12:04 Take-aways from optimizer call: Need to fix something that happens inside Item_...::get_mm_tree() calls for such cases. The generated range needs to be wider.

Alexander Barkov added a comment - 2024-08-21 17:41 - edited

The optimizer does not work well with prefix keys on collations having so called expansions (i.e. characters which have multiple weights).
'㏫' in collations utf8mb3_unicode_ci and uca1400_ai_ci have multiple weights.
It's a single character and it has four two-byte weights:

SET NAMES utf8mb4;

SELECT HEX(WEIGHT_STRING('㏫' COLLATE utf8mb4_uca1400_ai_ci));

+---------------------------------------------------------+

| HEX(WEIGHT_STRING('㏫' COLLATE utf8mb4_uca1400_ai_ci))  |

+---------------------------------------------------------+

| 206C206DFB40E5E5                                        |

+---------------------------------------------------------+

See also this report for simpler test cases: MDEV-8626. MDEV-8625 .

This problem existed for years. At least since MySQL-4.1 (in latin1_german2_ci). But we never had time to fix it.
Now with utf8mb4_uca1400_ai_ci being the default, its priority should be changed to more important. Critical?

For a prefix KEY(column(1)), the optimizer currently effectively translates the condition

column1 = '㏫'

into

LEFT(column1,1) = '㏫'

, which is wrong.

It should instead translate

either to:
LEFT(column1,1) = '1'
This looks too hard to implement in the collation library, and may be even impossible for some characters.

or to:

LEFT(column1,1)

  BETWEEN 'a character which is slightly less than ㏫'

      AND 'a character which is slightly greater than ㏫'

This way certainly needs some changes on the optimizer level. I'm not sure if changes in the collation library are needed. Are they?

Alexander Barkov added a comment - 2024-08-21 17:41 - edited The optimizer does not work well with prefix keys on collations having so called expansions (i.e. characters which have multiple weights). '㏫' in collations utf8mb3_unicode_ci and uca1400_ai_ci have multiple weights. It's a single character and it has four two-byte weights: SET NAMES utf8mb4; SELECT HEX(WEIGHT_STRING( '㏫' COLLATE utf8mb4_uca1400_ai_ci)); +---------------------------------------------------------+ | HEX(WEIGHT_STRING('㏫' COLLATE utf8mb4_uca1400_ai_ci)) | +---------------------------------------------------------+ | 206C206DFB40E5E5 | +---------------------------------------------------------+ See also this report for simpler test cases: MDEV-8626 . MDEV-8625 . This problem existed for years. At least since MySQL-4.1 (in latin1_german2_ci). But we never had time to fix it. Now with utf8mb4_uca1400_ai_ci being the default, its priority should be changed to more important. Critical? For a prefix KEY(column(1)), the optimizer currently effectively translates the condition column1 = '㏫' into LEFT (column1,1) = '㏫' , which is wrong. It should instead translate either to: LEFT (column1,1) = '1' This looks too hard to implement in the collation library, and may be even impossible for some characters. or to: LEFT (column1,1) BETWEEN 'a character which is slightly less than ㏫' AND 'a character which is slightly greater than ㏫' This way certainly needs some changes on the optimizer level. I'm not sure if changes in the collation library are needed. Are they?

People

Assignee:: Rex Johnston

Reporter:: Chenglin Liang

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Dates

Created:: 2024-07-17 09:37

Updated:: 2024-11-22 17:10

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.