[MDEV-14350] Index use with collation utf8mb4_unicode_nopad_ci on LIKE pattern with wrong results Created: 2017-11-10  Updated: 2017-12-08  Resolved: 2017-12-08

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.2.10
Fix Version/s: 10.2.12, 10.3.3

Type: Bug Priority: Major
Reporter: Christian Mattick Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: collation, index, nopad
Environment:

openSUSE 13.1 (x86_64)


Issue Links:
Relates
relates to MDEV-9711 NO PAD collations Closed

 Description   

Steps to reproduce:

CREATE TABLE `aaa` (
	`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
	`a` VARCHAR(50) NOT NULL COLLATE 'utf8mb4_unicode_nopad_ci',
	`b` VARCHAR(50) NOT NULL COLLATE 'utf8mb4_unicode_ci',
	PRIMARY KEY (`id`),
	INDEX `a` (`a`),
	INDEX `b` (`b`)
)
COLLATE='utf8mb4_nopad_bin'
ENGINE=InnoDB
;
INSERT INTO `aaa` VALUES (1, '123', '123');

Compare result of

SELECT * FROM netcon_fahrzeug.aaa WHERE a LIKE '123%'

to

SELECT * FROM netcon_fahrzeug.aaa IGNORE INDEX (a) WHERE a LIKE '123%'

The results shouldn't be different, the first query has an empty result which is not expected.

SELECT * FROM netcon_fahrzeug.aaa WHERE b LIKE '123%'

works as expected though so this is happening with the collation utf8mb4_unicode_nopad_ci (and maybe others).



 Comments   
Comment by Alice Sherepa [ 2017-11-17 ]

Thanks for the report! Reproducible with Innodb, but not with Myisam, (after 10.2.2)

testcase:

CREATE TABLE `t1` (
	`a` VARCHAR(50) COLLATE 'utf8mb4_unicode_nopad_ci' ,
	`b` int,
	INDEX (a)
) ENGINE=InnoDB;
 
INSERT INTO `t1` VALUES ('123', 123);
#wrong results --empty set -expected 123 123
 SELECT a,b FROM t1 WHERE a LIKE '123%';
 SELECT * FROM t1 WHERE a LIKE '123%';
 
#correct results
 SELECT a FROM t1 WHERE a LIKE '123%';  
 SELECT a,b FROM t1 IGNORE INDEX (a) WHERE a LIKE '123%';
 

MariaDB [test]> select version();
+-----------------------+
| version()             |
+-----------------------+
| 10.2.11-MariaDB-debug |
+-----------------------+
1 row in set (0.00 sec)
 
MariaDB [test]>  SELECT a,b FROM t1 WHERE a LIKE '123%';
Empty set (0.00 sec)
 
MariaDB [test]>  SELECT * FROM t1 WHERE a LIKE '123%';
Empty set (0.00 sec)
 
MariaDB [test]>  SELECT a FROM t1 WHERE a LIKE '123%';  
+------+
| a    |
+------+
| 123  |
+------+
1 row in set (0.00 sec)
 
MariaDB [test]>  SELECT a,b FROM t1 IGNORE INDEX (a) WHERE a LIKE '123%';
+------+------+
| a    | b    |
+------+------+
| 123  |  123 |
+------+------+
1 row in set (0.00 sec)

ANALYZE FORMAT=JSON  SELECT * FROM t1 WHERE a LIKE '123%';
--------------
| ANALYZE                                                                                                          
| {
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.0539,
    "table": {
      "table_name": "t1",
      "access_type": "range",
      "possible_keys": ["a"],
      "key": "a",
      "key_length": "203",
      "used_key_parts": ["a"],
      "r_loops": 1,
      "rows": 1,
      "r_rows": 0,
      "r_total_time_ms": 0.0284,
      "filtered": 100,
      "r_filtered": 100,
      "index_condition": "t1.a like '123%'"
    }
  }
} |
1 row in set (0.00 sec)

ANALYZE FORMAT=JSON  SELECT a,b FROM t1 IGNORE INDEX (a) WHERE a LIKE '123%'
{
  "query_block": {
    "select_id": 1,
    "r_loops": 1,
    "r_total_time_ms": 0.1091,
    "table": {
      "table_name": "t1",
      "access_type": "ALL",
      "r_loops": 1,
      "rows": 1,
      "r_rows": 1,
      "r_total_time_ms": 0.0729,
      "filtered": 100,
      "r_filtered": 100,
      "attached_condition": "t1.a like '123%'"
    }
  }
} |

Comment by Alexander Barkov [ 2017-12-08 ]

Also repeatable with MyISAM, with more than one rows in the table:

CREATE OR REPLACE TABLE t1 (
 a VARCHAR(50) CHARACTER SET utf8mb4 COLLATE 'utf8mb4_unicode_nopad_ci' NOT NULL,
 b VARCHAR(50) CHARACTER SET utf8mb4 COLLATE 'utf8mb4_unicode_ci' NOT NULL,
 INDEX `a` (`a`),
 INDEX `b` (`b`)
) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('111', '111');
INSERT INTO t1 VALUES ('222', '222');
INSERT INTO t1 VALUES ('333', '333');
INSERT INTO t1 VALUES ('444', '444');
SELECT * FROM t1 WHERE a LIKE '111%';
SELECT * FROM t1 IGNORE INDEX (a) WHERE a LIKE '111%';

Comment by Alexander Barkov [ 2017-12-08 ]

The same problem is repeatable with UCA collations for other Unicode character sets, e.g. ucs2:

CREATE OR REPLACE TABLE t1 (
 a VARCHAR(50) CHARACTER SET ucs2 COLLATE ucs2_unicode_nopad_ci NOT NULL,
 b VARCHAR(50) CHARACTER SET ucs2 COLLATE ucs2_unicode_ci NOT NULL,
 INDEX `a` (`a`),
 INDEX `b` (`b`)
) ENGINE=MyISAM;
INSERT INTO t1 VALUES ('111', '111');
INSERT INTO t1 VALUES ('222', '222');
INSERT INTO t1 VALUES ('333', '333');
INSERT INTO t1 VALUES ('444', '444');
SELECT * FROM t1 WHERE a LIKE '111%';
SELECT * FROM t1 IGNORE INDEX (a) WHERE a LIKE '111%';

Comment by Alexander Barkov [ 2017-12-08 ]

The same problem is repeatable with 8bit collations:

CREATE OR REPLACE TABLE t1 (
 a VARCHAR(50) NOT NULL,
 b VARCHAR(50) NOT NULL,
 INDEX `a` (`a`)
) ENGINE=MyISAM CHARACTER SET latin1 COLLATE latin1_swedish_nopad_ci;
INSERT INTO t1 VALUES ('111', '111');
INSERT INTO t1 VALUES ('222', '222');
INSERT INTO t1 VALUES ('333', '333');
INSERT INTO t1 VALUES ('444', '444');
SELECT * FROM t1 WHERE a LIKE '111%';
SELECT * FROM t1 IGNORE INDEX (a) WHERE a LIKE '111%';

Generated at Thu Feb 08 08:12:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.