[MDEV-30057] strnncollsp_nchars() gets called in a suspicious way from InnoDB Created: 2022-11-21  Updated: 2022-11-21

Status: Open
Project: MariaDB Server
Component/s: Character Sets, Storage Engine - InnoDB
Affects Version/s: 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: None


 Description   

I apply this patch:

--- a/storage/innobase/rem/rem0cmp.cc
+++ b/storage/innobase/rem/rem0cmp.cc
@@ -326,8 +326,11 @@ static int cmp_whole_field(ulint mtype, ulint prtype,
     DBUG_ASSERT(is_strnncoll_compatible(prtype & DATA_MYSQL_TYPE_MASK));
     if (CHARSET_INFO *cs= get_charset(dtype_get_charset_coll(prtype),
                                       MYF(MY_WME)))
+    {
+      DBUG_ASSERT(a_length == b_length || cs->mbminlen != cs->mbmaxlen);
       return cs->coll->strnncollsp_nchars(cs, a, a_length, b, b_length,
                                           std::max(a_length, b_length));
+    }
   }

The idea of the assert is to prove that InnoDB for the CHAR(N) data type:

  • performs trailing space compression for variable length encodings (e.g. utf8)
  • does not perform trailing space compression for fixed length encodings (e.g. latin1)

Then I run this script:

drop table if exists t2,t1;
create table t1 (a int primary key,s1 varchar(2) character set latin1 collate latin1_bin not null unique) engine=innodb;
create table t2 (s1 char(2) character set latin1 collate latin1_bin not null, constraint c foreign key(s1) references t1(s1) on update cascade) engine=innodb;
insert into t1 values(1,0x4100),(2,0x41);
insert into t2 values(0x41);

the server crashes on the new assert when doing the insert into t2.

Tracing in gdb show that the lengths are different indeed:

(gdb) p	a_length
$1 = 2
(gdb) p	b_length
$2 = 1

If I change the collation from latin1_bin to latin1_german1_ci:

drop table if exists t2,t1;
create table t1 (a int primary key,s1 varchar(2) character set latin1 collate latin1_german1_ci not null unique) engine=innodb;
create table t2 (s1 char(2) character set latin1 collate latin1_german1_ci not null, constraint c foreign key(s1) references t1(s1) on update cascade) engine=innodb;
insert into t1 values(1,0x4100),(2,0x41);
insert into t2 values(0x41);

it does not crash.

Tracing the same place shows that lengths are equal:

(gdb) p	a_length
$3 = 2
(gdb) p	b_length
$4 = 2

This looks suspicious. The two collations should work symmetrically in this context.

  • latin1_bin behaviour does not seem to be expected
  • latin1_german1_ci seems to work fine


 Comments   
Comment by Marko Mäkelä [ 2022-11-21 ]

The code has been refactored in MDEV-21924 in 10.5. I tested the following patch on 10.6:

diff --git a/storage/innobase/rem/rem0cmp.cc b/storage/innobase/rem/rem0cmp.cc
index 7fb6fdac1ba..9b10009978f 100644
--- a/storage/innobase/rem/rem0cmp.cc
+++ b/storage/innobase/rem/rem0cmp.cc
@@ -282,14 +282,20 @@ static int cmp_data(ulint mtype, ulint prtype, const byte *data1, ulint len1,
   case DATA_VARMYSQL:
     DBUG_ASSERT(is_strnncoll_compatible(prtype & DATA_MYSQL_TYPE_MASK));
     if (CHARSET_INFO *cs= all_charsets[dtype_get_charset_coll(prtype)])
+    {
+      ut_ad(len1 == len2 || cs->mbminlen != cs->mbmaxlen);
       return cs->coll->strnncollsp(cs, data1, len1, data2, len2);
+    }
   no_collation:
     ib::fatal() << "Unable to find charset-collation for " << prtype;
   case DATA_MYSQL:
     DBUG_ASSERT(is_strnncoll_compatible(prtype & DATA_MYSQL_TYPE_MASK));
     if (CHARSET_INFO *cs= all_charsets[dtype_get_charset_coll(prtype)])
+    {
+      ut_ad(len1 == len2 || cs->mbminlen != cs->mbmaxlen);
       return cs->coll->strnncollsp_nchars(cs, data1, len1, data2, len2,
                                           std::max(len1, len2));
+    }
     goto no_collation;
   case DATA_VARCHAR:
   case DATA_CHAR:

The first added assertion would fail in a number of tests:

./mtr --parallel=auto --big-test --force --max-test-fail=0 --skip-core-file innodb.innodb_autoinc_lock_mode_zero versioning.foreign innodb.innodb-ucs2 innodb.innodb_ctype_latin1 innodb.innodb-index_ucs2 innodb.innodb optimizer_unfixed_bugs.bug42991 main.subselect2 gcol.gcol_bugfixes innodb.blob_cmp_empty main.innodb_ext_key innodb_fts.fulltext_distinct innodb.mvcc_secondary main.endspace innodb_zip.bug56680 innodb.mdev-27746

10.6 6b083ce85185b2362e212c5e5a2a1ddc4acfcc68

mariadbd: /mariadb/10.6/storage/innobase/rem/rem0cmp.cc:286: int cmp_data(ulint, ulint, const byte *, ulint, const byte *, ulint): Assertion `len1 == len2 || cs->mbminlen != cs->mbmaxlen' failed.

FOREIGN KEY constraints or column prefix indexes seem to be a common theme. For example, innodb.blob_cmp_empty hits the assertion failure on the UPDATE:

SET @fill_amount = (@@innodb_page_size / 2 ) + 1;
CREATE TABLE t1 (col_text TEXT NOT NULL, KEY (col_text(9))) ENGINE=InnoDB;
 
INSERT INTO t1 (col_text) VALUES (REPEAT('x', @fill_amount));
UPDATE t1 SET col_text='';

Generated at Thu Feb 08 10:13:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.