[MDEV-26659] optimizer fails to consider character set conversion cost on PK joins Created: 2021-09-21  Updated: 2021-12-21

Status: Confirmed
Project: MariaDB Server
Component/s: Character Sets, Optimizer
Affects Version/s: 10.6.4
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Philip orleans Assignee: Sergei Petrunia
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Linux


Attachments: Zip Archive test_202109212120.zip    

 Description   

I isolated an amazing, catastrophic lack of performance for a common business operation. Something that with the same tables and data takes 0.5 seconds in MS SQL, in Mariadb it takes 15 minutes, and is not related to the engine, for I tried with innodb, memory and rocksdb with identical results.
two tables

CREATE TABLE test1 (
  callid varchar(64) NOT NULL DEFAULT '',
  created datetime NOT NULL DEFAULT current_timestamp,
  PRIMARY KEY (callid)
)
ENGINE = INNODB,
AVG_ROW_LENGTH = 88,
CHARACTER SET latin1,
COLLATE latin1_swedish_ci;
 
CREATE TABLE test2 (
  callid varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
  date datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (callid)
)
ENGINE = INNODB,
AVG_ROW_LENGTH = 88,
CHARACTER SET latin1,
COLLATE latin1_swedish_ci;
 
fill them with 65.000 rows (in my case)
 
select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------+---------+
| callid | created |
+--------+---------+
| NULL   | NULL    |
+--------+---------+
1 row in set (14 min 35.123 sec)

if you try to do
select * from test2 where callid not in (select callid from test1);

the result is the same

This operation takes 1/2 in Ms SQL.
What is wrong?



 Comments   
Comment by Philip orleans [ 2021-09-21 ]

I just tested with Mysql 8.26, same data and columns

select c.* from opensips.test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------+---------+
| callid | created |
+--------+---------+
| NULL   | NULL    |
+--------+---------+
1 row in set (0.26 sec)

Comment by Daniel Black [ 2021-09-22 ]

I'm having trouble reproducing this:

MariaDB [(none)]> use test;
Database changed
MariaDB [test]> CREATE TABLE test1 ( callid varchar(64) NOT NULL DEFAULT '', created datetime NOT NULL DEFAULT current_timestamp, PRIMARY KEY (callid) );
Query OK, 0 rows affected (0.036 sec)
 
MariaDB [test]> CREATE TABLE test2 (
    -> callid varchar(64) NOT NULL DEFAULT '',
    -> created datetime NOT NULL DEFAULT current_timestamp,
    -> PRIMARY KEY (callid)
    -> );
Query OK, 0 rows affected (0.033 sec)
 
MariaDB [test]> insert into test1 (callid) select * from seq_1_to_65000;
Query OK, 65000 rows affected (0.261 sec)
Records: 65000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> insert into test2 (callid) select * from seq_1_to_65000;
Query OK, 65000 rows affected (0.316 sec)
Records: 65000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
Empty set (0.107 sec)
 
MariaDB [test]> explain select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+------+-------------+-------+--------+---------------+---------+---------+---------------+-------+-------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref           | rows  | Extra                   |
+------+-------------+-------+--------+---------------+---------+---------+---------------+-------+-------------------------+
|    1 | SIMPLE      | t     | index  | NULL          | PRIMARY | 258     | NULL          | 65693 | Using index             |
|    1 | SIMPLE      | c     | eq_ref | PRIMARY       | PRIMARY | 258     | test.t.callid | 1     | Using where; Not exists |
+------+-------------+-------+--------+---------------+---------+---------+---------------+-------+-------------------------+
2 rows in set (0.001 sec)
 
MariaDB [test]> show create table test1;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                   |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test1 | CREATE TABLE `test1` (
  `callid` varchar(64) NOT NULL DEFAULT '',
  `created` datetime NOT NULL DEFAULT current_timestamp(),
  PRIMARY KEY (`callid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)
 
MariaDB [test]> select * from test1 limit 4;
+--------+---------------------+
| callid | created             |
+--------+---------------------+
| 1      | 2021-09-22 00:56:56 |
| 10     | 2021-09-22 00:56:56 |
| 100    | 2021-09-22 00:56:56 |
| 1000   | 2021-09-22 00:56:56 |
+--------+---------------------+
4 rows in set (0.001 sec)
 
MariaDB [test]> select * from test1 limit 7;
+--------+---------------------+
| callid | created             |
+--------+---------------------+
| 1      | 2021-09-22 00:56:56 |
| 10     | 2021-09-22 00:56:56 |
| 100    | 2021-09-22 00:56:56 |
| 1000   | 2021-09-22 00:56:56 |
| 10000  | 2021-09-22 00:56:56 |
| 10001  | 2021-09-22 00:56:56 |
| 10002  | 2021-09-22 00:56:56 |
+--------+---------------------+
7 rows in set (0.000 sec)
 
MariaDB [test]> update test1 set callid=NULL WHERE MOD(callid, 8)=0;
ERROR 1048 (23000): Column 'callid' cannot be null
MariaDB [test]>  select version();
+-------------------------------------+
| version()                           |
+-------------------------------------+
| 10.6.4-MariaDB-1:10.6.4+maria~focal |
+-------------------------------------+
1 row in set (0.001 sec)

How can callid be NULL if the column is declared NOT NULL? What does ANALYZE FORMAT=JSON thequery show for yoU?

Comment by Philip orleans [ 2021-09-22 ]

I am uploading a dump of the two tables with data. It creates a database called test.

explain FORMAT=JSON select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "query_block": {
    "select_id": 1,
    "table": {
      "table_name": "t",
      "access_type": "index",
      "key": "PRIMARY",
      "key_length": "258",
      "used_key_parts": ["callid"],
      "rows": 65500,
      "filtered": 100,
      "using_index": true
    },
    "block-nl-join": {
      "table": {
        "table_name": "c",
        "access_type": "ALL",
        "rows": 65500,
        "filtered": 100,
        "not_exists": true
      },
      "buffer_type": "flat",
      "buffer_size": "256Kb",
      "join_type": "BNL",
      "attached_condition": "trigcond(c.callid is null) and trigcond(convert(c.callid using utf8mb4) = t.callid)"
    }
  }
} |
+-----------------------------

Comment by Daniel Black [ 2021-09-22 ]

Silly me on the is null comment, its obviously a join. Me gets another cofffee.

Confirmed, thanks for the test case philip_38

Comes down to callid in both tables have a different character set:

--
-- Create table `test2`
--
CREATE TABLE test2 (
  callid varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL DEFAULT '',
  date datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (callid)
)
ENGINE = INNODB,
AVG_ROW_LENGTH = 88,
CHARACTER SET latin1,
COLLATE latin1_swedish_ci;
 
--
-- Create table `test1`
--
CREATE TABLE test1 (
  callid varchar(64) NOT NULL DEFAULT '',
  created datetime NOT NULL DEFAULT current_timestamp,
  PRIMARY KEY (callid)
)
ENGINE = INNODB,
AVG_ROW_LENGTH = 88,
CHARACTER SET latin1,
COLLATE latin1_swedish_ci;

A perf profile looks like:

+   34.69%  mysqld   mariadbd             [.] my_convert_fix                                                                                                                                                                          ▒
+   19.89%  mysqld   mariadbd             [.] my_wc_mb_utf8mb4                                                                                                                                                                        ▒
+   12.01%  mysqld   mariadbd             [.] my_mb_wc_latin1                                                                                                                                                                         ▒
+    4.64%  mysqld   mariadbd             [.] Item_field::val_str                                                                                                                                                                     ▒
+    2.82%  mysqld   mariadbd             [.] Arg_comparator::compare_string                                                                                                                                                          ▒
+    2.16%  mysqld   mariadbd             [.] JOIN_CACHE::check_match                                                                                                                                                                 ▒
+    2.03%  mysqld   mariadbd             [.] Item_cond_and::val_int                                                                                                                                                                  ▒
+    2.03%  mysqld   mariadbd             [.] String_copier_for_item::copy_with_warn                                                                                                                                                  ▒
+    1.96%  mysqld   mariadbd             [.] Item_func_conv_charset::val_str                                                                                                                                                         ◆
+    1.73%  mysqld   mariadbd             [.] Field_varstring::val_str                                                                                                                                                                ▒
+    1.73%  mysqld   mariadbd             [.] JOIN_CACHE::read_all_record_fields                                                                                                                                                      ▒
+    1.40%  mysqld   mariadbd             [.] JOIN_CACHE::read_record_field                                                                                                                                                           ▒
+    1.26%  mysqld   mariadbd             [.] JOIN_CACHE::skip_if_not_needed_match                                                                                                                                                    ▒
+    1.25%  mysqld   mariadbd             [.] JOIN_CACHE::get_record                                                                                                                                                                  ▒
+    1.11%  mysqld   mariadbd             [.] my_strnncollsp_utf8mb4_general_ci      

So the problem is that the character set of each isn't the same and its converting each row. Making the the same character set causes the problem to go away.

MariaDB [test]> alter table test2 modify callid varchar(64) CHARACTER SET latin1 NOT NULL DEFAULT '';
Query OK, 65405 rows affected (0.224 sec)              
Records: 65405  Duplicates: 0  Warnings: 0
 
MariaDB [test]>  select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------+---------+
| callid | created |
+--------+---------+
| NULL   | NULL    |
+--------+---------+
1 row in set (0.103 sec)
 
MariaDB [test]>  explain select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+------+-------------+-------+--------+---------------+---------+---------+---------------+-------+-------------------------+
| id   | select_type | table | type   | possible_keys | key     | key_len | ref           | rows  | Extra                   |
+------+-------------+-------+--------+---------------+---------+---------+---------------+-------+-------------------------+
|    1 | SIMPLE      | t     | index  | NULL          | PRIMARY | 66      | NULL          | 63849 | Using index             |
|    1 | SIMPLE      | c     | eq_ref | PRIMARY       | PRIMARY | 66      | test.t.callid | 1     | Using where; Not exists |
+------+-------------+-------+--------+---------------+---------+---------+---------------+-------+-------------------------+
2 rows in set (0.001 sec)
 
MariaDB [test]> alt

MDEV-26572 was improving character performance for 10.7, however its not enough in this case.

Also in 10.7 there is currently a preview of a UUID data type - https://mariadb.org/10-7-preview-feature-uuid-data-type/ that I've prototyped below:

10.7-preview-uuid

MariaDB [test]> alter table test1 modify callid uuid not null default '00000000-0000-0000-0000-000000000000';
Query OK, 65489 rows affected (0.312 sec)
Records: 65489  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table test2 modify callid uuid not null default '00000000-0000-0000-0000-000000000000';
Query OK, 65405 rows affected (0.262 sec)
Records: 65405  Duplicates: 0  Warnings: 0
 
MariaDB [test]>  select c.*,t.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------+---------+--------------------------------------+---------------------+
| callid | created | callid                               | date                |
+--------+---------+--------------------------------------+---------------------+
| NULL   | NULL    | 86f5dc5f-70da-4ece-9c7a-5c6c2ef823d2 | 2021-09-18 16:56:27 |
+--------+---------+--------------------------------------+---------------------+
1 row in set (0.084 sec)

For the moment I recommend one of:

MariaDB [test]> alter table test1 modify callid char(36) character set ascii not null default ''; 
Query OK, 65489 rows affected (0.327 sec)              
Records: 65489  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table test2 modify callid char(36) character set ascii not null default '';
Query OK, 65405 rows affected (0.282 sec)              
Records: 65405  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------+---------+
| callid | created |
+--------+---------+
| NULL   | NULL    |
+--------+---------+
1 row in set (0.144 sec)
 
MariaDB [test]> alter table test2 modify callid binary(36) not null default '';
Query OK, 65405 rows affected (0.292 sec)              
Records: 65405  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table test1 modify callid binary(36) not null default '';
Query OK, 65489 rows affected (0.273 sec)              
Records: 65489  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+--------+---------+
| callid | created |
+--------+---------+
| NULL   | NULL    |
+--------+---------+
1 row in set (0.096 sec)

From the warnings we see MySQL-8.0 deliberately avoid the query plan because of the character set differences. Hence leaving the bug open for now.

mysql-8.0.24

mysql> explain  select c.* from test1 c right join test2 t on c.callid=t.callid where c.callid is null;
+----+-------------+-------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------------------------------------+
| id | select_type | table | partitions | type  | possible_keys | key     | key_len | ref  | rows  | filtered | Extra                                                  |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------------------------------------+
|  1 | SIMPLE      | t     | NULL       | index | NULL          | PRIMARY | 258     | NULL | 65405 |   100.00 | Using index                                            |
|  1 | SIMPLE      | c     | NULL       | ALL   | PRIMARY       | NULL    | NULL    | NULL | 65500 |    10.00 | Using where; Not exists; Using join buffer (hash join) |
+----+-------------+-------+------------+-------+---------------+---------+---------+------+-------+----------+--------------------------------------------------------+
2 rows in set, 3 warnings (0.00 sec)
 
mysql> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1739
Message: Cannot use ref access on index 'PRIMARY' due to type or collation conversion on field 'callid'
*************************** 2. row ***************************
  Level: Warning
   Code: 1739
Message: Cannot use range access on index 'PRIMARY' due to type or collation conversion on field 'callid'
*************************** 3. row ***************************
  Level: Note
   Code: 1003
Message: /* select#1 */ select `test`.`c`.`callid` AS `callid`,`test`.`c`.`created` AS `created` from `test`.`test2` `t` left join `test`.`test1` `c` on((`test`.`c`.`callid` = `test`.`t`.`callid`)) where (`test`.`c`.`callid` is null)
3 rows in set (0.00 sec)

Comment by Philip orleans [ 2021-09-22 ]

Question: this problem when different tables are made of different character sets si terrible, Is there a way to enforce a single character set server-wide?
As you say, Mysql (Oracle) could care less, it performed perfectly in this case. If they can do it, why MariaDB cannot look at the code and apply the same idea?

Comment by Philip orleans [ 2021-09-22 ]

The configuration files is using

grep -i character *
50-server.cnf:# * Character sets
50-server.cnf:# utf8 4-byte character set. See also client.cnf
50-server.cnf:character-set-server  = utf8mb4
grep: 99-enable-encryption.cnf.preset: Is a directory
 
root@mariadb:/etc/mysql/mariadb.conf.d# grep -i collation *
50-server.cnf:collation-server      = utf8mb4_general_ci
 
in some boxes, Ubuntu,
show variables like '%character%';
haracter_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ 
 
and others, Debian
show variables like '%character%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb3                    |
| character_set_connection | utf8mb3                    |
| character_set_database   | utf8mb4                    |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb3                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8mb3                    |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

this makes no sense.

The question is: when MariaDB is installing, why is choosing this particular character set and not latin1? I live in the USA and only speak Romance languages.
What we need is a way to disallow the creation of any objects with a different character set.

Comment by Philip orleans [ 2021-09-22 ]

There is a further problem:
"The .deb packages provided by MariaDB Foundation's and MariaDB Corporation's repositories are not identical to the official .deb packages provided by Debian's and Ubuntu's default repositories."
According to the online documentation, Debian packages have a different character set, away from the default latin1. But, I did not install MariaDB in my Debian11 box from Debian repositories, but from MariaDB mirrors:
sources.list:deb [arch=amd64,ppc64el,arm64] http://mirrors.accretive-networks.net/mariadb/repo/10.6/debian bullseye main
yet the installation is not using latin1, but
character_set_client | utf8mb3 |

character_set_connection utf8mb3
character_set_database utf8mb4
character_set_filesystem binary
character_set_results utf8mb3
character_set_server utf8mb4
character_set_system utf8mb3

so how do I keep latin1 always, regardless of the OS preference or default?

Comment by Daniel Black [ 2021-09-22 ]

> Is there a way to enforce a single character set server-wide?

The server has a default character set, that databases inherit by default, that tables inherit by default, that columns inherit by default.

There is no single enforcement. The database does what is instructed.

> As you say, Mysql (Oracle) could care less, it performed perfectly in this case. If they can do it, why MariaDB cannot look at the code and apply the same idea?

That is why I re-titled the issue. So that it could get the attention to the cause of the problem. An off topic discussion on packaging defaults, or a feature request that enforces character set, will only distract this issue from fixing the optimizer choices. I'll take no further part in this issue.

Generated at Thu Feb 08 09:46:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.