[MDEV-7764] Optimizer bug:error query plan Created: 2015-03-12  Updated: 2021-09-16  Resolved: 2021-09-16

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.14, 10.0.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: frank.zhang Assignee: Axel Schwenke
Resolution: Won't Fix Votes: 0
Labels: optimizer
Environment:

centos6.4


Attachments: JPEG File 20150312171330.jpg    

 Description   

I suffered this bug,when I run the benchmark test using mariadb 10.0.14.

use case:

Test tool:sysbench0.5
OS:centos6.4
DB version mariadb10.0.14
Data:8 tables and each 25000000rows

Some my.cnf configuration:

innodb_file_per_table
innodb_buffer_pool_size=63G
innodb_log_file_size= 1G
innodb_flush_method=O_DIRECT
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1
log-bin
binlog-format=mixed

Test command:

sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --num-threads=512 --db-driver=mysql --mysql-db=sbtest --max-requests=0 --oltp-table-size=25000000 --mysql-table-engine=innodb --mysql-port=55945 --mysql-user=us_frank --mysql-password=123456 --mysql-socket=/var/lib/mysql/mariadb.sock --oltp-tables-count=8 run

Other:no preheat data,num-threads more than 512

During the benchmark test, I ran a huge IO-cost query such as select count(*) from sbtest1;.A few time later , show processlist returned following slow queries

+------+----------------+-----------+--------+---------+------+--------------+----------------------------------------------------------------------+----------+
| Id   | User           | Host      | db     | Command | Time | State        | Info                                                                 | Progress |
+------+----------------+-----------+--------+---------+------+--------------+----------------------------------------------------------------------+----------+
| 5021 | us_frank | localhost | sbtest | Query   | 1430 | Sending data | SELECT SUM(K) FROM sbtest1 WHERE id BETWEEN 12522878 AND 12522878+99 |    0.000 |
| 5044 | us_frank | localhost | sbtest | Query   | 1428 | Sending data | SELECT SUM(K) FROM sbtest3 WHERE id BETWEEN 12407570 AND 12407570+99 |    0.000 |
| 5376 | us_frank | localhost | sbtest | Query   | 1430 | Sending data | SELECT SUM(K) FROM sbtest7 WHERE id BETWEEN 12545547 AND 12545547+99 |    0.000 |
+------+----------------+-----------+--------+---------+------+--------------+----------------------------------------------------------------------+----------+

I ran the explain immediately.

explain SELECT SUM(K) FROM sbtest7 WHERE id BETWEEN 12545547 AND 12545547+99;
+------+-------------+---------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table   | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+---------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | sbtest7 | range | PRIMARY       | PRIMARY | 4       | NULL |   99 | Using where |
+------+-------------+---------+-------+---------------+---------+---------+------+------+-------------+
1 row in set (0.00 sec)

That is right!

But slow log showed as following .

That is wrong!Full table scan!

It seems that mariadb will choose wrong query plan when server is in the condition with high io load.



 Comments   
Comment by VAROQUI Stephane [ 2015-03-12 ]

Can you try to reproduce with 10.0.17, this was already fixed.

Comment by Elena Stepanova [ 2015-03-12 ]

stephane@skysql.com,

Do you have a JIRA issue number handy, for the reference?

Comment by VAROQUI Stephane [ 2015-03-12 ]

similar to MDEV-7118

Comment by VAROQUI Stephane [ 2015-03-12 ]

Similar was not easy to reproduce . But gone after analyze table .
https://mariadb.atlassian.net/browse/MDEV-5598

Comment by frank.zhang [ 2015-03-13 ]

HI all,I have just tested again with mariadb 10.0.17.unfortunately,it sitll happened so.
Test environment is the same as use-case mentioned.

mysql> show processlist;
+-----+----------------+-----------+--------+---------+------+----------------------------------+-------------------------------------------------------------------------------------+----------+
| Id  | User           | Host      | db     | Command | Time | State                            | Info                                                                                | Progress |
+-----+----------------+-----------+--------+---------+------+----------------------------------+-------------------------------------------------------------------------------------+----------+
|   3 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|   4 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|   5 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|   6 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|   7 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|   8 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|   9 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|  10 | system user    |           | NULL   | Connect | 1864 | Waiting for work from SQL thread | NULL                                                                                |    0.000 |
|  13 | us_frank | localhost | sbtest | Sleep   |  667 |                                  | NULL                                                                                |    0.000 |
|  29 | us_frank | localhost | sbtest | Query   |  976 | Sending data                     | SELECT c FROM sbtest6 WHERE id BETWEEN 12585927 AND 12585927+99                     |    0.000 |
| 453 | us_frank | localhost | sbtest | Query   |  641 | Sending data                     | SELECT c FROM sbtest3 WHERE id BETWEEN 12463062 AND 12463062+99                     |    0.000 |
| 490 | us_frank | localhost | sbtest | Query   |  600 | Copying to tmp table             | SELECT DISTINCT c FROM sbtest5 WHERE id BETWEEN 14984948 AND 14984948+99 ORDER BY c |    0.000 |
| 532 | us_frank | localhost | NULL   | Query   |    0 | init                             | show processlist                                                                    |    0.000 |
+-----+----------------+-----------+--------+---------+------+----------------------------------+-------------------------------------------------------------------------------------+----------+
13 rows in set (0.00 sec)

mysql> select version();
+---------------------+
| version()           |
+---------------------+
| 10.0.17-MariaDB-log |
+---------------------+
1 row in set (0.00 sec)
 
# Time: 150313 16:03:53
# User@Host: us_frank[us_frank] @ localhost []
# Thread_id: 29  Schema: sbtest  QC_hit: No
# Query_time: 1266.003463  Lock_time: 0.000308  Rows_sent: 1  Rows_examined: 259420
# Full_scan: Yes  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0  Priority_queue: No
#
# explain: id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
# explain: 1	SIMPLE	sbtest6	index	PRIMARY	in_k_c	364	NULL	1	Using where; Using index
#
SET timestamp=1426233833;
SELECT c FROM sbtest6 WHERE id BETWEEN 12585927 AND 12585927+99;

mysql> show create table sbtest.sbtest6\G
       Table: sbtest6
Create Table: CREATE TABLE `sbtest6` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `k` int(10) unsigned NOT NULL DEFAULT '0',
  `c` char(120) NOT NULL DEFAULT '',
  `pad` char(60) NOT NULL DEFAULT '',
  `col1` varchar(10) DEFAULT 'a',
  PRIMARY KEY (`id`),
  KEY `k_1` (`k`),
  KEY `in_k_c` (`k`,`c`)
) ENGINE=InnoDB AUTO_INCREMENT=25006481 DEFAULT CHARSET=utf8 MAX_ROWS=1000000
 
mysql> show index in sbtest.sbtest6;
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| sbtest6 |          0 | PRIMARY  |            1 | id          | A         |    24215000 |     NULL | NULL   |      | BTREE      |         |               |
| sbtest6 |          1 | k_1      |            1 | k           | A         |      864821 |     NULL | NULL   |      | BTREE      |         |               |
| sbtest6 |          1 | in_k_c   |            1 | k           | A         |     3459285 |     NULL | NULL   |      | BTREE      |         |               |
| sbtest6 |          1 | in_k_c   |            2 | c           | A         |    24215000 |     NULL | NULL   |      | BTREE      |         |               |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

So,I think that this bug has not been fixed in mariadb10.0.17,THX~

Comment by Elena Stepanova [ 2015-03-16 ]

Hi,

I've been running the test for days now, and still not getting any full scans. So, something is missing here.
Could you please provide the complete cnf file(s)?

Also, your server is configured as a slave. Is there any real replication? If yes, does it touch the tables on which you run the sysbench test?

Comment by frank.zhang [ 2015-03-17 ]

@Elena Stepanova
my.cnf
[mysqld]
basedir=/opt/mariadb
datadir=/data/mariadb-data
socket=/var/lib/mysql/mariadb.sock
tmpdir=/data/mariadb-data/tmp
log_error=/data/mariadb-data/mariadb.err
user=mysql
port=55945
character-set-server=utf8
log-bin
binlog-format=mixed
long_query_time=10
slow-query-log=1
server-id=155
log-slave-updates
slave-net-timeout=120
rpl_semi_sync_master_enabled=1
rpl_semi_sync_master_timeout=1000
rpl_semi_sync_slave_enabled=1

slave_parallel_threads=8

old_passwords=0
#secure-auth=1
sql-mode="NO_AUTO_CREATE_USER"
safe-user-create=1
symbolic-links=0
skip_name_resolve=1
lower_case_table_names=1

innodb_file_per_table
innodb_buffer_pool_size=63G
innodb_log_file_size= 1G
innodb_flush_method=O_DIRECT

sort_buffer_size=2M
join_buffer_size=4M
query_cache_type=0

max_connections=5000
max_connect_errors=1000

log-slow-verbosity=query_plan,explain

innodb_flush_log_at_trx_commit = 1
sync_binlog = 1

#innodb_flush_log_at_trx_commit = 2
#sync_binlog = 0

performance_schema=1

[mysql]
default-character-set=utf8

Yes,this is a slave instance,but there was no any real replication events while sysbench was running.

Something important may be:
Must run benchmark test by sysbench0.5.Sysbench0.4 will do not happen so.
Don't preheat data.You can restart instance before sysbench0.5 starts to run.
System I/O load has to maintain a high level.You can use some high cost SQL ,such as select count(*) from sbtest1;,when sysbench0.5 is running.
Sysbench configuration:num-threads=512.

Comment by Elena Stepanova [ 2015-03-18 ]

Current status:

Again, I've been running the test for about 2 days now.
I'm using sysbench 0.5, bzr tree revno 132.
The exact same server config, except for paths and innodb_buffer_pool_size; same sysbench command line except for paths, names, ports.
I switched to 10.0.14, just in case.
I restarted the server after sysbench prepare before sysbench run.
I also started two select count from different sbtest tables with ~1 day interval (none of them ever finished).
I tried to analyze some tables (but not all) since I don't know if they were analyzed in the initial use case.
Still haven't got a single full scan.

One thing I did differently was that I created the tables in the default test database, so they ended up being latin1. Also, I ran the test under root.
Now I interrupted the previous run and started a new prepare, with a new schema and user, just in case there is some magic in there.
If it doesn't help, I'll be short of further ideas.
I wonder if the problem can have anything to do with the buffer pool size though, maybe I'll have to try a bigger machine where I can use a similarly big value.

Comment by Elena Stepanova [ 2015-03-22 ]

I've run a several-day test on a bigger machine, where i could use the exact same config, including innodb_buffer_pool_size and all; still haven't got full scans.
axel,
You have more experience at running sysbench tests and dealing with their results than I do, could you please take a look, maybe you will know what to do to reproduce the issue.

Comment by Axel Schwenke [ 2021-09-16 ]

This affects an old version of the server (10.0). If this problem persists with an up-to-date version, please open a new ticket.

Generated at Thu Feb 08 07:22:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.