Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
5.5.37
-
None
Description
We have a problem where a long running SELECT query running on one schema affects replication that is running on a different schema.
There are three hosts:
Host A: Sysbench
Host B: MySQL 5.5 Master
Host C: MariaDB 5.5.37 Slave
We have two schemas: test (4.4GB) and test2 (88GB).
This is the test we run on schema `test`:
sysbench \
|
--mysql-host=192.168.0.241 \
|
--mysql-port=3306 \
|
--mysql-user=root \
|
--mysql-password=123123 \
|
--mysql-socket=/tmp/mysql.sock \
|
--mysql-db=test \
|
--mysql-table-engine=innodb \
|
--mysql-ignore-duplicates=on \
|
--test=/usr/share/doc/sysbench/tests/db/oltp.lua \
|
--num-threads=128 \
|
--oltp-table-size=100000000 \
|
--oltp-read-only=off \
|
--oltp-test-mode=complex \
|
--max-requests=0 \
|
--max-time=3000 \
|
run
|
On the slave, we monitor Handler_Commit status variable. We run a perl script that just grabs us the value and prints out per-second averages:
[root@test01 ~]# perl supporttools-master/perl/stat_monitor.pl Handler_commit 10
|
3559.60
|
3598.40
|
3570.60
|
3597.40
|
With 10 second polling intervals, we see an average of almost 3600 commits per second. We also see seconds_behind_master at 0, and relay logs are not building up:
-rw-rw----. 1 mariadb dba 335 May 5 19:26 mariadb-relay.000821
|
-rw-rw----. 1 mariadb dba 27451766 May 5 19:27 mariadb-relay.000822
|
On the slave, we run a long query on database `test2`.
SELECT
|
...
|
FROM
|
APP
|
JOIN B
|
ON B.cid = APP.cid
|
AND B.p...d = 'string'
|
JOIN META
|
ON META.cid = APP.cid
|
JOIN CTG
|
ON CTG.d...o = APP.c...1
|
JOIN MBR
|
ON MBR.m...o = APP.d...d
|
JOIN ITEM
|
ON ITEM.cid = APP.CID
|
AND ITEM.i...e = 'string'
|
JOIN COMM
|
ON CONCAT(COMM.c...d, COMM.c...d) = MBR.d...d
|
GROUP BY ITEM.cid ;
|
Here is the explain
+------+-------------+-------+--------+-----------------------------------------------------+------------------------------+---------+-----------------------+--------+---------------------------------+
|
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|
+------+-------------+-------+--------+-----------------------------------------------------+------------------------------+---------+-----------------------+--------+---------------------------------+
|
| 1 | SIMPLE | COMM | ALL | NULL | NULL | NULL | NULL | 4447 | Using temporary; Using filesort |
|
| 1 | SIMPLE | B | ref | fk_...phone,key_...01 | key_...01 | 62 | const | 119788 | Using index condition |
|
| 1 | SIMPLE | META | ref | key_cid | key_cid | 30 | B.cid | 1 | |
|
| 1 | SIMPLE | APP | eq_ref | PRIMARY,KEY_...5,key_...id | PRIMARY | 30 | B.cid | 1 | Using where |
|
| 1 | SIMPLE | MBR | eq_ref | PRIMARY | PRIMARY | 92 | APP.d...id | 1 | Using where |
|
| 1 | SIMPLE | CTG | eq_ref | PRIMARY | PRIMARY | 62 | APP....1 | 1 | |
|
| 1 | SIMPLE | ITEM | ref | fk_...item | fk_...item | 30 | cid | 9 | Using where |
|
+------+-------------+-------+--------+-----------------------------------------------------+------------------------------+---------+-----------------------+--------+---------------------------------+
|
7 rows in set (0.00 sec)
|
Around 20-30 seconds in, we see a drop in commits per second:
3563.40
|
2419.60
|
490.60
|
3218.20
|
446.60
|
425.20
|
We see seconds_behind_master increasing, but replication is still going. We see Exec_Master_Log_Pos continually increasing. And relay logs building up:
-rw-rw----. 1 mariadb dba 335 May 5 19:26 mariadb-relay.000821
|
-rw-rw----. 1 mariadb dba 314572983 May 5 19:42 mariadb-relay.000822
|
-rw-rw----. 1 mariadb dba 988 May 5 19:42 mariadb-relay.000823
|
-rw-rw----. 1 mariadb dba 335 May 5 19:42 mariadb-relay.000824
|
-rw-rw----. 1 mariadb dba 35844766 May 5 19:44 mariadb-relay.000825
|
Killing the query, we see a jump in commits per second:
112.60
|
111.60
|
114.00
|
<killed query>
|
7439.40
|
28726.00
|
29618.00
|
...
|
26761.40
|
26279.00
|
18664.60
|
3567.00
|
3577.00
|
3599.00
|
3582.00
|
Then goes back to our normal replication speed. Relay logs are all cleaned up. We switch to the InnoDB plugin with this configuration setting:
ignore_builtin_innodb
|
plugin_load=innodb=ha_innodb.so
|
And run the same tests and monitors. No replication lag builds up. The queries cannot really be optimized, as they are dynamically generated. This issue is also present on two different data sets, so we believe the root cause of the issue is within XtraDB.
Attachments
Issue Links
- links to