[MDEV-6210] Replication slows down significantly if select query is executed on slave using XtraDB - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: 5.5.37
Fix Version/s: 5.5.37
Component/s: None
Labels:
- upstream
- xtradb

Description

We have a problem where a long running SELECT query running on one schema affects replication that is running on a different schema.

There are three hosts:

Host A: Sysbench
Host B: MySQL 5.5 Master
Host C: MariaDB 5.5.37 Slave

We have two schemas: test (4.4GB) and test2 (88GB).

This is the test we run on schema `test`:

sysbench \

--mysql-host=192.168.0.241 \

--mysql-port=3306 \

--mysql-user=root \

--mysql-password=123123 \

--mysql-socket=/tmp/mysql.sock \

--mysql-db=test \

--mysql-table-engine=innodb \

--mysql-ignore-duplicates=on \

--test=/usr/share/doc/sysbench/tests/db/oltp.lua \

--num-threads=128 \

--oltp-table-size=100000000 \

--oltp-read-only=off \

--oltp-test-mode=complex \

--max-requests=0 \

--max-time=3000 \

run

On the slave, we monitor Handler_Commit status variable. We run a perl script that just grabs us the value and prints out per-second averages:

[root@test01 ~]# perl supporttools-master/perl/stat_monitor.pl Handler_commit 10

3559.60

3598.40

3570.60

3597.40

With 10 second polling intervals, we see an average of almost 3600 commits per second. We also see seconds_behind_master at 0, and relay logs are not building up:

-rw-rw----. 1 mariadb dba 335 May 5 19:26 mariadb-relay.000821

-rw-rw----. 1 mariadb dba 27451766 May 5 19:27 mariadb-relay.000822

On the slave, we run a long query on database `test2`.

SELECT

...

FROM

APP

  JOIN B

    ON B.cid = APP.cid

    AND B.p...d = 'string'

  JOIN META

    ON META.cid = APP.cid

  JOIN CTG

    ON CTG.d...o = APP.c...1

  JOIN MBR

    ON MBR.m...o = APP.d...d

  JOIN ITEM

    ON ITEM.cid = APP.CID

    AND ITEM.i...e = 'string'

  JOIN COMM

    ON CONCAT(COMM.c...d, COMM.c...d) = MBR.d...d

GROUP BY ITEM.cid ;

Here is the explain

+------+-------------+-------+--------+-----------------------------------------------------+------------------------------+---------+-----------------------+--------+---------------------------------+

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |

+------+-------------+-------+--------+-----------------------------------------------------+------------------------------+---------+-----------------------+--------+---------------------------------+

| 1 | SIMPLE | COMM | ALL | NULL | NULL | NULL | NULL | 4447 | Using temporary; Using filesort |

| 1 | SIMPLE | B | ref | fk_...phone,key_...01 | key_...01 | 62 | const | 119788 | Using index condition |

| 1 | SIMPLE | META | ref | key_cid | key_cid | 30 | B.cid | 1 | |

| 1 | SIMPLE | APP | eq_ref | PRIMARY,KEY_...5,key_...id | PRIMARY | 30 | B.cid | 1 | Using where |

| 1 | SIMPLE | MBR | eq_ref | PRIMARY | PRIMARY | 92 | APP.d...id | 1 | Using where |

| 1 | SIMPLE | CTG | eq_ref | PRIMARY | PRIMARY | 62 | APP....1 | 1 | |

| 1 | SIMPLE | ITEM | ref | fk_...item | fk_...item | 30 | cid | 9 | Using where |

+------+-------------+-------+--------+-----------------------------------------------------+------------------------------+---------+-----------------------+--------+---------------------------------+

7 rows in set (0.00 sec)

Around 20-30 seconds in, we see a drop in commits per second:

3563.40

2419.60

490.60

3218.20

446.60

425.20

We see seconds_behind_master increasing, but replication is still going. We see Exec_Master_Log_Pos continually increasing. And relay logs building up:

-rw-rw----. 1 mariadb dba 335 May 5 19:26 mariadb-relay.000821

-rw-rw----. 1 mariadb dba 314572983 May 5 19:42 mariadb-relay.000822

-rw-rw----. 1 mariadb dba 988 May 5 19:42 mariadb-relay.000823

-rw-rw----. 1 mariadb dba 335 May 5 19:42 mariadb-relay.000824

-rw-rw----. 1 mariadb dba 35844766 May 5 19:44 mariadb-relay.000825

Killing the query, we see a jump in commits per second:

112.60
111.60
114.00
<killed query>
7439.40
28726.00
29618.00
...
26761.40
26279.00
18664.60
3567.00
3577.00
3599.00
3582.00

Then goes back to our normal replication speed. Relay logs are all cleaned up. We switch to the InnoDB plugin with this configuration setting:

ignore_builtin_innodb

plugin_load=innodb=ha_innodb.so

And run the same tests and monitors. No replication lag builds up. The queries cannot really be optimized, as they are dynamically generated. This issue is also present on two different data sets, so we believe the root cause of the issue is within XtraDB.

https://bugs.launchpad.net/percona-server/+bug/1316400

Attachments

Issue Links

links to

the issue in the XtraDB bug tracker

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Jan Lindström (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2014-05-06 07:23

Updated:: 2014-07-04 08:47

Resolved:: 2014-07-04 08:47

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.