[MDEV-7578] Slave is ~10x slower to execute set of statements compared to master when using RBR Created: 2015-02-12 Updated: 2015-03-05 Resolved: 2015-03-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Storage Engine - InnoDB, Storage Engine - XtraDB |
| Affects Version/s: | 5.5.42, 10.0.16 |
| Fix Version/s: | 5.5.43, 10.0.18 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Jan Lindström (Inactive) | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | verified | ||
| Description |
|
First the table is not that large (1.1GB) and this number of rows:
The table is 1GB and the server has 192 GB of RAM.
This table exists on two different machines (lets call them machine1 and machine2). These machines are on two different replication streams and this causes that table contents on these machines may differ. Periodically, customer executes Percona tool pt-sync-table (http://www.percona.com/doc/percona-toolkit/2.2/pt-table-sync.html). On master this produces following kind of transactions (single transaction containing lot of statements):
On slave this same transaction is ~10x slower:
So runtime: 246994 = ~ 2 days and 21 hours. Again vs the original ~ 7 hours (on the master) this behaviour is really not good. So the question is: why it takes so much longer and what can be done about this to prevent it? Server was not I/O bound. It was 100% cpu bound with a single thread pegged at 100% Number of different statements on that one big transaction:
|
| Comments |
| Comment by Jan Lindström (Inactive) [ 2015-02-17 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Lets try to repeat first using same sane defaults. From InnoDB point of view transaction should be executed exactly similar way as in master if the set of actual statements are exactly the same. If in RBR some statement is replicated differently then we have different case. Here is what we understant to this point of time. We have two machines A and B, both contain the same table but table contents differ. Thus in machine A master executes selects (to find out the rows that differ and how), inserts, replaces, deletes to modify table rows to match table in B. This creates a big transaction that takes ~7h on master, on transaction commit master writes ~1G binlog that is then transfered to slave for execution. For some reason this execution takes 10x time compared with master. I did debug the slave, and there was basically only one SQL-thread doing something, executing this one big transaction, statement by statement. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2015-03-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Tentative test case (should be run with the increased testcase-timeout and maybe suite-timeout; although there is no need to wait till the end of the test, as soon as the problem is obvious it can be interrupted).
If the problem is reproducible, the 'Number of rows inserted' will grow slower and slower while the slave progresses through the huge transaction. I am running the test now, it will take a while. With 1M rows it's not convincing, that's why we couldn't reproduce it in the initial attempts. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2015-03-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Problem is on
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2015-03-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Fixed on 5.5: commit f66fbe8ce0ff4ffcd6a6c185f9b3d25bd9f67f8d Analysis: On master when executing (single/multi) row INSERTs/REPLACEs Fix: Use new style autoinc locks also when |