[MDEV-6276] Replicate_Do_DB option causes slave server to commit empty InnoDB transactions. Created: 2014-05-28  Updated: 2022-09-12  Resolved: 2022-09-12

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.0.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Bill Rios Assignee: Kristian Nielsen
Resolution: Won't Fix Votes: 1
Labels: regression, replication
Environment:

RHEL 6.4



 Description   

Hi,

I'm using the Replicate_Do_DB option to replicate a DB with a very small amount of activity (less then one transaction per second). Another DB running on the master server runs many more transactions per second. In the output of 'show full processlist' on the slave, I can see the slave process constantly switching between COMMIT and 'closing tables' for the DB that should be excluded by the Replicate_Do_DB option. I did not have this issue when using MariaDB 5.5.27.

I enabled the general query log and it shows BEGIN and COMMIT repeatedly but no queries that actually alter any tables:

140527 12:13:35 1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
140527 12:13:36 1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT /* implicit, from Xid_log_event */
1721 Query BEGIN
1721 Query COMMIT /* implicit, from Xid_log_event */
1721 Query BEGIN
1721 Query COMMIT
1721 Query BEGIN
1721 Query COMMIT

This generates a lot of I/O contention that I've had to mitigate on this development server by setting innodb_flush_log_at_trx_commit to 0. I wouldn't be comfortable with that setting on a production DB server though.

Regards,
Bill



 Comments   
Comment by Kristian Nielsen [ 2014-06-10 ]

I verified this bug with the following test case:

--source include/have_innodb.inc
--source include/have_binlog_format_statement.inc
--source include/master-slave.inc
 
--connection slave
--source include/stop_slave.inc
SET @old_do_db = @@GLOBAL.replicate_do_db;
SET GLOBAL replicate_do_db = "test2,test3";
--source include/start_slave.inc
 
--connection master
CREATE DATABASE test2;
CREATE DATABASE test3;
 
use test;
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t1 VALUES (101);
 
use test2;
CREATE TABLE t2 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t2 VALUES (201);
INSERT INTO t2 VALUES (202), (203);
 
use test3;
CREATE TABLE t3 (a INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t3 VALUES (301);
BEGIN;
INSERT INTO t3 VALUES (302);
INSERT INTO t3 VALUES (303);
COMMIT;
 
use test;
UPDATE t1 SET a = a + 10;
INSERT INTO t1 VALUES (102);
DELETE FROM t1 WHERE a = 111;
 
use test2;
UPDATE t2 SET a = a + 10 WHERE a MOD 2 = 0;
 
use test;
INSERT INTO t1 VALUES (103);
SELECT @@gtid_binlog_pos;
 
SHOW BINLOG EVENTS;
--save_master_pos
 
--connection slave
--sync_with_master
SHOW BINLOG EVENTS;
SELECT * FROM mysql.gtid_slave_pos ORDER BY domain_id, sub_id;
SELECT @@gtid_binlog_pos;
 
--error ER_NO_SUCH_TABLE
SELECT * FROM test.t1;
SELECT * FROM test2.t2;
SELECT * FROM test3.t3;
 
--connection master
use test;
DROP TABLE t1;
use test2;
DROP TABLE t2;
use test3;
DROP TABLE t3;
use test;
DROP DATABASE test2;
DROP DATABASE test3;
--save_master_pos
 
--connection slave
--sync_with_master
--source include/stop_slave.inc
SET GLOBAL replicate_do_db = @old_do_db;
--source include/start_slave.inc
 
--source include/rpl_end.inc

In the output of this test case, we can see this:

SELECT * FROM mysql.gtid_slave_pos ORDER BY domain_id, sub_id;
domain_id	sub_id	server_id	seq_no
0	14	1	14
0	15	1	15
SELECT @@gtid_slave_pos;
@@gtid_slave_pos
0-1-15
SELECT @@gtid_binlog_pos;
@@gtid_binlog_pos
0-1-14

So the problem is that for the empty transaction, even though there are no
changes (and nothing logged into the binlog), we still update the gtid slave
pos, which requires an expensive commit of the mysql.gtid_slave_pos table.

It might have been deliberate to update the gtid_slave_pos, the idea being
that if the slave is stopped and then re-started, it should not try to
re-execute events that were already skipped.

But in any case, I think this bug clearly shows that such behaviour is
wrong. If user filters out events, she will not expect transaction commit of
such filtered events.

So this needs to be fixed, I think.

Comment by Kristian Nielsen [ 2014-06-12 ]

Hm, there is a potential problem with not updating the GTID state for events
that are ignored.

If a lot of events are ignored (perhaps within a specific domain only), the
slave's current GTID position (within that domain) may remain at an old GTID
for a long time. This could eventually lead the slave reconnect to fail due to
the master having purged old binlog files.

The slave does not need any of the events that the master has purged, but the
master cannot know this.

One way to handle this might be to update the table only occasinally. For
example once for each master binlog file, which could be conveniently done by
updating when the GTID_LIST_EVENT is received.

But it might be somewhat confusing to users that the current gtid position
jumps ahead in big jumps, seemingly at random.

Another approach might be to update the in-memory @@gtid_slave_pos for every
event, but update the underlying table only occasionally. In case of crash,
the updated in-memory position is not preserved, but that is not a problem as
the events will just be re-fetch from the master and ignored again. But it is
somewhat ugly to introduce such an inconsistency between the table and the
variable - and it could confuse users that expect to be able to query the
table to get the current position.

Comment by Kristian Nielsen [ 2014-06-12 ]

Maybe we could make a general facility to not update the mysql.gtid_slave_pos
in a crash-safe way, for users that do not want or need it.

Then it could be configured to to eg. update all event groups crash safe,
update only non-ignored groups crash-safe, update no groups crash safe.

Then when a GTID is not to be recorded crash-safe, we can record it only in
the in-memory hash. And then periodically, perhaps everytime a GTID_LIST event
is seen or every N GTIDs, we can update the table.

And we will also update whenever a slave thread stops, so that in the no-crash
case there would be no difference for the user.

This could then be configurable by the DBA, which could also help reduce the
overhead introduced by the GTID update, for users that do not use GTID...

Comment by Sergei Golubchik [ 2022-09-12 ]

10.0 was EOLed in March 2019

Generated at Thu Feb 08 07:10:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.