[MDEV-5033] Replication filter producing unecessary table locking on the slave Created: 2013-09-18  Updated: 2014-10-25  Due: 2014-10-25  Resolved: 2014-10-25

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Linux x86_64



 Description   

Master have multiple databases with the same non transactional table (blackhole) inside every schema .

We randomly insert records into all databases tables.

Start a replication and filtering on a single database (replicate-do-db=schema1) . We observe sql thread locking all the tables of all the databases . The SQL thread is in locking state on the COMMIT statement . Despite this the filter works for DML and no records are inserted.

#130918 13:45:23 server id 3164  end_log_pos 925        GTID 0-3164-275862697
/*!100001 SET @@session.gtid_seq_no=275862697*//*!*/;
BEGIN
/*!*/;
# at 925
#130918 13:45:23 server id 3164  end_log_pos 1594       Query   thread_id=15    exec_time=0     error_code=0
use `ccmstats_shard04`/*!*/;
SET TIMESTAMP=1379504723/*!*/;
insert high_priority ignore into `ccmstats_shard04`.`ccmreferers`(`ip`,`date`,`firstseenon`,`keyword`,`domaine`,`referer`,`keyword_crc64`)values(1770066430,'2013-09-18 13:45:23','/download/telecharger-34056887','','www.commentcamarche.net','http://www.google.com/url?sa=t&rct=j&q=',5255623658049533571)
/*!*/;
# at 1594

In multi source we have 24 shards so all domain concurrently replay same lock and unlock tables decreasing the all replication speed .

Can we do something to remove the lock introduce by the master when inserting into non transactional table . or can we detect that the lock is part of a transaction that is fully filtered



 Comments   
Comment by Elena Stepanova [ 2014-10-11 ]

I couldn't reproduce the problem (even on 10.0.4), although maybe I just failed to decipher the description properly.

Here my MTR-like test case, as I understood the issue:

# Run with --replicate-do-db=test2
 
--source include/master-slave.inc
--source include/have_binlog_format_statement.inc
 
create database test1;
use test1;
create table test1.t (i int) engine=blackhole;
create database test2;
use test2;
create table test2.t (i int) engine=blackhole;
 
--sync_slave_with_master
 
create database if not exists test1;
create table if not exists test1.t (i int) engine=blackhole;
lock table test1.t write;
 
--connection master
 
use test1;
insert into test1.t values (1);
use test2;
insert into test2.t values (2);
 
--connection slave
sleep 1;
show processlist;
select * from test1.t;
unlock tables;
select * from test2.t;

If it's run without replicate-do-db, show processlist expectedly shows that the slave thread replicating insert into test1.t is waiting for table metadata lock. Allegedly, the test with replicate-do-db=test2 should show the same, if the slave thread still attempts to lock test1.t (it cannot do so because the table is locked by another thread).

However, I don't see it happen, processlist doesn't show any lock waits.

WITHOUT replicate-do-db

Id	User	Host	db	Command	Time	State	Info	Progress
4	root	localhost:57862	test	Sleep	1		NULL	0.000
5	root	localhost:57863	test	Sleep	2		NULL	0.000
6	system user		NULL	Connect	1	Waiting for master to send event	NULL	0.000
7	system user		test1	Connect	1	Waiting for table metadata lock	insert into test1.t values (1)	0.000
8	root	localhost:57869	test	Query	0	init	show processlist	0.000
9	root	localhost:57870	test	Sleep	1		NULL	0.000

WITH replicate-do-db=test2

Id	User	Host	db	Command	Time	State	Info	Progress
4	root	localhost:57838	test	Sleep	1		NULL	0.000
5	root	localhost:57839	test	Sleep	1		NULL	0.000
6	system user		NULL	Connect	1	Waiting for master to send event	NULL	0.000
7	system user		NULL	Connect	1	Slave has read all relay log; waiting for the slave I/O thread to update it	NULL	0.000
8	root	localhost:57845	test	Query	0	init	show processlist	0.000
9	root	localhost:57846	test	Sleep	1		NULL	0.000

Please clarify what's the difference between the original scenario and the one in the test case.

Comment by Elena Stepanova [ 2014-10-25 ]

Closing for now as 'can't reproduce', please comment to re-open if you have any additional information.

Generated at Thu Feb 08 07:01:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.