[MDEV-7668] Intermediate master groups CREATE TEMPORARY with INSERT, causing parallel replication failure Created: 2015-03-05  Updated: 2015-04-20  Resolved: 2015-03-09

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.0.16, 10.0.17
Fix Version/s: 10.0.18

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Kristian Nielsen
Resolution: Fixed Votes: 1
Labels: parallelslave, replication
Environment:

rhel5 x86_64



 Description   

While running a pt-table-sync the following replication error occurred on two 10.0.17 slaves and one 10.0.16 slave. One 10.0.16 slave was lucky enough not to have its replication stopped.

This slave is a 10.0.17 version running with slave-parallel-threads = 20 with not in gtid mode.

It appears the pt-table-sync is creating, inserting into and dropping the temp table drupal_yourlanguage_prod.__maatkit_char_chunking_map frequently.

In the replication stream it appears that the insert was processed before the create temporary table or after the drop table.

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.244.17.9
                  Master_User: replication
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.027842
          Read_Master_Log_Pos: 54265116
               Relay_Log_File: relay-bin.000075
                Relay_Log_Pos: 35278544
        Relay_Master_Log_File: mysql-bin.027842
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: hostworks.backup_config
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 1146
                   Last_Error: Error 'Table 'drupal_yourlanguage_prod.__maatkit_char_chunking_map' doesn't exist' on query. Default database: 'drupal_yourlanguage_prod'. Query: 'INSERT INTO `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('53'))'
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 35278253
              Relay_Log_Space: 159281418
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 1146
               Last_SQL_Error: Error 'Table 'drupal_yourlanguage_prod.__maatkit_char_chunking_map' doesn't exist' on query. Default database: 'drupal_yourlanguage_prod'. Query: 'INSERT INTO `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('53'))'
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 9
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
                   Using_Gtid: No
                  Gtid_IO_Pos: 
1 row in set (0.00 sec)

# at 35274123
#150305 13:54:09 server id 8  end_log_pos 35274167 CRC32 0xf88d815f     GTID 0-8-822549387 cid=2030318995
/*!100001 SET @@session.server_id=8*//*!*/;
/*!100001 SET @@session.gtid_seq_no=822549387*//*!*/;
BEGIN
/*!*/;
# at 35274167
#150305 13:54:09 server id 8  end_log_pos 35274410 CRC32 0x51d382fa     Query   thread_id=16009572      exec_time=0     error_code=0
use `drupal_yourlanguage_prod`/*!*/;
SET TIMESTAMP=1425524049/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
CREATE TEMPORARY TABLE `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` (  `uuid` varchar(64) NOT NULL DEFAULT '' COMMENT 'UUID') ENGINE=MEMORY
/*!*/;
# at 35274410
#150305 13:54:09 server id 8  end_log_pos 35274441 CRC32 0xbd48fb89     Xid = 2030319005
COMMIT/*!*/;
# at 35274441
#150305 13:54:09 server id 9  end_log_pos 35274485 CRC32 0xaff77f81     GTID 0-9-822549392 cid=2030318995
..
..
..
# at 35278253
#150305 13:54:09 server id 8  end_log_pos 35278297 CRC32 0x10a6fd1f     GTID 0-8-822549388 cid=2030318995
/*!100001 SET @@session.gtid_domain_id=0*//*!*/;
/*!100001 SET @@session.server_id=8*//*!*/;
/*!100001 SET @@session.gtid_seq_no=822549388*//*!*/;
BEGIN
/*!*/;
# at 35278297
#150305 13:54:09 server id 8  end_log_pos 35278477 CRC32 0x8e8fb011     Query   thread_id=16009572      exec_time=0     error_code=0
use `drupal_yourlanguage_prod`/*!*/;
SET TIMESTAMP=1425524049/*!*/;
SET @@session.pseudo_thread_id=16009572/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
INSERT INTO `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('53'))
/*!*/;
# at 35278477
#150305 13:54:09 server id 8  end_log_pos 35278508 CRC32 0x27d8f923     Xid = 2030319081
COMMIT/*!*/;
# at 35278508
#150305 13:54:09 server id 8  end_log_pos 35278550 CRC32 0xd5165459     GTID 0-8-822549389
/*!100001 SET @@session.gtid_seq_no=822549389*//*!*/;
# at 35278550
#150305 13:54:09 server id 8  end_log_pos 35278755 CRC32 0x8eed9ce8     Query   thread_id=16009572      exec_time=0     error_code=0
SET TIMESTAMP=1425524049/*!*/;
DROP TEMPORARY TABLE IF EXISTS `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` /* generated by server */
/*!*/;
# at 35278755
#150305 13:54:09 server id 9  end_log_pos 35278799 CRC32 0x287fe7a9     GTID 0-9-822549394 cid=2030319094
/*!100001 SET @@session.server_id=9*//*!*/;
/*!100001 SET @@session.gtid_seq_no=822549394*//*!*/;
..
..
..
# at 35286349
#150305 13:54:09 server id 8  end_log_pos 35286391 CRC32 0xe6995ada     GTID 0-8-822549400
/*!100001 SET @@session.server_id=8*//*!*/;
/*!100001 SET @@session.gtid_seq_no=822549400*//*!*/;
# at 35286391
#150305 13:54:09 server id 8  end_log_pos 35286559 CRC32 0xdec85a9c     Query   thread_id=16009572      exec_time=0     error_code=0
use `drupal_yourlanguage_prod`/*!*/;
SET TIMESTAMP=1425524049/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
DROP TABLE IF EXISTS `__maatkit_char_chunking_map` /* generated by server */
/*!*/;
# at 35286559
#150305 13:54:09 server id 9  end_log_pos 35286603 CRC32 0x07b5d089     GTID 0-9-822549401 cid=2030319218
...
..
..
# at 35301099
#150305 13:54:09 server id 8  end_log_pos 35301342 CRC32 0x6743eb5f     Query   thread_id=16009572      exec_time=0     error_code=0
use `drupal_yourlanguage_prod`/*!*/;
SET TIMESTAMP=1425524049/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
CREATE TEMPORARY TABLE `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` (  `uuid` varchar(64) NOT NULL DEFAULT '' COMMENT 'UUID') ENGINE=MEMORY
/*!*/;
# at 35301342
#150305 13:54:09 server id 8  end_log_pos 35301373 CRC32 0xec091fed     Xid = 2030319351
COMMIT/*!*/;
# at 35301373
#150305 13:54:09 server id 8  end_log_pos 35301417 CRC32 0x25c8fd61     GTID 0-8-822549406 cid=2030319479
/*!100001 SET @@session.gtid_seq_no=822549406*//*!*/;
BEGIN
/*!*/;
# at 35301417
#150305 13:54:09 server id 8  end_log_pos 35301597 CRC32 0x7953ec65     Query   thread_id=16009572      exec_time=0     error_code=0
SET TIMESTAMP=1425524049/*!*/;
INSERT INTO `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` VALUES (CHAR('53'))
/*!*/;
# at 35301597
#150305 13:54:09 server id 8  end_log_pos 35301628 CRC32 0x6f90a55e     Xid = 2030319478
COMMIT/*!*/;
# at 35301628
..
..
# at 35308115
#150305 13:54:09 server id 8  end_log_pos 35308157 CRC32 0x40b064a2     GTID 0-8-822549407
/*!100001 SET @@session.server_id=8*//*!*/;
/*!100001 SET @@session.gtid_seq_no=822549407*//*!*/;
# at 35308157
#150305 13:54:09 server id 8  end_log_pos 35308362 CRC32 0xcdc75459     Query   thread_id=16009572      exec_time=0     error_code=0
SET TIMESTAMP=1425524049/*!*/;
SET @@session.sql_mode=1074266112/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C latin1 *//*!*/;
SET @@session.character_set_client=8,@@session.collation_connection=8,@@session.collation_server=33/*!*/;
DROP TEMPORARY TABLE IF EXISTS `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map` /* generated by server */
/*!*/;
# at 35308362



 Comments   
Comment by Daniel Black [ 2015-03-05 ]

Master binlog settings:

+-----------------------------------------+-----------------------------+
| Variable_name                           | Value                       |
+-----------------------------------------+-----------------------------+
| binlog_annotate_row_events              | ON                          |
| binlog_cache_size                       | 32768                       |
| binlog_checksum                         | CRC32                       |
| binlog_commit_wait_count                | 20                          |
| binlog_commit_wait_usec                 | 100000                      |
| binlog_direct_non_transactional_updates | OFF                         |
| binlog_format                           | MIXED                       |
| binlog_optimize_thread_scheduling       | ON                          |
| binlog_stmt_cache_size                  | 32768                       |
| gtid_binlog_pos                         | 0-9-822927104               |
| gtid_binlog_state                       | 0-8-822556721,0-9-822927104 |
| innodb_api_enable_binlog                | OFF                         |
| innodb_locks_unsafe_for_binlog          | OFF                         |
| max_binlog_cache_size                   | 18446744073709547520        |
| max_binlog_size                         | 104857600                   |
| max_binlog_stmt_cache_size              | 18446744073709547520        |
| sync_binlog                             | 1                           |

Comment by Daniel Black [ 2015-03-05 ]

http://bazaar.launchpad.net/~percona-toolkit-dev/percona-toolkit/2.2/view/head:/bin/pt-table-sync#L4013 version 2.2.13 used. pt-table-sync is also explicitly statement based.

Comment by Kristian Nielsen [ 2015-03-05 ]

Some additional information from the reporter:

This error occurs on slaves S1-4 in a setup:

M1 <-> M2 -> S1-4

M1 and M2 are 10.0.15. S1-4 are 10.0.16/10.0.17

The queries are originally run in M1 (server_id=8). They are replicated to M2
(server_id=9) and then to S1-4.

We see in the binlog snippet, which is from M2, that three transactions were
binlogged in a single group commit on M2: A CREATE from M1, a local
transaction on M2, and the INSERT from M1:

GTID 0-8-822549387 cid=2030318995 CREATE TEMPORARY TABLE `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map`

GTID 0-9-822549392 cid=2030318995

GTID 0-8-822549388 cid=2030318995 INSERT INTO `drupal_yourlanguage_prod`.`__maatkit_char_chunking_map`

This is the reason it fails on the slaves; the CREATE and the INSERT are
allowed to run in parallel, which is incorrect.

This scenario seems quite special (in case one wants to try and reproduce): A
non-transactional CREATE TABLE + INSERT into that table (MEMORY engine) on
M1. Replicated to M2 in parallel, and M2 somehow manages to group-commit those
two together, with an extra local transaction in the middle.

Comment by Kristian Nielsen [ 2015-03-05 ]

Ok, I managed to reproduce, here is an MTR test case (needs the .cnf below):

--source include/have_innodb.inc
--let $rpl_topology=1->2->3
--source include/rpl_init.inc
 
--echo *** MDEV-7668: Intermediate master groups CREATE with INSERT, causing parallel replication failure ***
 
--connection server_1
SET @old_updates= @@GLOBAL.binlog_direct_non_transactional_updates;
SET GLOBAL binlog_direct_non_transactional_updates=OFF;
SET SESSION binlog_direct_non_transactional_updates=OFF;
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
CREATE TABLE t1 (a int PRIMARY KEY, b INT) ENGINE=InnoDB;
--save_master_pos
 
--connection server_2
--sync_with_master
--save_master_pos
--source include/stop_slave.inc
SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
SET GLOBAL slave_parallel_threads=10;
SET @old_commit_count=@@GLOBAL.binlog_commit_wait_count;
SET GLOBAL binlog_commit_wait_count=2;
SET @old_commit_usec=@@GLOBAL.binlog_commit_wait_usec;
SET GLOBAL binlog_commit_wait_usec=2000000;
SELECT @@GLOBAL.binlog_direct_non_transactional_updates;
SET @old_updates= @@GLOBAL.binlog_direct_non_transactional_updates;
SET GLOBAL binlog_direct_non_transactional_updates=OFF;
SELECT @@SESSION.binlog_direct_non_transactional_updates;
SET SESSION binlog_direct_non_transactional_updates=OFF;
CHANGE MASTER TO master_use_gtid=current_pos;
 
--connection server_3
--sync_with_master
--save_master_pos
SET @old_parallel_threads=@@GLOBAL.slave_parallel_threads;
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=10;
CHANGE MASTER TO master_use_gtid=current_pos;
 
 
--connection server_1
 
BEGIN;
CREATE TEMPORARY TABLE t2 (a INT PRIMARY KEY) ENGINE=MEMORY;
COMMIT;
INSERT INTO t2 VALUES (1);
INSERT INTO t1 SELECT a, a*10 FROM t2;
DROP TABLE t2;
--source include/save_master_gtid.inc
 
--connection server_2
--source include/start_slave.inc
--source include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
SHOW BINLOG EVENTS;
 
--connection server_3
--source include/start_slave.inc
--source include/sync_with_master_gtid.inc
SELECT * FROM t1 ORDER BY a;
 
 
# Clean up
 
--connection server_2
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=@old_parallel_threads;
SET GLOBAL binlog_commit_wait_count=@old_commit_count;
SET GLOBAL binlog_commit_wait_usec=@old_commit_usec;
SET GLOBAL binlog_direct_non_transactional_updates= @old_updates;
--source include/start_slave.inc
 
--connection server_3
--source include/stop_slave.inc
SET GLOBAL slave_parallel_threads=@old_parallel_threads;
--source include/start_slave.inc
 
--connection server_1
SET GLOBAL binlog_direct_non_transactional_updates= @old_updates;
DROP TABLE t1;
 
--source include/rpl_end.inc

!include ../my.cnf
 
[mysqld.1]
log-slave-updates
loose-innodb
 
[mysqld.2]
log-slave-updates
loose-innodb
 
[mysqld.3]
log-slave-updates
loose-innodb
 
[ENV]
SERVER_MYPORT_3=		@mysqld.3.port
SERVER_MYSOCK_3=		@mysqld.3.socket

Comment by Kristian Nielsen [ 2015-03-05 ]

The problems seems to be lack of locking around temporary tables.

So there is nothing that prevents the INSERT INTO temptable to commit in parallel with CREATE TEMPORARY TABLE temptable.

This then causes the statements to replicate in parallel on the lower-level slave, causing failure.

Comment by Kristian Nielsen [ 2015-03-09 ]

Pushed to 10.0.18:

http://lists.askmonty.org/pipermail/commits/2015-March/007545.html

The bug here happens when using statement-based replication with temporary
tables. And an intermediate master with parallel replication enabled.

A third-level slave, also with parallel replication enabled, might then in
some cases (depending on timing) try to execute conflicting events in
parallel, leading to replication issues.

Generated at Thu Feb 08 07:21:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.