Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.3.27, 10.5.8, 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL)
-
None
-
CentOS 8
Description
Hi,
I found a bug affecting recent MariaDB versions.
Versions tested:
- OS repo: MariaDB 5.5.68 (on CentOS 7) - Not affected
- OS repo: MariaDB 10.3.27 (On CentOS 8) - Affected
- MariaDB repo: MariaDB 10.5.8 (On CentOS 8) - Affected
(All with stock settings, e.g. stock binlog_format)
We have a master, with multiple slaves.
The slaves have log_slave_updates=1, to replicate to more slaves.
- Master: MariaDB 5.5.68 on CentOS 7 from OS repository.
- Slave #1: MariaDB 10.5.8 On CentOS 8 from MariaDB repository. Also tested 10.3.27 from OS repository.
- Slave #2: MariaDB 5.5.68 on CentOS 7 from OS repository.
We run an query on the master to update some table. The query is successfully replicated to slaves #1 and #2. However, slave #1 doesn't write it properly to the binary log, causing the further slaves actually lose rows.
The query we run on the master:
create temporary table tpool like cid_stock; |
load data local infile '/root/cidcsv.sql' into table tpool fields terminated by ',' lines terminated by '\n' (cid) set countrycode='IL'; |
delete from cid_stock where countrycode='IL'; |
insert into cid_stock select * from tpool where countrycode='IL'; |
The issue seems related to temporary tables, because it doesn't happen with a static table. With this query i can successfully see the rows in the binary log:
create table pool2 like cid_stock; |
load data local infile '/root/cidcsv.sql' into table pool2 fields terminated by ',' lines terminated by '\n' (cid) set countrycode='IL'; |
delete from cid_stock where countrycode='IL'; |
insert into cid_stock select * from pool2 where countrycode='IL'; |
What happens is that slave #1 correctly executes this, however it doesn't write it properly to the binary log. The load data infile statement is missing. I guess its ok for it to be missing because of binlog_format=mixed, however the rows aren't replicated either.
So what happens is that delete works, but the insert inserts 0 rows because the table is empty.
On slave #2 its fine, it correctly writes the load data infile to the binary log.
Binary log emitted by slave #1:
# at 14546
|
#210124 13:09:38 server id 8 end_log_pos 14588 CRC32 0xdb2fc10d GTID 0-8-96 ddl
|
/*!100001 SET @@session.gtid_seq_no=96*//*!*/;
|
# at 14588
|
#210124 13:09:38 server id 8 end_log_pos 14709 CRC32 0x47091aaf Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
SET @@session.pseudo_thread_id=44/*!*/;
|
create temporary table tpool like cid_stock
|
/*!*/;
|
# at 14709
|
#210124 13:09:38 server id 8 end_log_pos 14751 CRC32 0x2e83513f GTID 0-8-97 trans
|
/*!100001 SET @@session.gtid_seq_no=97*//*!*/;
|
START TRANSACTION
|
/*!*/;
|
# at 14751
|
#210124 13:09:38 server id 8 end_log_pos 14873 CRC32 0xfa6ce3aa Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
delete from cid_stock where countrycode='IL'
|
/*!*/;
|
# at 14873
|
#210124 13:09:38 server id 8 end_log_pos 14904 CRC32 0x634c53ca Xid = 227
|
COMMIT/*!*/;
|
# at 14904
|
#210124 13:09:38 server id 8 end_log_pos 14946 CRC32 0xad87d017 GTID 0-8-98
|
/*!100001 SET @@session.gtid_seq_no=98*//*!*/;
|
START TRANSACTION
|
/*!*/;
|
# at 14946
|
#210124 13:09:38 server id 8 end_log_pos 15088 CRC32 0xd5b71ed0 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
insert into cid_stock select * from tpool where countrycode='IL'
|
/*!*/;
|
# at 15088
|
#210124 13:09:38 server id 8 end_log_pos 15119 CRC32 0xb6ae6dbc Xid = 229
|
COMMIT/*!*/;
|
DELIMITER ;
|
# End of log file
|
Binary log emitted by slave #2:
# at 21828
|
#210124 13:09:38 server id 8 end_log_pos 21945 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
SET @@session.pseudo_thread_id=44/*!*/;
|
create temporary table tpool like cid_stock
|
/*!*/;
|
# at 21945
|
#210124 13:09:38 server id 8 end_log_pos 22024 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
BEGIN
|
/*!*/;
|
# at 22024
|
#210124 13:09:38 server id 8 end_log_pos 22223
|
#Begin_load_query: file_id: 17 block_len: 176
|
# at 22223
|
#210124 13:09:38 server id 8 end_log_pos 22498 Execute_load_query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
LOAD DATA LOCAL INFILE '/tmp/SQL_LOAD_MB-11-1' INTO TABLE `tpool` FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`cid`) SET `countrycode`='IL'
|
/*!*/;
|
# file_id: 17
|
# at 22498
|
#210124 13:09:38 server id 8 end_log_pos 22578 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
COMMIT
|
/*!*/;
|
# at 22578
|
#210124 13:09:38 server id 8 end_log_pos 22642 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
BEGIN
|
/*!*/;
|
# at 22642
|
#210124 13:09:38 server id 8 end_log_pos 22760 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
delete from cid_stock where countrycode='IL'
|
/*!*/;
|
# at 22760
|
#210124 13:09:38 server id 8 end_log_pos 22787 Xid = 191
|
COMMIT/*!*/;
|
# at 22787
|
#210124 13:09:38 server id 8 end_log_pos 22851 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
BEGIN
|
/*!*/;
|
# at 22851
|
#210124 13:09:38 server id 8 end_log_pos 22989 Query thread_id=44 exec_time=0 error_code=0
|
SET TIMESTAMP=1611493778/*!*/;
|
insert into cid_stock select * from tpool where countrycode='IL'
|
/*!*/;
|
# at 22989
|
#210124 13:09:38 server id 8 end_log_pos 23016 Xid = 193
|
COMMIT/*!*/;
|
DELIMITER ;
|
# End of log file
|
Table structure:
CREATE TABLE `cid_stock` ( |
`countrycode` varchar(8) NOT NULL DEFAULT '', |
`cid` varchar(32) NOT NULL DEFAULT '', |
PRIMARY KEY (`countrycode`,`cid`), |
KEY `countrycode` (`countrycode`) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Example CSV file for the load data infile:
4912399313
|
4912399314
|
4912399315
|
4912399316
|
4912399317
|
4912399318
|
Thanks for the report.
I presume you run the servers with the default values of binlog_format, which would be statement on 5.5 and mixed on 10.3/10.5.
Unfortunately, that's the exact combination which appears to be affected by the bug.
Here is a raw MTR test to demonstrate (MTR sets log_slave_updates itself):
--source include/master-slave.inc
--source include/have_binlog_format_mixed.inc
--write_file $MYSQLTEST_VARDIR/load_data
1
2
EOF
show binlog events;
--sync_slave_with_master
show binlog events;
# cleanup
--connection master
--source include/rpl_end.inc
10.2 59e6d14c
connection slave;
show binlog events;
Log_name Pos Event_type Server_id End_log_pos Info
slave-bin.000001 4 Format_desc 2 256 Server ver: 10.2.37-MariaDB-debug-log, Binlog ver: 4
slave-bin.000001 256 Gtid_list 2 285 []
slave-bin.000001 285 Binlog_checkpoint 2 328 slave-bin.000001
slave-bin.000001 328 Gtid 1 370 GTID 0-1-1
slave-bin.000001 370 Query 1 460 use `test`; create table t1 (a int)
slave-bin.000001 460 Gtid 1 502 GTID 0-1-2
slave-bin.000001 502 Query 1 603 use `test`; create temporary table tmp like t1
slave-bin.000001 603 Gtid 1 645 BEGIN GTID 0-1-4
slave-bin.000001 645 Query 1 744 use `test`; insert into t1 select * from tmp
slave-bin.000001 744 Query 1 817 COMMIT
In the test case, have_binlog_format_mixed.inc makes the slave run with the mixed format, while for the master we set statement explicitly within the test. It imitates the default combination of binlog formats which happens naturally in 5.5 => 10.3 replication.
If both the master and the slave were running with statement format, LOAD DATA would be replicated.
If the slave were running with row format, it would skip the temporary table completely and write the row events for INSERT. Same would happen on master if it were running with row format.
If the master were running with mixed format, the use of a temporary table would make it switch to row, and further it would behave as above.
But If the master is running in statement format, it writes the statements unchanged (temporary table, load and insert); and the slave running in mixed format writes some, but not all of it. That's the bug.
It was introduced long time ago, in 10.0.5, my best guess is by this commit
commit 2e100cc5a493b6a0f6f907e0483a734c7fee2087 3784432256a30e4d453dde10c875d8446519e7c1
Date: Mon Oct 14 00:24:05 2013 +0300
Fixes for parallel slave:
but I didn't check the guess as I can't build it on my machine.