[MDEV-19158] MariaDB 10.2.22 is writing duplicate entries into binary log Created: 2019-04-03  Updated: 2019-05-20  Resolved: 2019-05-14

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2.22
Fix Version/s: 10.2.25, 10.3.16, 10.4.5

Type: Bug Priority: Major
Reporter: kilroy Assignee: Sujatha Sivakumar (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Environment:

Gentoo


Attachments: Text File mariadb-MDEV-19158-anonymized_binlog.txt     Text File mariadb-MDEV-19158-table-structure.txt     Text File mariadb-MDEV-19158-triggers.txt    

 Description   

We have a Master/Master Setup on two servers, but are only writing to one of those servers (so it is essentially Master/Slave)

We upgraded from 10.1.* to 10.2.22 last week and starting with the upgrade, we are getting duplicate key errors on the slave.

BINLOG=mixed

Examing the binary log on the master and the relay log on the slave, we see:

  • BEGIN

Then one binary blob:

  • INSERT table1 (id 1)
  • INSERT table2 (this one is triggered by the insert in table1)
  • INSERT table3 (id 4711)

And then in Statement format:

  • INSERT table3 (id 4711)
  • COMMIT

Which fails since it repeats the last statement of the binary blob.

If I replay it by hand and leave out the statement block, everything is ok.

I have found others to have the same problem. They found this bug starting with 10.2.10 (but it can also be earlier). One example is https://stackoverflow.com/questions/47650166/mariadb-10-2-10-writing-double-binlog-entries-in-mixed-format

What adds to the strangeness: Only 1 in 100 inserts triggers this behavior and it only seems to be triggered on clients that access the master from outside the network (additionally, this client has MariaDB client libs from 10.1.*). So maybe it is a timing issue?



 Comments   
Comment by Andrei Elkin [ 2019-04-03 ]

Thank you for the report, kilroy!

To start analyzing this issue we need

1. the output from

{{mysqlbinlog -v --start-position=`the-transaction-begin-offset` --stop-position=`the-next-begin-offset` }}

2. Also do that to a previous transaction (BEGIN..Xid_log_event or a DDL) from the same thread_id=NNN.

3. Give us the table 1-3 show create table
4. Show any trigger defined on the table 1-3.
5. on master: Show global variables
6. on master: Show global status

In Only 1 in 100 inserts triggers this behavior I read you can reproduce this issue, can you?

Cheers,

Andrei

Comment by kilroy [ 2019-04-03 ]

And I for myself just need to wait, the customers using the application trigger it. They just triggered it again (at 9:43 GMT). That makes 3 incidents in <3 days.

I have anonymized the bin log and the tables and I hope the important parts are still there, I didn't delete any referencing ids but I needed to remove all binlog content since this contains user data...

Comment by kilroy [ 2019-04-03 ]

And just another one happened. Its raining now

Comment by kilroy [ 2019-04-03 ]

Another 5 happened. All from external writers (host in same subnet over 1gig connection). Now this host has an updated MariaDB client library (10.2.22, too) and the errors are still coming.

I have zero issues from local writers (same machine)! I have tested the same queries over and over again from the local machine - everything works fine.

Comment by kilroy [ 2019-04-03 ]

I have nailed it:

We have a corner case in this application where an old function is called sometimes. This case was used for a reeeeeeal old MySQL version workaround. This function does the following:

set @@autocommit = 0
LOCK TABLES account WRITE
LOCK TABLES log_account WRITE

.... INSERT (multiple times)

UNLOCK ALL TABLES

COMMIT
set @@autocommit = 1

This always produces the bug... I have fixed it with real transactions, everything works now. But I think this corner case should be caught anyway, too.

Comment by Andrei Elkin [ 2019-04-03 ]

kilroy That's a great exploit from your side . Congrats and thanks! We'll try to reproduce locally which seems to be straightforward now.

Comment by Sujatha Sivakumar (Inactive) [ 2019-04-04 ]

Hello Kilroy,

I tried creating a function as per your above inputs but it fails. I tried in both 10.1 and 10.2.

MariaDB [test]> create table t ( f int ) engine=innodb;
Query OK, 0 rows affected (0.03 sec)

MariaDB [test]> DELIMITER $$
MariaDB [test]> CREATE FUNCTION func() RETURNS INT
-> BEGIN
-> LOCK TABLES t WRITE;
-> INSERT INTO t VALUES (10);
-> UNLOCK TABLES;
-> RETURN (SELECT f FROM t);
-> END $$
ERROR 1314 (0A000): LOCK is not allowed in stored procedures
MariaDB [test]> DELIMITER ;

Can you please provide the master server version. Since you are able to consistently reproduce can you share the SHOW CREATE FUNCTION output;

Thanks

Comment by kilroy [ 2019-04-04 ]

With "function" I meant a normal function (here: Python) which executes these statements in the order mentioned.

Comment by Daniel Ragle [ 2019-04-24 ]

I believe I am also bitten by this bug; I am seeing duplicate insert statements in the binary log after upgrading from 5.5 => 10.3.14 (stepwise upgrade, mysql_upgrade performed with each step). I use binlog_format=MIXED.

Here's how I reproduce it, perhaps it can even be reduced further:

MariaDB [MyDB]> CREATE TABLE t1 (
    ->    id mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
    ->    someLabel varchar(30) NOT NULL,
    ->    flag tinyint(1) NOT NULL DEFAULT 0,
    ->    PRIMARY KEY (id)
    -> ) Engine=MyISAM;
Query OK, 0 rows affected (0.005 sec)
 
MariaDB [MyDB]> CREATE TABLE t2 (
    ->    id mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
    ->    data varchar(30) NOT NULL,
    ->    status tinyint(1) NOT NULL,
    ->    PRIMARY KEY (id)
    -> ) Engine=MyISAM;
Query OK, 0 rows affected (0.006 sec)
 
MariaDB [MyDB]> CREATE TABLE t3 (
    ->    id mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
    ->    t1id mediumint(8) unsigned NOT NULL,
    ->    flag tinyint(1) NOT NULL DEFAULT 0,
    ->    status tinyint(1) NOT NULL DEFAULT 0,
    ->    PRIMARY KEY (id)
    -> ) Engine=MyISAM;
Query OK, 0 rows affected (0.005 sec)
 
MariaDB [MyDB]> INSERT INTO t1 ( id, someLabel, flag ) VALUES ( 1, 'ABC', 0 );
Query OK, 1 row affected (0.004 sec)
 
MariaDB [MyDB]> DELIMITER |
MariaDB [MyDB]> CREATE OR REPLACE TRIGGER doNothing
    -> BEFORE UPDATE ON t1
    -> FOR EACH ROW
    -> BEGIN
    ->     IF
    ->         new.someLabel != old.someLabel
    ->     THEN
    ->         UPDATE t3 SET t3.flag = 0;
    ->     END IF;
    -> END|
Query OK, 0 rows affected (0.269 sec)
 
MariaDB [MyDB]> DELIMITER ;
MariaDB [MyDB]> FLUSH LOGS;
Query OK, 0 rows affected (0.290 sec)
 
MariaDB [MyDB]> LOCK TABLES
    ->     t1 WRITE,
    ->     t2 WRITE;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [MyDB]> INSERT INTO t2 (data, status) VALUES ('1', 4);
Query OK, 1 row affected (0.003 sec)
 
MariaDB [MyDB]> UPDATE t1 SET flag = 1 WHERE id = 1;
Query OK, 1 row affected (0.002 sec)
Rows matched: 1  Changed: 1  Warnings: 0
 
MariaDB [MyDB]> INSERT INTO t2 (data, status) VALUES ('2', 4);
Query OK, 1 row affected (0.002 sec)
 
MariaDB [MyDB]> UNLOCK TABLES;
Query OK, 0 rows affected (0.000 sec)

Note that the trigger doesn't actually do anything, it just seems to need to exist and reference t3 (likely something having to do with the implied LOCK ).

Now, key statements in the binary log (abbreviated):

BEGIN
use `MyDB`
SET INSERT_ID=1/*!*/;
INSERT INTO t2 (data, status) VALUES ('1', 4)
COMMIT
 
BEGIN
BINLOG '
1lnAXBMgAwAAMQAAACsDAAAAANoBAAAAAAFABE15REIAAnQxAAMJDwECHgAAIXRGkg==
1lnAXBMgAwAAMQAAAFwDAAAAANsBAAAAAAEABE15REIAAnQyAAMJDwECHgAAl59/OQ==
1lnAXBMgAwAAMAAAAIwDAAAAANwBAAAAAAEABE15REIAAnQzAAQJCQEBAAAYVEUG
1lnAXBggAwAANAAAAMADAAAAANoBAAAAAAEAA///+AEAAANBQkMA+AEAAANBQkMBe/aD3Q==
'/*!*/;
### UPDATE `MyDB`.`t1`
### WHERE
###   @1=1
###   @2='ABC'
###   @3=0
### SET
###   @1=1
###   @2='ABC'
###   @3=1
COMMIT
 
BEGIN
/*!*/;
# at 1075
# at 1143
#190424  8:43:18 server id 800  end_log_pos 1143 CRC32 0x87ec5245       Annotate_rows:
#Q> INSERT INTO t2 (data, status) VALUES ('2', 4)
#190424  8:43:18 server id 800  end_log_pos 1192 CRC32 0x9252858f       Table_map: `MyDB`.`t2` mapped to number 475
# at 1192
#190424  8:43:18 server id 800  end_log_pos 1240 CRC32 0xe3de78d2       Table_map: `MyDB`.`t3` mapped to number 476
# at 1240
#190424  8:43:18 server id 800  end_log_pos 1280 CRC32 0xa41dc7ec       Write_rows: table id 475 flags: STMT_END_F
 
BINLOG '
5lnAXBMgAwAAMQAAAKgEAAAAANsBAAAAAAEABE15REIAAnQyAAMJDwECHgAAj4VSkg==
5lnAXBMgAwAAMAAAANgEAAAAANwBAAAAAAEABE15REIAAnQzAAQJCQEBAADSeN7j
5lnAXBcgAwAAKAAAAAAFAAAAANsBAAAAAAEAAwf4AgAAATIE7McdpA==
'/*!*/;
### INSERT INTO `MyDB`.`t2`
### SET
###   @1=2
###   @2='2'
###   @3=4
# Number of rows: 1
# at 1280
#190424  8:43:18 server id 800  end_log_pos 1312 CRC32 0x6337470f       Intvar
SET INSERT_ID=2/*!*/;
# at 1312
#190424  8:43:18 server id 800  end_log_pos 1424 CRC32 0x12a734ce       Query   thread_id=55    exec_time=0     error_code=0
SET TIMESTAMP=1556109798/*!*/;
INSERT INTO t2 (data, status) VALUES ('2', 4)
/*!*/;
# at 1424
#190424  8:43:18 server id 800  end_log_pos 1497 CRC32 0x077c6fc4       Query   thread_id=55    exec_time=0     error_code=0
SET TIMESTAMP=1556109798/*!*/;
COMMIT

The first command is statement-based and is fine. The second is auto-converted to RBR, but it also appears fine. The third transaction is the problem; I copied that block verbatim from the BEGIN through the COMMIT from the binary log. Notice that within the same BEGIN..COMMIT block there is both an RBR insert for the row, as well as a statement insert for the row, both referencing the same PK (the latter via SET INSERT_ID=2). This of course breaks replication on the slave.

This is a critical issue for me as we have several constructs like this. If anyone could share a decent work around I would greatly appreciate it.

Comment by Daniel Ragle [ 2019-04-24 ]

Actually just found this one:

https://jira.mariadb.org/browse/MDEV-18437

which may be more on point with what I'm seeing.

Comment by Sujatha Sivakumar (Inactive) [ 2019-04-24 ]

Hello Daniel,

Thank you for the latest update.

I was able to reproduce the issue on 10.2.22.

 
BINLOG '
JIDAXBMBAAAAMQAAAKcEAAAAABUAAAAAAAEABHRlc3QAAnQyAAMJDwECHgAA/nHBhg==
JIDAXBMBAAAAMAAAANcEAAAAABYAAAAAAAEABHRlc3QAAnQzAAQJCQEBAAA1DMuK
JIDAXBcBAAAAKAAAAP8EAAAAABUAAAAAAAEAAwf4DAAAATIE7kL6rg==
'/*!*/;
### INSERT INTO `test`.`t2`
### SET
###   @1=12
###   @2='2'
###   @3=4
# at 1279
#190424 20:56:28 server id 1  end_log_pos 1311 CRC32 0x3479d267 	Intvar
SET INSERT_ID=12/*!*/;
# at 1311
#190424 20:56:28 server id 1  end_log_pos 1428 CRC32 0x440656b9 	Query	thread_id=9	exec_time=0	error_code=0
SET TIMESTAMP=1556119588/*!*/;
INSERT INTO t2 (data, status) VALUES ('2', 4)
/*!*/;
# at 1428
#190424 20:56:28 server id 1  end_log_pos 1506 CRC32 0xb3c40556 	Query	thread_id=9	exec_time=0	error_code=0
SET TIMESTAMP=1556119588/*!*/;
COMMIT

Hence I am marking it as verified.

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-03 ]

On debug builds following DBUG_ASSERT is reported on Master:

/sql/log.cc:5704: int THD::binlog_write_table_map(TABLE*, bool, my_bool*): Assertion `is_current_stmt_binlog_format_row()' failed.
190503 12:38:22 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
Thread pointer: 0x7f3714000d60
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f3740453e50 thread_stack 0x49000
mysys/stacktrace.c:267(my_print_stacktrace)[0x55e19dc2968e]
sql/signal_handler.cc:168(handle_fatal_signal)[0x55e19d4d0c89]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f3746fad890]
linux/raise.c:51(__GI_raise)[0x7f37462a9e97]
stdlib/abort.c:81(__GI_abort)[0x7f37462ab801]
assert/assert.c:89(__assert_fail_base)[0x7f374629b39a]
/lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f374629b412]
sql/log.cc:5705(THD::binlog_write_table_map(TABLE*, bool, char*))[0x55e19d5d8487]
sql/handler.cc:5857(write_locked_table_maps(THD*))[0x55e19d4dfea0]
sql/handler.cc:5886(binlog_log_row_internal(TABLE*, unsigned char const*, unsigned char const*, bool (*)(THD*, TABLE*, bool, unsigned char const*, unsigned char const*)))[0x55e19d4dff56]
sql/handler.cc:5936(binlog_log_row(TABLE*, unsigned char const*, unsigned char const*, bool (*)(THD*, TABLE*, bool, unsigned char const*, unsigned char const*)))[0x55e19d4e018c]
sql/handler.cc:6062(handler::ha_write_row(unsigned char*))[0x55e19d4e08dd]
sql/sql_insert.cc:1940(write_record(THD*, TABLE*, st_copy_info*))[0x55e19d234201]
sql/sql_insert.cc:1065(mysql_insert(THD*, TABLE_LIST*, List<Item>&, List<List<Item> >&, List<Item>&, List<Item>&, enum_duplicates, bool))[0x55e19d231cdf]
sql/sql_parse.cc:4436(mysql_execute_command(THD*))[0x55e19d259507]
sql/sql_parse.cc:8013(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55e19d264cc4]
sql/sql_parse.cc:1835(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55e19d252504]
sql/sql_parse.cc:1386(do_command(THD*))[0x55e19d250e5c]
sql/sql_connect.cc:1335(do_handle_one_connection(CONNECT*))[0x55e19d3a3513]
sql/sql_connect.cc:1242(handle_one_connection)[0x55e19d3a3293]
perfschema/pfs.cc:1864(pfs_spawn_thread)[0x55e19dbcf117]
nptl/pthread_create.c:463(start_thread)[0x7f3746fa26db]
x86_64/clone.S:97(clone)[0x7f374638c88f]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f3714011488): INSERT INTO t2 (data, status) VALUES ('2', 4)
Connection ID (thread ID): 9
Status: NOT_KILLED

Comment by Daniel Ragle [ 2019-05-03 ]

To anyone following this and looking for a work-around: We found that we could bypass the bug by creating a do-nothing temporary table with an AUTO INCREMENT PK prior to the LOCK TABLES, then deleting the temporary table after the UNLOCK TABLES. The temporary table itself does not need to be locked, and nothing needs to be in it. I.E.,

CREATE TEMPORARY TABLE temp1 (
    id mediumint(8) UNSIGNED NOT NULL AUTO_INCREMENT,
    PRIMARY KEY (id)
) Engine=MyISAM;
LOCK TABLES t1 WRITE, t2 WRITE;
UPDATE t1 SET flag = 1 WHERE id = 1;
INSERT INTO t2 (data, status) VALUES ('2', 4);
UNLOCK TABLES
DROP TEMPORARY TABLE temp1;

With this construct, unfortunately all of the statements from the LOCK to the UNLOCK are written as RBR; but there is no statement duplication. Possibly the same effect could be achieved via explicitly setting BINLOG_FORMAT=ROW before the LOCK TABLES and back to MIXED afterwards, though as I recall you need SUPER to do that.

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-10 ]

Hello Andrei,

Good morning.
Can you please review the fix for MDEV-19158.

The patch is available at

https://github.com/MariaDB/server/commit/e28c2408c7447e70d9509c9078f1b93a7cda6078
http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha

Comment by Andrei Elkin [ 2019-05-13 ]

Sujatha, well done, thanks for a nice work!

Andrei

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-14 ]

Fix for this issue is pushed in 10.2.25.

10.3 merge will have a minor conflict in sql/handler.h:

Please find the 10.3 patch

https://github.com/MariaDB/server/commit/86a6a3b8faddc5f8ff1898c5b9a6727f9b584be0

Comment by Daniel Ragle [ 2019-05-19 ]

This is listed as being fixed in 10.3.15; however I just updated to that version and I'm still seeing the same behavior from my test script above.

Also, I suspect https://jira.mariadb.org/browse/MDEV-18437 should be linked to this one, as I believe they're the same issue.

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-20 ]

Hello Daniel,

Thank you for checking the 10.3.15 release. You are right, the fix missed the 10.3.15 release.
Sorry for that. The fix will be included in next upcoming 10.3 release which is 10.3.16.
I have updated the same in JIRA.

Marked MDEV-18437 as duplicate for this issue.

Generated at Thu Feb 08 08:49:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.