[MDEV-14092] NEXTVAL fails on slave Created: 2017-10-19  Updated: 2020-08-25  Resolved: 2017-10-19

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - Sequence
Affects Version/s: 10.3
Fix Version/s: 10.3.3

Type: Bug Priority: Major
Reporter: Michael Widenius Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None


 Description   

In some cases executing NEXTVAL() on a slave can stop replication with
Unknown error, Error_code: 1105; handler error No Error!

This can happen when:

  • Executing a NEXTVAL() on the slave (which makes the slave tables out of sync with the master)
  • Executing NEXTVAL() in parallel in seperate threads on the master and both threads allocates new values


 Comments   
Comment by Michael Widenius [ 2017-10-19 ]

The problem was that the code in replication didn't distinguish between a
setval() failing because the stored sequence number was bigger than the
current (should have been ignored) and a failure from the storage engine.

Comment by Elena Stepanova [ 2017-10-20 ]

For further reference, this was most relevant of our test cases, although the crudest one:

--source include/master-slave.inc
--source include/have_binlog_format_row.inc
 
CREATE SEQUENCE s;
INSERT INTO s VALUES (1,1,999999999999999999,1,1,0,0,0);
SELECT NEXTVAL(s);
 
--delimiter $
CREATE PROCEDURE pr(n INT)
BEGIN
DECLARE i INT DEFAULT 0;
WHILE i < n
DO
SELECT NEXTVAL(s);
SELECT NEXTVAL(s);
SELECT NEXTVAL(s);
SET i= i+1;
END WHILE;
END $
--delimiter ;
 
--connect (con1,localhost,root,,)
--send CALL pr(10000)
--connect (con2,localhost,root,,)
--send CALL pr(10000)
--connect (con3,localhost,root,,)
--send CALL pr(10000)
--connect (con4,localhost,root,,)
--send CALL pr(10000)
--connect (con5,localhost,root,,)
--send CALL pr(10000)
--connect (con6,localhost,root,,)
--send CALL pr(10000)
--connect (con7,localhost,root,,)
--send CALL pr(10000)
--connect (con8,localhost,root,,)
--send CALL pr(10000)
 
 
--disable_query_log
--disable_result_log
 
--connection con1
--reap
--connection con2
--reap
--connection con3
--reap
--connection con4
--reap
--connection con5
--reap
--connection con6
--reap
--connection con7
--reap
--connection con8
--reap
 
--enable_query_log
--enable_result_log
 
--connection master
 
--sync_slave_with_master
 
# Cleanup
--connection master
DROP SEQUENCE s;
DROP PROCEDURE pr;
--source include/rpl_end.inc

Without the fix, on a non-debug build the replication fails with

2017-10-20 18:36:24 13 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.s; Unknown error, Error_code: 1105; handler error No Error!; the event's master log master-bin.000001, end_log_pos 47885, Gtid 0-1-186, Internal MariaDB error code: 1105
2017-10-20 18:36:24 13 [Warning] Slave: Unknown error Error_code: 1105
2017-10-20 18:36:24 13 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'master-bin.000001' position 47700

and on a debug build with the assertion failure

10.3 c65fdcf7c81c47

mysqld: /data/src/10.3/sql/log_event.cc:12748: virtual int Write_rows_log_event::do_exec_row(rpl_group_info*): Assertion `0' failed.
171020 18:55:44 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f5206eaaee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055e5ebfd0b95 in Write_rows_log_event::do_exec_row (this=0x7f51a4039350, rgi=0x7f51a4000a90) at /data/src/10.3/sql/log_event.cc:12748
#9  0x000055e5ebfcbecd in Rows_log_event::do_apply_event (this=0x7f51a4039350, rgi=0x7f51a4000a90) at /data/src/10.3/sql/log_event.cc:10931
#10 0x000055e5ebb2139d in Log_event::apply_event (this=0x7f51a4039350, rgi=0x7f51a4000a90) at /data/src/10.3/sql/log_event.h:1473
#11 0x000055e5ebb15d67 in apply_event_and_update_pos_apply (ev=0x7f51a4039350, thd=0x7f51a4001550, rgi=0x7f51a4000a90, reason=0) at /data/src/10.3/sql/slave.cc:3791
#12 0x000055e5ebb16231 in apply_event_and_update_pos (ev=0x7f51a4039350, thd=0x7f51a4001550, rgi=0x7f51a4000a90) at /data/src/10.3/sql/slave.cc:3939
#13 0x000055e5ebb1699d in exec_relay_log_event (thd=0x7f51a4001550, rli=0x55e5ef64e358, serial_rgi=0x7f51a4000a90) at /data/src/10.3/sql/slave.cc:4226
#14 0x000055e5ebb19fb2 in handle_slave_sql (arg=0x55e5ef64c690) at /data/src/10.3/sql/slave.cc:5373
#15 0x000055e5ec19cf4c in pfs_spawn_thread (arg=0x7f51a80406f0) at /data/src/10.3/storage/perfschema/pfs.cc:1862
#16 0x00007f5208def494 in start_thread (arg=0x7f52001ac700) at pthread_create.c:333
#17 0x00007f5206f6793f in clone () from /lib/x86_64-linux-gnu/libc.so.6

With the fix, the test passes both on debug and non-debug builds.

Generated at Thu Feb 08 08:10:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.