[MDEV-19376] Repl_semi_sync_master::commit_trx assertion failure: (thd_kill_level(thd) == THD_ABORT_ASAP) || !m_active_tranxs || !m_active_tranxs->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos) Created: 2019-05-01  Updated: 2020-11-04  Resolved: 2019-11-12

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.3.13
Fix Version/s: 10.1.44, 10.2.30, 10.3.21, 10.4.11

Type: Bug Priority: Critical
Reporter: Geoff Montee (Inactive) Assignee: Andrei Elkin
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-17125 InnoDB: Assertion failure in file mar... Open
relates to MDEV-17948 Assertion `thd_killed(thd) || !m_acti... Closed
relates to MDEV-24118 rpl.rpl_semi_sync failed in bb, Asser... Open

 Description   

A user saw the following crash with semisynchronous replication in MariaDB 10.3.13:

2019-04-26 15:57:38 778 [Note] Start binlog_dump to slave_server(54823151), pos(mariadb-2-bin.000001, 24898028)
2019-04-26 15:57:38 778 [Note] Semi-sync replication switched ON with slave (server_id: 54823151) at (mariadb-2-bin.000001, 4)
2019-04-26 15:57:38 778 [Note] Start semi-sync binlog_dump to slave (server_id: 54823151), pos(mariadb-2-bin.000001, 4)
mysqld: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/semisync_master.cc:929: int Repl_semi_sync_master::commit_trx(const char*, my_off_t): Assertion `(thd_kill_level(thd) == THD_ABORT_ASAP) || !m_active_tranxs || !m_active_tranxs->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos)' failed.
190426 15:57:38 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.3.13-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=5
max_threads=153
thread_count=65
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467390 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f278801fff8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f28141b6d30 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x565106dd8dde]
/usr/sbin/mysqld(handle_fatal_signal+0x357)[0x565106878bd7]
sigaction.c:0(__restore_rt)[0x7f281b0c35d0]
:0(__GI_raise)[0x7f2819396207]
:0(__GI_abort)[0x7f28193978f8]
:0(__assert_fail_base)[0x7f281938f026]
:0(__GI___assert_fail)[0x7f281938f0d2]
/usr/sbin/mysqld(+0x6c7ab7)[0x5651067c6ab7]
/usr/sbin/mysqld(_ZN21Repl_semi_sync_master10commit_trxEPKcy+0x19)[0x5651067c6ad9]
/usr/sbin/mysqld(_ZN21Repl_semi_sync_master17wait_after_commitEP3THDb+0x85)[0x5651067c6bd5]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4de)[0x56510687c2ae]
/usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x57)[0x56510678c2a7]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x87f)[0x5651066a312f]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x22b)[0x5651066abc2b]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1d5e)[0x5651066ae8ee]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x13e)[0x5651066af74e]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1aa)[0x56510677f3ba]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x56510677f4dd]
pthread_create.c:0(start_thread)[0x7f281b0bbdd5]
/lib64/libc.so.6(clone+0x6d)[0x7f281945dead]

gdb shows the following backtrace:

Thread 79 (Thread 0x7fde7c060700 (LWP 1035)):
#0  0x00007fdebc5e74d7 in kill () from /lib64/libc.so.6
#1  0x0000561d18f8eb65 in handle_fatal_signal (sig=6) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/signal_handler.cc:317
#2  <signal handler called>
#3  0x00007fdebc5e7207 in raise () from /lib64/libc.so.6
#4  0x00007fdebc5e88f8 in abort () from /lib64/libc.so.6
#5  0x00007fdebc5e0026 in __assert_fail_base () from /lib64/libc.so.6
#6  0x00007fdebc5e00d2 in __assert_fail () from /lib64/libc.so.6
#7  0x0000561d18edcab7 in Repl_semi_sync_master::commit_trx (this=0x561d19ea3360 <repl_semisync_master>, trx_wait_binlog_name=0x7fde4801eb10 "rajesh-dc1-cmdb-mariadb-1-bin.000004", trx_wait_binlog_pos=32778) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/semisync_master.cc:927
#8  0x0000561d18edcad9 in Repl_semi_sync_master::commit_trx (this=<optimized out>, trx_wait_binlog_name=<optimized out>, trx_wait_binlog_pos=<optimized out>) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/semisync_master.cc:781
#9  0x0000561d18edcbd5 in Repl_semi_sync_master::wait_after_commit (this=<optimized out>, thd=thd@entry=0x7fde48018f98, all=all@entry=false) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/semisync_master.cc:697
#10 0x0000561d18f922ae in ha_commit_trans (thd=thd@entry=0x7fde48018f98, all=all@entry=false) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/handler.cc:1528
#11 0x0000561d18ea22a7 in trans_commit_stmt (thd=thd@entry=0x7fde48018f98) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/transaction.cc:520
#12 0x0000561d18db912f in mysql_execute_command (thd=thd@entry=0x7fde48018f98) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/sql_parse.cc:6363
#13 0x0000561d18dc1c2b in mysql_parse (thd=0x7fde48018f98, rawbuf=<optimized out>, length=35, parser_state=0x7fde7c05f4f0, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/sql_parse.cc:8095
#14 0x0000561d18dc48ee in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7fde48018f98, packet=packet@entry=0x7fde48021959 "insert into mydb.mydbtable values()", packet_length=packet_length@entry=35, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/sql_parse.cc:1854
#15 0x0000561d18dc574e in do_command (thd=0x7fde48018f98) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/sql_parse.cc:1396
#16 0x0000561d18e953ba in do_handle_one_connection (connect=connect@entry=0x561d1c852868) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/sql_connect.cc:1403
#17 0x0000561d18e954dd in handle_one_connection (arg=0x561d1c852868) at /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.13/sql/sql_connect.cc:1309
#18 0x00007fdebe30cdd5 in start_thread () from /lib64/libpthread.so.0
#19 0x00007fdebc6aeead in clone () from /lib64/libc.so.6

This looks somewhat similar to MDEV-17948.



 Comments   
Comment by Andrei Elkin [ 2019-11-04 ]

The reason of the assert has been finally identified, and it's in that the Dump thread updates the last replied binlog:pos at its initialization to a value submitted by the slave (effectively) through CHANGE-MASTER rather than by acknowledging a receive (as it is normally supposed to be). And if slave is configured to recover via master_use_gtid = slave_pos, the initiating binlog:pos may be really bogus as the dump thread still needs to dig out a resumption gtid which may be positioned in binlog at some distance in the past. If that the case the assert can be reached after the slave initialization through execution a very first query on master.

A patch is being composed to be published soon.

Comment by Andrei Elkin [ 2019-11-06 ]

Sujatha, the patch 977da10b754e461 for 10.1 branch is ready for review. I pushed it also to bb-10.1-andrei. Hope to hear your comments soon. Cheers, Andrei.

Comment by Sujatha Sivakumar (Inactive) [ 2019-11-07 ]

Hello Andrei,

Thank you for working on this issue.
The code changes look fine.

Generated at Thu Feb 08 08:51:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.