[MDEV-19376] Repl_semi_sync_master::commit_trx assertion failure: (thd_kill_level(thd) == THD_ABORT_ASAP) || !m_active_tranxs || !m_active_tranxs->is_tranx_end_pos(trx_wait_binlog_name, trx_wait_binlog_pos) Created: 2019-05-01 Updated: 2020-11-04 Resolved: 2019-11-12 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.3.13 |
| Fix Version/s: | 10.1.44, 10.2.30, 10.3.21, 10.4.11 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Andrei Elkin |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
A user saw the following crash with semisynchronous replication in MariaDB 10.3.13:
gdb shows the following backtrace:
This looks somewhat similar to |
| Comments |
| Comment by Andrei Elkin [ 2019-11-04 ] |
|
The reason of the assert has been finally identified, and it's in that the Dump thread updates the last replied binlog:pos at its initialization to a value submitted by the slave (effectively) through CHANGE-MASTER rather than by acknowledging a receive (as it is normally supposed to be). And if slave is configured to recover via master_use_gtid = slave_pos, the initiating binlog:pos may be really bogus as the dump thread still needs to dig out a resumption gtid which may be positioned in binlog at some distance in the past. If that the case the assert can be reached after the slave initialization through execution a very first query on master. A patch is being composed to be published soon. |
| Comment by Andrei Elkin [ 2019-11-06 ] |
|
Sujatha, the patch 977da10b754e461 for 10.1 branch is ready for review. I pushed it also to bb-10.1-andrei. Hope to hear your comments soon. Cheers, Andrei. |
| Comment by Sujatha Sivakumar (Inactive) [ 2019-11-07 ] |
|
Hello Andrei, Thank you for working on this issue. |