[MDEV-29832] rpl.rpl_semi_sync_after_sync_row frequently fails - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL), 11.3(EOL)
Fix Version/s: 10.11
Component/s: Replication, Storage Engine - InnoDB, Tests, MTR
Labels:
None

Description

Recently on 10.11 branches, rpl.rpl_semi_sync_after_row has been failing on various environments with this output:

https://buildbot.mariadb.org/#/builders/384/builds/1622

10.11 11cfaf394bdf9804f74ffd1289ec758e1107ba32
rpl.rpl_semi_sync_after_sync_row 'innodb,row' w7 [ fail ]
Test ended at 2022-10-20 02:56:50

CURRENT_TEST: rpl.rpl_semi_sync_after_sync_row
mysqltest: In included file "/home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/include/rpl_semi_sync.inc":
included from /home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/t/rpl_semi_sync.test at line 2:
included from /home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/t/rpl_semi_sync_after_sync_row.test at line 3:
At line 90: query 'create table t1 (a int) engine=$engine_type' failed: <Unknown> (2013): Lost connection to server during query

The result from queries just before the failure was:
< snip >
rpl_semi_sync_master_enabled OFF
[ enable semi-sync on master ]
set global rpl_semi_sync_master_enabled = 1;
show variables like 'rpl_semi_sync_master_enabled';
Variable_name Value
rpl_semi_sync_master_enabled ON
[ status of semi-sync on master should be ON even without any semi-sync slaves ]
show status like 'Rpl_semi_sync_master_clients';
Variable_name Value
Rpl_semi_sync_master_clients 0
show status like 'Rpl_semi_sync_master_status';
Variable_name Value
Rpl_semi_sync_master_status ON
show status like 'Rpl_semi_sync_master_yes_tx';
Variable_name Value
Rpl_semi_sync_master_yes_tx 0
#
# BUG#45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation failed
# BUG#45673 Semisynch reports correct operation even if no slave is connected
#

More results from queries before failure can be found in /home/buildbot/aarch64-fedora-36/build/mysql-test/var/7/log/rpl_semi_sync_after_sync_row.log


Server [mysqld.1 - pid: 61248, winpid: 61248, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2022-10-20 2:55:53 387 [Note] Deleted Master_info file '/dev/shm/var_auto_CXzB/7/mysqld.1/data/master.info'.
2022-10-20 2:55:53 387 [Note] Deleted Master_info file '/dev/shm/var_auto_CXzB/7/mysqld.1/data/relay-log.info'.
2022-10-20 2:55:53 389 [Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1), gtid('')
2022-10-20 2:55:53 390 [Note] Semi-sync replication initialized for transactions.
2022-10-20 2:55:53 390 [Note] Semi-sync replication enabled on the master.
2022-10-20 2:55:53 0 [Note] Starting ack receiver thread
2022-10-20 02:56:48 0xffff1f7ef000 InnoDB: Assertion failure in file /home/buildbot/aarch64-fedora-36/build/storage/innobase/include/fut0lst.h line 122
InnoDB: Failing assertion: addr.page == FIL_NULL \|\| addr.boffset >= FIL_PAGE_DATA
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
221020 2:56:48 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

The traceback appears to slightly differ among the environments, but for now I will put them all here as the failing assertion is the same and they are all recent failures, suggesting that they are related.

11cfaf394bdf9804f74ffd1289ec758e1107ba32 aarch64-fedora-36
Thread pointer: 0xaaab1bb7e8f8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xffff1f7ee608 thread_stack 0x49000
mysys/stacktrace.c:213(my_print_stacktrace)[0xaaaae05da380]
sql/signal_handler.cc:236(handle_fatal_signal)[0xaaaae0195dfc]
addr2line: 'linux-vdso.so.1': No such file
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb2a407a0]
??:0(__pthread_kill_implementation)[0xffffb1e62658]
:0(__GI_raise)[0xffffb1e1ab00]
:0(__GI_abort)[0xffffb1e070f8]
include/ut0ut.h:329(ib::logger& ib::logger::operator<< <int>(int const&))[0xaaaadfe93974]
include/ut0new.h:766(ut_allocator<unsigned char, true>::deallocate_trace(ut_new_pfx_t const*))[0xaaaadfe92d24]
include/dyn0buf.h:177(mtr_buf_t::~mtr_buf_t())[0xaaaae04ba8d0]
trx/trx0purge.cc:1363(trx_purge(unsigned long, bool))[0xaaaae04bc730]
srv/srv0srv.cc:1610(purge_coordinator_state::do_purge())[0xaaaae04b2ddc]
srv/srv0srv.cc:1766(purge_coordinator_callback(void*))[0xaaaae04b2948]
tpool/task_group.cc:71(tpool::task_group::execute(tpool::task*))[0xaaaae05888f4]
tpool/tpool_generic.cc:578(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0xaaaae0587290]
??:0(std::error_code::default_error_condition() const)[0xffffb21530a0]
??:0(start_thread)[0xffffb1e609a8]
??:0(thread_start)[0xffffb1ecbd1c]

Same traceback was seen with 47736901e08 https://buildbot.mariadb.org/#/builders/523/builds/450

5bd86986a8ec2d9222e621d36e10b63f4c026976 ppc64le-debian-sid
Thread pointer: 0x13c92b5d8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fff0f7ee308 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0x134260fa4]
sql/signal_handler.cc:233(handle_fatal_signal)[0x133c20b38]
addr2line: 'linux-vdso64.so.1': No such file
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fffaea90514]
??:0(pthread_key_delete)[0x7fffada8defc]
??:0(gsignal)[0x7fffada2cd3c]
??:0(abort)[0x7fffada0d060]
ut/ut0dbg.cc:60(ut_dbg_assertion_failed(char const, char const, unsigned int))[0x13379b12c]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x133799c8c]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x1337e2198]
trx/trx0purge.cc:1361(trx_purge(unsigned long, bool))[0x1340d322c]
srv/srv0srv.cc:1611(purge_coordinator_state::do_purge())[0x1340c7c34]
srv/srv0srv.cc:1765(purge_coordinator_callback(void*))[0x1340c7578]
tpool/task_group.cc:70(tpool::task_group::execute(tpool::task*))[0x1341e76d8]
tpool/task.cc:32(tpool::task::execute())[0x1341e7908]
tpool/tpool_generic.cc:580(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x1341e51f0]
bits/invoke.h:74(void std::__invoke_impl<void, void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::&&)(tpool::worker_data), tpool::thread_pool_generic&&, tpool::worker_data&&))[0x1341e6384]
??:0(std::error_code::default_error_condition() const)[0x7fffadf06920]
??:0(pthread_condattr_setpshared)[0x7fffada8b3c8]
??:0(clone)[0x7fffadb3acc0]

3a0c3b65de426a9b8cd8bd03406d52e10977f17a aarch64-debian-10
Thread pointer: 0xaaab15b256c8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xffff177ee708 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0xaaaadd338cf0]
sql/signal_handler.cc:236(handle_fatal_signal)[0xaaaadcedea8c]
addr2line: 'linux-vdso.so.1': No such file
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffac5707a0]
??:0(raise)[0xffffabbd2714]
??:0(abort)[0xffffabbc08e8]
ut/ut0ut.cc:63(ut_print_timestamp(_IO_FILE*))[0xaaaadcbe4840]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0xaaaadd1fc98c]
trx/trx0purge.cc:1361(trx_purge(unsigned long, bool))[0xaaaadd1fe9e8]
srv/srv0srv.cc:1610(purge_coordinator_callback(void*))[0xaaaadd1f2900]
tpool/task_group.cc:71(tpool::task_group::execute(tpool::task*))[0xaaaadd2e6508]
tpool/tpool_generic.cc:578(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0xaaaadd2e45b8]
??:0(std::error_code::default_error_condition() const)[0xffffabeca1f4]
??:0(start_thread)[0xffffabfc77e4]
??:0(__clone)[0xffffabc6f59c]

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

11cfaf3_aarch64-fedora-36_mysqld.1.err.7
1007 kB
2022-10-20 11:53
5bd86986a_ppc64le-debian-sid_mysqld.1.err.1
512 kB
2022-10-20 11:53
c4994b468_ppc64le-ubuntu-2004_mysqld.1.err.2
746 kB
2022-10-20 11:53

Issue Links

duplicates

MDEV-33325 Crash in flst_read_addr on corrupted data

Closed

relates to

MDEV-30728 Frequent 'InnoDB: Database page corruption on disk' on MariaDB 10.11 in Debian autopktest on arch ppc64el

Closed

MDEV-34453 Trying to read 16384 bytes at 70368744161280 outside the bounds of the file: ./ibdata1

Closed

rpl.rpl_semi_sync_after_sync_row frequently fails

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration