Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29832

rpl.rpl_semi_sync_after_sync_row frequently fails

    XMLWordPrintable

Details

    Description

      Recently on 10.11 branches, rpl.rpl_semi_sync_after_row has been failing on various environments with this output:

      https://buildbot.mariadb.org/#/builders/384/builds/1622

      10.11 11cfaf394bdf9804f74ffd1289ec758e1107ba32

      rpl.rpl_semi_sync_after_sync_row 'innodb,row' w7 [ fail ]
              Test ended at 2022-10-20 02:56:50
       
      CURRENT_TEST: rpl.rpl_semi_sync_after_sync_row
      mysqltest: In included file "/home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/include/rpl_semi_sync.inc": 
      included from /home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/t/rpl_semi_sync.test at line 2:
      included from /home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/t/rpl_semi_sync_after_sync_row.test at line 3:
      At line 90: query 'create table t1 (a int) engine=$engine_type' failed: <Unknown> (2013): Lost connection to server during query
       
      The result from queries just before the failure was:
      < snip >
      rpl_semi_sync_master_enabled	OFF
      [ enable semi-sync on master ]
      set global rpl_semi_sync_master_enabled = 1;
      show variables like 'rpl_semi_sync_master_enabled';
      Variable_name	Value
      rpl_semi_sync_master_enabled	ON
      [ status of semi-sync on master should be ON even without any semi-sync slaves ]
      show status like 'Rpl_semi_sync_master_clients';
      Variable_name	Value
      Rpl_semi_sync_master_clients	0
      show status like 'Rpl_semi_sync_master_status';
      Variable_name	Value
      Rpl_semi_sync_master_status	ON
      show status like 'Rpl_semi_sync_master_yes_tx';
      Variable_name	Value
      Rpl_semi_sync_master_yes_tx	0
      #
      # BUG#45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation failed
      # BUG#45673 Semisynch reports correct operation even if no slave is connected
      #
       
      More results from queries before failure can be found in /home/buildbot/aarch64-fedora-36/build/mysql-test/var/7/log/rpl_semi_sync_after_sync_row.log
       
       
      Server [mysqld.1 - pid: 61248, winpid: 61248, exit: 256] failed during test run
      Server log from this test:
      ----------SERVER LOG START-----------
      2022-10-20  2:55:53 387 [Note] Deleted Master_info file '/dev/shm/var_auto_CXzB/7/mysqld.1/data/master.info'.
      2022-10-20  2:55:53 387 [Note] Deleted Master_info file '/dev/shm/var_auto_CXzB/7/mysqld.1/data/relay-log.info'.
      2022-10-20  2:55:53 389 [Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1), gtid('')
      2022-10-20  2:55:53 390 [Note] Semi-sync replication initialized for transactions.
      2022-10-20  2:55:53 390 [Note] Semi-sync replication enabled on the master.
      2022-10-20  2:55:53 0 [Note] Starting ack receiver thread
      2022-10-20 02:56:48 0xffff1f7ef000  InnoDB: Assertion failure in file /home/buildbot/aarch64-fedora-36/build/storage/innobase/include/fut0lst.h line 122
      InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mariadbd startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      221020  2:56:48 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
      

      The traceback appears to slightly differ among the environments, but for now I will put them all here as the failing assertion is the same and they are all recent failures, suggesting that they are related.

      11cfaf394bdf9804f74ffd1289ec758e1107ba32 aarch64-fedora-36

      Thread pointer: 0xaaab1bb7e8f8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0xffff1f7ee608 thread_stack 0x49000
      mysys/stacktrace.c:213(my_print_stacktrace)[0xaaaae05da380]
      sql/signal_handler.cc:236(handle_fatal_signal)[0xaaaae0195dfc]
      addr2line: 'linux-vdso.so.1': No such file
      linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb2a407a0]
      ??:0(__pthread_kill_implementation)[0xffffb1e62658]
      :0(__GI_raise)[0xffffb1e1ab00]
      :0(__GI_abort)[0xffffb1e070f8]
      include/ut0ut.h:329(ib::logger& ib::logger::operator<< <int>(int const&))[0xaaaadfe93974]
      include/ut0new.h:766(ut_allocator<unsigned char, true>::deallocate_trace(ut_new_pfx_t const*))[0xaaaadfe92d24]
      include/dyn0buf.h:177(mtr_buf_t::~mtr_buf_t())[0xaaaae04ba8d0]
      trx/trx0purge.cc:1363(trx_purge(unsigned long, bool))[0xaaaae04bc730]
      srv/srv0srv.cc:1610(purge_coordinator_state::do_purge())[0xaaaae04b2ddc]
      srv/srv0srv.cc:1766(purge_coordinator_callback(void*))[0xaaaae04b2948]
      tpool/task_group.cc:71(tpool::task_group::execute(tpool::task*))[0xaaaae05888f4]
      tpool/tpool_generic.cc:578(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0xaaaae0587290]
      ??:0(std::error_code::default_error_condition() const)[0xffffb21530a0]
      ??:0(start_thread)[0xffffb1e609a8]
      ??:0(thread_start)[0xffffb1ecbd1c]
      

      5bd86986a8ec2d9222e621d36e10b63f4c026976 ppc64le-debian-sid

      Thread pointer: 0x13c92b5d8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fff0f7ee308 thread_stack 0x49000
      mysys/stacktrace.c:212(my_print_stacktrace)[0x134260fa4]
      sql/signal_handler.cc:233(handle_fatal_signal)[0x133c20b38]
      addr2line: 'linux-vdso64.so.1': No such file
      linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fffaea90514]
      ??:0(pthread_key_delete)[0x7fffada8defc]
      ??:0(gsignal)[0x7fffada2cd3c]
      ??:0(abort)[0x7fffada0d060]
      ut/ut0dbg.cc:60(ut_dbg_assertion_failed(char const*, char const*, unsigned int))[0x13379b12c]
      include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x133799c8c]
      include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x1337e2198]
      trx/trx0purge.cc:1361(trx_purge(unsigned long, bool))[0x1340d322c]
      srv/srv0srv.cc:1611(purge_coordinator_state::do_purge())[0x1340c7c34]
      srv/srv0srv.cc:1765(purge_coordinator_callback(void*))[0x1340c7578]
      tpool/task_group.cc:70(tpool::task_group::execute(tpool::task*))[0x1341e76d8]
      tpool/task.cc:32(tpool::task::execute())[0x1341e7908]
      tpool/tpool_generic.cc:580(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x1341e51f0]
      bits/invoke.h:74(void std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&))[0x1341e6384]
      ??:0(std::error_code::default_error_condition() const)[0x7fffadf06920]
      ??:0(pthread_condattr_setpshared)[0x7fffada8b3c8]
      ??:0(clone)[0x7fffadb3acc0]
      

      3a0c3b65de426a9b8cd8bd03406d52e10977f17a aarch64-debian-10

      Thread pointer: 0xaaab15b256c8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0xffff177ee708 thread_stack 0x49000
      mysys/stacktrace.c:212(my_print_stacktrace)[0xaaaadd338cf0]
      sql/signal_handler.cc:236(handle_fatal_signal)[0xaaaadcedea8c]
      addr2line: 'linux-vdso.so.1': No such file
      linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffac5707a0]
      ??:0(raise)[0xffffabbd2714]
      ??:0(abort)[0xffffabbc08e8]
      ut/ut0ut.cc:63(ut_print_timestamp(_IO_FILE*))[0xaaaadcbe4840]
      include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0xaaaadd1fc98c]
      trx/trx0purge.cc:1361(trx_purge(unsigned long, bool))[0xaaaadd1fe9e8]
      srv/srv0srv.cc:1610(purge_coordinator_callback(void*))[0xaaaadd1f2900]
      tpool/task_group.cc:71(tpool::task_group::execute(tpool::task*))[0xaaaadd2e6508]
      tpool/tpool_generic.cc:578(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0xaaaadd2e45b8]
      ??:0(std::error_code::default_error_condition() const)[0xffffabeca1f4]
      ??:0(start_thread)[0xffffabfc77e4]
      ??:0(__clone)[0xffffabc6f59c]
      

      Attachments

        Issue Links

          Activity

            People

              bnestere Brandon Nesterenko
              angelique.sklavounos Angelique Sklavounos (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.