[MDEV-29832] rpl.rpl_semi_sync_after_sync_row crashes in BB with InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA Created: 2022-10-20  Updated: 2023-11-17

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB, Tests
Affects Version/s: 10.11, 11.0, 11.1, 11.2, 11.3
Fix Version/s: 10.11, 11.0, 11.1, 11.2, 11.3

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Daniel Black
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File 11cfaf3_aarch64-fedora-36_mysqld.1.err.7     File 5bd86986a_ppc64le-debian-sid_mysqld.1.err.1     File c4994b468_ppc64le-ubuntu-2004_mysqld.1.err.2    
Issue Links:
Relates
relates to MDEV-30728 Frequent 'InnoDB: Database page corru... Closed

 Description   

Recently on 10.11 branches, rpl.rpl_semi_sync_after_row has been failing on various environments with this output:

https://buildbot.mariadb.org/#/builders/384/builds/1622

10.11 11cfaf394bdf9804f74ffd1289ec758e1107ba32

rpl.rpl_semi_sync_after_sync_row 'innodb,row' w7 [ fail ]
        Test ended at 2022-10-20 02:56:50
 
CURRENT_TEST: rpl.rpl_semi_sync_after_sync_row
mysqltest: In included file "/home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/include/rpl_semi_sync.inc": 
included from /home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/t/rpl_semi_sync.test at line 2:
included from /home/buildbot/aarch64-fedora-36/build/mysql-test/suite/rpl/t/rpl_semi_sync_after_sync_row.test at line 3:
At line 90: query 'create table t1 (a int) engine=$engine_type' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
rpl_semi_sync_master_enabled	OFF
[ enable semi-sync on master ]
set global rpl_semi_sync_master_enabled = 1;
show variables like 'rpl_semi_sync_master_enabled';
Variable_name	Value
rpl_semi_sync_master_enabled	ON
[ status of semi-sync on master should be ON even without any semi-sync slaves ]
show status like 'Rpl_semi_sync_master_clients';
Variable_name	Value
Rpl_semi_sync_master_clients	0
show status like 'Rpl_semi_sync_master_status';
Variable_name	Value
Rpl_semi_sync_master_status	ON
show status like 'Rpl_semi_sync_master_yes_tx';
Variable_name	Value
Rpl_semi_sync_master_yes_tx	0
#
# BUG#45672 Semisync repl: ActiveTranx:insert_tranx_node: transaction node allocation failed
# BUG#45673 Semisynch reports correct operation even if no slave is connected
#
 
More results from queries before failure can be found in /home/buildbot/aarch64-fedora-36/build/mysql-test/var/7/log/rpl_semi_sync_after_sync_row.log
 
 
Server [mysqld.1 - pid: 61248, winpid: 61248, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2022-10-20  2:55:53 387 [Note] Deleted Master_info file '/dev/shm/var_auto_CXzB/7/mysqld.1/data/master.info'.
2022-10-20  2:55:53 387 [Note] Deleted Master_info file '/dev/shm/var_auto_CXzB/7/mysqld.1/data/relay-log.info'.
2022-10-20  2:55:53 389 [Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1), gtid('')
2022-10-20  2:55:53 390 [Note] Semi-sync replication initialized for transactions.
2022-10-20  2:55:53 390 [Note] Semi-sync replication enabled on the master.
2022-10-20  2:55:53 0 [Note] Starting ack receiver thread
2022-10-20 02:56:48 0xffff1f7ef000  InnoDB: Assertion failure in file /home/buildbot/aarch64-fedora-36/build/storage/innobase/include/fut0lst.h line 122
InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
221020  2:56:48 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

The traceback appears to slightly differ among the environments, but for now I will put them all here as the failing assertion is the same and they are all recent failures, suggesting that they are related.

11cfaf394bdf9804f74ffd1289ec758e1107ba32 aarch64-fedora-36

Thread pointer: 0xaaab1bb7e8f8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xffff1f7ee608 thread_stack 0x49000
mysys/stacktrace.c:213(my_print_stacktrace)[0xaaaae05da380]
sql/signal_handler.cc:236(handle_fatal_signal)[0xaaaae0195dfc]
addr2line: 'linux-vdso.so.1': No such file
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffb2a407a0]
??:0(__pthread_kill_implementation)[0xffffb1e62658]
:0(__GI_raise)[0xffffb1e1ab00]
:0(__GI_abort)[0xffffb1e070f8]
include/ut0ut.h:329(ib::logger& ib::logger::operator<< <int>(int const&))[0xaaaadfe93974]
include/ut0new.h:766(ut_allocator<unsigned char, true>::deallocate_trace(ut_new_pfx_t const*))[0xaaaadfe92d24]
include/dyn0buf.h:177(mtr_buf_t::~mtr_buf_t())[0xaaaae04ba8d0]
trx/trx0purge.cc:1363(trx_purge(unsigned long, bool))[0xaaaae04bc730]
srv/srv0srv.cc:1610(purge_coordinator_state::do_purge())[0xaaaae04b2ddc]
srv/srv0srv.cc:1766(purge_coordinator_callback(void*))[0xaaaae04b2948]
tpool/task_group.cc:71(tpool::task_group::execute(tpool::task*))[0xaaaae05888f4]
tpool/tpool_generic.cc:578(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0xaaaae0587290]
??:0(std::error_code::default_error_condition() const)[0xffffb21530a0]
??:0(start_thread)[0xffffb1e609a8]
??:0(thread_start)[0xffffb1ecbd1c]

5bd86986a8ec2d9222e621d36e10b63f4c026976 ppc64le-debian-sid

Thread pointer: 0x13c92b5d8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fff0f7ee308 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0x134260fa4]
sql/signal_handler.cc:233(handle_fatal_signal)[0x133c20b38]
addr2line: 'linux-vdso64.so.1': No such file
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fffaea90514]
??:0(pthread_key_delete)[0x7fffada8defc]
??:0(gsignal)[0x7fffada2cd3c]
??:0(abort)[0x7fffada0d060]
ut/ut0dbg.cc:60(ut_dbg_assertion_failed(char const*, char const*, unsigned int))[0x13379b12c]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x133799c8c]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x1337e2198]
trx/trx0purge.cc:1361(trx_purge(unsigned long, bool))[0x1340d322c]
srv/srv0srv.cc:1611(purge_coordinator_state::do_purge())[0x1340c7c34]
srv/srv0srv.cc:1765(purge_coordinator_callback(void*))[0x1340c7578]
tpool/task_group.cc:70(tpool::task_group::execute(tpool::task*))[0x1341e76d8]
tpool/task.cc:32(tpool::task::execute())[0x1341e7908]
tpool/tpool_generic.cc:580(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x1341e51f0]
bits/invoke.h:74(void std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*>(std::__invoke_memfun_deref, void (tpool::thread_pool_generic::*&&)(tpool::worker_data*), tpool::thread_pool_generic*&&, tpool::worker_data*&&))[0x1341e6384]
??:0(std::error_code::default_error_condition() const)[0x7fffadf06920]
??:0(pthread_condattr_setpshared)[0x7fffada8b3c8]
??:0(clone)[0x7fffadb3acc0]

3a0c3b65de426a9b8cd8bd03406d52e10977f17a aarch64-debian-10

Thread pointer: 0xaaab15b256c8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0xffff177ee708 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0xaaaadd338cf0]
sql/signal_handler.cc:236(handle_fatal_signal)[0xaaaadcedea8c]
addr2line: 'linux-vdso.so.1': No such file
linux-vdso.so.1(__kernel_rt_sigreturn+0x0)[0xffffac5707a0]
??:0(raise)[0xffffabbd2714]
??:0(abort)[0xffffabbc08e8]
ut/ut0ut.cc:63(ut_print_timestamp(_IO_FILE*))[0xaaaadcbe4840]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0xaaaadd1fc98c]
trx/trx0purge.cc:1361(trx_purge(unsigned long, bool))[0xaaaadd1fe9e8]
srv/srv0srv.cc:1610(purge_coordinator_callback(void*))[0xaaaadd1f2900]
tpool/task_group.cc:71(tpool::task_group::execute(tpool::task*))[0xaaaadd2e6508]
tpool/tpool_generic.cc:578(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0xaaaadd2e45b8]
??:0(std::error_code::default_error_condition() const)[0xffffabeca1f4]
??:0(start_thread)[0xffffabfc77e4]
??:0(__clone)[0xffffabc6f59c]



 Comments   
Comment by Angelique Sklavounos (Inactive) [ 2022-10-20 ]

FWIW, cross-ref search for Failure Output of "fut0lst.h line 122" produces fails from other test suites, but rpl.rpl_semi_sync_after_row has the most crashes.

The crashes also appear to only occur on ppc64le or aarch64 platforms.

Comment by Marko Mäkelä [ 2022-10-20 ]

This looks pretty serious to me, hinting that the InnoDB page locks or some code around it may be broken on ARM and POWER. I am assigning this to danblack for some custom debugging. In the mentioned log, liburing can never be used, because the host kernel is too old to support the system calls. Hence, the barebones simulated asynchronous code path will be used.

I did not find any debug builder for ARM or POWER. I think that we definitely need one, so that this kind of ISA dependent bugs can be found more easily.

It could be useful to run an extended version of the test unit.innodb_sync (the source code is in storage/innobase/unittest/innodb_sync-t.cc) on the affected platforms. I see that the unit test does run on some ARM builders, but the execution time might be way too short to catch any trouble, and I am afraid that the assert() macros would be disabled. We might need an explicit #undef NDEBUG at the start of the file so that the test would actually do something useful on all platforms.

Comment by Angelique Sklavounos (Inactive) [ 2022-10-20 ]

The tests mostly seem to show that liburing is not used, but the error log for 5bd86986a https://buildbot.mariadb.org/#/builders/165/builds/9556 (attached) shows:
022-10-19 4:47:24 0 [Note] InnoDB: Using liburing

Comment by Daniel Black [ 2022-10-21 ]

10.8/10/9 failures Widening the cross-ref search to "fut0lst.h" shows a much wider set of set failures of the same assertion (line 128 in 10.8) all seem originating from bb-10.5-MDEV-14425 branches all under the stack "trx_purge_add_undo_to_history".

Included 10.8-5b3ad94c7b070be1b1e5ab186c5d4d01 - innodb.index_vcol_purge_startup kvm-fulltest

encryption.innodb-redo-badkey seems to be bad examples as they are preceeded by "[ERROR] InnoDB: We scanned the log up to 1925605. A checkpoint was at 49715 and the maximum LSN on a database page was 1935092. It is possible that the database is now corrupt" which I believe was another now fixed error.

There was even mariabackup.xbstream bb-10.8-MDEV-14425 052b1d21a winx64-packages windows failures.

The 10.11 versions from this bug are exclusively aarch64/ppc64le

  • stress.ddl_innodb, rpl.rpl_start_alter_options, rpl.rpl_row_idempotency, rpl.rpl_row_drop_create_temp_table under trx_purge_add_undo_to_history
  • rpl.rpl_semi_sync_after_sync_row -> trx_purge -> mtr_buf_t::~mtr_buf_t
  • gcol.gcol_supported_sql_funcs_innodb -> trx_purge_attach_undo_recs -> trx_purge_get_next_rec
  • parts.partition_alter_innodb -> trx_purge_truncate_history _> ::deallocate_trace
  • main.subselect_sj2_jcl6 (corrupt index recovery) -> trx_purge_add_undo_to_history -> flst_remove -> ut_print_timestamp
Comment by Marko Mäkelä [ 2022-10-21 ]

danblack, I agree that it is good to look for any message that suggests that recovery skipped some log records. This would also include the infamous

InnoDB: Page … log sequence number … is in the future!

I did not notice any of those messages in the log file that I checked. The ARM and POWER debug builders that were added yesterday will hopefully soon start to produce useful statistics. The additional checksum records that are written in debug builds (MDEV-18976) should indicate recovery failures better.

Starting with 10.8, the way how the redo log is formatted and written was changed in MDEV-14425. I had some concerns on the correctness of the locking in MDEV-27774, using a shared lock to cover writes to distinct bytes potentially in the same cache line (a portion of log_sys.buf). An experiment that I conducted on an implementation of AMD64 did not show any trouble regarding that. It may be worthwhile to test this on ARM or POWER.

Comment by Alice Sherepa [ 2023-11-17 ]

https://buildbot.mariadb.org/#/builders/231/builds/27279/steps/7/logs/stdio the same test on bb-11.2-release

rpl.rpl_parallel_xa_same_xid test fails with the same assertion:
https://buildbot.mariadb.org/#/builders/231/builds/27328/steps/7/logs/stdio

 
[Revision hash: e6d1dbb5b64d824c19e1df8c5d15d284b0d015ee]
rpl.rpl_parallel_xa_same_xid 'innodb,row' w14 [ fail ]
        Test ended at 2023-11-16 12:49:18
 
CURRENT_TEST: rpl.rpl_parallel_xa_same_xid
 
 
Server [mysqld.2 - pid: 42796, winpid: 42796, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
$ /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd --defaults-group-suffix=.2 --defaults-file=/home/buildbot/ppc64le-rhel-8/build/mysql-test/var/14/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --binlog-format=row --log-bin=slave-bin --log-bin=slave-bin --core-file --loose-debug-sync-timeout=300
2023-11-16 12:49:14 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32310)
2023-11-16 12:49:14 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
2023-11-16 12:49:14 0 [Note] Starting MariaDB 11.2.2-MariaDB-log source revision 0c5176b7744b75ffd6dc1ebe14143b3c2ea88103 as process 42797
2023-11-16 12:49:14 0 [Note] Plugin 'partition' is disabled.
2023-11-16 12:49:14 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2023-11-16 12:49:14 0 [Note] InnoDB: Number of transaction pools: 1
2023-11-16 12:49:14 0 [Note] InnoDB: Using POWER8 crc32 instructions
2023-11-16 12:49:14 0 [Note] InnoDB: Using liburing
2023-11-16 12:49:14 0 [Note] InnoDB: Initializing buffer pool, total size = 8.000MiB, chunk size = 1.000MiB
2023-11-16 12:49:14 0 [Note] InnoDB: Completed initialization of buffer pool
2023-11-16 12:49:14 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
2023-11-16 12:49:14 0 [Note] InnoDB: End of log at LSN=49129
2023-11-16 12:49:14 0 [Note] InnoDB: Opened 3 undo tablespaces
2023-11-16 12:49:14 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
2023-11-16 12:49:14 0 [Note] InnoDB: Setting O_DIRECT on file ./ibtmp1 failed
2023-11-16 12:49:14 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2023-11-16 12:49:14 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2023-11-16 12:49:14 0 [Note] InnoDB: log sequence number 49129; transaction id 16
2023-11-16 12:49:14 0 [Note] InnoDB: Loading buffer pool(s) from /dev/shm/var_auto_XkWP/14/mysqld.2/data/ib_buffer_pool
2023-11-16 12:49:14 0 [Note] Plugin 'SEQUENCE' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_FT_CONFIG' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_FT_DELETED' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_CMP' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'THREAD_POOL_WAITS' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_CMP_RESET' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'THREAD_POOL_QUEUES' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'FEEDBACK' is disabled.
2023-11-16 12:49:14 0 [Note] InnoDB: Buffer pool(s) load completed at 231116 12:49:14
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'THREAD_POOL_GROUPS' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'user_variables' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'THREAD_POOL_STATS' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'unix_socket' is disabled.
2023-11-16 12:49:14 0 [Note] Plugin 'wsrep_provider' is disabled.
2023-11-16 12:49:14 0 [Warning] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: unknown variable 'loose-feedback-debug-startup-interval=20'
2023-11-16 12:49:14 0 [Warning] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: unknown variable 'loose-feedback-debug-first-interval=60'
2023-11-16 12:49:14 0 [Warning] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: unknown variable 'loose-feedback-debug-interval=60'
2023-11-16 12:49:14 0 [Warning] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: unknown option '--loose-pam-debug'
2023-11-16 12:49:14 0 [Warning] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: unknown option '--loose-aria'
2023-11-16 12:49:14 0 [Warning] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: unknown variable 'loose-debug-sync-timeout=300'
2023-11-16 12:49:14 0 [Note] Server socket created on IP: '127.0.0.1'.
2023-11-16 12:49:14 0 [Note] mariadbd: Event Scheduler: Loaded 0 events
2023-11-16 12:49:14 0 [Note] /home/buildbot/ppc64le-rhel-8/build/sql/mariadbd: ready for connections.
Version: '11.2.2-MariaDB-log'  socket: '/home/buildbot/ppc64le-rhel-8/build/mysql-test/var/tmp/14/mysqld.2.sock'  port: 16161  Source distribution
2023-11-16 12:49:15 5 [Note] Deleted Master_info file '/home/buildbot/ppc64le-rhel-8/build/mysql-test/var/14/mysqld.2/data/master.info'.
2023-11-16 12:49:15 5 [Note] Deleted Master_info file '/home/buildbot/ppc64le-rhel-8/build/mysql-test/var/14/mysqld.2/data/relay-log.info'.
2023-11-16 12:49:15 5 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
2023-11-16 12:49:15 5 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='', master_port='3306', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16160', master_log_file='', master_log_pos='4'.
2023-11-16 12:49:15 5 [Note] Previous Using_Gtid=Slave_Pos. New Using_Gtid=Slave_Pos
2023-11-16 12:49:15 6 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16160' in log '' at position 4
2023-11-16 12:49:15 7 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 4, relay log './slave-relay-bin.000001' position: 4; GTID position ''
2023-11-16 12:49:15 6 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16160',replication starts at GTID position ''
2023-11-16 12:49:15 7 [Note] Error reading relay log event: slave SQL thread was killed
2023-11-16 12:49:15 7 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 329; GTID position '', master: 127.0.0.1:16160
2023-11-16 12:49:15 6 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 329; GTID position , master 127.0.0.1:16160
2023-11-16 12:49:15 10 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16160' in log 'master-bin.000001' at position 329
2023-11-16 12:49:15 10 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16160',replication starts at GTID position ''
2023-11-16 12:49:15 11 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 329, relay log './slave-relay-bin.000001' position: 4; GTID position ''
2023-11-16 12:49:17 0x7fff7a8dee90  InnoDB: Assertion failure in file /home/buildbot/ppc64le-rhel-8/build/storage/innobase/include/fut0lst.h line 122
InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
231116 12:49:17 [ERROR] mysqld got signal 6 ;
Sorry, we probably made a mistake, and this is a bug.
 
Your assistance in bug reporting will enable us to fix this for the next release.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 11.2.2-MariaDB-log source revision: 0c5176b7744b75ffd6dc1ebe14143b3c2ea88103
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=3
max_threads=153
thread_count=12
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63865 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mysys/stacktrace.c:215(my_print_stacktrace)[0x11d8f6624]
sql/signal_handler.cc:235(handle_fatal_signal)[0x11d2bb328]
addr2line: 'linux-vdso64.so.1': No such file
linux-vdso64.so.1(__kernel_sigtramp_rt64+0x0)[0x7fff883a0514]
/lib64/glibc-hwcaps/power9/libc-2.28.so(gsignal+0xd8)[0x7fff876196f8]
/lib64/glibc-hwcaps/power9/libc-2.28.so(abort+0x164)[0x7fff875f3ff4]
ut/ut0dbg.cc:60(ut_dbg_assertion_failed(char const*, char const*, unsigned int))[0x11ce1800c]
include/fut0lst.h:122(flst_read_addr(unsigned char const*))[0x11d757f6c]
trx/trx0purge.cc:545(purge_sys_t::iterator::free_history() const)[0x11d759074]
srv/srv0srv.cc:1101(purge_truncation_callback(void*))[0x11d74920c]
tpool/task_group.cc:70(tpool::task_group::execute(tpool::task*))[0x11d87c82c]
tpool/task.cc:32(tpool::task::execute())[0x11d87cc08]
tpool/tpool_generic.cc:583(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x11d879464]
bits/invoke.h:73(std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run())[0x11d87b2f8]
/lib64/libstdc++.so.6(+0xf5f74)[0x7fff87a45f74]
/lib64/glibc-hwcaps/power9/libpthread-2.28.so(+0x9678)[0x7fff87bb9678]
/lib64/glibc-hwcaps/power9/libc-2.28.so(clone+0x74)[0x7fff87708938]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /dev/shm/var_auto_XkWP/14/mysqld.2/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1024                 1024                 files     
Max locked memory         51200000             51200000             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       993164               993164               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core
 
Kernel version: Linux version 5.10.0-20-powerpc64le (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.158-2 (2022-12-13)
 
----------SERVER LOG END-------------
mysqltest failed but provided no output
The result from queries just before the failure was:
< snip >
include/master-slave.inc
[connection master]
connection slave;
call mtr.add_suppression("WSREP: handlerton rollback failed");
include/stop_slave.inc
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
SET @old_parallel_threads = @@GLOBAL.slave_parallel_threads;
SET @old_parallel_mode               = @@GLOBAL.slave_parallel_mode;
SET @@global.slave_parallel_mode     ='optimistic';
include/start_slave.inc
connection master;
CREATE TABLE t1 (a INT, b INT) ENGINE=InnoDB;
CREATE TABLE t2 (a INT AUTO_INCREMENT PRIMARY KEY, b INT) ENGINE=InnoDB;
 
 
 
 - found 'core' (0/1)
Core generated by '/home/buildbot/ppc64le-rhel-8/build/sql/mariadbd'
 
Trying 'dbx' to get a backtrace
 
Trying 'lldb' to get a backtrace from coredump /home/buildbot/ppc64le-rhel-8/build/mysql-test/var/14/log/rpl.rpl_parallel_xa_same_xid-innodb,row/mysqld.2/data/core
Compressed file /home/buildbot/ppc64le-rhel-8/build/mysql-test/var/14/log/rpl.rpl_parallel_xa_same_xid-innodb,row/mysqld.2/data/core
 - saving '/home/buildbot/ppc64le-rhel-8/build/mysql-test/var/14/log/rpl.rpl_parallel_xa_same_xid-innodb,row/' to '/home/buildbot/ppc64le-rhel-8/build/mysql-test/var/log/rpl.rpl_parallel_xa_same_xid-innodb,row/'

Comment by Marko Mäkelä [ 2023-11-17 ]

alice, builder 231 is also known as ppc64le-rhel-8. I think that this is a Linux io_uring kernel bug that affects at least POWER. This is being tracked by MDEV-30728.

Implementations of POWER and ARM ISAs typically implement weak memory ordering, unlike (say) IA-32, AMD64, SPARC, and RISC-V RVTSO, which follow Total Store Order (TSO). This bug could be a Linux kernel equivalent of MDEV-28430.

Comment by Marko Mäkelä [ 2023-11-17 ]

Builder 384 that is mentioned in Description is aarch64-fedora-36, which I think should be recent enough to support io_uring. For now, I could believe that the kernel bug behind MDEV-28430 can affect also ARMv8.

Generated at Thu Feb 08 10:11:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.