[MDEV-15785] Test case galera_as_slave_replication_bundle caused debug assertion - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.1(EOL)
Fix Version/s: N/A
Component/s: Galera, Replication, Tests
Labels:
- Test_disabled
- test_disabled_ES

Description

2018-04-05 10:48:04 140629028526848 [Note] WSREP: ready state reached

2018-04-05 10:48:04 140629028526848 [Note] Slave SQL thread initialized, starting replication in log 'FIRST' at position 0, relay log './mysqld-relay-bin.000001' position: 4

2018-04-05 10:48:04 140629028834048 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16000',replication started in log 'FIRST' at position 4

mysqld: /home/jan/mysql/10.1-galera/sql/log_event.cc:6784: virtual int Gtid_log_event::do_apply_event(rpl_group_info*): Assertion `(bits & (1ULL << 25)) == 0' failed.

180405 10:48:04 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.33-MariaDB-debug

key_buffer_size=1048576

read_buffer_size=131072

max_used_connections=2

max_threads=153

thread_count=4

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63044 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fe6ab06a070

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7fe6bf4a48f0 thread_stack 0x48400

mysys/stacktrace.c:267(my_print_stacktrace)[0x559f11653ab0]

sql/signal_handler.cc:168(handle_fatal_signal)[0x559f10fed82b]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x11f50)[0x7fe6c2a45f50]

linux/raise.c:51(__GI_raise)[0x7fe6bfcb0e7b]

stdlib/abort.c:81(__GI_abort)[0x7fe6bfcb2231]

assert/assert.c:89(__assert_fail_base)[0x7fe6bfca99da]

/lib/x86_64-linux-gnu/libc.so.6(+0x2da52)[0x7fe6bfca9a52]

sql/log_event.cc:6785(Gtid_log_event::do_apply_event(rpl_group_info*))[0x559f110eb487]

sql/log_event.h:1343(Log_event::apply_event(rpl_group_info*))[0x559f10d1afcb]

sql/slave.cc:3479(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x559f10d10729]

sql/slave.cc:3601(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*))[0x559f10d10bb2]

sql/slave.cc:3885(exec_relay_log_event(THD*, Relay_log_info*, rpl_group_info*))[0x559f10d113b6]

sql/slave.cc:4981(handle_slave_sql)[0x559f10d145ff]

perfschema/pfs.cc:1863(pfs_spawn_thread)[0x559f11314c3b]

nptl/pthread_create.c:463(start_thread)[0x7fe6c2a3b5aa]

x86_64/clone.S:97(clone)[0x7fe6bfd72cbf]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x0):

Connection ID (thread ID): 9

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

Writing a core file

----------SERVER LOG END-------------

 - saving '/home/jan/mysql/10.1-galera/mysql-test/var/log/galera.galera_as_slave_replication_bundle-innodb_plugin/' to '/home/jan/mysql/10.1-galera/mysql-test/var/log/galera.galera_as_slave_replication_bundle-innodb_plugin/'

Only  1  of 2 completed.

--------------------------------------------------------------------------

The servers were restarted 0 times

Spent 0.000 of 12 seconds executing testcases

Failure: Failed 1/1 tests, 0.00% were successful.

Failing test(s): galera.galera_as_slave_replication_bundle

The log files in var/log may give you some hint of what went wrong.

If you want to report this error, please read first the documentation

at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html

mysql-test-run: *** ERROR: there were failing test cases

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

bundle.diff
2018-10-02 10:20
6 kB
Jan Lindström

Activity

Ascending order - Click to sort in descending order

Jan Lindström (Inactive) added a comment - 2018-09-13 11:32

Elkin Can you have a look this assertion happens on middle of replication code that is not familiar to me.

Jan Lindström (Inactive) added a comment - 2018-09-13 11:32 Elkin Can you have a look this assertion happens on middle of replication code that is not familiar to me.

Andrei Elkin added a comment - 2018-09-13 12:35

This looks like caused by skipped resetting of the flag by a preceding transaction. You can find its end position in the master binlog
in file: rli->group_master_log_name at pos:rli->group_master_log_pos.

It should have clear the flag at execution
of Xid_log_event::do_apply_event (or Query_log_event's method for Commit query). As you have a test that reproduces the issue
it must be straightforward to locate that miss.
I would do this:

set a gdb br for this SQL thread before it takes on a preceding Gtid group and watch
how the flag is set, and then how the group of events terminates (if it does). E.g absence of Xid_log_event at its end
would trigger the assert. You can

If you will need any help to process in gdb I am always there.

Andrei Elkin added a comment - 2018-09-13 12:35 This looks like caused by skipped resetting of the flag by a preceding transaction. You can find its end position in the master binlog in file: rli->group_master_log_name at pos:rli->group_master_log_pos. It should have clear the flag at execution of Xid_log_event::do_apply_event (or Query_log_event's method for Commit query). As you have a test that reproduces the issue it must be straightforward to locate that miss. I would do this: set a gdb br for this SQL thread before it takes on a preceding Gtid group and watch how the flag is set, and then how the group of events terminates (if it does). E.g absence of Xid_log_event at its end would trigger the assert. You can If you will need any help to process in gdb I am always there.

Jan Lindström (Inactive) added a comment - 2018-10-02 10:16 - edited

After fixing original assertion a new one surfaced from wsrep InnoDB checkpoint. First checkpoint is written as follows (using instrumented code to get assertion where we want):

2018-10-02 13:03:33 140472161303296 [Note] WSREP: Xid_log_event: Skipping wsrep commit replicated 1 binlog ./mysqld-relay-bin.000002 pos 871

2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:03:33 140472161303296 [Note] WSREP: Wsrep checkpoint writing for uuid 66f554e0-c62a-11e8-97f5-9262cfb8a273 xid_seqno 2 trx_sys_cur_xid_seqno 1 trx_state 2

2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:03:33 7fc239467b00  InnoDB: Assertion failure in thread 140472161303296 in file trx0sys.cc line 391

InnoDB: Failing assertion: 0

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

181002 13:03:33 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.37-MariaDB-debug

key_buffer_size=1048576

read_buffer_size=131072

max_used_connections=2

max_threads=153

thread_count=4

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63026 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7fc221c70070

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7fc2394668f0 thread_stack 0x48400

2018-10-02 13:03:33 140472161917696 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1

/home/jan/mysql/10.1-galera/sql/mysqld(my_print_stacktrace+0x38)[0x55fd4be68365]

mysys/stacktrace.c:267(my_print_stacktrace)[0x55fd4b803e39]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7fc23b6858e0]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc239ad5f3b]

linux/raise.c:51(__GI_raise)[0x7fc239ad72f1]

/home/jan/mysql/10.1-galera/sql/mysqld(+0xbe82e2)[0x55fd4bcdd2e2]

trx/trx0sys.cc:397(trx_sys_update_wsrep_checkpoint(xid_t const*, unsigned char*, mtr_t*, trx_t const*))[0x55fd4bce49a2]

trx/trx0trx.cc:1243(trx_write_serialisation_history(trx_t*, mtr_t*))[0x55fd4bce59c1]

trx/trx0trx.cc:1639(trx_commit_low(trx_t*, mtr_t*))[0x55fd4bce5a86]

trx/trx0trx.cc:1668(trx_commit(trx_t*))[0x55fd4bce6441]

trx/trx0trx.cc:1932(trx_commit_for_mysql(trx_t*))[0x55fd4bb40d0d]

handler/ha_innodb.cc:4704(innobase_commit_low(trx_t*))[0x55fd4bb40feb]

handler/ha_innodb.cc:4846(innobase_commit_ordered_2(trx_t*, THD*))[0x55fd4bb41191]

handler/ha_innodb.cc:4906(innobase_commit_ordered(handlerton*, THD*, bool))[0x55fd4b8ea1b4]

sql/log.cc:8794(TC_LOG::run_commit_ordered(THD*, bool))[0x55fd4b8e8369]

sql/log.cc:8024(MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*))[0x55fd4b8e7345]

sql/log.cc:7616(MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*))[0x55fd4b8e6a58]

sql/log.cc:7290(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55fd4b8d8a29]

sql/log.cc:1751(binlog_flush_cache(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55fd4b8d8ee8]

sql/log.cc:1859(binlog_commit_flush_xid_caches(THD*, binlog_cache_mngr*, bool, unsigned long long))[0x55fd4b8ec2c6]

sql/log.cc:9576(MYSQL_BIN_LOG::log_and_order(THD*, unsigned long long, bool, bool, bool))[0x55fd4b807474]

sql/handler.cc:1463(ha_commit_trans(THD*, bool))[0x55fd4b71093b]

sql/transaction.cc:235(trans_commit(THD*))[0x55fd4b902ea5]

sql/log_event.cc:7675(Xid_log_event::do_apply_event(rpl_group_info*))[0x55fd4b530f3f]

sql/log_event.h:1343(Log_event::apply_event(rpl_group_info*))[0x55fd4b5267d9]

sql/slave.cc:3479(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x55fd4b526c62]

sql/slave.cc:3601(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*))[0x55fd4b527466]

sql/slave.cc:3885(exec_relay_log_event(THD*, Relay_log_info*, rpl_group_info*))[0x55fd4b52a6bb]

sql/slave.cc:4981(handle_slave_sql)[0x55fd4bb29ace]

nptl/pthread_create.c:463(start_thread)[0x7fc23b67af2a]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fc239b97edf]

This does not yet break any existing rules. However, we have a second commit for the same xid_seqno as follows (again using instrumented code) that does break current expectations i.e. the fact that xid_seqno should be increasing:

2018-10-02 13:14:57 140216945548032 [Note] WSREP: Xid_log_event: Skipping wsrep commit replicated 1 binlog ./mysqld-relay-bin.000002 pos 871

2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:14:57 140216945548032 [Note] WSREP: Wsrep checkpoint writing for uuid fe20fa80-c62b-11e8-b401-2e57249066f2 xid_seqno 2 trx_sys_cur_xid_seqno 1 trx_state 2

2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0

2018-10-02 13:14:57 140216945548032 [Note] WSREP: WRONG: wsrep checkpoint writing for uuid fe20fa80-c62b-11e8-b401-2e57249066f2 xid_seqno 2 trx_sys_cur_xid_seqno 2 trx_state 2

2018-10-02 13:14:57 7f86cd3b1b00  InnoDB: Assertion failure in thread 140216945548032 in file trx0sys.cc line 395

InnoDB: Failing assertion: xid_seqno > trx_sys_cur_xid_seqno

InnoDB: We intentionally generate a memory trap.

InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

InnoDB: If you get repeated assertion failures or crashes, even

InnoDB: immediately after the mysqld startup, there may be

InnoDB: corruption in the InnoDB tablespace. Please refer to

InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

InnoDB: about forcing recovery.

181002 13:14:57 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.37-MariaDB-debug

key_buffer_size=1048576

read_buffer_size=131072

max_used_connections=2

max_threads=153

thread_count=4

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63026 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f86c1474070

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f86cd3b08f0 thread_stack 0x48400

2018-10-02 13:14:57 140217000942336 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1

/home/jan/mysql/10.1-galera/sql/mysqld(my_print_stacktrace+0x38)[0x55b9c460c409]

mysys/stacktrace.c:267(my_print_stacktrace)[0x55b9c3fa7e39]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7f86da3c68e0]

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f86d8816f3b]

linux/raise.c:51(__GI_raise)[0x7f86d88182f1]

/home/jan/mysql/10.1-galera/sql/mysqld(+0xbe83a9)[0x55b9c44813a9]

trx/trx0sys.cc:398(trx_sys_update_wsrep_checkpoint(xid_t const*, unsigned char*, mtr_t*, trx_t const*))[0x55b9c4488a45]

trx/trx0trx.cc:1243(trx_write_serialisation_history(trx_t*, mtr_t*))[0x55b9c4489a64]

trx/trx0trx.cc:1639(trx_commit_low(trx_t*, mtr_t*))[0x55b9c4489b29]

trx/trx0trx.cc:1668(trx_commit(trx_t*))[0x55b9c448a4e4]

trx/trx0trx.cc:1932(trx_commit_for_mysql(trx_t*))[0x55b9c42e4d0d]

handler/ha_innodb.cc:4704(innobase_commit_low(trx_t*))[0x55b9c42e4feb]

handler/ha_innodb.cc:4846(innobase_commit_ordered_2(trx_t*, THD*))[0x55b9c42e5191]

handler/ha_innodb.cc:4906(innobase_commit_ordered(handlerton*, THD*, bool))[0x55b9c408e1b4]

sql/log.cc:8794(TC_LOG::run_commit_ordered(THD*, bool))[0x55b9c408c369]

sql/log.cc:8024(MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*))[0x55b9c408b345]

sql/log.cc:7616(MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*))[0x55b9c408aa58]

sql/log.cc:7290(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55b9c407ca29]

sql/log.cc:1751(binlog_flush_cache(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55b9c407cee8]

sql/log.cc:1859(binlog_commit_flush_xid_caches(THD*, binlog_cache_mngr*, bool, unsigned long long))[0x55b9c40902c6]

sql/log.cc:9576(MYSQL_BIN_LOG::log_and_order(THD*, unsigned long long, bool, bool, bool))[0x55b9c3fab474]

sql/handler.cc:1463(ha_commit_trans(THD*, bool))[0x55b9c3eb493b]

sql/transaction.cc:235(trans_commit(THD*))[0x55b9c40a6ea5]

sql/log_event.cc:7675(Xid_log_event::do_apply_event(rpl_group_info*))[0x55b9c3cd4f3f]

sql/log_event.h:1343(Log_event::apply_event(rpl_group_info*))[0x55b9c3cca7d9]

sql/slave.cc:3479(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x55b9c3ccac62]

sql/slave.cc:3601(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*))[0x55b9c3ccb466]

sql/slave.cc:3885(exec_relay_log_event(THD*, Relay_log_info*, rpl_group_info*))[0x55b9c3cce6bb]

sql/slave.cc:4981(handle_slave_sql)[0x55b9c42cdace]

nptl/pthread_create.c:463(start_thread)[0x7f86da3bbf2a]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f86d88d8edf]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x0):

Connection ID (thread ID): 9

Status: NOT_KILLED

Patch for instrumented code attached on this ticket. We could just ignore these same xid_seqno writes on InnoDB, but in my opinion it is not correct way. How to repeat:

./mtr galera_as_slave_replication_bundle,xtradb --enable-disabled --mysqld=--wsrep-debug=ON

Jan Lindström (Inactive) added a comment - 2018-10-02 10:16 - edited After fixing original assertion a new one surfaced from wsrep InnoDB checkpoint. First checkpoint is written as follows (using instrumented code to get assertion where we want): 2018-10-02 13:03:33 140472161303296 [Note] WSREP: Xid_log_event: Skipping wsrep commit replicated 1 binlog ./mysqld-relay-bin.000002 pos 871 2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:03:33 140472161303296 [Note] WSREP: Wsrep checkpoint writing for uuid 66f554e0-c62a-11e8-97f5-9262cfb8a273 xid_seqno 2 trx_sys_cur_xid_seqno 1 trx_state 2 2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:03:33 140472161303296 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:03:33 7fc239467b00 InnoDB: Assertion failure in thread 140472161303296 in file trx0sys.cc line 391 InnoDB: Failing assertion: 0 InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 181002 13:03:33 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.37-MariaDB-debug key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=2 max_threads=153 thread_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63026 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7fc221c70070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7fc2394668f0 thread_stack 0x48400 2018-10-02 13:03:33 140472161917696 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1 /home/jan/mysql/10.1-galera/sql/mysqld(my_print_stacktrace+0x38)[0x55fd4be68365] mysys/stacktrace.c:267(my_print_stacktrace)[0x55fd4b803e39] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7fc23b6858e0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fc239ad5f3b] linux/raise.c:51(__GI_raise)[0x7fc239ad72f1] /home/jan/mysql/10.1-galera/sql/mysqld(+0xbe82e2)[0x55fd4bcdd2e2] trx/trx0sys.cc:397(trx_sys_update_wsrep_checkpoint(xid_t const*, unsigned char*, mtr_t*, trx_t const*))[0x55fd4bce49a2] trx/trx0trx.cc:1243(trx_write_serialisation_history(trx_t*, mtr_t*))[0x55fd4bce59c1] trx/trx0trx.cc:1639(trx_commit_low(trx_t*, mtr_t*))[0x55fd4bce5a86] trx/trx0trx.cc:1668(trx_commit(trx_t*))[0x55fd4bce6441] trx/trx0trx.cc:1932(trx_commit_for_mysql(trx_t*))[0x55fd4bb40d0d] handler/ha_innodb.cc:4704(innobase_commit_low(trx_t*))[0x55fd4bb40feb] handler/ha_innodb.cc:4846(innobase_commit_ordered_2(trx_t*, THD*))[0x55fd4bb41191] handler/ha_innodb.cc:4906(innobase_commit_ordered(handlerton*, THD*, bool))[0x55fd4b8ea1b4] sql/log.cc:8794(TC_LOG::run_commit_ordered(THD*, bool))[0x55fd4b8e8369] sql/log.cc:8024(MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*))[0x55fd4b8e7345] sql/log.cc:7616(MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*))[0x55fd4b8e6a58] sql/log.cc:7290(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55fd4b8d8a29] sql/log.cc:1751(binlog_flush_cache(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55fd4b8d8ee8] sql/log.cc:1859(binlog_commit_flush_xid_caches(THD*, binlog_cache_mngr*, bool, unsigned long long))[0x55fd4b8ec2c6] sql/log.cc:9576(MYSQL_BIN_LOG::log_and_order(THD*, unsigned long long, bool, bool, bool))[0x55fd4b807474] sql/handler.cc:1463(ha_commit_trans(THD*, bool))[0x55fd4b71093b] sql/transaction.cc:235(trans_commit(THD*))[0x55fd4b902ea5] sql/log_event.cc:7675(Xid_log_event::do_apply_event(rpl_group_info*))[0x55fd4b530f3f] sql/log_event.h:1343(Log_event::apply_event(rpl_group_info*))[0x55fd4b5267d9] sql/slave.cc:3479(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x55fd4b526c62] sql/slave.cc:3601(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*))[0x55fd4b527466] sql/slave.cc:3885(exec_relay_log_event(THD*, Relay_log_info*, rpl_group_info*))[0x55fd4b52a6bb] sql/slave.cc:4981(handle_slave_sql)[0x55fd4bb29ace] nptl/pthread_create.c:463(start_thread)[0x7fc23b67af2a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fc239b97edf] This does not yet break any existing rules. However, we have a second commit for the same xid_seqno as follows (again using instrumented code) that does break current expectations i.e. the fact that xid_seqno should be increasing: 2018-10-02 13:14:57 140216945548032 [Note] WSREP: Xid_log_event: Skipping wsrep commit replicated 1 binlog ./mysqld-relay-bin.000002 pos 871 2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:14:57 140216945548032 [Note] WSREP: Wsrep checkpoint writing for uuid fe20fa80-c62b-11e8-b401-2e57249066f2 xid_seqno 2 trx_sys_cur_xid_seqno 1 trx_state 2 2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:14:57 140216945548032 [Note] WSREP: apply_event_and_update_pos() result: 0 2018-10-02 13:14:57 140216945548032 [Note] WSREP: WRONG: wsrep checkpoint writing for uuid fe20fa80-c62b-11e8-b401-2e57249066f2 xid_seqno 2 trx_sys_cur_xid_seqno 2 trx_state 2 2018-10-02 13:14:57 7f86cd3b1b00 InnoDB: Assertion failure in thread 140216945548032 in file trx0sys.cc line 395 InnoDB: Failing assertion: xid_seqno > trx_sys_cur_xid_seqno InnoDB: We intentionally generate a memory trap. InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ InnoDB: If you get repeated assertion failures or crashes, even InnoDB: immediately after the mysqld startup, there may be InnoDB: corruption in the InnoDB tablespace. Please refer to InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html InnoDB: about forcing recovery. 181002 13:14:57 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.37-MariaDB-debug key_buffer_size=1048576 read_buffer_size=131072 max_used_connections=2 max_threads=153 thread_count=4 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63026 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f86c1474070 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f86cd3b08f0 thread_stack 0x48400 2018-10-02 13:14:57 140217000942336 [Note] WSREP: wsrep_sync_wait: thd->variables.wsrep_sync_wait = 15, mask = 1 /home/jan/mysql/10.1-galera/sql/mysqld(my_print_stacktrace+0x38)[0x55b9c460c409] mysys/stacktrace.c:267(my_print_stacktrace)[0x55b9c3fa7e39] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7f86da3c68e0] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7f86d8816f3b] linux/raise.c:51(__GI_raise)[0x7f86d88182f1] /home/jan/mysql/10.1-galera/sql/mysqld(+0xbe83a9)[0x55b9c44813a9] trx/trx0sys.cc:398(trx_sys_update_wsrep_checkpoint(xid_t const*, unsigned char*, mtr_t*, trx_t const*))[0x55b9c4488a45] trx/trx0trx.cc:1243(trx_write_serialisation_history(trx_t*, mtr_t*))[0x55b9c4489a64] trx/trx0trx.cc:1639(trx_commit_low(trx_t*, mtr_t*))[0x55b9c4489b29] trx/trx0trx.cc:1668(trx_commit(trx_t*))[0x55b9c448a4e4] trx/trx0trx.cc:1932(trx_commit_for_mysql(trx_t*))[0x55b9c42e4d0d] handler/ha_innodb.cc:4704(innobase_commit_low(trx_t*))[0x55b9c42e4feb] handler/ha_innodb.cc:4846(innobase_commit_ordered_2(trx_t*, THD*))[0x55b9c42e5191] handler/ha_innodb.cc:4906(innobase_commit_ordered(handlerton*, THD*, bool))[0x55b9c408e1b4] sql/log.cc:8794(TC_LOG::run_commit_ordered(THD*, bool))[0x55b9c408c369] sql/log.cc:8024(MYSQL_BIN_LOG::trx_group_commit_leader(MYSQL_BIN_LOG::group_commit_entry*))[0x55b9c408b345] sql/log.cc:7616(MYSQL_BIN_LOG::write_transaction_to_binlog_events(MYSQL_BIN_LOG::group_commit_entry*))[0x55b9c408aa58] sql/log.cc:7290(MYSQL_BIN_LOG::write_transaction_to_binlog(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55b9c407ca29] sql/log.cc:1751(binlog_flush_cache(THD*, binlog_cache_mngr*, Log_event*, bool, bool, bool))[0x55b9c407cee8] sql/log.cc:1859(binlog_commit_flush_xid_caches(THD*, binlog_cache_mngr*, bool, unsigned long long))[0x55b9c40902c6] sql/log.cc:9576(MYSQL_BIN_LOG::log_and_order(THD*, unsigned long long, bool, bool, bool))[0x55b9c3fab474] sql/handler.cc:1463(ha_commit_trans(THD*, bool))[0x55b9c3eb493b] sql/transaction.cc:235(trans_commit(THD*))[0x55b9c40a6ea5] sql/log_event.cc:7675(Xid_log_event::do_apply_event(rpl_group_info*))[0x55b9c3cd4f3f] sql/log_event.h:1343(Log_event::apply_event(rpl_group_info*))[0x55b9c3cca7d9] sql/slave.cc:3479(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x55b9c3ccac62] sql/slave.cc:3601(apply_event_and_update_pos(Log_event*, THD*, rpl_group_info*))[0x55b9c3ccb466] sql/slave.cc:3885(exec_relay_log_event(THD*, Relay_log_info*, rpl_group_info*))[0x55b9c3cce6bb] sql/slave.cc:4981(handle_slave_sql)[0x55b9c42cdace] nptl/pthread_create.c:463(start_thread)[0x7f86da3bbf2a] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f86d88d8edf] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x0): Connection ID (thread ID): 9 Status: NOT_KILLED Patch for instrumented code attached on this ticket. We could just ignore these same xid_seqno writes on InnoDB, but in my opinion it is not correct way. How to repeat: ./mtr galera_as_slave_replication_bundle,xtradb --enable-disabled --mysqld=--wsrep-debug=ON

Jan Lindström (Inactive) added a comment - 2018-10-02 10:23 - edited

seppo I do not know how to proceed from this state. To me it looks like there is two participants on xa group commit on the same node.

Jan Lindström (Inactive) added a comment - 2018-10-02 10:23 - edited seppo I do not know how to proceed from this state. To me it looks like there is two participants on xa group commit on the same node.

MariaDB Server

Test case galera_as_slave_replication_bundle caused debug assertion

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration