[MDEV-21026] Galera: Assertion `xid_seqno > wsrep_seqno' failed in trx_rseg_update_wsrep_checkpoint | Replicating servers 10.2 & 10.3, attempt to use RQG Created: 2019-11-11  Updated: 2021-01-14  Resolved: 2021-01-14

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2.30, 10.3.20
Fix Version/s: 10.2.37, 10.3.28

Type: Bug Priority: Major
Reporter: Stepan Patryshev (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: galera
Environment:

OS: CentOS Linux release 7.6.1810 (Core).


Attachments: Zip Archive 191112_10.2.zip     File galera_stress.yy     File galera_stress.zz     File mysql0.err     File mysql1.err     File mysql2.err     File queries0.zip.001     File queries0.zip.002     File rqg_191107_10.3_vol.zip.001     File rqg_191107_10.3_vol.zip.002     File rqg_191107_10.3_vol.zip.003    
Issue Links:
Relates
relates to MDEV-21025 Galera: Server 10.4 crashes with sign... Closed

 Description   

MariaDB Version 10.3.20-MariaDB-debug: Repository: MariaDB/server; branch 10.3; Revision 352e7667877ec6603aff5234c003ef8c11885cd8; debug built from sources.

Galera lib 25.3.28: Repository: MariaDB/galera; branch mariadb-3.x; Revision 792d6990b65259d12327c211e6658cf1c0c818cfb; debug built from sources.

RQG: Repository: MariaDB/randgen; master branch; Revision bea5b6b07d08ba6349d8a6ff4356b01678822727.

Run:

perl ./runall-new.pl --grammar=conf/galera/galera_stress.yy --gendata=conf/galera/galera_stress.zz --duration=4000 --queries=200M --threads=1 --galera=mss --basedir=/home/stepan/mariadb/10.3 --vardir=/home/stepan/rqg/var --sqltrace=MarkErrors "--mysqld=--wsrep-provider=/usr/lib/libgalera_smm_3.so" "--mysqld=--wsrep_sst_method=rsync" "--mysqld=--core" "--mysqld=--general-log" "--mysqld=--general-log-file=queries.log" "--mysqld=--log-output=file" "--mysqld=--wsrep-debug=server" "--mysqld=--wsrep-sync-wait=15" "--mysqld=--wsrep_retry_autocommit=0" "--mysqld=--wsrep_log_conflicts=1" "--mysqld=--wsrep_on=ON"

Server 1 log:
191107 22:24:56 [ERROR] mysqld got signal 6 ;

2019-11-07 22:24:56 2 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10000_innodb_int; Duplicate entry '0' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 8432, Internal MariaDB error code: 1062
2019-11-07 22:24:56 2 [Warning] WSREP: RBR event 64 Write_rows_v1 apply warning: 121, 132
2019-11-07 22:24:56 2 [Warning] WSREP: Failed to apply app buffer: seqno: 132, status: 1
	 at galera/src/trx_handle.cpp:apply():353
	 at galera/src/replicator_smm.cpp:apply_trx_ws():34
Retrying 2th time
2019-11-07 22:24:56 2 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table10000_innodb_int; Duplicate entry '0' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 8432, Internal MariaDB error code: 1062
2019-11-07 22:24:56 2 [Warning] WSREP: RBR event 64 Write_rows_v1 apply warning: 121, 132
mysqld: /home/stepan/mariadb/10.3/storage/innobase/trx/trx0rseg.cc:92: void trx_rseg_update_wsrep_checkpoint(trx_rsegf_t*, const XID*, mtr_t*): Assertion `xid_seqno > wsrep_seqno' failed.
191107 22:24:56 [ERROR] mysqld got signal 6 ;
 
Server version: 10.3.20-MariaDB-debug-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=9
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467507 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fd3b0000af0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fd3d9398da8 thread_stack 0x49000
/home/stepan/mariadb/10.3/sql/mysqld(my_print_stacktrace+0x40)[0x559ff274ac7e]
mysys/stacktrace.c:269(my_print_stacktrace)[0x559ff1edc9df]
sigaction.c:0(__restore_rt)[0x7fd3d90fa5d0]
:0(__GI_raise)[0x7fd3d73e8207]
:0(__GI_abort)[0x7fd3d73e98f8]
:0(__assert_fail_base)[0x7fd3d73e1026]
:0(__GI___assert_fail)[0x7fd3d73e10d2]
trx/trx0rseg.cc:92(trx_rseg_update_wsrep_checkpoint(unsigned char*, xid_t const*, mtr_t*))[0x559ff2373287]
trx/trx0purge.cc:284(trx_purge_add_undo_to_history(trx_t const*, trx_undo_t*&, mtr_t*))[0x559ff2351eb0]
trx/trx0trx.cc:1133(trx_write_serialisation_history(trx_t*, mtr_t*))[0x559ff237f0cc]
trx/trx0trx.cc:1562(trx_commit_low(trx_t*, mtr_t*))[0x559ff23810ca]
trx/trx0trx.cc:1610(trx_commit(trx_t*))[0x559ff2381224]
trx/trx0roll.cc:66(trx_rollback_finish(trx_t*))[0x559ff236c09b]
trx/trx0roll.cc:149(trx_rollback_to_savepoint_low(trx_t*, trx_savept_t*))[0x559ff236c89d]
trx/trx0roll.cc:211(trx_rollback_for_mysql_low(trx_t*))[0x559ff236cd3a]
trx/trx0roll.cc:235(trx_rollback_for_mysql(trx_t*))[0x559ff236d1eb]
handler/ha_innodb.cc:4724(innobase_rollback(handlerton*, THD*, bool))[0x559ff210b3ce]
sql/handler.cc:1748(ha_rollback_trans(THD*, bool))[0x559ff1ee152f]
sql/transaction.cc:408(trans_rollback(THD*))[0x559ff1d452d8]
sql/wsrep_applier.cc:342(wsrep_rollback(THD*))[0x559ff1e00de5]
sql/wsrep_applier.cc:370(wsrep_commit_cb)[0x559ff1e00edf]
src/replicator_smm.cpp:58(apply_trx_ws(void*, wsrep_cb_status (*)(void*, void const*, unsigned long, unsigned int, wsrep_trx_meta const*), wsrep_cb_status (*)(void*, unsigned int, wsrep_trx_meta const*, bool*, bool), galera::TrxHandle const&, wsrep_trx_meta const&))[0x7fd3d528d0e0]
src/replicator_smm.cpp:450(galera::ReplicatorSMM::apply_trx(void*, galera::TrxHandle*))[0x7fd3d528f7c1]
src/replicator_smm.cpp:1273(galera::ReplicatorSMM::process_trx(void*, galera::TrxHandle*))[0x7fd3d529200e]
src/gcs_action_source.cpp:116(galera::GcsActionSource::dispatch(void*, gcs_action const&, bool&))[0x7fd3d5277efd]
src/gcs_action_source.cpp:180(galera::GcsActionSource::process(void*, bool&))[0x7fd3d5278369]
src/replicator_smm.cpp:362(galera::ReplicatorSMM::async_recv(void*))[0x7fd3d52926a4]
sql/wsrep_thd.cc:363(wsrep_replication_process(THD*))[0x559ff1e025e9]
sql/wsrep_mysqld.cc:2117(start_wsrep_THD)[0x559ff1df293b]
pthread_create.c:0(start_thread)[0x7fd3d90f2dd5]
/lib64/libc.so.6(clone+0x6d)[0x7fd3d74afead]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fd3c82fe510): INSERT IGNORE INTO `table10000_innodb_int` ( `col_int_key`, `col_char_12` ) VALUES ( 'b', 'w' ), ( 5, 'b' ) /* QNO 1296 CON_ID 15 */
Connection ID (thread ID): 2
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /home/stepan/rqg/var/node1/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             4096                 23005                processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23005                23005                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h

Server 2 log has the same errors.

Server 0 log.

queries0.log: queries0.zip.001 + queries0.zip.002.

RQG output: rqg_191107_10.3_vol.zip.001 + rqg_191107_10.3_vol.zip.002 + rqg_191107_10.3_vol.zip.003.



 Comments   
Comment by Stepan Patryshev (Inactive) [ 2019-11-12 ]

The same issue is reproduced on 10.2:
MariaDB Version 10.2.30-MariaDB-debug: Repository: MariaDB/server; branch 10.2; Revision 40e65e878eac4d88ec124610e5ede5132b3c4cf2; debug built from sources.

Servers logs.

Comment by Stepan Patryshev (Inactive) [ 2019-11-27 ]

I have attached appropriate xx/yy files.

Comment by Stepan Patryshev (Inactive) [ 2019-11-28 ]

The same issue is reproduced on 10.2 ES:
MariaDB Version 10.2.29-4-MariaDB-debug: Repository: mariadb-corporation/MariaDBEnterprise; branch 10.2-enterprise; Revision 6b01d80ff85c354105cbf869d218b80bcac5beb2.

Comment by Roel Van de Paar [ 2020-04-18 ]

Maybe related to MDEV-22216?

Comment by Jan Lindström (Inactive) [ 2021-01-14 ]

I could not reproduce with my external SSD using USB-C. In my understanding this is fixed in MDEV-24327.

Generated at Thu Feb 08 09:04:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.