[MDEV-26177] WSREP assertion failure in bf_abort when replicating certain DDL against in-memory tables - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Cannot Reproduce
Affects Version/s: 10.5.11
Fix Version/s: N/A
Component/s: Galera
Labels:
None
Environment:
CentOS7 / RH7

Description

The bug is triggered by truncates of ENGINE=MEMORY tables: TRUNCATE TABLE statement is (erroneously?) replicated, fails to be applied because in-memory tables are node-local, tx is brute force aborted, then “local or streaming tx” assertion fails in WSREP and node crashes with an error similar to the following, and may eventually lead to cluster lockup when WSREP goes completely haywire due to repeated node crashes caused by this same transaction replicated again and again: Perhaps mode==local should evaluate to true ?

2021-07-17 10:26:43 10 [ERROR] Slave SQL: Error 'Table 'lsa.ttm$etl_dimension_discovery_etl06_lsa_001' doesn't exist' on query. Default database: 'lsa'. Query: '/* DATABASE_EXECUTE_DDL*/ TRUNCATE TABLE `ttm$etl_dimension_discovery_etl06_lsa_001` /* PROCESS(LSA [ETLs] <lsa@etl06>, metadata_background|abandoned[1h]|time_to_live[4h]|maximum_reuse[15m])*/', Internal MariaDB error code: 1146
2021-07-17 10:26:43 10 [Warning] WSREP: Ignoring error 'Table 'lsa.ttm$etl_dimension_discovery_etl06_lsa_001' doesn't exist' on query. Default database: 'lsa'. Query: '/* DATABASE_EXECUTE_DDL*/ TRUNCATE TABLE `ttm$etl_dimension_discovery_etl06_lsa_001` /* PROCESS(LSA [ETLs] <lsa@etl06>, metadata_background|abandoned[1h]|time_to_live[4h]|maximum_reuse[15m])*/', Error_code: 1146
mariadbd: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.11/wsrep-lib/include/wsrep/client_state.hpp:668: int wsrep::client_state::bf_abort(wsrep::seqno): Assertion `mode_ == m_local || transaction_.is_streaming()' failed.

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.5.11-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=165
max_threads=3010
thread_count=182
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6732055 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
2021-07-17 10:27:18 12 [ERROR] Slave SQL: Could not execute Update_rows_v1 event on table lsa.configuration_group; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log FIRST, end_log_pos 713, Internal MariaDB error code: 1213
??:0(my_print_stacktrace)[0x55c7d751179e]
??:0(handle_fatal_signal)[0x55c7d6f16457]
sigaction.c:0(__restore_rt)[0x7f6ad94c1630]
:0(__GI_raise)[0x7f6ad890c387]
:0(__GI_abort)[0x7f6ad890da78]
:0(__assert_fail_base)[0x7f6ad89051a6]
:0(_GI__assert_fail)[0x7f6ad8905252]
??:0(wsrep_bf_abort(THD const*, THD*))[0x55c7d71d9dc6]
??:0(wsrep_thd_bf_abort)[0x55c7d71e000f]
??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55c7d72076cd]
??:0(handle_manager)[0x55c7d6d05fde]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55c7d716356d]
pthread_create.c:0(start_thread)[0x7f6ad94b9ea5]
??:0(__clone)[0x7f6ad89d49fd]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 127953 127953 processes
Max open files 32768 32768 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 127953 127953 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: core

Attachments

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Kent Hoover

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2021-07-19 13:58

Updated:: 2022-10-03 07:54

Resolved:: 2022-10-03 07:54

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.