[MDEV-24322] Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_P REFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329 Created: 2020-12-01  Updated: 2023-08-17  Resolved: 2021-11-23

Status: Closed
Project: MariaDB Server
Component/s: Galera, Platform RedHat, Storage Engine - InnoDB
Affects Version/s: 10.5.6, 10.4.21
Fix Version/s: 10.4.22, 10.5.13, 10.6.5

Type: Bug Priority: Critical
Reporter: Martin Nix Assignee: Jan Lindström (Inactive)
Resolution: Duplicate Votes: 3
Labels: crash, innodb
Environment:

Centos 7.8.2003, 32 vCore, 256GB RAM


Attachments: Text File gdb-core.11003.txt     Text File mariadb_srv1.log     Text File mariadb_srv2.log     Text File mariadb_srv3.log    
Issue Links:
Duplicate
duplicates MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lo... Open
duplicates MDEV-24978 SIGABRT in __libc_message Closed
is duplicated by MDEV-24035 Failing assertion: UT_LIST_GET_LEN(lo... Open
is duplicated by MDEV-25163 Rowid filter does not process storage... Closed
Relates
relates to MDEV-24581 MDL race condition in trans_rollback_... Closed

 Description   

See below error dump & attached core dump, happens sporadically resulting in dropping of all connections and evacuation of cache taking around 20 seconds to recover. In each case it is a random simple select that causes the issue. All tables are innodb and have been mysqlcheck'd with no issue found. Issue did not occur under 10.4 but coincided with move to 10.5. System is not under excessive load at the time the problems manifests. Problem also was manifest in 10.5.5

2020-12-01 11:11:36 0x7f474137d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329
InnoDB: Failing assertion: UT_LIST_GET_LEN(lock.trx_locks) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
201201 11:11:36 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.5.6-MariaDB-log
key_buffer_size=67108864
read_buffer_size=1048576
max_used_connections=624
max_threads=1252
thread_count=462
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2660907 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f44840b6ef8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f474137cc90 thread_stack 0x49000
Can't start addr2line
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x5581d45b00ae]
/usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x5581d3fb15d7]
/lib64/libpthread.so.0(+0xf630)[0x7f7f13604630]
/lib64/libc.so.6(gsignal+0x37)[0x7f7f1165e387]
/lib64/libc.so.6(abort+0x148)[0x7f7f1165fa78]
/usr/sbin/mariadbd(+0x64c5b7)[0x5581d3c8f5b7]
/usr/sbin/mariadbd(+0xdaa307)[0x5581d43ed307]
/usr/sbin/mariadbd(+0xdaa433)[0x5581d43ed433]
/usr/sbin/mariadbd(+0xdaa714)[0x5581d43ed714]
/usr/sbin/mariadbd(+0xc4c83c)[0x5581d428f83c]
/usr/sbin/mariadbd(+0xc57893)[0x5581d429a893]
/usr/sbin/mariadbd(_ZN7handler16ha_external_lockEP3THDi+0x261)[0x5581d3fbd651]
/usr/sbin/mariadbd(+0xa6ae4a)[0x5581d40ade4a]
/usr/sbin/mariadbd(_Z24mysql_unlock_read_tablesP3THDP13st_mysql_lock+0x72)[0x5581d40ae6d2]
/usr/sbin/mariadbd(_ZN4JOIN9join_freeEv+0x169)[0x5581d3debc89]
/usr/sbin/mariadbd(_ZN4JOIN10exec_innerEv+0xc6c)[0x5581d3e1052c]
/usr/sbin/mariadbd(_ZN4JOIN4execEv+0x33)[0x5581d3e107e3]
/usr/sbin/mariadbd(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x186)[0x5581d3e0e9a6]
/usr/sbin/mariadbd(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1cc)[0x5581d3e0f50c]
/usr/sbin/mariadbd(+0x63340a)[0x5581d3c7640a]
/usr/sbin/mariadbd(_Z21mysql_execute_commandP3THD+0x37f5)[0x5581d3db55d5]
/usr/sbin/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x36a)[0x5581d3db84da]
/usr/sbin/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1bbd)[0x5581d3dba84d]
/usr/sbin/mariadbd(_Z10do_commandP3THD+0x10b)[0x5581d3dbbfeb]
/usr/sbin/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x1b2)[0x5581d3ea4b02]
/usr/sbin/mariadbd(handle_one_connection+0x34)[0x5581d3ea4db4]
/usr/sbin/mariadbd(+0xbb627d)[0x5581d41f927d]
/lib64/libpthread.so.0(+0x7ea5)[0x7f7f135fcea5]
/lib64/libc.so.6(clone+0x6d)[0x7f7f117268dd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f44840e5d90): SELECT * FROM `lookup_permissions_grid` WHERE  `application` IN ('guest')
Connection ID (thread ID): 8019424
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /mnt/db/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        unlimited            unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             1030944              1030944              processes
Max open files            16384                16384                files
Max locked memory         65536                65536                bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       1030944              1030944              signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core
 
2020-12-01 11:12:06 0 [Note] mariadbd: Aria engine: starting recovery
recovered pages: 0% 16% 37% 59% 81% 94% 100% (0.0 seconds); tables to flush: 2 1 0
 (2.5 seconds);
2020-12-01 11:12:08 0 [Note] mariadbd: Aria engine: recovery done
2020-12-01 11:12:08 0 [Warning] The parameter innodb_buffer_pool_instances is deprecated and has no effect.
2020-12-01 11:12:08 0 [Note] InnoDB: Using Linux native AIO
2020-12-01 11:12:08 0 [Note] InnoDB: Uses event mutexes
2020-12-01 11:12:08 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2020-12-01 11:12:08 0 [Note] InnoDB: Number of pools: 1
2020-12-01 11:12:08 0 [Note] InnoDB: Using SSE4.2 crc32 instructions
2020-12-01 11:12:08 0 [Note] InnoDB: Initializing buffer pool, total size = 235149459456, chunk size = 134217728
2020-12-01 11:12:15 0 [Note] InnoDB: Completed initialization of buffer pool
2020-12-01 11:12:15 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-12-01 11:12:15 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=41701133623035
2020-12-01 11:12:16 0 [Note] InnoDB: 13 transaction(s) which must be rolled back or cleaned up in total 1710 row operations to undo
2020-12-01 11:12:16 0 [Note] InnoDB: Trx id counter is 128600878351
2020-12-01 11:12:16 0 [Note] InnoDB: Starting final batch to recover 6393 pages from redo log.
2020-12-01 11:12:17 0 [Note] InnoDB: Last binlog file '/mnt/db/mysqllogs/bin-log.016476', position 903131623
2020-12-01 11:12:17 0 [Note] InnoDB: 128 rollback segments are active.
2020-12-01 11:12:17 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2020-12-01 11:12:17 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2020-12-01 11:12:17 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2020-12-01 11:12:17 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2020-12-01 11:12:17 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2020-12-01 11:12:17 0 [Note] InnoDB: 10.5.6 started; log sequence number 41701133722664; transaction id 128600878353
2020-12-01 11:12:17 0 [Note] InnoDB: Loading buffer pool(s) from /mnt/db/mysql/ib_buffer_pool
2020-12-01 11:12:17 0 [Note] Plugin 'FEEDBACK' is disabled.
2020-12-01 11:12:17 0 [Note] Recovering after a crash using /mnt/db/mysqllogs/bin-log
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878319
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600873770
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878248
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878283
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878247
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600874989
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600876172
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600863931
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878174
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878294
2020-12-01 11:12:17 0 [Note] InnoDB: Rolled back recovered transaction 128600878322
2020-12-01 11:12:18 0 [Note] InnoDB: Rolled back recovered transaction 128600877747
2020-12-01 11:12:18 0 [Note] InnoDB: Rolled back recovered transaction 128600878337
2020-12-01 11:12:18 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2020-12-01 11:12:20 0 [Note] Starting crash recovery...
2020-12-01 11:12:20 0 [Note] Crash recovery finished.
2020-12-01 11:12:20 0 [Note] Server socket created on IP: '::'.
2020-12-01 11:12:20 0 [Note] Reading of all Master_info entries succeeded
2020-12-01 11:12:20 0 [Note] Added new Master_info '' to hash table
2020-12-01 11:12:20 0 [Note] /usr/sbin/mariadbd: ready for connections.
Version: '10.5.6-MariaDB-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2020-12-01 11:12:20 131 [Warning] IP address '10.0.1.58' could not be resolved: Name or service not known
2020-12-01 11:12:22 715 [Warning] IP address '10.0.1.56' could not be resolved: Name or service not known
2020-12-01 11:13:00 23979 [Note] Start binlog_dump to slave_server(2), pos(bin-log.016476, 903133807), using_gtid(1), gtid('1-1-15145546826')
2020-12-01 11:13:00 23982 [Note] Start binlog_dump to slave_server(11), pos(bin-log.016476, 903133807), using_gtid(1), gtid('1-1-15145546826,2-1-3532097557')

mysqld --print-defaults
mysqld would have been started with the following arguments:
--datadir=/mnt/db/mysql --tmpdir=/var/lib/mysqltmp --sql-mode=STRICT_TRANS_TABLES,NO_ENGINE_SUBSTITUTION --userstat=1 --log_warnings=2 --thread_handling=one-thread-per-connection --thread-cache-size=16 --table-open-cache=4096 --table-definition-cache=2048 --query-cache-type=0 --sort-buffer-size=1M --read-buffer-size=1M --read-rnd-buffer-size=1M --join-buffer-size=1M --tmp-table-size=128M --max-heap-table-size=128M --back-log=100 --max-connections=1250 --max-connect-errors=10000 --max-allowed-packet=16M --interactive-timeout=28800 --wait-timeout=28800 --default-storage-engine=InnoDB --innodb=FORCE --key-buffer-size=64M --myisam-sort-buffer-size=128M --innodb-buffer-pool-size=224256M --innodb-log-file-size=100M --innodb-file-per-table=1 --innodb-open-files=4096 --innodb_flush_log_at_trx_commit=0 --innodb_buffer_pool_instances=16 --innodb_flush_method=O_DIRECT --log-output=FILE --slow-query-log=1 --slow-query-log-file=/mnt/db/mysql_slow.log --long-query-time=1 --character-set-server=utf8 --group_concat_max_len=40960 --log-error=/mnt/db/mysqld.log --open-files-limit=65535 --server-id=1 --gtid-domain-id=1 --log-bin=/mnt/db/mysqllogs/bin-log --relay-log-space-limit=16G --expire-logs-days=2 --sync-binlog=0 --core_file --innodb_io_capacity=2000 --innodb_thread_concurrency=0



 Comments   
Comment by Marko Mäkelä [ 2020-12-08 ]

marnixgb, I think that this depends on the table definitions (SHOW CREATE TABLE) and the current statement.

In gdb-core.11003.txt I can see that the assertion fails in an auto-commit transaction. Here is the upper part of the call stack:

mariadb-10.5.6

#10 0x00005581d428f83c in innobase_commit (hton=<optimized out>, thd=0x7f44840b6ef8, commit_trx=<optimized out>) at /usr/src/debug/MariaDB-10.5.6/src_0/storage/innobase/handler/ha_innodb.cc:4036
No locals.
#11 0x00005581d429a893 in ha_innobase::external_lock (this=0x7f45b0019cb0, thd=0x7f44840b6ef8, lock_type=2) at /usr/src/debug/MariaDB-10.5.6/src_0/storage/innobase/handler/ha_innodb.cc:15821
No locals.
#12 0x00005581d3fbd651 in handler::ha_external_lock (this=0x7f45b0019cb0, thd=thd@entry=0x7f44840b6ef8, lock_type=lock_type@entry=2) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/handler.cc:6660
No locals.
#13 0x00005581d40ade4a in ha_external_unlock (thd=0x7f44840b6ef8, this=<optimized out>) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/handler.h:3411
No locals.
#14 unlock_external (thd=0x7f44840b6ef8, table=0x7f44840e7928, count=<optimized out>) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/lock.cc:727
        error = <optimized out>
        error_code = <optimized out>
#15 0x00005581d40ae6d2 in mysql_unlock_read_tables (thd=<optimized out>, sql_lock=0x7f44840e7908) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/lock.cc:494
        i = <optimized out>
        found = 0
        error = 0
        table = <optimized out>
        lock = <optimized out>
#16 0x00005581d3debc89 in JOIN::join_free (this=0x7f44840e7958) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/sql_select.cc:13799
No locals.
#17 0x00005581d3e1052c in JOIN::exec_inner (this=this@entry=0x7f44840e7958) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/sql_select.cc:20222
No locals.
#18 0x00005581d3e107e3 in JOIN::exec (this=0x7f44840e7958) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/sql_select.cc:4231
No locals.
#19 0x00005581d3e0e9a6 in mysql_select (thd=0x7f44840b6ef8, tables=0x7f44840e64f0, fields=<optimized out>, conds=0x7f44840e6e00, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f44840e7930, unit=0x7f44840bad50, select_lex=0x7f44840e5e90) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/sql_select.cc:4655
No locals.
#20 0x00005581d3e0f50c in handle_select (thd=thd@entry=0x7f44840b6ef8, lex=lex@entry=0x7f44840bac88, result=result@entry=0x7f44840e7930, setup_tables_done_option=setup_tables_done_option@entry=0) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/sql_select.cc:429
No locals.
#21 0x00005581d3c7640a in execute_sqlcom_select (thd=thd@entry=0x7f44840b6ef8, all_tables=0x7f44840e64f0) at /usr/src/debug/MariaDB-10.5.6/src_0/sql/sql_parse.cc:6210
        save_protocol = 0x0
        lex = 0x7f44840bac88
        result = 0x7f44840e7930
        res = <optimized out>

Can you please provide more information? You are free to obfuscate the original table and column names and the data, as long as the problem remains reproducible. Ideally, we would like to have a test case in SQL format, starting with CREATE and INSERT.

Comment by Martin Nix [ 2020-12-14 ]

Hello Marko

Excuse my tardy reply, I have been on annual leave for a short time

I wish this was reproducible in the manner that you suggest ! In every case that we have had the problem it is a simple query on well established data, below are the queries where this occurred since the version upgrade that we applied in August

Query (0x7f68180db060): SELECT * FROM `pm_users` WHERE `pm_user_id` IN ('127581115')
Query (0x7f71c0004a70): SELECT * FROM `lookup_permissions_grid` WHERE `application` IN ('feed')
Query (0x7f8e90039240): SELECT * FROM `pm_users` WHERE `pm_user_id` IN ('127129513')
Query (0x7f43a0074710): SELECT * FROM `lookup_permissions_grid` WHERE `application` IN ('guest')
Query (0x7f2fe40525c0): SELECT * FROM `pm_users` WHERE `pm_user_id` IN ('126874727')
Query (0x7faedc010500): SELECT * FROM `site_cashless_providers` WHERE `username` IN ('xyz')
Query (0x7f44840e5d90): SELECT * FROM `lookup_permissions_grid` WHERE `application` IN ('guest')
Query (0x7f66e0047670): SELECT * FROM `pm_users` WHERE `pm_user_id` IN ('128661405')
Query (0x7f76380c3d10): SELECT * FROM `pm_users` WHERE `pm_user_id` IN ('127768191')

The assertion failures that relate to the above queries are :

2020-09-21 09:02:08 0x7f6804e8f700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.5/storage/innobase/trx/trx0trx.cc line 1329
2020-10-06 15:17:28 0x7f71e723a700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.5/storage/innobase/trx/trx0trx.cc line 1329
2020-10-13 10:12:44 0x7f8ef4e1c700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.5/storage/innobase/trx/trx0trx.cc line 1329
2020-10-21 14:47:03 0x7f45b0c42700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.5/storage/innobase/trx/trx0trx.cc line 1329
2020-11-04 14:24:33 0x7f319eee9700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329
2020-11-29 15:56:44 0x7f69d1b24700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/lock/lock0lock.cc line 6760
2020-11-29 15:56:45 0x7f655bb8f700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329
2020-11-29 16:00:03 0x7fb24c429700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329
2020-12-01 11:11:36 0x7f474137d700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329
2020-12-08 16:33:59 0x7f6a9ee53700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329
2020-12-14 11:08:15 0x7f781496c700 InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.5.6/storage/innobase/trx/trx0trx.cc line 1329

In the particular example you have the dump for the query is for :

Query (0x7f44840e5d90): SELECT * FROM `lookup_permissions_grid` WHERE `application` IN ('guest')

To give you some background below are the tables/row counts involved :

pm_users / 2068318
lookup_permissions_grid / 57
site_cashless_providers / 410

Checking back on each of those selects these were all records that were being read that have been established for months - they have not been updated in a long time

The only other hint I can give is that these tables are all very central - they are referenced a great deal, especially reads (as already indicated) - how can I help further ?

Thanks

Martin

Comment by Martin Nix [ 2020-12-15 ]

Sorry - here is the create table :

CREATE TABLE `lookup_permissions_grid` (
`permission_id` int(10) NOT NULL AUTO_INCREMENT,
`application` varchar(40) NOT NULL,
`name` varchar(100) NOT NULL,
`enabled_default` tinyint(1) NOT NULL,
`title` varchar(255) DEFAULT NULL,
`description` text DEFAULT NULL,
`index` int(11) DEFAULT NULL,
PRIMARY KEY (`permission_id`),
KEY `search1` (`application`)
) ENGINE=InnoDB AUTO_INCREMENT=121 DEFAULT CHARSET=utf8

This table contains static reference data that is only changed around 2-3 times a year but as previously stated is read from (a lot) with very simple queries per the example failure

Query (0x7f44840e5d90): SELECT * FROM `lookup_permissions_grid` WHERE `application` IN ('guest')

Comment by Marko Mäkelä [ 2020-12-15 ]

marnixgb, thank you. Is the table being accessed in the same transaction with something else? Could the transaction isolation level be SERIALIZABLE?

The assertion is claiming that the being-committed transaction does not hold any locks (or that all locks should have been released). Reads normally do not acquire any locks inside InnoDB, not even table locks. The table is protected by a metadata lock (MDL) in the SQL layer.

It could be that the transaction object had been corrupted in some way by an earlier operation. In order to improve performance, the transaction objects are kept in a preallocated memory pool.

I hope that you can troubleshoot this a bit further to provide some more clues, because currently I have no ideas.

Comment by Martin Nix [ 2020-12-16 ]

We run tx_isolation as REPEATABLE-READ everywhere and always have done. Switching to SERIALIZABLE is not an option for us as the impact is not desirable but we do run all updates in transactions (unless trivial)

However I think that is probably a red herring - the selects that we get this error on do not relate to updates for those records (if I understand you correctly you are suggesting that the select is being affected by a lock). In the case of this MDEV report : table lookup_permissions_grid is a completely static reference table (never updated outside of a deployment window therefore never locked)

Just a reminder - this issue did not exist for us before we moved to 10.5.5 (from 10.4.12)

Comment by Steve Hunger [ 2021-03-23 ]

Likely the same issue happened on a 3 Node Galera Cluster with MariaDB 10.4.18 on Debian Buster.
The cluster has moderate load and writes are going to one node only.

mariadb_srv1.log
mariadb_srv2.log
mariadb_srv3.log

The second node was the first one that crashed and the third node followed seconds after that.
The second node was also the primary (writing) node at the time.
The first node in the cluster did survive.

Comment by Marko Mäkelä [ 2021-04-13 ]

marnixgb, thank you. It indeed looks like memory corruption. We have not seen this in internal testing. And as always, with such rarely occurring corruption, it is hard to determine the cause. It could be faulty RAM, or supposedly any non-ECC RAM that is vulnerable to things like the RowHammer attack. Maybe it just corrupts with some probability.

It could also be a bug somewhere in MariaDB server (outside the InnoDB storage engine), such as writing via an invalid pointer, possibly due to using freed memory. If such code is present in a rarely executed error handling code path, it could be missed in our internal testing. It is unlikely, because we test a lot with AddressSanitizer.

s.hunger@artworked.de, were you able to rule out a hardware problem? If not, then your bug could be specific to Galera. There have been problems with lock conflict resolution in Galera. Maybe after the recent fix of MDEV-23328, it is more stable.

Comment by Martin Nix [ 2021-04-14 ]

Thanks for the response Marko - much appreciated

I've been a long time MySql/MariaDB admin and had not come across this issue until moving to 10.5 (from 10.4) but things have settled. I have had to stick at 10.5.6 as recent attempts to go beyond that have shown generally poorer/spikier performance for some reason (not something I have seen before ever)

Comment by Steve Hunger [ 2021-04-14 ]

I can rule out a hardware problem. The cluster has ECC RAM and runs stable on 10.4.15, but crashes regularly with 10.4.18. I've decided to wait for the release of 10.4.19 to upgrade from 10.4.15, because of this issue and MDEV-24954. In the end it was very easy to reproduce this problem just by cleanly restarting one node, which crashed at least one other node. But since this is a live cluster, such tests are currently not possible. I'll give feedback when 10.4.19 is released.

Comment by Marko Mäkelä [ 2021-04-26 ]

s.hunger@artworked.de, if you have a staging environment, maybe it would be possible to compile the MariaDB server (mysqld) from the source code using cmake -DWITH_ASAN=ON to see if it would reveal something under your workload. There are very many changes between 10.4.15 and 10.4.18. It would still appear to me that InnoDB is not the culprit but the victim and messenger of memory corruption that was inflicted by some other subsystem.

For what it is worth, upon reading the 10.4 commit messages since the mariadb-10.4.18 release tag, I encountered the fixes of MDEV-25182 and MDEV-22741. However, I would expect the observed problem to be the something else.

Comment by Marko Mäkelä [ 2021-04-26 ]

s.hunger@artworked.de, you mentioned the magic word "cluster". Are you using Galera replication? Just today, someone reported MDEV-25518, which would seem to be related to MDEV-23328.

However, the changes of MDEV-23328 cannot be the culprit for the crash that was reported by marnixgb already for 10.5.6.

Comment by Marko Mäkelä [ 2021-04-26 ]

If ROLLBACK TO SAVEPOINT is being used by the application, a possible cause of this could be MDEV-24581. But it should not be anything that would have been changed recently.

Comment by Steve Hunger [ 2021-04-26 ]

Thank you for your response Marko. Yes, this is a Galera multi master setup and i also think it's more an issue with Galera then with InnoDB. The application is build on Django which handles nested transactions by using savepoints and, in case of an error, rollbacks. Since i posted the comment on this issue, because of the same assertion error, more reports have been created (MDEV-25296, MDEV-25163). So this may be more related to MDEV-25296 than to this issue.

Comment by Martin Nix [ 2021-10-12 ]

We have been racking our brains / command history to get to an understanding on what may be triggering this - reviewing our config changes over time the only indicator we have come across that could be instrumental in triggering the Assertion failure (and subsequent slave issues) is the method of binary logging we use.

For the time that we have had this issue our binary logging has been ROW based - we have switched back to MIXED and the problem has not re-occurred since (of course this may be a coincidence !)

It seems unlikely to me that this truly is an indicator of the actual problem but as it is the only thing we can correlate I thought it was worth mentioning at least in case others may have come across this too

Comment by Roel Van de Paar [ 2021-10-12 ]

marnixgb Thank you very much. It is often little insights or seemingly unimportant details like these that make the difference between reproducibility or not. I have observed this many times, and it is one of the important reasons we automate many things in testing. So far we have been unable to reproduce the issue.

ramesh what binlog format did you use in your testing?

Comment by Ramesh Sivaraman [ 2021-10-13 ]

Roel I had used binlog format ROW to attempt to reproduce the crash

Comment by Lucas Rodrigues [ 2021-10-30 ]

I don't know if I should use the comment field for it. But, I'm hitting the same issue. I update MariaDB from a ancient "10.3.12-MariaDB" (on CentOS 7.6) to "10.6.4-MariaDB" (on Debian 10). That is causing a lot of issues, sometimes MariaDB takes hours to recover from the crash. The new version can't stay online for 1 day without crashing.

The only "pattern" that I notice is: the query is a prepared-statement (stmt) that is been created using `Prepare` from Golang (https://pkg.go.dev/database/sql#DB.Prepare). The query is been using without transactions, but it's updated inside one transaction, I don't use `savepoint`. I'll change my code to use (and re-use) the same transaction. I'm not sure if it will have any effect.

The situation is:
-
"Thread 1" (not use tx):
SELECT .... FROM table WHERE x = 1

"Thread 2" (select without tx, but update it using tx):
SELECT .... FROM table WHERE x = 1

START TRANSACTION
UPDATE table SET .... WHERE x = 1
(many other queries...)

COMMIT/ROLLBACK

"Thread N" (not use tx):
SELECT .... FROM table WHERE x = 1


Sometimes, it fails at `SELECT .... FROM table WHERE x = 1`, crashing everything and takes hours to recover sometimes.

EDIT: The "binlog_format" is MIXED. I'm not using cluster/replicas.

Comment by Lucas Rodrigues [ 2021-10-31 ]

Before any change, only a single query fails. However, now I'm using transactions on the `SELECT .... FROM table WHERE x = 1`. It's read-only, but even then I'm using `BEGIN ... COMMIT` anyway.
It's fix and not fix the issue... Since them, that query isn't the cause of the crash anymore. However, it doesn't fix the issue, because now it's hitting other query, that also doesn't use transaction.

So, that is really odd. Before just one that query fails. Changing that query to use transaction (even if it's a read-only) seems to fix that case. Now, the error is happening on other queries that also doesn't use transaction. I will change all my code to use transaction/commit even for read-only operations, then I give a update here.

Comment by Roel Van de Paar [ 2021-10-31 ]

Thank you inkeliz for adding this information. Adding comments is fine.

Comment by Lucas Rodrigues [ 2021-11-02 ]

Just a quick update, I'm running for 2 days without any crash, since I change everything to transactions. Now, all queries (even read-only) are using transaction (BEGIN ... COMMIT), it sound useless, but fixes the crash in my situation. For any future readers, give a try by changing all queries to use transaction and also switch "binlog_forma" to MIXED (since was reported before too).

Comment by Martin Nix [ 2021-11-11 ]

Further update - after nearly a whole month of reliable running with the binary logging format set to mixed we are still running reliably - reminder this was the only change we have made to our config

I am now going to tempt fate and switch back to row based

Comment by Jan Lindström (Inactive) [ 2021-11-23 ]

Based on table structure this looks like a duplicate of MDEV-24978 where incorrect handling of utf8 charset foreign keys were first introduced and then fixed.

Comment by Martin Nix [ 2021-11-23 ]

Hi Jan

For extra info - regarding your comment two hours ago - we don't use foreign keys anywhere in our DB (we do use utf8 though)

BR

Martin

Comment by Jan Lindström (Inactive) [ 2021-11-23 ]

marnixgb I see, I do not see how Galera could effect SELECTs as they are not replicated and thus they are just normal transactions. But question is what kind of UPDATE/DELETE/INSERT clauses there is. Do you have primary key on all tables? If not, my pointed out problem could effect you as bug was especially on utf8 case. There does not need to be foreign keys.

Comment by Martin Nix [ 2021-11-23 ]

We are not using Galera and in all cases we have primary keys (with additional indexes) - see example table in comment 2020-12-15 09:43

Thanks as ever

Comment by Lucas Rodrigues [ 2021-11-23 ]

Hi,

I don't think it's related. In my case, I don't use Galera (or any other replication). The query that crashes doesn't use UTF-8 at all. It was fixed by using `BEGIN ... QUERY ... COMMIT`, even for read-only operations. I'm running for 3 weeks without crash.

Comment by VAROQUI Stephane [ 2021-12-29 ]

Hi,

Get similar trace on last 10.3

RECORD LOCKS space id 329029 page no 3 n bits 160 index PRIMARY of table `wpmixrgen`.`msg_conversation_user` trx id 35135650 lock_mode X locks rec but not gap waiting
Record lock, heap no 43 PHYSICAL RECORD: n_fields 18; compact format; info bits 0
 0: len 8; hex 000000000000002a; asc        *;;
 1: len 6; hex 000000000000; asc       ;;
 2: len 7; hex 80000000000000; asc        ;;
 3: len 8; hex 000000000000000d; asc         ;;
 4: len 8; hex 0000000000003649; asc       6I;;
 5: len 8; hex 000000000000030b; asc         ;;
 6: len 8; hex 000000000000002a; asc        *;;
 7: len 1; hex 81; asc  ;;
 8: len 5; hex 99ab6ae70e; asc   j  ;;
 9: len 5; hex 8000000000; asc      ;;
 10: len 5; hex 99ab6c971a; asc   l  ;;
 11: len 5; hex 99ab7aa733; asc   z 3;;
 12: len 5; hex 99ab7aa733; asc   z 3;;
 13: len 5; hex 99ab7aa733; asc   z 3;;
 14: len 8; hex 0000000000000000; asc         ;;
 15: len 5; hex 8000000000; asc      ;;
 16: len 5; hex 8000000000; asc      ;;
 17: len 5; hex 8000000000; asc      ;;
 
2021-12-29 11:29:52 733720 [Note] InnoDB: *** WE ROLL BACK TRANSACTION (1)
 
2021-12-29 11:29:52 0x7f7b05310700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.32/storage/innobase/trx/trx0trx.cc line 1301
InnoDB: Failing assertion: UT_LIST_GET_LEN(trx->lock.trx_locks) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
211229 11:29:52 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.3.32-MariaDB-1:10.3.32+maria~focal-log
key_buffer_size=213909504
read_buffer_size=262144
max_used_connections=46
max_threads=65546
thread_count=36
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 84130556 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f7a90036368
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f7b0530fcf0 thread_stack 0x49000
mysqld(my_print_stacktrace+0x32)[0x555d2bcf4c32]

table definition

CREATE TABLE `msg_conversation_user` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `id_msg_conversation` bigint(20) unsigned NOT NULL,
  `id_user` bigint(20) unsigned NOT NULL,
  `id_workplace` bigint(20) unsigned NOT NULL,
  `id_msg_conversation_user_name` bigint(20) unsigned NOT NULL,
  `is_conversation_admin` tinyint(4) NOT NULL DEFAULT 0,
  `added_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `renamed_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `admin_role_udpated_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `check_updates_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `received_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `opened_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `id_user_removed_by` bigint(20) unsigned NOT NULL DEFAULT 0,
  `removed_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `hidden_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `muted_at` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  KEY `id_user` (`id_user`),
  KEY `added_at` (`added_at`),
  KEY `received_at` (`received_at`),
  KEY `opened_at` (`opened_at`),
  KEY `removed_at` (`removed_at`),
  KEY `id_msg_conversation` (`id_msg_conversation`),
  KEY `renamed_at` (`renamed_at`),
  KEY `admin_role_udpated_at` (`admin_role_udpated_at`),
  KEY `check_updates_at` (`check_updates_at`),
  KEY `muted_at` (`muted_at`),
  KEY `hidden_at` (`hidden_at`),
  KEY `id_user_removed_by` (`id_user_removed_by`),
  KEY `id_workplace` (`id_workplace`)
) ENGINE=InnoDB AUTO_INCREMENT=96 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci 

This is a dev server upgraded in release form 2 weeks ago and those tables and workload are new code

cat /proc/version
Linux version 5.4.0-91-generic (buildd@lcy01-amd64-017) (gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04)) #102-Ubuntu SMP Fri Nov 5 16:31:28 UTC 2021

config disable adaptive hash and change buffer it run on a pretty unoptimized ZFS on slow sata raid disques Kernel

+---------------------------------------------+-------------------------------------+
| Variable_name                               | Value                               |
+---------------------------------------------+-------------------------------------+
| innodb_adaptive_flushing                    | ON                                  |
| innodb_adaptive_flushing_lwm                | 10.000000                           |
| innodb_adaptive_hash_index                  | OFF                                 |
| innodb_adaptive_hash_index_parts            | 8                                   |
| innodb_adaptive_max_sleep_delay             | 150000                              |
| innodb_autoextend_increment                 | 64                                  |
| innodb_autoinc_lock_mode                    | 2                                   |
| innodb_background_scrub_data_check_interval | 3600                                |
| innodb_background_scrub_data_compressed     | OFF                                 |
| innodb_background_scrub_data_interval       | 604800                              |
| innodb_background_scrub_data_uncompressed   | OFF                                 |
| innodb_buf_dump_status_frequency            | 0                                   |
| innodb_buffer_pool_chunk_size               | 134217728                           |
| innodb_buffer_pool_dump_at_shutdown         | ON                                  |
| innodb_buffer_pool_dump_now                 | OFF                                 |
| innodb_buffer_pool_dump_pct                 | 25                                  |
| innodb_buffer_pool_filename                 | ib_buffer_pool                      |
| innodb_buffer_pool_instances                | 1                                   |
| innodb_buffer_pool_load_abort               | OFF                                 |
| innodb_buffer_pool_load_at_startup          | ON                                  |
| innodb_buffer_pool_load_now                 | OFF                                 |
| innodb_buffer_pool_size                     | 1342177280                          |
| innodb_change_buffer_max_size               | 25                                  |
| innodb_change_buffering                     | none                                |
| innodb_checksum_algorithm                   | none                                |
| innodb_checksums                            | ON                                  |
| innodb_cmp_per_index_enabled                | OFF                                 |
| innodb_commit_concurrency                   | 0                                   |
| innodb_compression_algorithm                | zlib                                |
| innodb_compression_default                  | OFF                                 |
| innodb_compression_failure_threshold_pct    | 5                                   |
| innodb_compression_level                    | 6                                   |
| innodb_compression_pad_pct_max              | 50                                  |
| innodb_concurrency_tickets                  | 5000                                |
| innodb_data_file_path                       | ibdata1:12M:autoextend              |
| innodb_data_home_dir                        | /var/lib/mysql/.system/innodb       |
| innodb_deadlock_detect                      | ON                                  |
| innodb_default_encryption_key_id            | 1                                   |
| innodb_default_row_format                   | dynamic                             |
| innodb_defragment                           | OFF                                 |
| innodb_defragment_fill_factor               | 0.900000                            |
| innodb_defragment_fill_factor_n_recs        | 20                                  |
| innodb_defragment_frequency                 | 40                                  |
| innodb_defragment_n_pages                   | 7                                   |
| innodb_defragment_stats_accuracy            | 0                                   |
| innodb_disable_sort_file_cache              | OFF                                 |
| innodb_disallow_writes                      | OFF                                 |
| innodb_doublewrite                          | OFF                                 |
| innodb_encrypt_log                          | OFF                                 |
| innodb_encrypt_tables                       | OFF                                 |
| innodb_encrypt_temporary_tables             | OFF                                 |
| innodb_encryption_rotate_key_age            | 1                                   |
| innodb_encryption_rotation_iops             | 100                                 |
| innodb_encryption_threads                   | 0                                   |
| innodb_fast_shutdown                        | 1                                   |
| innodb_fatal_semaphore_wait_threshold       | 600                                 |
| innodb_file_format                          |                                     |
| innodb_file_per_table                       | ON                                  |
| innodb_fill_factor                          | 100                                 |
| innodb_flush_log_at_timeout                 | 1                                   |
| innodb_flush_log_at_trx_commit              | 1                                   |
| innodb_flush_method                         | O_DSYNC                             |
| innodb_flush_neighbors                      | 0                                   |
| innodb_flush_sync                           | ON                                  |
| innodb_flushing_avg_loops                   | 30                                  |
| innodb_force_load_corrupted                 | OFF                                 |
| innodb_force_primary_key                    | ON                                  |
| innodb_force_recovery                       | 0                                   |
| innodb_ft_aux_table                         |                                     |
| innodb_ft_cache_size                        | 8000000                             |
| innodb_ft_enable_diag_print                 | OFF                                 |
| innodb_ft_enable_stopword                   | ON                                  |
| innodb_ft_max_token_size                    | 84                                  |
| innodb_ft_min_token_size                    | 3                                   |
| innodb_ft_num_word_optimize                 | 2000                                |
| innodb_ft_result_cache_limit                | 2000000000                          |
| innodb_ft_server_stopword_table             |                                     |
| innodb_ft_sort_pll_degree                   | 2                                   |
| innodb_ft_total_cache_size                  | 640000000                           |
| innodb_ft_user_stopword_table               |                                     |
| innodb_idle_flush_pct                       | 100                                 |
| innodb_immediate_scrub_data_uncompressed    | OFF                                 |
| innodb_instant_alter_column_allowed         | add_last                            |
| innodb_io_capacity                          | 100                                 |
| innodb_io_capacity_max                      | 300                                 |
| innodb_large_prefix                         |                                     |
| innodb_lock_schedule_algorithm              | fcfs                                |
| innodb_lock_wait_timeout                    | 50                                  |
| innodb_locks_unsafe_for_binlog              | OFF                                 |
| innodb_log_buffer_size                      | 16777216                            |
| innodb_log_checksums                        | ON                                  |
| innodb_log_compressed_pages                 | ON                                  |
| innodb_log_file_size                        | 1073741824                          |
| innodb_log_files_in_group                   | 1                                   |
| innodb_log_group_home_dir                   | /var/lib/mysql/.system/innodb/redo  |
| innodb_log_optimize_ddl                     | OFF                                 |
| innodb_log_write_ahead_size                 | 8192                                |
| innodb_lru_scan_depth                       | 1024                                |
| innodb_max_dirty_pages_pct                  | 40.000000                           |
| innodb_max_dirty_pages_pct_lwm              | 20.000000                           |
| innodb_max_purge_lag                        | 0                                   |
| innodb_max_purge_lag_delay                  | 0                                   |
| innodb_max_purge_lag_wait                   | 4294967295                          |
| innodb_max_undo_log_size                    | 10485760                            |
| innodb_monitor_disable                      |                                     |
| innodb_monitor_enable                       |                                     |
| innodb_monitor_reset                        |                                     |
| innodb_monitor_reset_all                    |                                     |
| innodb_old_blocks_pct                       | 37                                  |
| innodb_old_blocks_time                      | 1000                                |
| innodb_online_alter_log_max_size            | 134217728                           |
| innodb_open_files                           | 16384                               |
| innodb_optimize_fulltext_only               | OFF                                 |
| innodb_page_cleaners                        | 1                                   |
| innodb_page_size                            | 16384                               |
| innodb_prefix_index_cluster_optimization    | OFF                                 |
| innodb_print_all_deadlocks                  | ON                                  |
| innodb_purge_batch_size                     | 300                                 |
| innodb_purge_rseg_truncate_frequency        | 128                                 |
| innodb_purge_threads                        | 4                                   |
| innodb_random_read_ahead                    | OFF                                 |
| innodb_read_ahead_threshold                 | 56                                  |
| innodb_read_io_threads                      | 1                                   |
| innodb_read_only                            | OFF                                 |
| innodb_replication_delay                    | 0                                   |
| innodb_rollback_on_timeout                  | ON                                  |
| innodb_rollback_segments                    | 128                                 |
| innodb_scrub_log                            | OFF                                 |
| innodb_scrub_log_speed                      | 256                                 |
| innodb_sort_buffer_size                     | 1048576                             |
| innodb_spin_wait_delay                      | 4                                   |
| innodb_stats_auto_recalc                    | ON                                  |
| innodb_stats_include_delete_marked          | OFF                                 |
| innodb_stats_method                         | nulls_equal                         |
| innodb_stats_modified_counter               | 0                                   |
| innodb_stats_on_metadata                    | OFF                                 |
| innodb_stats_persistent                     | ON                                  |
| innodb_stats_persistent_sample_pages        | 20                                  |
| innodb_stats_sample_pages                   | 8                                   |
| innodb_stats_traditional                    | ON                                  |
| innodb_stats_transient_sample_pages         | 8                                   |
| innodb_status_output                        | OFF                                 |
| innodb_status_output_locks                  | OFF                                 |
| innodb_strict_mode                          | ON                                  |
| innodb_sync_array_size                      | 1                                   |
| innodb_sync_spin_loops                      | 30                                  |
| innodb_table_locks                          | ON                                  |
| innodb_temp_data_file_path                  | ibtmp1:12M:autoextend               |
| innodb_thread_concurrency                   | 0                                   |
| innodb_thread_sleep_delay                   | 10000                               |
| innodb_tmpdir                               |                                     |
| innodb_undo_directory                       | //var/lib/mysql/.system/innodb/undo |
| innodb_undo_log_truncate                    | OFF                                 |
| innodb_undo_logs                            | 128                                 |
| innodb_undo_tablespaces                     | 0                                   |
| innodb_use_atomic_writes                    | ON                                  |
| innodb_use_native_aio                       | OFF                                 |
| innodb_version                              | 10.3.32                             |
| innodb_write_io_threads                     | 1                                   |
+---------------------------------------------+-------------------------------------+

Syslog

Dec 29 11:28:25 mixr-fr-1 dockerd[2583]: time="2021-12-29T11:28:25.997972509+01:00" level=error msg="stream copy error: read /proc/self/fd/227: file already closed"
Dec 29 11:29:54 mixr-fr-1 kernel: [748575.988934] traps: mysqld[1261883] general protection fault ip:7f7ba4e1d941 sp:7f7b0530c7c0 error:0 in libc-2.31.so[7f7ba4e1d000+178000]

and dmesg

[Wed Dec 22 18:11:12 2021] obr_mixrnet: port 4(vethab23a5d6) entered blocking state
[Wed Dec 22 18:11:12 2021] obr_mixrnet: port 4(vethab23a5d6) entered forwarding state
[Wed Dec 29 11:30:03 2021] traps: mysqld[1261883] general protection fault ip:7f7ba4e1d941 sp:7f7b0530c7c0 error:0 in libc-2.31.so[7f7ba4e1d000+178000]
[Wed Dec 29 11:35:29 2021] obr_mixrnet: port 4(vethab23a5d6) entered disabled state
[Wed Dec 29 11:35:29 2021] device vethab23a5d6 left promiscuous mode

Note that the crash occured in between 2 network issues that does not show up constantly on the server

Comment by VAROQUI Stephane [ 2021-12-29 ]

The last slow query reported before the cash was a drop table and the workoad is doing this every minute in a process of rebuilding a table
I will activate general log and report if it crash again

# # Time: 211224 10:18:38
# User@Host: root[root] @  [10.48.48.146]
# Thread_id: 305282  Schema: wpmixrgen  QC_hit: No
# Query_time: 2.768105  Lock_time: 0.000069  Rows_sent: 0  Rows_examined: 0
# Rows_affected: 0  Bytes_sent: 11
SET timestamp=1640337518;
drop table if exists geo_filter_old;
mysqld, Version: 10.3.32-MariaDB-1:10.3.32+maria~focal-log (mariadb.org binary distribution). started with:
Tcp port: 3306  Unix socket: (null)
Time		    Id Command	Argument
# Time: 211229 11:55:34

the client code is under constant deadlock on this table

explain update `msg_conversation_user` set `check_updates_at` = '2021-12-29 14:35:50' where `id` in (92, 65, 40, 30, 27, 1);
+------+-------------+-----------------------+-------+---------------+---------+---------+------+------+-------------+
| id   | select_type | table                 | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+------+-------------+-----------------------+-------+---------------+---------+---------+------+------+-------------+
|    1 | SIMPLE      | msg_conversation_user | range | PRIMARY       | PRIMARY | 8       | NULL |    6 | Using where |
+------+-------------+-----------------------+-------+---------------+---------+---------+------+------+-------------+ 

and

analyze SELECT     mc.id                     AS id,            0                         AS is_new_conversation,            mc.id_user                AS id_user,            mc.id_workplace           AS id_workplace,            mc.id_organization        AS id_organization,            mc.is_group_conversation  AS is_group_conversation,            mcun.conversation_name    AS user_conversation_name,            mcu.is_conversation_admin AS user_is_conversation_admin,            mcu.added_at              AS user_added_at,            mcu.received_at           AS user_received_at,            mcu.opened_at             AS user_opened_at,            mcu.id_user_removed_by    AS user_removed_by_id_user,            mcu.removed_at            AS user_removed_at,            mcu.muted_at              AS user_muted_at,            mc.created_at             AS created_at,            mc.updated_at             AS updated_at,            mc.deleted_at             AS deleted_at FROM       `msg_conversation`        AS `mc` INNER JOIN `msg_conversation_user`   AS `mcu` ON         `mc`.`id` = `mcu`.`id_msg_conversation` INNER JOIN `msg_conversation_user_name` AS `mcun` ON         `mcu`.`id_msg_conversation_user_name` = `mcun`.`id` WHERE      `mc`.`id` = 14 AND        `mcu`.`id_user` = 13137 ORDER BY   `user_added_at` DESC LIMIT      1 FOR UPDATE\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: mc
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: const
         rows: 1
       r_rows: NULL
     filtered: 100.00
   r_filtered: NULL
        Extra: Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: mcu
         type: index_merge
possible_keys: id_user,id_msg_conversation
          key: id_user,id_msg_conversation
      key_len: 8,8
          ref: NULL
         rows: 1
       r_rows: 1.00
     filtered: 100.00
   r_filtered: 100.00
        Extra: Using intersect(id_user,id_msg_conversation); Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: mcun
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: wpmixrgen.mcu.id_msg_conversation_user_name
         rows: 1
       r_rows: 1.00
     filtered: 100.00
   r_filtered: 100.00
        Extra: Using join buffer (flat, BKA join); Key-ordered scan

Comment by Marko Mäkelä [ 2023-01-19 ]

jplindst, can you please identify the supposed fix of this bug? Note that this affects users who do not use Galera replication. I will reopen MDEV-24035 to keep track of the non-Galera case.

Comment by Jan Lindström (Inactive) [ 2023-01-20 ]

marko My understanding is that Galera problem was fixed on 47ba5523. That fix does not effect only foreign keys as it effects how write set column values are stored. If I remember correctly the actual customer problem we could not reproduce.

Comment by Jan Lindström (Inactive) [ 2023-01-23 ]

If this is reproducible we would need more information. For example query log, show create table for table effected, full stack trace from all threads. Is that select inside a transaction and does that transaction contain UPDATE/DELETE-clauses. We could also when we hit that problematic assertion print more information what those locks are (this requires code changes). Clear is based on reports both standalone InnoDB and Galera cluster are effected. In Galera cluster case in testing environment run using --wsrep-debug=1 (requires debug packages). Is it possible that transaction was already killed or aborted but this fact was ignored on upper layer and execution was continued?

Generated at Thu Feb 08 09:29:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.