[MDEV-30598] Mariadb crashes Created: 2023-02-07  Updated: 2023-05-02  Resolved: 2023-05-02

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.6.12
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Łukasz Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Environment:

docker


Issue Links:
Relates
relates to MDEV-30397 InnoDB crash due to DB_FAIL reported ... Closed

 Description   

I upgraded yesterday mariadb from 10.5 to 10.6, and it corrupted my data.
I had backup, so I restored and everything seemed fine, however certain queries are causing it to crash.

Temporarily I went back to 10.5 and everything is working fine.

I don't have much except this, because I had to restore environment as quickly as possible.

2023-02-07 11:45:01 164 [ERROR] [FATAL] InnoDB: Unknown error Failed, retry may succeed
230207 11:45:01 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary                                            
or one of the libraries it was linked against is corrupt, improperly built,                                          
or misconfigured. This error can also be caused by malfunctioning hardware.                                                                                                                                                               
To report this bug, see https://mariadb.com/kb/en/reporting-bugs                                                     
                                                                                                                     
We will try our best to scrape up some info that will hopefully help                                                 
diagnose the problem, but since we have already crashed,                                                             
something is definitely wrong and this may fail.                                                                     
                                                                                                                     
Server version: 10.6.12-MariaDB-1:10.6.12+maria~ubu2004-log source revision: 4c79e15cc3716f69c044d4287ad2160da8101cdc
key_buffer_size=134217728                                                                                            
read_buffer_size=131072
max_used_connections=5
max_threads=153
thread_count=5
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467967 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f03c8003958
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f043fed2d58 thread_stack 0x49000
Printing to addr2line failed
mariadbd(my_print_stacktrace+0x32)[0x555a74d3d0c2]
mariadbd(handle_fatal_signal+0x485)[0x555a74804c55]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f0469ee0420]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7f04699e400b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7f04699c3859]
mariadbd(+0x66f1df)[0x555a744861df]
mariadbd(+0x661815)[0x555a74478815]
idbd(+0xdacf5d)[0x555a74bc3f5d]
mariadbd(+0xce832d)[0x555a74aff32d]
mariadbd(_ZN7handler18ha_index_next_sameEPhPKhj+0x275)[0x555a7480db75]
mariadbd(+0x8041e5)[0x555a7461b1e5]
mariadbd(_Z10sub_selectP4JOINP13st_join_tableb+0x1e5)[0x555a7460da25]
mariadbd(+0x7e3b8c)[0x555a745fab8c]
mariadbd(_Z10sub_selectP4JOINP13st_join_tableb+0x1a6)[0x555a7460d9e6]
mariadbd(_ZN4JOIN10exec_innerEv+0xf67)[0x555a7463be77]
mariadbd(_ZN4JOIN4execEv+0x39)[0x555a7463c229]
mariadbd(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_uni
tP13st_select_lex+0x10a)[0x555a7463a36a]
mariadbd(_Z13handle_selectP3THDP3LEXP13select_resultm+0x157)[0x555a7463ab27]
mariadbd(+0x7b1581)[0x555a745c8581]
mariadbd(_Z21mysql_execute_commandP3THDb+0x4600)[0x555a745d6bb0]
mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0x1e7)[0x555a745c2f67]
mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0x10a5)[0x555a745cf635]
mariadbd(_Z10do_commandP3THDb+0x13e)[0x555a745d173e]
mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x3b7)[0x555a746e4727]
mariadbd(handle_one_connection+0x5d)[0x555a746e4a7d]
mariadbd(+0xc36766)[0x555a74a4d766]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609)[0x7f0469ed4609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7f0469ac0133]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f03c8014bc0): SELECT `user_id` FROM `oc_filecache` `f` INNER JOIN `oc_mounts` `m` ON `storage_id` = `storag
e` WHERE (`size` < 0) AND (`parent` > -1) LIMIT 1
 
Connection ID (thread ID): 164
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_mer
ge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys
=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_ma
tch_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_wi
th_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimi
nation=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materia
lized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       79863                79863                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                     
Max realtime priority     0                    0                     
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/lib/systemd/systemd-coredump %P %u %g %s %t 9223372036854775808 %h
 
Kernel version: Linux version 5.15.0-58-generic (buildd@lcy02-amd64-101) (gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0, 
GNU ld (GNU Binutils for Ubuntu) 2.38) #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023
 
Fatal signal 11 while backtracing



 Comments   
Comment by Alice Sherepa [ 2023-02-07 ]

Could you please add the output of SHOW CREATE TABLE oc_filecache; SHOW CREATE TABLE oc_mounts;

Comment by Łukasz [ 2023-02-07 ]

I am running now 10.5 and 10.6 simultaneously, production on 10.5, since 10.6 crashes and output of SQLs you provided is from 10.6

| Table        | Create Table
 
| oc_filecache | CREATE TABLE `oc_filecache` (
  `fileid` bigint(20) NOT NULL AUTO_INCREMENT,
  `storage` bigint(20) NOT NULL DEFAULT 0,
  `path` varchar(4000) DEFAULT NULL,
  `path_hash` varchar(32) NOT NULL DEFAULT '',
  `parent` bigint(20) NOT NULL DEFAULT 0,
  `name` varchar(250) DEFAULT NULL,
  `mimetype` bigint(20) NOT NULL DEFAULT 0,
  `mimepart` bigint(20) NOT NULL DEFAULT 0,
  `size` bigint(20) NOT NULL DEFAULT 0,
  `mtime` bigint(20) NOT NULL DEFAULT 0,
  `storage_mtime` bigint(20) NOT NULL DEFAULT 0,
  `encrypted` int(11) NOT NULL DEFAULT 0,
  `unencrypted_size` bigint(20) NOT NULL DEFAULT 0,
  `etag` varchar(40) DEFAULT NULL,
  `permissions` int(11) DEFAULT 0,
  `checksum` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`fileid`),
  UNIQUE KEY `fs_storage_path_hash` (`storage`,`path_hash`),
  KEY `fs_parent_name_hash` (`parent`,`name`),
  KEY `fs_storage_mimetype` (`storage`,`mimetype`),
  KEY `fs_storage_mimepart` (`storage`,`mimepart`),
  KEY `fs_storage_size` (`storage`,`size`,`fileid`),
  KEY `fs_mtime` (`mtime`),
  KEY `fs_size` (`size`),
  KEY `fs_id_storage_size` (`fileid`,`storage`,`size`),
  KEY `fs_storage_path_prefix` (`storage`,`path`(64))
) ENGINE=InnoDB AUTO_INCREMENT=6610847 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED |
 
 
| oc_mounts | CREATE TABLE `oc_mounts` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `storage_id` bigint(20) NOT NULL,
  `root_id` bigint(20) NOT NULL,
  `user_id` varchar(64) NOT NULL,
  `mount_point` varchar(4000) NOT NULL,
  `mount_id` bigint(20) DEFAULT NULL,
  `mount_provider_class` varchar(128) DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `mounts_user_root_index` (`user_id`,`root_id`),
  KEY `mounts_storage_index` (`storage_id`),
  KEY `mounts_root_index` (`root_id`),
  KEY `mounts_mount_id_index` (`mount_id`),
  KEY `mount_user_storage` (`storage_id`,`user_id`),
  KEY `mounts_class_index` (`mount_provider_class`)
) ENGINE=InnoDB AUTO_INCREMENT=30205 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin ROW_FORMAT=COMPRESSED |

Comment by Alice Sherepa [ 2023-02-07 ]

Is the crash repeatable if you run the same query again on 10.6? Could you please also attach .cnf file(s)? And tables are not corrupted now, right? could you please run ANALYZE for them and also EXPLAIN EXTENDED for the query ?

Comment by Marko Mäkelä [ 2023-02-08 ]

This looks like a possible duplicate of MDEV-30397.

Comment by Łukasz [ 2023-02-11 ]

cnf file:

[server]
skip_name_resolve = 1
innodb_buffer_pool_size = 128M
innodb_buffer_pool_instances = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 32M
innodb_max_dirty_pages_pct = 90
query_cache_type = 1
query_cache_limit = 2M
query_cache_min_res_unit = 2k
query_cache_size = 64M
tmp_table_size= 64M
max_heap_table_size= 64M
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 1
 
[client]
default-character-set = utf8mb4
 
[mysqld]
character_set_server = utf8mb4
collation_server = utf8mb4_general_ci
transaction_isolation = READ-COMMITTED
binlog_format = ROW
innodb_large_prefix=on
innodb_file_format=barracuda
innodb_file_per_table=1

I think this is reproducible when there is more load on the database. When I redirect traffic then the problem starts to occurring, however, running queries manually triggers no error, unfortunately.

Comment by Marko Mäkelä [ 2023-02-16 ]

A fully resolved stack trace of a crash would help find out the cause of the corruption. I see that ROW_FORMAT=COMPRESSED is involved. The fix of MDEV-30397 should avoid an InnoDB crash upon encountering this corruption. With that fix, the operation that attempts to access the corrupted page should still fail.

Running the statement

ALTER TABLE oc_mounts ROW_FORMAT=DEFAULT;

should rebuild the table to not use ROW_FORMAT=COMPRESSED at all. If the server does not crash during that operation, the corruption should affect a secondary index page or a non-leaf page.

Comment by Elena Stepanova [ 2023-03-27 ]

dolohow, would you be able to provide a fully resolved stack trace as requested above?

Generated at Thu Feb 08 10:17:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.