[MDEV-28940] SEGV after upgrade to 10.8.3-MariaDB Created: 2022-06-24  Updated: 2022-08-15  Resolved: 2022-08-15

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.8.3
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Muhammad Baqir Assignee: Vladislav Vaintroub
Resolution: Incomplete Votes: 0
Labels: bug, galera, galera_recovery
Environment:

Centos-release-7-9.2009.1.el7.centos.x86_64
MariaDB 10.8.3



 Description   

I've run upgrade on my VPS server and i've got SEGV error and MariaDB won't start again.
Here's the information from my server :

mariadb.err :

2022-06-24 17:46:04 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2022-06-24 17:46:04 0 [Note] InnoDB: Number of transaction pools: 1
2022-06-24 17:46:04 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-06-24 17:46:04 0 [Note] InnoDB: Using Linux native AIO
2022-06-24 17:46:04 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB
2022-06-24 17:46:04 0 [Note] InnoDB: Completed initialization of buffer pool
2022-06-24 17:46:04 0 [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
2022-06-24 17:46:04 0 [Note] InnoDB: 128 rollback segments are active.
2022-06-24 17:46:04 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
2022-06-24 17:46:04 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
2022-06-24 17:46:04 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
2022-06-24 17:46:04 0 [Note] InnoDB: log sequence number 1168229248; transaction id 1287509
2022-06-24 17:46:04 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-06-24 17:46:04 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
2022-06-24 17:46:04 0 [Note] Server socket created on IP: '127.0.0.1'.
2022-06-24 17:46:04 0 [ERROR] mariadbd: Can't create/write to file '/var/run/mariadb/mariadb.pid' (Errcode: 2 "No such file or directory")
2022-06-24 17:46:04 0 [ERROR] Can't start server: can't create PID file: No such file or directory
220624 17:46:04 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.8.3-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467997 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
??:0(my_print_stacktrace)[0x561b63b3822e]
??:0(handle_fatal_signal)[0x561b63597a77]
sigaction.c:0(__restore_rt)[0x7f2af6165630]
??:0(std::__detail::_List_node_base::_M_unhook())[0x7f2af5cb913a]
??:0(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x561b6396d4b4]
??:0(void std::__introsort_loop<unsigned char**, long>(unsigned char**, unsigned char**, long))[0x561b6396dd64]
??:0(tpool::task_group::execute(tpool::task*))[0x561b63ac29d6]
??:0(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x561b63ac1381]
??:0(std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >))[0x7f2af5cff330]
pthread_create.c:0(start_thread)[0x7f2af615dea5]
??:0(__clone)[0x7f2af5678b0d]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             31202                31202                processes 
Max open files            32768                32768                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       31202                31202                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core

journalctl -xe :

– Unit mariadb.service has begun starting up.
Jun 24 17:54:47 vmi813286.contaboserver.net mariadbd[14453]: 2022-06-24 17:54:47 0 [Note] /usr/sbin/mariadbd (server 10.8.3-MariaDB) starting as proc
Jun 24 17:54:47 vmi813286.contaboserver.net kernel: mariadbd[14468]: segfault at 8 ip 00007f4a7d78913a sp 00007f4a49ffac08 error 6 in libstdc++.so.6.
Jun 24 17:54:47 vmi813286.contaboserver.net systemd[1]: mariadb.service: main process exited, code=killed, status=11/SEGV
Jun 24 17:54:47 vmi813286.contaboserver.net systemd[1]: Failed to start MariaDB 10.8.3 database server.
– Subject: Unit mariadb.service has failed
– Defined-By: systemd
– Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel

– Unit mariadb.service has failed.

systemctl status mariadb.service :

mariadb.service - MariaDB 10.8.3 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: failed (Result: exit-code) since Jum 2022-06-24 17:54:52 +08; 5min ago
Docs: man:mariadbd(8)
https://mariadb.com/kb/en/library/systemd/
Process: 14619 ExecStart=/usr/sbin/mariadbd $MYSQLD_OPTS $_WSREP_NEW_CLUSTER $_WSREP_START_POSITION (code=exited, status=1/FAILURE)
Process: 14591 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`cd /usr/bin/..; /usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-environment _WSREP_START_POSITION=$VAR || exit 1 (code=exited, status=0/SUCCESS)
Process: 14589 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
Main PID: 14619 (code=exited, status=1/FAILURE)



 Comments   
Comment by Muhammad Baqir [ 2022-06-24 ]

Solved by downgrading Centos 7 kernel, this is the affected kernel information from my vps :

yum list kernel

Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile

  • base: asi-fs-s.contabo.net
  • epel: mirrors.thzhost.com
  • extras: mirror.vodien.com
  • remi-php74: mirrors.thzhost.com
  • remi-safe: mirrors.thzhost.com
  • updates: asi-fs-s.contabo.net
    Installed Packages
    kernel.x86_64 3.10.0-1160.el7 @anaconda
    kernel.x86_64 3.10.0-1160.59.1.el7 @updates
    kernel.x86_64 3.10.0-1160.62.1.el7 @updates
    kernel.x86_64 3.10.0-1160.66.1.el7 @updates
Comment by Sergei Golubchik [ 2022-07-04 ]

Your main problem is this error message:

2022-06-24 17:46:04 0 [ERROR] mariadbd: Can't create/write to file '/var/run/mariadb/mariadb.pid' (Errcode: 2 "No such file or directory")

May be you didn't have /var/run/mariadb/ or perhaps selinux or something didn't allow to write to it.

The crash is clearly a bug that we need to fix, but it happened after the server failed to create a file and was aborting the startup. When we fix the crash, your server will (or would, on the new kernel) still fail to start with the same error.

Comment by Sergei Golubchik [ 2022-07-04 ]

wlad, please, take a look, may be you'll be able to see why such an early abort would cause the thread pool to crash.

Comment by Vladislav Vaintroub [ 2022-07-04 ]

serg, it is not an early startup, It is very late in the startup sequence, after the server is already accessible via network. Innodb recovery is probably already done at that stage.

The stacktrace is partially damaged, tpool::task_group::execute would usually execute a callback function, which Innodb provides. It would definitely not execute any introsort_loop. If I try to start the server with invalid path in --pid-file, it ends with

2022-07-04 21:30:43 0 [ERROR] mysqld: Can't create/write to file '/mnt/c/aaa/bla' (Errcode: 2 "No such file or directory")
2022-07-04 21:30:43 0 [ERROR] Can't start server: can't create PID file: No such file or directory

and no signs of crash.
Thus , it would be helpful to get all threads stacktraces, as described https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#analyzing-a-core-file-with-gdb-on-linux . If I interpret the error log snippet correctly ("Writing a core file...", core file was written , probably into /var/lib/mysql.
semprul57, would it be possible to produce the "all threads stacktrace" from the core file , as described in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#analyzing-a-core-file-with-gdb-on-linux
Thank you!

Comment by Muhammad Baqir [ 2022-07-12 ]

Sorry for late reply, this is production server and we couldn't afford downtime to reproduce this bug again.

Is there any chance to reproduce full stack trace without downtime?

Comment by Vladislav Vaintroub [ 2022-07-12 ]

Yes, there is a chance. There is probably a core file in /var/lib/mysql . If there is, then you just follow the instructions in https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#analyzing-a-core-file-with-gdb-on-linux, that describe how to debug the core file, rather than the active process.

Generated at Thu Feb 08 10:04:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.