[MDEV-23684] MariaDB 10.3 / 10.4 / 10.5 crashing and restarting intermittently - status=11/SEGV Created: 2020-09-07  Updated: 2021-08-12  Resolved: 2020-11-02

Status: Closed
Project: MariaDB Server
Component/s: Optimizer - CTE
Affects Version/s: 10.3.24, 10.4.14, 10.5.5
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Carlos Oliveira Assignee: Daniel Black
Resolution: Incomplete Votes: 0
Labels: crash, need_feedback
Environment:

Ubuntu 18.04.4 LTS, Ubuntu 18.04.5 LTS, running on virtual machines (vmware and VirtualBox) and AWS EC2 instances


Attachments: File core.10268.gz     PNG File screenshot-1.png     PNG File screenshot-2.png     Zip Archive syslog.zip     Zip Archive test_mariadb_10-3-24_MDEV-23684.zip    
Issue Links:
Relates
relates to MDEV-17239 change default max_recursive_iteratio... Closed

 Description   

Hello,

After I installed a new server with MariaDB 10.4 or 10.5, I am facing on ubuntu syslog the "systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV" being reported a lot of times.

Then applications loss the DB connections and some transactions are broken. I tried to install a new server, MariaDB, and restore a existing mysql dump file, however the same error happened.

My system: - AWS EC2 running ubuntu 18.04 instances; - MariaDB 10.4.14 / 10.5 (error on both versions); - Client applications (nodejs using mysql or mariadb dependency);

My findings so far is that only on MariaDB 10.4.13 (using a not-upgraded developer server and another one using AWS RDS MariaDB) the error does not happen.

All other tentatives, I tried existing and new servers with MariaDB 10.4.14 and 10.5.x, the erratic behavior was the same (service crash and restart with status=11/SEGV).

I am attaching my syslog and coredump files.

PS: I also recorded the error on community KB:

https://mariadb.com/kb/en/mariadb-104-105-crashing-and-restarting-intermittently-status11segv/

Best regards,



 Comments   
Comment by Carlos Oliveira [ 2020-09-14 ]

I installed a new server for testing and the same error happened with MariaDB 10.3.24.

After additional testing, I found a "WITH RECURSIVE" SQL statement, where server process had crashed with the "status=11/SEGV".

The server crash behavior happened even out of application runtime (nodejs with mysql/mariadb modules), using dbForge client to reproduce it.

Please find attached new screenshots and the SQL statement where I reproduced the crash.

Comment by Daniel Black [ 2020-09-29 ]

Edit: no longer required

What information would be useful is journalctl -u mariadb.service -n 50 which hopefully contains the full crash information.

The recursive CTE that generated this is quite massive. Can a reduced version generate the crash?

Can you include for each table in the CTE:

SHOW CREATE TABLE {tablename};
SHOW INDEX IN {tablename;

and paste or attach your cnf file(s)?
If any of the above are views rather than base tables, then please also include the same output for the underlying tables.

Even better if you can provide a data dump from those tables (that is, if it's not strictly confidential).

If you can do so but don't want to share it publicly, you can upload it to ftp.askmonty.org/private (https://mariadb.com/kb/en/meta/mariadb-ftp-server/).

Comment by Daniel Black [ 2020-09-30 ]

I pulled out your core dump and found the following backtrace.

I suspect being this deep you may have just run out of stack. Maybe the RECURSIVE CTE didn't actually have a propper bounding.

https://mariadb.com/kb/en/server-system-variables/#max_recursive_iterations can help prevent these from crashing the server.

MDEV-17239 suggests changing its default to 1000 or so like Oracle MySQL

(gdb) 
(gdb) bt
#0  0x00005629ee50cb3d in sub_select(JOIN*, st_join_table*, bool) ()
#1  0x00005629ee52e02e in JOIN::exec_inner() ()
#2  0x00005629ee52e403 in JOIN::exec() ()
#3  0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#4  0x00005629ee495991 in ?? ()
#5  0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#6  0x00005629ee50c9a8 in st_join_table::preread_init() ()
#7  0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#8  0x00005629ee52e02e in JOIN::exec_inner() ()
#9  0x00005629ee52e403 in JOIN::exec() ()
#10 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#11 0x00005629ee495991 in ?? ()
#12 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#13 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#14 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#15 0x00005629ee52e02e in JOIN::exec_inner() ()
#16 0x00005629ee52e403 in JOIN::exec() ()
#17 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#18 0x00005629ee495991 in ?? ()
#19 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#20 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#21 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#22 0x00005629ee52e02e in JOIN::exec_inner() ()
#23 0x00005629ee52e403 in JOIN::exec() ()
#24 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#25 0x00005629ee495991 in ?? ()
#26 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#27 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#28 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#29 0x00005629ee52e02e in JOIN::exec_inner() ()
#30 0x00005629ee52e403 in JOIN::exec() ()
#31 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#32 0x00005629ee495991 in ?? ()
#33 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#34 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#35 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#36 0x00005629ee52e02e in JOIN::exec_inner() ()
#37 0x00005629ee52e403 in JOIN::exec() ()
#38 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#39 0x00005629ee495991 in ?? ()
#40 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#41 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#42 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#43 0x00005629ee52e02e in JOIN::exec_inner() ()
#44 0x00005629ee52e403 in JOIN::exec() ()
#45 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#46 0x00005629ee495991 in ?? ()
#47 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#48 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#49 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
....
#2059 0x00005629ee52e02e in JOIN::exec_inner() ()
#2060 0x00005629ee52e403 in JOIN::exec() ()
#2061 0x00005629ee573bdc in st_select_lex_unit::exec_recursive() ()
#2062 0x00005629ee49572e in TABLE_LIST::fill_recursive(THD*) ()
#2063 0x00005629ee495c3f in ?? ()
#2064 0x00005629ee4955da in mysql_handle_single_derived(LEX*, TABLE_LIST*, unsigned int) ()
#2065 0x00005629ee50c9a8 in st_join_table::preread_init() ()
#2066 0x00005629ee50cb42 in sub_select(JOIN*, st_join_table*, bool) ()
#2067 0x00005629ee4fece8 in ?? ()
#2068 0x00005629ee50cc02 in sub_select(JOIN*, st_join_table*, bool) ()
#2069 0x00005629ee52e02e in JOIN::exec_inner() ()
#2070 0x00005629ee52e403 in JOIN::exec() ()
#2071 0x00005629ee5734ac in st_select_lex_unit::exec() ()
#2072 0x00005629ee575862 in mysql_union(THD*, LEX*, select_result*, st_select_lex_unit*, unsigned long) ()
#2073 0x00005629ee52d093 in handle_select(THD*, LEX*, select_result*, unsigned long) ()
#2074 0x00005629ee4c9ad1 in ?? ()
#2075 0x00005629ee4d1c3a in mysql_execute_command(THD*) ()
#2076 0x00005629ee4d908a in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) ()
#2077 0x00005629ee4db4e5 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) ()
#2078 0x00005629ee4dcc64 in do_command(THD*) ()
#2079 0x00005629ee5b9c3e in do_handle_one_connection(CONNECT*) ()
#2080 0x00005629ee5b9cfd in handle_one_connection ()
#2081 0x00007faa89a936db in start_thread (arg=0x7faa8a82c700) at pthread_create.c:463
#2082 0x00007faa884b5a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Daniel Black [ 2020-09-30 ]

start of cte

WITH RECURSIVE hierarc AS (SELECT acGrGroup.ID_GROUP_MEMBERSHIP, 
        acGrGroup.ID_GROUP_PARENT, acGrGroup.ID_GROUP, acGrGroup.ID_ROLE, 
        acGrGroup.ID_USER 
    FROM FIT_ACCESS_GROUP_MEMBERSHIP acGrGroup 
    WHERE acGrGroup.ID_DOMAIN = 308 UNION ALL 
    SELECT acGrGroup2.ID_GROUP_MEMBERSHIP, acGrGroup2.ID_GROUP_PARENT, 
        acGrGroup2.ID_GROUP, acGrGroup2.ID_ROLE, acGrGroup2.ID_USER 
    FROM FIT_ACCESS_GROUP_MEMBERSHIP acGrGroup2, hierarc 
    WHERE acGrGroup2.ID_GROUP_PARENT = hierarc.ID_GROUP) 
SELECT ...

Based on this and the stack depth, I think you've either got a cyclic parent hierarcy, or its just very very deep.

10.5 has a 'CYCLE' option that prevents cyclic paths https://mariadb.com/kb/en/with/

If its just very very deep - increase your thread_stack system variable.

Can you check your data and see if this is the case? Don't worry about the information I asked for yesterday.

Comment by Carlos Oliveira [ 2021-08-12 ]

As latest update:

1) Same server (on-prem), same database, after vanilla upgrade do MariaDB 10.6 release, the "crash" error is not happening anymore.

2) Using same database dump, running on AWS RDS Mariab 10.4, the "crash" error does not happen.

Generated at Thu Feb 08 09:24:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.