[MDEV-31886] Server fails to shut down storage engine plugins when thread_stack is set to the smallest negative value Created: 2023-08-10  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Minor
Reporter: Ramesh Sivaraman Assignee: Oleksandr Byelkin
Resolution: Unresolved Votes: 0
Labels: rr-profile-analyzed

Issue Links:
Relates
relates to MDEV-22930 Unnecessary contention on rw_lock_lis... Closed

 Description   

# mysqld options required for replay: --thread-stack=-5000

The crash is happening only on 10.4 build.

Leads to

10.4.31 161ce045a71e306768d4609bdc35788fa5ea2a71 (Debug)

mariadbd: /test/10.4_dbg/storage/innobase/sync/sync0debug.cc:1506: CreateTracker::~CreateTracker(): Assertion `m_files.empty()' failed.

10.4.31 161ce045a71e306768d4609bdc35788fa5ea2a71 (Debug)

Core was generated by `/test/MD090823-mariadb-10.4.31-linux-x86_64-dbg/bin/mariadbd --no-defaults --co'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x152141918900 (LWP 4028037))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000152141af1859 in __GI_abort () at abort.c:79
#2  0x0000152141af1729 in __assert_fail_base (fmt=0x152141c87588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x559a854316d5 "m_files.empty()", file=0x559a854078f0 "/test/10.4_dbg/storage/innobase/sync/sync0debug.cc", line=1506, function=<optimized out>) at assert.c:92
#3  0x0000152141b02fd6 in __GI___assert_fail (assertion=assertion@entry=0x559a854316d5 "m_files.empty()", file=file@entry=0x559a854078f0 "/test/10.4_dbg/storage/innobase/sync/sync0debug.cc", line=line@entry=1506, function=function@entry=0x559a85408a58 "CreateTracker::~CreateTracker()") at assert.c:101
#4  0x0000559a84da9613 in CreateTracker::~CreateTracker (this=0x559a862caac0 <create_tracker>, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/stl_tree.h:1043
#5  0x0000152141b158a7 in __run_exit_handlers (status=status@entry=1, listp=0x152141cbb718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#6  0x0000152141b15a60 in __GI_exit (status=status@entry=1) at exit.c:139
#7  0x0000559a844ef6ae in start_signal_handler () at /test/10.4_dbg/sql/mysqld.cc:3176
#8  mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.4_dbg/sql/mysqld.cc:5802
#9  0x0000559a844e3b06 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.4_dbg/sql/main.cc:25



 Comments   
Comment by Marko Mäkelä [ 2023-08-10 ]

Please provide a valid rr replay trace for this. I suspect that there is a memory overrun somewhere (likely outside InnoDB), causing srv_sys_space.m_space_id=0 to be overwritten with ~0ULL. The fil_system.sys_space and fil_system.temp_space in the core dump that you provided look reasonable to me.

Comment by Marko Mäkelä [ 2023-08-10 ]

The CreateTracker was removed in MariaDB Server 10.5.5 by MDEV-22930.

Comment by Marko Mäkelä [ 2023-08-10 ]

Thank you, I finally got a working trace of this.

The assertion failure basically complains that create_tracker.register_latch() has been invoked without corresponding create_tracker.deregister_latch(). In the trace that I am looking at, there are 315 elements in create_tracker.m_files. The last one was registered here:

#4  CreateTracker::register_latch (line=103, filename=0x55716fef5888 "/test/10.4_dbg/storage/innobase/btr/btr0defragment.cc", ptr=0x557170da83f0 <btr_defragment_mutex+16>, this=0x557170da5ac0 <create_tracker>)
    at /test/10.4_dbg/storage/innobase/sync/sync0debug.cc:1530
#5  sync_file_created_register (ptr=ptr@entry=0x557170da83f0 <btr_defragment_mutex+16>, filename=filename@entry=0x55716fef5888 "/test/10.4_dbg/storage/innobase/btr/btr0defragment.cc", line=line@entry=103)
    at /test/10.4_dbg/storage/innobase/sync/sync0debug.cc:1635
#6  0x000055716f6cb710 in GenericPolicy<TTASEventMutex<GenericPolicy> >::init (line=103, filename=0x55716fef5888 "/test/10.4_dbg/storage/innobase/btr/btr0defragment.cc", id=LATCH_ID_BTR_DEFRAGMENT_MUTEX, 
    this=0x557170da83f0 <btr_defragment_mutex+16>) at /test/10.4_dbg/storage/innobase/include/sync0types.h:428
#7  PolicyMutex<TTASEventMutex<GenericPolicy> >::init (line=103, filename=0x55716fef5888 "/test/10.4_dbg/storage/innobase/btr/btr0defragment.cc", id=LATCH_ID_BTR_DEFRAGMENT_MUTEX, 
    this=0x557170da83e0 <btr_defragment_mutex>) at /test/10.4_dbg/storage/innobase/include/ib0mutex.h:662
#8  mutex_init<PolicyMutex<TTASEventMutex<GenericPolicy> > > (mutex=mutex@entry=0x557170da83e0 <btr_defragment_mutex>, id=id@entry=LATCH_ID_BTR_DEFRAGMENT_MUTEX, 
    file_name=file_name@entry=0x55716fef5888 "/test/10.4_dbg/storage/innobase/btr/btr0defragment.cc", line=line@entry=103) at /test/10.4_dbg/storage/innobase/include/ut0mutex.h:176
#9  0x000055716f94a4d9 in btr_defragment_init () at /test/10.4_dbg/storage/innobase/btr/btr0defragment.cc:103
#10 0x000055716f86fd3e in srv_start (create_new_db=<optimized out>) at /test/10.4_dbg/storage/innobase/srv/srv0start.cc:2423
#11 0x000055716f64f552 in innodb_init (p=<optimized out>) at /test/10.4_dbg/storage/innobase/handler/ha_innodb.cc:4365
#12 0x000055716f32527c in ha_initialize_handlerton (plugin=0x55717203e250) at /test/10.4_dbg/sql/handler.cc:581
#13 0x000055716f0b8355 in plugin_do_initialize (plugin=plugin@entry=0x55717203e250, state=@0x7ffc3106f23c: 4) at /test/10.4_dbg/sql/sql_plugin.cc:1429
#14 0x000055716f0bdcbe in plugin_initialize (tmp_root=tmp_root@entry=0x7ffc3106f6e0, plugin=0x55717203e250, argc=argc@entry=0x55717052d368 <remaining_argc>, argv=argv@entry=0x557172000660, 
    options_only=options_only@entry=false) at /test/10.4_dbg/sql/sql_plugin.cc:1482
#15 0x000055716f0bf13b in plugin_init (argc=argc@entry=0x55717052d368 <remaining_argc>, argv=<optimized out>, flags=0) at /test/10.4_dbg/sql/sql_plugin.cc:1730
#16 0x000055716efc41db in init_server_components () at /test/10.4_dbg/sql/mysqld.cc:5233
#17 0x000055716efc9acc in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.4_dbg/sql/mysqld.cc:5772
#18 0x000055716efbeb06 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.4_dbg/sql/main.cc:25

The reason for this check failing is that an orderly shutdown of all plugins is being skipped here:

#0  __run_exit_handlers (status=status@entry=1, listp=0x49dd4a22a718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:40
#1  0x000049dd4a084a60 in __GI_exit (status=status@entry=1) at exit.c:139
#2  0x000055716efca6ae in start_signal_handler () at /test/10.4_dbg/sql/mysqld.cc:3176
#3  mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.4_dbg/sql/mysqld.cc:5802
#4  0x000055716efbeb06 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.4_dbg/sql/main.cc:25

In MariaDB Server 10.5 or later, there may not be any assertion failure, but I think it is still wrong to call exit() without making an effort to shut down the storage engine plugins. I have seen unireg_abort() being invoked in similar cases. It would seem to do the correct thing. ramesh, can you test this patch?

diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index c110baa9634..34397b068c4 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -3173,7 +3173,7 @@ static void start_signal_handler(void)
   {
     sql_print_error("Can't create interrupt-thread (error %d, errno: %d)",
 		    error,errno);
-    exit(1);
+    unireg_abort(1);
   }
   mysql_cond_wait(&COND_start_thread, &LOCK_start_thread);
   mysql_mutex_unlock(&LOCK_start_thread);

Possibly there are other such calls that need to be replaced.

Comment by Ramesh Sivaraman [ 2023-08-10 ]

After applying the patch the server shows {{safe_mutex: Trying to destroy a mutex LOCK_start_thread }} message.

2023-08-10 15:42:24 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
2023-08-10 15:42:24 0 [Warning] Can't create handle_manager thread (errno: 22 "Invalid argument")
2023-08-10 15:42:24 0 [ERROR] Can't open and lock privilege tables: Thread stack overrun:  3776 bytes used of a -5120 byte stack, and 81920 bytes needed.  Use 'mysqld --thread_stack=#' to specify a bigger stack
2023-08-10 15:42:24 0 [Note] Server socket created on IP: '::'.
2023-08-10 15:42:24 0 [ERROR] Can't create interrupt-thread (error 22, errno: 2)
2023-08-10 15:42:24 0 [ERROR] Aborting
safe_mutex: Trying to destroy a mutex LOCK_start_thread that was locked at /test/10.4_dbg/sql/mysqld.cc, line 3169 at /test/10.4_dbg/sql/mysqld.cc, line 2107

Comment by Ramesh Sivaraman [ 2023-08-10 ]

marko As discussed, after applying following patch it shows memory issue

diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index c110baa9634..f646752e539 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -3173,7 +3173,8 @@ static void start_signal_handler(void)
   {
     sql_print_error("Can't create interrupt-thread (error %d, errno: %d)",
 		    error,errno);
-    exit(1);
+    mysql_mutex_unlock(&LOCK_start_thread);
+    unireg_abort(1);
   }
   mysql_cond_wait(&COND_start_thread, &LOCK_start_thread);
   mysql_mutex_unlock(&LOCK_start_thread);

2023-08-10 16:05:44 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
2023-08-10 16:05:44 0 [Warning] Can't create handle_manager thread (errno: 22 "Invalid argument")
2023-08-10 16:05:44 0 [ERROR] Can't open and lock privilege tables: Thread stack overrun:  3776 bytes used of a -5120 byte stack, and 81920 bytes needed.  Use 'mysqld --thread_stack=#' to specify a bigger stack
2023-08-10 16:05:44 0 [Note] Server socket created on IP: '::'.
2023-08-10 16:05:44 0 [ERROR] Can't create interrupt-thread (error 22, errno: 0)
2023-08-10 16:05:44 0 [ERROR] Aborting
Warning: Memory not freed: 4328

Generated at Thu Feb 08 10:27:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.