[MDEV-19918] Server hangs or crashes while trying to lock mutex when the mutex was already locked upon startup with server_audit and orphan records in mysql.plugin Created: 2019-07-01  Updated: 2021-09-30  Resolved: 2020-07-26

Status: Closed
Project: MariaDB Server
Component/s: Plugin - Audit, Plugins
Affects Version/s: 10.4.7, 10.4.8, 10.4.11, 10.4
Fix Version/s: 10.4.14, 10.5.5

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Alexey Botchkov
Resolution: Fixed Votes: 3
Labels: None

Issue Links:
Blocks
Duplicate
duplicates MDEV-20442 Failed to upgrade from 10.3.9 to 10.4.7 Closed
duplicates MDEV-25864 After upgrading from 10.2.14 to 10.4.... Closed
Relates
relates to MDEV-19917 Spider doesn't remove itself from mys... Closed
relates to MDEV-22406 Galera node freezes when query_respon... Open
relates to MDEV-21211 plugin loading error deadlock with se... Closed

 Description   

When the server is started with --plugin-load-add=server_audit and it has records in mysql.plugin referring non-existent libraries, it hangs on a release build or crashes on a debug build.

10.4 9e0ed0fa

Trying to lock mutex at /data/src/10.4/sql/sql_plugin.cc, line 2399, when the mutex was alread
y locked at /data/src/10.4/sql/sql_plugin.cc, line 1854 in thread T@1
190701 20:58:06 [ERROR] mysqld got signal 6 ;
 
#5  0x00007f1adb13042a in __GI_abort () at abort.c:89
#6  0x00005620a7381932 in safe_mutex_lock (mp=0x5620a80e95e0 <LOCK_plugin>, my_flags=0, file=0x5620a747d330 "/data/src/10.4/sql/sql_plugin.cc", line=2399) at /data/src/10.4/mysys/thr_mutex.c:265
#7  0x00005620a6889ef5 in inline_mysql_mutex_lock (that=0x5620a80e95e0 <LOCK_plugin>, src_file=0x5620a747d330 "/data/src/10.4/sql/sql_plugin.cc", src_line=2399) at /data/src/10.4/include/mysql/psi/mysql_thread.h:710
#8  0x00005620a6890a99 in plugin_foreach_with_mask (thd=0x5620ab294120, func=0x5620a69e269f <acquire_plugins(THD*, plugin_ref, void*)>, type=5, state_mask=8, arg=0x7ffee2db8700) at /data/src/10.4/sql/sql_plugin.cc:2399
#9  0x00005620a69e28a1 in mysql_audit_acquire_plugins (thd=0x5620ab294120, event_class_mask=0x7ffee2db8700) at /data/src/10.4/sql/sql_audit.cc:120
#10 0x00005620a69e2f19 in mysql_audit_notify (thd=0x5620ab294120, event_class=0, event=0x7ffee2db8770) at /data/src/10.4/sql/sql_audit.cc:417
#11 0x00005620a67fd6d7 in mysql_audit_general (thd=0x5620ab294120, event_subtype=1, error_code=1126, msg=0x7ffee2db8fb0 "Can't open shared library '/data/bld/10.4-debug/lib/plugin/bar.so' (errno: 0, cannot open shared object file: No such file or directory)") at /data/src/10.4/sql/sql_audit.h:191
#12 0x00005620a68004e1 in THD::raise_condition (this=0x5620ab294120, sql_errno=1126, sqlstate=0x5620a74935dd "HY000", level=Sql_state_errno_level::WARN_LEVEL_ERROR, ucid=..., msg=0x7ffee2db8fb0 "Can't open shared library '/data/bld/10.4-debug/lib/plugin/bar.so' (errno: 0, cannot open shared object file: No such file or directory)") at /data/src/10.4/sql/sql_class.cc:1076
#13 0x00005620a674e464 in THD::raise_condition (this=0x5620ab294120, sql_errno=1126, sqlstate=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR, msg=0x7ffee2db8fb0 "Can't open shared library '/data/bld/10.4-debug/lib/plugin/bar.so' (errno: 0, cannot open shared object file: No such file or directory)") at /data/src/10.4/sql/sql_class.h:4354
#14 0x00005620a67413ff in my_message_sql (error=1126, str=0x7ffee2db8fb0 "Can't open shared library '/data/bld/10.4-debug/lib/plugin/bar.so' (errno: 0, cannot open shared object file: No such file or directory)", MyFlags=64) at /data/src/10.4/sql/mysqld.cc:3371
#15 0x00005620a736d933 in my_error (nr=1126, MyFlags=64) at /data/src/10.4/mysys/my_error.c:125
#16 0x00005620a688b4ce in plugin_dl_add (dl=0x7ffee2db9b50, MyFlags=64) at /data/src/10.4/sql/sql_plugin.cc:753
#17 0x00005620a688c4ff in plugin_add (tmp_root=0x7ffee2dba5c0, if_not_exists=false, name=0x7ffee2db9b40, dl=0x7ffee2db9b50, MyFlags=64) at /data/src/10.4/sql/sql_plugin.cc:1100
#18 0x00005620a688ed03 in plugin_load (tmp_root=0x7ffee2dba5c0) at /data/src/10.4/sql/sql_plugin.cc:1855
#19 0x00005620a688e641 in plugin_init (argc=0x5620a80e5fd8 <remaining_argc>, argv=0x5620aa5f3a48, flags=0) at /data/src/10.4/sql/sql_plugin.cc:1734
#20 0x00005620a6744cce in init_server_components () at /data/src/10.4/sql/mysqld.cc:5202
#21 0x00005620a6745e99 in mysqld_main (argc=12, argv=0x5620aa5f3a48) at /data/src/10.4/sql/mysqld.cc:5725
#22 0x00005620a673a730 in main (argc=12, argv=0x7ffee2dbb008) at /data/src/10.4/sql/main.cc:25

The test case below is artificial, but there is a quite real-life scenario which represents the problem.

  • on Debian/Ubuntu, install mariadb-server and mariadb-plugin-spider packages;
  • add plugin-load-add=server_audit at the end of /etc/mysql/my.cnf;
  • uninstall mariadb-plugin-spider
  • restart the service => it hangs upon restart

Run the test case below with --plugin-load-add=server_audit
It seems important, installing it via INSTALL SONAME doesn't cause the same effect.

insert into mysql.plugin values ('foo','bar.so');
 
--source include/restart_mysqld.inc
 
# Cleanup
delete from mysql.plugin where name = 'foo';

Couldn't reproduce on 10.2-10.3.



 Comments   
Comment by Valerii Kravchuk [ 2019-08-10 ]

There is another, more "natural" way to reproduce similar hang.

1. INSTALL audit plugin "manually":

openxs@ao756:~/dbs/maria10.4$ bin/mysqld_safe --no-defaults --port=3308 --socket=/tmp/mariadb.sock &
[1] 8424
openxs@ao756:~/dbs/maria10.4$ 190810 15:53:56 mysqld_safe Logging to '/home/openxs/dbs/maria10.4/data/ao756.err'.
190810 15:53:56 mysqld_safe Starting mysqld daemon with databases from /home/openxs/dbs/maria10.4/data
 
openxs@ao756:~/dbs/maria10.4$ bin/mysql -uroot --socket=/tmp/mariadb.sock Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.4.8-MariaDB Source distribution
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> select * from mysql.plugin;
Empty set (0,001 sec)
 
MariaDB [(none)]> INSTALL SONAME 'server_audit';
Query OK, 0 rows affected (0,049 sec)
 
MariaDB [(none)]> select * from mysql.plugin;
+--------------+-----------------+
| name | dl |
+--------------+-----------------+
| SERVER_AUDIT | server_audit.so |
+--------------+-----------------+
1 row in set (0,000 sec)
 
MariaDB [(none)]> shutdown;
Query OK, 0 rows affected (0,000 sec)
 
MariaDB [(none)]> exit
Bye
[1]+ Done bin/mysqld_safe --no-defaults --port=3308 --socket=/tmp/mariadb.sock

2. Restart server adding plugin_,load-add to my.cnf or command line:

openxs@ao756:~/dbs/maria10.4$ bin/mysqld_safe --no-defaults --port=3308 --socket=/tmp/mariadb.sock --plugin_load_add=server_audit --server_audit=FORCE_PLUS_PERMANENT --server-audit-logging=1 &
[1] 8861
openxs@ao756:~/dbs/maria10.4$ 190810 15:54:31 mysqld_safe Logging to '/home/openxs/dbs/maria10.4/data/ao756.err'.
190810 15:54:31 mysqld_safe Starting mysqld daemon with databases from /home/openxs/dbs/maria10.4/data
 
openxs@ao756:~/dbs/maria10.4$ bin/mysql -uroot --socket=/tmp/mariadb.sock ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/tmp/mariadb.sock' (2)
openxs@ao756:~/dbs/maria10.4$ tail data/ao756.err
Version: '10.4.8-MariaDB' socket: '/tmp/mariadb.sock' port: 3308 Source distribution
190810 15:54:06 server_audit: MariaDB Audit Plugin version 1.4.8 STARTED.
190810 15:54:06 server_audit: Query cache is enabled with the TABLE events. Some table reads can be veiled.2019-08-10 15:54:18 0 [Note] /home/openxs/dbs/maria10.4/bin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
2019-08-10 15:54:18 0 [Note] Event Scheduler: Purging the queue. 0 events
190810 15:54:18 server_audit: STOPPED
2019-08-10 15:54:18 0 [Note] /home/openxs/dbs/maria10.4/bin/mysqld: Shutdown complete
 
2019-08-10 15:54:32 0 [Note] Plugin 'FEEDBACK' is disabled.
190810 15:54:32 server_audit: MariaDB Audit Plugin version 1.4.8 STARTED.
190810 15:54:32 server_audit: Query cache is enabled with the TABLE events. Some table reads can be veiled.
190810 15:54:32 server_audit: logging started to the file server_audit.log.
openxs@ao756:~/dbs/maria10.4$

So, it hangs.

3. Attach gdb and see this:

openxs@ao756:~/dbs/maria10.4$ ps aux | grep 10.4 | grep mysqld
openxs 8949 0.0 1.4 470040 56780 pts/21 Sl 15:54 0:00 /home/openxs/dbs/maria10.4/bin/mysqld --no-defaults --basedir=/home/openxs/dbs/maria10.4 --datadir=/home/openxs/dbs/maria10.4/data --plugin-dir=/home/openxs/dbs/maria10.4/lib/plugin --plugin_load_add=server_audit --server_audit=FORCE_PLUS_PERMANENT --server-audit-logging=1 --log-error=/home/openxs/dbs/maria10.4/data/ao756.err --pid-file=ao756.pid --socket=/tmp/mariadb.sock --port=3308
openxs@ao756:~/dbs/maria10.4$ sudo gdb -p 8949
[sudo] password for openxs:
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 8949
[New LWP 8950]
[New LWP 8951]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
__lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135 ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x00007fcefc35defe in __GI___pthread_mutex_lock (
mutex=mutex@entry=0x561661eb7ce0 <LOCK_plugin>)
at ../nptl/pthread_mutex_lock.c:135
#2 0x00005616610b73c4 in inline_mysql_mutex_lock (
src_file=0x5616615db9d0 "/home/openxs/git/server/sql/sql_plugin.cc",
src_line=2405, that=0x561661eb7ce0 <LOCK_plugin>)
at /home/openxs/git/server/include/mysql/psi/mysql_thread.h:712
#3 plugin_foreach_with_mask (thd=thd@entry=0x561662d97a88,
func=func@entry=0x56166117df80 <acquire_plugins(THD*, plugin_ref, void*)>,
type=type@entry=5, state_mask=state_mask@entry=8,
arg=arg@entry=0x7fff8ed7dc00)
at /home/openxs/git/server/sql/sql_plugin.cc:2405
#4 0x000056166117e035 in mysql_audit_acquire_plugins (thd=0x561662d97a88,
event_class_mask=0x7fff8ed7dc00)
at /home/openxs/git/server/sql/sql_audit.cc:120
#5 0x000056166117e52f in mysql_audit_notify (thd=thd@entry=0x561662d97a88,
event_class=event_class@entry=0, event=event@entry=0x7fff8ed7dc60)
at /home/openxs/git/server/sql/sql_audit.cc:433
#6 0x000056166105d477 in mysql_audit_general (event_subtype=1,
msg=0x7fff8ed7e180 "Plugin 'SERVER_AUDIT' already installed",
error_code=1968, thd=0x561662d97a88)
at /home/openxs/git/server/sql/sql_audit.h:192
---Type <return> to continue, or q <return> to quit---
#7 THD::raise_condition (this=this@entry=0x561662d97a88,
sql_errno=sql_errno@entry=1968, sqlstate=0x5616615e31d8 "HY000",
sqlstate@entry=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR,
ucid=...,
msg=msg@entry=0x7fff8ed7e180 "Plugin 'SERVER_AUDIT' already installed")
at /home/openxs/git/server/sql/sql_class.cc:1078
#8 0x0000561660fed554 in THD::raise_condition (
msg=0x7fff8ed7e180 "Plugin 'SERVER_AUDIT' already installed",
level=<optimized out>, sqlstate=0x0, sql_errno=1968, this=0x561662d97a88)
at /home/openxs/git/server/sql/sql_class.h:4355
#9 my_message_sql (error=1968,
str=0x7fff8ed7e180 "Plugin 'SERVER_AUDIT' already installed", MyFlags=64)
at /home/openxs/git/server/sql/mysqld.cc:3371
#10 0x000056166151dc5f in my_error (nr=nr@entry=1968, MyFlags=MyFlags@entry=64)
at /home/openxs/git/server/mysys/my_error.c:125
#11 0x00005616610b538a in plugin_add (tmp_root=tmp_root@entry=0x7fff8ed7f060,
if_not_exists=if_not_exists@entry=false, name=name@entry=0x7fff8ed7e6c0,
dl=dl@entry=0x7fff8ed7e6d0, MyFlags=MyFlags@entry=64)
at /home/openxs/git/server/sql/sql_plugin.cc:1095
#12 0x00005616610b5aae in plugin_load (tmp_root=tmp_root@entry=0x7fff8ed7f060)
at /home/openxs/git/server/sql/sql_plugin.cc:1859
#13 0x00005616610ba3dd in plugin_init (
argc=argc@entry=0x561661eb5188 <remaining_argc>, argv=0x561662c96e50,
---Type <return> to continue, or q <return> to quit---
flags=0) at /home/openxs/git/server/sql/sql_plugin.cc:1735
#14 0x0000561660ff1b19 in init_server_components ()
at /home/openxs/git/server/sql/mysqld.cc:5203
#15 0x0000561660ff835a in mysqld_main (argc=12, argv=0x561662c96e50)
at /home/openxs/git/server/sql/mysqld.cc:5726
#16 0x00007fcefb505830 in __libc_start_main (
main=0x561660fd5a00 <main(int, char**)>, argc=12, argv=0x7fff8ed7fc78,
init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>,
stack_end=0x7fff8ed7fc68) at ../csu/libc-start.c:291
#17 0x0000561660fec199 in _start ()
(gdb)

Comment by Sergei Golubchik [ 2019-09-25 ]

No, https://github.com/mariadb/server/commit/4c63b87036da79e7f416a94f4dc2240fd5139ee4 is a quite a dirty hack, please don't push that.

In fact, there is already one fix for crashes like that, in sql_plugin.cc:

  /*
    Pre-acquire audit plugins for events that may potentially occur
    during [UN]INSTALL PLUGIN.
 
    When audit event is triggered, audit subsystem acquires interested
    plugins by walking through plugin list. Evidently plugin list
    iterator protects plugin list by acquiring LOCK_plugin, see
    plugin_foreach_with_mask().
 
    On the other hand [UN]INSTALL PLUGIN is acquiring LOCK_plugin
    rather for a long time.
 
    When audit event is triggered during [UN]INSTALL PLUGIN, plugin
    list iterator acquires the same lock (within the same thread)
    second time.
 
    This hack should be removed when LOCK_plugin is fixed so it
    protects only what it supposed to protect.
 
    See also mysql_uninstall_plugin() and initialize_audit_plugin()
  */
  if (mysql_audit_general_enabled())
    mysql_audit_acquire_plugins(thd, event_class_mask);

(which is a hack too, as the comment admits). But it only works if there's a current THD. So one possible fix for this MDEV would be to create a dummy THD for the initial loading of plugins. In fact, there is such a dummy THD, created in plugin_load(). Just making use of it to preacquire audit plugins should fix this crash.

But a much better and only slightly more complex (if at all) solution would be to get rid of LOCK_plugin completely and to switch to RCU approach for plugin data structures. It means something like

  • there's plugin_mem_root already, put plugin hashes and st_plugin_int structures into this memroot, not DYNAMIC ARRAY
  • create a new structure:

    struct plugin_registry_st {
      DYNAMIC_ARRAY plugin_dl_array, plugin_array;
      static HASH plugin_hash[MYSQL_MAX_PLUGIN_TYPE_NUM];
    };
    

  • although I suspect that DYNAMIC_ARRAYs won't be needed anymore, and an array of hashes will suffice.
  • and a global pointer struct plugin_registry_st *plugin_registry
  • all code that reads plugin structures uses plugin_registry without any mutexes
  • INSTALL/UNINSTALL code copies the complete registry. Modifies the copy. Then uses CAS to atomically install the new registry
  • optionally — serialize INSTALL/UNINSTALL with a mutex to avoid two threads doing this RCU trick in parallel (it won't break anything, but will result in a wasted space in memroot)

this should fix all current and future concurrency problems with LOCK_plugin, and will allow to remove this hack, quoted above, and another one, in plugin_foreach_with_mask(), simplifying all plugin handling code

Comment by Roel Van de Paar [ 2020-03-06 ]

Is MDEV-21873 related (or a duplicate)?

Comment by Nick (Inactive) [ 2020-03-26 ]

From my understanding, MDEV-21873 is related and the current cause of this issue for some customers.

Comment by Volker Klasen [ 2020-03-31 ]

I've got the same issue, but not with audit plugin. I installed auth_pam first using "INSTALL SONAME 'auth_pam';" and then by adding "plugin_load_add = auth_pam" to my.cnf and restarting the server. Strace shows the following:

openat(AT_FDCWD, "./mysql/plugin.MAD", O_RDWR|O_CLOEXEC) = 26
getpid()                                = 608
access("./mysql/plugin.TRG", F_OK)      = -1 ENOENT (No such file or directory)
pread64(26, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0) = 8192
pread64(26, "\1\0\0\236\371\1\0\1\1\377\312\37\0\3\0\3\v\0pamauth_pam.so"..., 8192, 8192) = 8192
futex(0x5646e36e8020, FUTEX_WAIT_PRIVATE, 2, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
futex(0x5646e36e8020, FUTEX_WAIT_PRIVATE, 2, NULL

If I start the server without plugin_load_add, then I can execute "UNINSTALL SONAME 'auth_pam';" and restart the server with plugin_load_add again.

Cheers,
Volker

Comment by Dan Livingston [ 2020-04-06 ]

I am seeing this behavior on 10.4.12-6 as well.

In my case I see it on first startup (before adding any other plugins) when I fail to not comment out the plugin% parameters in mariadb-enterprise.cnf. I see the same sort of output as above:

pread64(16, "\376\376\t\3\f\2\1\300\0\327\0\207\0\357\0\1\0\0\1\0\0\0\3\3\0\0\1\371\1\0\0\235"..., 448, 0) = 448
open("./mysql/plugin.MAD", O_RDWR|O_CLOEXEC) = 14
access("./mysql/plugin.TRG", F_OK)      = -1 ENOENT (No such file or directory)
pread64(14, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192, 0) = 8192
pread64(14, "\1\0\0y\32\10\0\201\1\377\312\37\0\3\0\v\16\0unix_socketaut"..., 8192, 8192) = 8192
futex(0x563a0e474810, FUTEX_WAIT_PRIVATE, 2, NULL

Comment by Nilnandan Joshi [ 2020-07-21 ]

Agree with last comment, its happening with 10.4.12-6.

In my case, it was with unix_socket plugin. In MariaDB 10.4.3 and later, the unix_socket authentication plugin is installed by default. But if you are upgrading server from 10.3, the entry for that plugin will be in mysql.plugin table.

MariaDB [(none)]> select * from mysql.plugin;
+-----------------------+--------------------------+
| name                  | dl                       |
+-----------------------+--------------------------+
| unix_socket           | auth_socket.so           |

If we'll try to start 10.4 with audit_plugin and above entry , it will be getting hanged at

2020-07-20 22:05:37 0 [Note] InnoDB: Buffer pool(s) load completed at 200720 22:05:37

But if we remove that unix_socket entry from mysql.plugin and then restart the server, its working.

Comment by Alexey Botchkov [ 2020-07-26 ]

https://github.com/MariaDB/server/commit/4968fdbcef1c2d62ed525dffac153cf9f8eb1913

Generated at Thu Feb 08 08:55:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.