[MDEV-16586] Dead-lock in MYSQL_BIN_LOG::reset_logs(..) ? Created: 2018-06-26  Updated: 2020-01-06  Resolved: 2020-01-06

Status: Closed
Project: MariaDB Server
Component/s: Replication, Server
Affects Version/s: 10.3.7, 10.3.8, 10.3.10, 10.3.18
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: David Kedves Assignee: Sachin Setiya (Inactive)
Resolution: Incomplete Votes: 0
Labels: binlog_checkpoint, need_feedback
Environment:

Ubuntu 14.04 (I used an LXC container), CentOS 6.10 Cloud Image (qemu)


Attachments: Text File backtrace.txt     Text File bt.txt     File my.cnf     Text File my.cnf.txt     File mysqld.log     Text File packages.txt     Text File processlist.txt     Text File processlist.txt     Text File psaux.txt     Text File pstree.txt     Text File reproduced_again.txt     Text File rpmqa.txt    
Issue Links:
Relates
relates to MDEV-7402 'reset master' hangs, waits for signa... Closed

 Description   

I've installed MariaDB 10.3.7 and initialized with my own configs (from previous setups), set a root user.. etc, then I have executed

/usr/bin/mysql -NAB -u'root' -p'!password$$' -e "RESET MASTER"

And this command seems to be stuck since hours (actually it is almost half a day since now...).

I have attached my mysqld config, logs and the backtrace of mysqld too, it seems to be waiting for a mutex ... possibly dead-lock internally?

Actually there are two issues here:
1) Server gets stuck in executing 'RESET MASTER'
( The GDB backtrace suggests it is waiting for a mutex, possible dead-lock? )
2) Client never times out? how is that?
( I still have this running since yesterday ~15:16 or something like that... )

Unfortunetely mariadb doesn't provide -dbg packages (or i just don't see them) so i couldn't provide better backtrace atm...



 Comments   
Comment by David Kedves [ 2018-06-26 ]

NOTE: I've reproduced this so-far only once... will retry if it is reproducible (also i'll check if affects other distros as well..)

Comment by David Kedves [ 2018-06-26 ]

reproduced_again.txt

Okay, it seems the issue is only reproducible on Ubuntu 14.04, at least this did not happen in other OS-es to me (CentOS 7, Ubuntu16.04, Debian 8 and 9)
Attached a new backtrace, it shows the exact same locking problem.

Comment by David Kedves [ 2018-07-24 ]

Hey guys, this is still reproducible also with 10.3.8, any news there?

{{
root@replication-ubuntu1404-node1:~# ps aux | grep NAB
root 6800 0.0 0.6 38588 7140 pts/0 S+ Jul23 0:00 /usr/bin/mysql -NAB -uroot -px xxxxxxxxx -e RESET MASTER
root 9738 0.0 0.1 13420 1728 ? S+ 07:47 0:00 grep --color=auto NAB
root@replication-ubuntu1404-node1:~# mysql -uroot -p'!password$$'
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 82
Server version: 10.3.8-MariaDB-1:10.3.8+maria~trusty-log mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show full processlist
-> ;
---------------------------------------------------------------------------------------------

Id User Host db Command Time State Info Progress

---------------------------------------------------------------------------------------------

1 system user   NULL Daemon NULL InnoDB purge worker NULL 0.000
2 system user   NULL Daemon NULL InnoDB purge worker NULL 0.000
3 system user   NULL Daemon NULL InnoDB purge worker NULL 0.000
4 system user   NULL Daemon NULL InnoDB purge coordinator NULL 0.000
5 system user   NULL Daemon NULL InnoDB shutdown handler NULL 0.000
80 root localhost NULL Query 65763 Init RESET MASTER 0.000
82 root localhost NULL Query 0 Init show full processlist 0.000

---------------------------------------------------------------------------------------------
7 rows in set (0.000 sec)

MariaDB [(none)]> Bye
}}
Bactrace shows the same stuff:

{{
Thread 4 (Thread 0x7f6f86a52700 (LWP 5664)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x000055dac1249038 in ?? ()
#2 0x000055dac1256a18 in MYSQL_BIN_LOG::reset_logs(THD*, bool, rpl_gtid*, unsigned int, unsigned long)
()
#3 0x000055dac0fb9954 in reset_master(THD*, rpl_gtid*, unsigned int, unsigned long) ()
#4 0x000055dac108cdd8 in reload_acl_and_cache(THD*, unsigned long long, TABLE_LIST*, int*) ()
#5 0x000055dac0f966d9 in mysql_execute_command(THD*) ()
#6 0x000055dac0f9e562 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) ()
#7 0x000055dac0fa1068 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) ()
#8 0x000055dac0fa1efe in do_command(THD*) ()
#9 0x000055dac107262a in do_handle_one_connection(CONNECT*) ()
#10 0x000055dac107274d in handle_one_connection ()
#11 0x00007f6f9867d184 in start_thread (arg=0x7f6f86a52700) at pthread_create.c:312
#12 0x00007f6f97da003d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
}}

Comment by David Kedves [ 2018-10-16 ]

Update: this still seems to happen with MariaDB 10.3.10 as well

mariadb-server-10.3 1:10.3.10+maria~trusty

Bactrace still similar:

Thread 5 (Thread 0x7f500c1eb700 (LWP 5251)):
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1 0x00005574563c6978 in ?? ()
#2 0x00005574563d42d0 in MYSQL_BIN_LOG::reset_logs(THD*, bool, rpl_gtid*, unsigned int, unsigned long)
()
#3 0x0000557456139354 in reset_master(THD*, rpl_gtid*, unsigned int, unsigned long) ()
#4 0x000055745620dce6 in reload_acl_and_cache(THD*, unsigned long long, TABLE_LIST*, int*) ()
#5 0x0000557456115e88 in mysql_execute_command(THD*) ()
#6 0x000055745611dd12 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) ()
#7 0x0000557456120918 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) ()
#8 0x00005574561217ae in do_command(THD*) ()
#9 0x00005574561f350a in do_handle_one_connection(CONNECT*) ()
#10 0x00005574561f362d in handle_one_connection ()
#11 0x00007f500f62e184 in start_thread (arg=0x7f500c1eb700) at pthread_create.c:312
#12 0x00007f500ed5103d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

And MySQL CLI and mysqld seems to stuck :-S

Comment by Sujatha Sivakumar (Inactive) [ 2019-05-31 ]

Hello David,

Thanks for reporting this issue. It would be great if you can provide a backtrace which includes line numbers and filenames, for all the thread. Can please try installing the debug info packages and provide backtrace during the hang scenario.

https://mariadb.com/kb/en/library/yum/#installing-debug-info-packages-with-yum

Comment by David Kedves [ 2019-10-02 ]

Hi,

bt.txt my.cnf.txt processlist.txt psaux.txt pstree.txt rpmqa.txt

It just happened to me again, this time on CentOS6 (on a qemu cloud image). It didn't "freeze" now on reset logs but on a create user command after that, however in processlist i still see the "reset master" in initializing state.. and i guess this makes the "create user" to be deadlocked somehow when mysqld tries to write into the binlogs... (just guessing..) anyhow, this time i could capture a backtrace with line numbers and everything, please see the new set of attachment to this comment.

Thanks, David

Comment by Sachin Setiya (Inactive) [ 2019-10-06 ]

Hi kedazo

Thanks for providing backtrace , Can you provide mysqld.log file also ?, In your earlier mysqld.log file

2018-06-25 15:20:03 73 [Note] Semi-sync replication initialized for transactions.
2018-06-25 15:20:03 73 [Note] Semi-sync replication enabled on the master.
2018-06-25 15:20:03 0 [Note] Starting ack receiver thread
2018-06-25 15:20:06 79 [Note] Semi-sync replication switched OFF.
2018-06-25 15:20:06 79 [Note] Semi-sync replication disabled on the master.

So is this server master ?, or there is no replication involved at all ?

Comment by David Kedves [ 2019-10-06 ]

Sorry I have destroyed this VM already... well it is a single instance, so that is right, there is no replication involved at all.
I will go and reproduce this tomorrow (it is easily reproducible with this my.cnf, it happens always), I will attach the log file then.

Comment by Sachin Setiya (Inactive) [ 2019-10-06 ]

Hi kedazo,

Thanks ,
I have 3 more questions
1. does server have some data or it was completely new instance
2. If you have some data then , then i think you are , issuing reset master after loading of data.
3. is it possible for you to upload the data on mariadb private server. So that i can simulate it?
https://mariadb.com/kb/en/meta/mariadb-ftp-server/

Comment by Andrei Elkin [ 2019-12-02 ]

The latest bt attachment https://jira.mariadb.org/secure/attachment/45808/backtrace.txt (thanks, kedazo) contains symbols and suggests Thread 4 "RESET MASTER" is waiting for the end of binlog checkpoint request while holing a mutex lock which Thread 2 is waiting for. It's no a deadlock between the two. And it's unclear about the reason of binlog checkpoint response delay or loss. There's no other thread that keeps a running transaction that might be blamed for the delay.

Overall, I think we need some more specific instructions to how to reproduce.

Generated at Thu Feb 08 08:30:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.