[MDEV-27127] slave hung at shutdown Created: 2021-11-26  Updated: 2022-08-31

Status: Open
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.4.17, 10.4.18, 10.4.19, 10.4.20, 10.4.21, 10.4.22
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Jozef Kováč Assignee: Brandon Nesterenko
Resolution: Unresolved Votes: 0
Labels: hang, shutdown
Environment:

Oracle Linux 8.4
Mariadb binary distribution x86_64 for systemd


Attachments: File mariadb_broken_shutdown.log     File mariadb_shutdown_gdb.log     File my.cnf    
Issue Links:
Relates
relates to MDEV-28948 FLUSH BINARY LOGS waits/hangs on mysq... Closed

 Description   

Mariadb stuck(hang) at shutdown approximately 1/3 times with active slave connection, execuiting STOP ALL SLAVES before shutdown helps.

Configuration and strace added to attachment.

Shutdown from error_log (i eventually killed with -9, nothing else works, you can wait for hours)
Nov 26 13:49:34 x1 systemd[1]: Stopping MariaDB 10.4.x database server...
Nov 26 13:49:34 x1 mysqld[3164320]: 2021-11-26 13:49:34 0 [Note] /usr/local/mysql/bin/mysqld (initiated by: unknown): Normal shutdown
Nov 26 13:49:34 x1 mysqld[3164320]: 2021-11-26 13:49:34 0 [Note] Event Scheduler: Purging the queue. 0 events
Nov 26 13:49:34 x1 mysqld[3164320]: 2021-11-26 13:49:34 0 [ERROR] InnoDB: (Duplicate key) writing word node to FTS auxiliary index table `d_8nkc96648k3uln`.`jos_vm_product_param_xref`
Nov 26 13:49:34 x1 mysqld[3164320]: 2021-11-26 13:49:34 0 [Note] InnoDB: FTS optimize thread exiting.
Nov 26 13:54:34 x1 systemd[1]: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.
Nov 26 13:56:39 x1 mysqld[3164320]: 2021-11-26 13:56:39 24 [ERROR] Master 'x5': Error reading packet from server: Lost connection to MySQL server during query (server_errno=2013)
Nov 26 13:56:39 x1 mysqld[3164320]: 2021-11-26 13:56:39 24 [Note] Master 'x5': Slave I/O thread: Failed reading log event, reconnecting to retry, log 'binary_log.010514' at position 97604371; GTID position '1-13-2385195623,2-14-212>
Nov 26 13:56:39 x1 mysqld[3164320]: 2021-11-26 13:56:39 24 [ERROR] Master 'x5': Slave I/O: error reconnecting to master 'replicator@10.1.1.25:3306' - retry-time: 60 maximum-retries: 86400 message: Can't connect to MySQL server on>
Nov 26 13:57:24 x1 systemd[1]: mariadb.service: Main process exited, code=killed, status=9/KILL
Nov 26 13:57:24 x1 systemd[1]: mariadb.service: Failed with result 'timeout'.



 Comments   
Comment by Marko Mäkelä [ 2021-11-26 ]

Please attach a debugger to the server after you have initiated a shutdown and produce a stack trace of all threads that are executing. In any GNU/Linux system, the following should work:

gdb -ex 'set height 0' -ex 'thread apply all backtrace' -ex 'quit' -p $(pgrep -x mysqld)

On some systems, attaching to an arbitrary process will require special privileges. For example, the kernels shipped by Ubuntu would require that the command be prefixed with sudo.

Comment by Jozef Kováč [ 2021-11-29 ]

Different setup but most likely same problem. Added attachment mariadb_shutdown_gdb.log mariadb_shutdown_gdb.log

Comment by Marko Mäkelä [ 2021-11-29 ]

lathander, thank you. I checked all InnoDB threads in mariadb_shutdown_gdb.log, and they are idle. Apparently the InnoDB shutdown has not been initiated yet. Thread 1 is waiting for a mi->rli.stop_cond in terminate_slave_thread(). Several threads are waiting for COND_bin_log_updated in MYSQL_BIN_LOG::wait_for_update_binlog_end_pos().

Comment by Jozef Kováč [ 2022-08-31 ]

Can`t reproduce in 10.4.25,26 or 10.5.17.

Generated at Thu Feb 08 09:50:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.