Details
-
Task
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
Description
Currently when shutdown happens, innodb_fast_shutdown<>2 snd InnoDB has to roll back some huge transaction, there is no output (but one empty row) until this process is completed. Consider this error log sample:
2019-08-19 16:08:58 139669736232704 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
|
2019-08-19 16:08:58 139669859714816 [Note] InnoDB: FTS optimize thread exiting.
|
2019-08-19 16:08:58 139669736232704 [Note] Event Scheduler: Purging the queue. 0 events
|
2019-08-19 16:08:58 139669751666432 [Note] Error reading relay log event: slave SQL thread was killed
|
2019-08-19 16:08:58 139669770405632 [Note] Slave I/O thread exiting, read up to log 'binlog.1519705', position 10122889; GTID position 0-198067036-22956651797,999-186069024-2
|
2019-08-19 16:08:59 139669751666432 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.1519705' at position 10122889; GTID position '0-198067036-22956651797,999-186069024-2'
|
2019-08-19 16:09:19 139669736232704 [Warning] /usr/sbin/mysqld: Forcing close of thread 4999326 user: 'user1'
|
2019-08-19 16:09:19 139669736232704 [Warning] /usr/sbin/mysqld: Forcing close of thread 4999326 user: ‘user1’
|
|
2019-08-19 17:22:46 139669736232704 [Note] unregister_replicator OK
|
2019-08-19 17:22:46 139669736232704 [Note] InnoDB: Starting shutdown...
|
2019-08-19 17:22:46 139669817751296 [Note] InnoDB: Dumping buffer pool(s) to /mysql/ppc/data/ib_buffer_pool
|
As you can see, we get nothing but one empty row for more than an hour. Backtrace analysis shows that there is only one active thread doing rollback of a huge transaction, but there is no way to find this out without OS level tools like gdb, perf or bpftrace.
How poor DBA should know that something is in progress and there is no hang? Please, add information about the transaction(s) rollback stage (maybe if they work for longer than few seconds at least) like it was done for other stages of normal shutdown above, and consider adding progress reports similar to those that are written upon startup and recovery.
On a separate note, in versions with MDEV-18450 implemented it would be useful to add the output about waiting for replication-related dump threads to be stopped properly. I may be wrong, but cursory review of the related commits does not seem to give me a confidence that there is a proper reporting of the progress on that stage in non-debug binaries.