[MDEV-20391] Add information about the progress of transaction rollback (and waiting for replication threads to complete) to the error log upon shutdown Created: 2019-08-20  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB
Fix Version/s: 10.4

Type: Task Priority: Major
Reporter: Valerii Kravchuk Assignee: Andrei Elkin
Resolution: Unresolved Votes: 1
Labels: innodb, logging, replication


 Description   

Currently when shutdown happens, innodb_fast_shutdown<>2 snd InnoDB has to roll back some huge transaction, there is no output (but one empty row) until this process is completed. Consider this error log sample:

2019-08-19 16:08:58 139669736232704 [Note] /usr/sbin/mysqld (initiated by: unknown): Normal shutdown
2019-08-19 16:08:58 139669859714816 [Note] InnoDB: FTS optimize thread exiting.
2019-08-19 16:08:58 139669736232704 [Note] Event Scheduler: Purging the queue. 0 events
2019-08-19 16:08:58 139669751666432 [Note] Error reading relay log event: slave SQL thread was killed
2019-08-19 16:08:58 139669770405632 [Note] Slave I/O thread exiting, read up to log 'binlog.1519705', position 10122889; GTID position 0-198067036-22956651797,999-186069024-2
2019-08-19 16:08:59 139669751666432 [Note] Slave SQL thread exiting, replication stopped in log 'binlog.1519705' at position 10122889; GTID position '0-198067036-22956651797,999-186069024-2'
2019-08-19 16:09:19 139669736232704 [Warning] /usr/sbin/mysqld: Forcing close of thread 4999326 user: 'user1'
2019-08-19 16:09:19 139669736232704 [Warning] /usr/sbin/mysqld: Forcing close of thread 4999326 user: ‘user1’
 
2019-08-19 17:22:46 139669736232704 [Note] unregister_replicator OK
2019-08-19 17:22:46 139669736232704 [Note] InnoDB: Starting shutdown...
2019-08-19 17:22:46 139669817751296 [Note] InnoDB: Dumping buffer pool(s) to /mysql/ppc/data/ib_buffer_pool

As you can see, we get nothing but one empty row for more than an hour. Backtrace analysis shows that there is only one active thread doing rollback of a huge transaction, but there is no way to find this out without OS level tools like gdb, perf or bpftrace.

How poor DBA should know that something is in progress and there is no hang? Please, add information about the transaction(s) rollback stage (maybe if they work for longer than few seconds at least) like it was done for other stages of normal shutdown above, and consider adding progress reports similar to those that are written upon startup and recovery.

On a separate note, in versions with MDEV-18450 implemented it would be useful to add the output about waiting for replication-related dump threads to be stopped properly. I may be wrong, but cursory review of the related commits does not seem to give me a confidence that there is a proper reporting of the progress on that stage in non-debug binaries.



 Comments   
Comment by Andrei Elkin [ 2022-11-07 ]

ralf.gebhardt, you refer to the 1st part's (of actually two reported issues) related:

2019-08-19 16:09:19 139669736232704 [Warning] /usr/sbin/mysqld: Forcing close of thread 4999326 user: ‘user1’

2019-08-19 17:22:46 139669736232704 [Note] unregister_replicator OK

which is apparently to marko.

I'd rather we untie the two to settle the Dump thread shutdown progress

On a separate note, in versions with MDEV-18450 implemented it would be useful to add the output about waiting for replication-related dump threads to be stopped properly.

in a new MDEV.

Comment by Marko Mäkelä [ 2022-11-07 ]

The InnoDB shutdown could be sped up by setting innodb_fast_shutdown=3 (MDEV-15832). However, then the rollback would be executed in a background thread during the next server startup. InnoDB rollback can be extremely slow; implementing a change of secondary index record format (MDEV-17598) should make it faster.

Comment by Andrei Elkin [ 2022-11-07 ]

valerii, to the MDEV-18450 note actually a progress reporting exists. To activate it one sets log_warnings > 2 to get the current status of a dump thread once in a minute:

2022-11-07 13:54:48 6 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 329), using_gtid(1), gtid('')
2022-11-07 13:56:30 6 [Note] master sends heartbeat message ./master-bin.000001:484
2022-11-07 13:58:12 0 [Note] 10.11/A/RUN/sql/mariadbd (initiated by: root[root] @ localhost [127.0.0.1]): Normal shutdown
2022-11-07 13:58:12 0 [Warning] Dump thread 6 last sent to server 2 binlog file:pos master-bin.000001:688
2022-11-07 13:59:18 0 [Warning] Dump thread 6 last sent to server 2 binlog file:pos master-bin.000001:688
2022-11-07 14:01:26 0 [Warning] Dump thread 6 last sent to server 2 binlog file:pos master-bin.000001:688

(It doesn't keep moving forward in my simulation though)

Please tell us if anything more is needed.

Comment by Valerii Kravchuk [ 2022-11-07 ]

Is the above I miss some number showing the progress achieved over time.(position remains the same), for DBA to predict when the process may end. I think it may, indeed, be a topic for a separate, new replication-related MDEV.

My main request here is for InnoDB rollback progress reporting, hence the Component used.

Comment by Andrei Elkin [ 2022-11-07 ]

valerii, well, my example constant offset was unnatural caused by my gdb simulation (I just stalled the dump thread). To estimate the progress is still possible with finding the END position of binlog. Sure at the shutdown time one would have to
do with external tools e.g ls -s master-bin.000001, and that's what a good warning should provide at once.

We can improve of course.

Generated at Thu Feb 08 08:59:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.