[MDEV-25611] RESET MASTER still causes the server to hang Created: 2021-05-06  Updated: 2023-12-05

Status: Stalled
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5, 10.6
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Andrei Elkin
Resolution: Unresolved Votes: 1
Labels: hang, rr-profile, rr-profile-analyzed

Issue Links:
Blocks
is blocked by MDEV-24302 RESET MASTER hangs as Innodb does not... Closed
Relates
relates to MDEV-14462 Confusing error message: ib_logfiles ... Closed
relates to MDEV-25948 Remove log_flush_task Closed
relates to MDEV-18959 Engine transaction recovery through p... Stalled
relates to MDEV-24660 MYSQL_BIN_LOG::cleanup(): Assertion `... Closed

 Description   

We got a hung server:

10.6 2ceadb390309f9c29786bb5ec2a6459dfe88344e

versioning.rpl 'innodb,row,trx_id'       w3 [ fail ]  timeout after 900 seconds
        Test ended at 2021-05-06 15:23:30
 
Test case timeout after 900 seconds
 
== D:/winx64-debug/build/mysql-test/var/3/log/rpl.log == 
include/master-slave.inc
 
 == D:/winx64-debug/build/mysql-test/var/3/tmp/analyze-timeout-mysqld.2.err ==
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
16	root	localhost:38509	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
 
 == D:/winx64-debug/build/mysql-test/var/3/tmp/analyze-timeout-mysqld.1.err ==
SHOW PROCESSLIST;
Id	User	Host	db	Command	Time	State	Info	Progress
15	root	localhost:37368	test	Query	960	starting	RESET MASTER	0.000
17	root	localhost:38510	NULL	Query	0	starting	SHOW PROCESSLIST	0.000
 
mysqltest failed but provided no output

Among the stack traces, no InnoDB code is executing (only buf_flush_page_cleaner() is in an untimed wait). The culprit for the hang seems to be a thread that is executing RESET MASTER:

10.6 2ceadb390309f9c29786bb5ec2a6459dfe88344e

server!pthread_cond_wait(struct _RTL_CONDITION_VARIABLE * cond = 0x00007fff`622137a8, struct _RTL_CRITICAL_SECTION * mutex = 0x00007fff`62213778) [D:\winx64-debug\build\src\mysys\my_wincond.c @ 91]
server!psi_cond_wait(struct st_mysql_cond * that = 0x00007fff`622137a8, struct st_mysql_mutex * mutex = 0x00007fff`62213778, char * file = 0x00007fff`60671dc8 "D:\winx64-debug\build\src\sql\log.cc", unsigned int line = 0x10f8) [D:\winx64-debug\build\src\mysys\my_thr_init.c @ 596]
server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007fff`622137a8, struct st_mysql_mutex * mutex = 0x00007fff`62213778, char * src_file = 0x00007fff`60671dc8 "D:\winx64-debug\build\src\sql\log.cc", unsigned int src_line = 0x10f8) [D:\winx64-debug\build\src\include\mysql\psi\mysql_thread.h @ 1070]
server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x00000227`de152788, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\log.cc @ 4345]
server!reset_master(class THD * thd = 0x00000227`de152788, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [D:\winx64-debug\build\src\sql\sql_repl.cc @ 3966]
server!reload_acl_and_cache(class THD * thd = 0x00000227`de152788, unsigned int64 options = 0x80, struct TABLE_LIST * tables = 0x00000000`00000000, int * write_to_binlog = 0x00000063`731edb74) [D:\winx64-debug\build\src\sql\sql_reload.cc @ 362]
server!mysql_execute_command(class THD * thd = 0x00000227`de152788) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 5452]
server!mysql_parse(class THD * thd = 0x00000227`de152788, char * rawbuf = 0x00000227`de1ca690 "--- memory read error at address 0x00000227`de1ca690 ---", unsigned int length = 0xc, class Parser_state * parser_state = 0x00000063`731ee950) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 8018]
server!dispatch_command(enum_server_command command = COM_QUERY (0n3), class THD * thd = 0x00000227`de152788, char * packet = 0x00000227`ddfceec9 "--- memory read error at address 0x00000227`ddfceec9 ---", unsigned int packet_length = 0xc, bool blocking = false) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 1899]
server!do_command(class THD * thd = 0x00000227`de152788, bool blocking = false) [D:\winx64-debug\build\src\sql\sql_parse.cc @ 1406]
server!threadpool_process_request(class THD * thd = 0x00000227`de152788) [D:\winx64-debug\build\src\sql\threadpool_common.cc @ 394]
server!tp_callback(struct TP_connection * c = 0x00000227`ddf80e30) [D:\winx64-debug\build\src\sql\threadpool_common.cc @ 203]
server!tp_callback(struct _TP_CALLBACK_INSTANCE * instance = 0x00000063`731ef968, void * context = 0x00000227`ddf80e30) [D:\winx64-debug\build\src\sql\threadpool_win.cc @ 279]
server!io_completion_callback(struct _TP_CALLBACK_INSTANCE * instance = 0x00000063`731ef968, void * context = 0x00000227`ddf80e30, void * overlapped = 0x00000227`ddf80ea0, unsigned long io_result = 0, unsigned int64 nbytes = 0x11, struct _TP_IO * io = 0x00000227`de05a5e0) [D:\winx64-debug\build\src\sql\threadpool_win.cc @ 299]

I think that also 10.5 must be affected by this. On older server versions, it is probably harder to reproduce this, because we never observed MDEV-24302 on older versions than 10.5. This could correlate with scheduling improvements in 10.5 (MDEV-23855) and 10.6 (MDEV-20612, MDEV-25404); earlier, there could have been more scattered calls to write InnoDB log that would wake up RESET MASTER from its wait.



 Comments   
Comment by Marko Mäkelä [ 2021-05-12 ]

I would suggest that this code be refactored. The handlerton::commit_checkpoint_request and trx_commit_complete_for_mysql() should be removed altogether, and InnoDB should notify the binlog via some simpler interface once the transaction state change has become persistent.

Such simplification would make it easy to streamline the transaction commits further in MDEV-14462.

Comment by Marko Mäkelä [ 2021-06-24 ]

For what it is worth, after MDEV-25948 the calls to log_write_up_to() should be more frequent again, and the probability of these hangs might become lower. (Every single update of log_sys.flushed_to_disk_lsn has the potential to ‘rescue’ a hung RESET MASTER.)

Comment by Roel Van de Paar [ 2022-08-08 ]

I have seen mysqladmin shutdown hang, quite easily reproducible/repeatable, whilst RESET MASTER was running. Only a single input SQL thread is necessary. This was using this testcase from MDEV-24660, using RESET MASTER instead of SHUTDOWN and with RESET MASTER repeated many more times in the input SQL, which overall replayed in random SQL order that modified testcase over and over. Elkin has analayzed those rr traces. Reproducibility is 3 to 4 out of every 10 pquery 15 sec trials with this setup. Only mysqld option required is --log-bin). It is better to use the file (linked first in this comment, and here) but for reference the SQL is listed in this comment.

Comment by Andrei Elkin [ 2022-10-13 ]

angelique.sklavounos, what do you about enhancing versioning.rpl to expose the description reported timeout
more reliably?
Currently there's only one BB trace available, with very little details (perhaps it was again RESET MASTER run by rpl_end.inc).
I suggest to create a new test that would run a ver..rpl's body in a (say 10 time, but fit the exec time under the big-test limit) loop each time ended with an explicit RESET MASTER.
We should have that 10 times speed-up in chasing for the cause of the hang.

Comment by Angelique Sklavounos (Inactive) [ 2022-10-24 ]

Hi Elkin, the test body looped 10 times was committed here: https://github.com/MariaDB/server/commit/1e82f34a33387432f1e91c18cfb1dba6ee646009

The test ran against the winx64-debug platform here and passed: https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/34673

In cross-ref I see no failures for this test. However, there have been no failures for versioning.rpl with "timeout after 900 seconds" since 2021-06-08.

I pushed the looped test against a recent version of 10.6, but I can revert the code back to 2ceadb39 if that would help.

Comment by Marko Mäkelä [ 2022-11-02 ]

To better faciliate testing, I reverted MDEV-25948 and added the test. I had no luck in reproducing the hang, but I did not try really hard.

Comment by Angelique Sklavounos (Inactive) [ 2022-11-04 ]

Branch bb-10.6-MDEV-25611 has the looped x10 versioning.rpl_25611 test, added on top of 762bcb81b5b~1, i.e. the first commit from MDEV-25948.

Dummy commits were made to the branch, and the winx64-debug builder passed each time:
979406 https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/34935
379d28 https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/34986
87dd2d https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/35005
bdfb4c https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/35044
0a76df https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/35047

To better faciliate testing, I reverted MDEV-25948 and added the test. I had no luck in reproducing the hang, but I did not try really hard.

With this branch (bb-10.6-MDEV-25948-MDEV-25611), the windx64-debug builder passed each time:
fdb873 https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/35018
30fcb6 https://buildbot.askmonty.org/buildbot/builders/winx64-debug/builds/35046

Locally I ran ./mtr --repeat=100 --parallel=10 versioning.rpl_25611{,,,} but could not reproduce the hang.

Comment by Angelique Sklavounos (Inactive) [ 2022-11-04 ]

Based off of this comment, I am unconfirming this issue, but will continue to try to reproduce the hang.

Comment by Marko Mäkelä [ 2022-11-15 ]

It seems to me that the debug injection --debug-dbug=d,ib_log_checkpoint_avoid, which should not affect correctness in any way, reproduces some interesting failures.

On an updated test branch, each test in the the following invocation fails for me:

./mtr --parallel=auto --mysqld=--debug-dbug=d,ib_log_checkpoint_avoid --max-test-fail=0 --force \
binlog{,_encryption}.binlog_incident rpl.rpl_skip_incident binlog.binlog_killed_simulate rpl.rpl_loaddata_fatal

The test rpl.rpl_loaddata_fatal will be terminated by some timeout; all other tests fail due to result differences that suggest that fewer binlog records were written.

rpl.rpl_loaddata_fatal 'stmt'            w5 [ fail ]
        Test ended at 2022-11-15 09:04:07
 
CURRENT_TEST: rpl.rpl_loaddata_fatal
mysqltest: In included file "./include/wait_for_slave_param.inc": 
included from ./include/wait_for_slave_sql_error.inc at line 41:
included from ./include/wait_for_slave_sql_error_and_skip.inc at line 57:
included from /mariadb/10.6m/mysql-test/suite/rpl/t/rpl_loaddata_fatal.test at line 21:
At line 119: Timeout in include/wait_for_slave_param.inc

That test has also failed recently on buildbot, possibly also due to a timeout that caused the test harness to terminate the server with SIGABRT:

bb-10.5-andrei-MDEV-29322 e3954fdadfb52a1b1113ef891281cb7dfc42660a

rpl.rpl_loaddata_fatal 'stmt'            w19 [ fail ]
        Test ended at 2022-08-31 19:03:35
 
CURRENT_TEST: rpl.rpl_loaddata_fatal
mysqltest: At line 25: failed in 'select master_pos_wait('master-bin.000001', 1221, 300, '')': 2013: Lost connection to MySQL server during query

Comment by Angelique Sklavounos (Inactive) [ 2022-11-29 ]

Have built 3dd4bad4b3c7681 on a Windows 10 machine and tried running versioning.rpl_25611 with --debug-dbug=d,ib_log_checkpoint_avoid, but as of now the tests have passed.

Comment by Angelique Sklavounos (Inactive) [ 2022-11-29 ]

Tried again, I believe I was able to recreate the hang with RESET MASTER, using --debug-dbug=d,ib_log_checkpoint_avoid. The var dir has been placed here:
https://drive.google.com/file/d/16I1nttAiQDzN1oPbBVRp4R_2C3tfdDkH/view?usp=sharing

10.6 3dd4bad4b3c7681

versioning.rpl_25611 'innodb,mix,timestamp' w2 [ 45 fail ]  timeout after 900 seconds
        Test ended at 2022-11-29 09:15:12
 
Test case timeout after 900 seconds
 
== C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/log/rpl_25611.log ==
Table   Create Table
t1      CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL
) ENGINE=ENGINE DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci WITH SYSTEM VERSIONING
connection master;
drop table t1, t2;
create table t1 (i int) with system versioning partition by system_time limit 8 ( partition p1 history, partition p2 history, partition pn current );
insert into t1 values (1);
update t1 set i = 1;
update t1 set i = 0;
connection slave;
connection master;
drop table t1;
connection slave;
connection slave;
include/stop_slave.inc
reset slave;
include/start_slave.inc
reset master;
 
 == C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/tmp/analyze-timeout-mysqld.1.err ==
SHOW PROCESSLIST;
Id      User    Host    db      Command Time    State   Info    Progress
855     root    localhost:54625 NULL    Binlog Dump     921     Master has sent all binlog to slave; waiting for more updates   NULL    0.000
856     root    localhost:58844 NULL    Query   0       starting        SHOW PROCESSLIST        0.000
 
 == C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/tmp/analyze-timeout-mysqld.2.err ==
SHOW PROCESSLIST;
Id      User    Host    db      Command Time    State   Info    Progress
2536    root    localhost:54442 test    Query   921     starting        reset master    0.000
2573    system user             NULL    Slave_IO        921     Waiting for master to send event        NULL    0.000
2574    system user             test    Slave_SQL       921     closing tables  DROP TABLE `t1` /* generated by server */       0.000
2576    root    localhost:58845 NULL    Query   0       starting        SHOW PROCESSLIST        0.000
 
mysqltest failed but provided no output
The result from queries just before the failure was:
< snip >
Table   Create Table
t1      CREATE TABLE `t1` (
  `a` int(11) DEFAULT NULL,
  `b` int(11) DEFAULT NULL
) ENGINE=ENGINE DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci WITH SYSTEM VERSIONING
connection master;
drop table t1, t2;
create table t1 (i int) with system versioning partition by system_time limit 8 ( partition p1 history, partition p2 history, partition pn current );
insert into t1 values (1);
update t1 set i = 1;
update t1 set i = 0;
connection slave;
connection master;
drop table t1;
connection slave;
connection slave;
include/stop_slave.inc
reset slave;
include/start_slave.inc
reset master;
 
 
 - found 'mar3C75.tmp.dmp', moving it to 'C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/log/versioning.rpl_25611-innodb,mix,timestamp'
 - found 'mariadb-test.dmp', moving it to 'C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/log/versioning.rpl_25611-innodb,mix,timestamp'
 - found 'mariadbd.dmp', moving it to 'C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/log/versioning.rpl_25611-innodb,mix,timestamp'
 - found 'mar3C75.tmp.dmp' (0/5)
 
Trying 'cdb' to get a backtrace
Cannot find cdb. Please Install Debugging tools for Windows
from http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx (native x64 version)
 - found 'mariadb-test.dmp' (1/5)
 
Trying 'cdb' to get a backtrace
Cannot find cdb. Please Install Debugging tools for Windows
from http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx (native x64 version)
 - found 'mariadbd.dmp' (2/5)
 
Trying 'cdb' to get a backtrace
Cannot find cdb. Please Install Debugging tools for Windows
from http://www.microsoft.com/whdc/devtools/debugging/install64bit.mspx (native x64 version)
 - saving 'C:/Users/sepol/mariadb/server/bld/mysql-test/var/2/log/versioning.rpl_25611-innodb,mix,timestamp/' to 'C:/Users/sepol/mariadb/server/bld/mysql-test/var/log/versioning.rpl_25611-innodb,mix,timestamp/'
 
Only  1  of 15 completed.
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 25301.472 of 3516 seconds executing testcases
 
Failure: Failed 1/45 tests, 97.78% were successful.
 
Failing test(s): versioning.rpl_25611
 
The log files in var/log may give you some hint of what went wrong.
 
If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
 
mysql-test-run: *** ERROR: there were failing test cases

Comment by Angelique Sklavounos (Inactive) [ 2023-01-31 ]

The RESET MASTER hang was reproduced without --debug-dbug=d,ib_log_checkpoint_avoid, on a machine shared with Elkin.

Comment by Marko Mäkelä [ 2023-05-16 ]

On an updated test branch, each test in the the following invocation fails for me:

./mtr --parallel=auto --mysqld=--debug-dbug=d,ib_log_checkpoint_avoid --max-test-fail=0 --force \
binlog{,_encryption}.binlog_incident rpl.rpl_skip_incident binlog.binlog_killed_simulate rpl.rpl_loaddata_fatal

I got an idea that perhaps the following patch would fix this, but it was not the case.

diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index dba0352b5ce..e5527d04601 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -4776,7 +4776,7 @@ void log_flush_notify(lsn_t flush_lsn)
   {
     mysql_mutex_lock(&log_requests.mutex);
     pending= log_requests.start.load(std::memory_order_relaxed);
-    log_flush_notify_and_unlock(pending, flush_lsn);
+    log_flush_notify_and_unlock(pending, log_sys.get_flushed_lsn());
   }
 }
 
@@ -4849,7 +4849,7 @@ static void innodb_log_flush_request(void *cookie)
   persistent, we will notify old requests of completion. Note:
   log_flush_notify() may skip some notifications because it is
   basically assuming that the list is in ascending order of LSN. */
-  log_flush_notify(flush_lsn);
+  log_flush_notify(log_sys.get_flushed_lsn());
   commit_checkpoint_notify_ha(cookie);
 }
 

If I omit the option --debug-dbug=d,ib_log_checkpoint_avoid, then all tests will pass.

Comment by Kristian Nielsen [ 2023-05-16 ]

Seems to be a debug_dbug syntax problem.
Using --debug-dbug=d,... resets any other debug_dbug settings, causing some failures.
Using this command I get no failures:

./mtr --parallel=auto --mysqld=--debug-dbug=+d,ib_log_checkpoint_avoid --max-test-fail=0 --force binlog{,_encryption}.binlog_incident rpl.rpl_skip_incident binlog.binlog_killed_simulate rpl.rpl_loaddata_fatal

Maybe it should be re-tried to run some of the other rqg tests to see if they can reproduce the hang?

Comment by Kristian Nielsen [ 2023-05-16 ]

I created a test case that reliably reproduces a hang in RESET MASTER due to waiting for binlog checkpoint response that does not arrive.

https://github.com/MariaDB/server/commit/008df8e4d689eaa205203923432d7696eae7e0f0

The test case disables in InnoDB the code that syncs the InnoDB redo log once per second, otherwise the hang does not occur.
But this is still an interesting data point. It points to a design problem in RESET MASTER (that it can hang if the innodb redo log does not get synced to the end for some reason).
And in case a binlog checkpoint notification gets lost for some reason (unknown at present), this could similarly cause a RESET MASTER hang.

Comment by Marko Mäkelä [ 2023-05-19 ]

elenst provided a list of hangs on Microsoft Windows similar to the one in the Description. The last such failure occurred on 2021-10-13 on a mariadb-10.6.4 based branch. As far as I can tell, the extra changes on top of the mariadb-10.6.4 release tag should not contribute to this. The failure was like this:

innodb.group_commit_crash 'innodb'       w2 [ fail ]  timeout after 900 seconds
SHOW BINLOG EVENTS LIMIT 4,1;
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	#	Query	1	#	use `test`; insert into t1 select * from t2
delete from t1;
SET binlog_format= mixed;
RESET MASTER;

I do not see any hung InnoDB threads. The thread executing RESET MASTER is waiting here:

server!pthread_cond_timedwait(struct _RTL_CONDITION_VARIABLE * cond = 0x00007fff`0adcbb28, struct _RTL_CRITICAL_SECTION * mutex = 0x00007fff`0adcbaf8, struct timespec * abstime = 0x00000000`00000000) [C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildmysysmy_wincond.c @ 82]
server!pthread_cond_wait(struct _RTL_CONDITION_VARIABLE * cond = 0x00007fff`0adcbb28, struct _RTL_CRITICAL_SECTION * mutex = 0x00007fff`0adcbaf8) [C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildmysysmy_wincond.c @ 91]
server!psi_cond_wait(struct st_mysql_cond * that = 0x00007fff`0adcbb28, struct st_mysql_mutex * mutex = 0x00007fff`0adcbaf8, char * file = 0x00007fff`08c6c2c0 "C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildsqllog.cc", unsigned int line = 0x1105) [C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildmysysmy_thr_init.c @ 596]
server!inline_mysql_cond_wait(struct st_mysql_cond * that = 0x00007fff`0adcbb28, struct st_mysql_mutex * mutex = 0x00007fff`0adcbaf8, char * src_file = 0x00007fff`08c6c2c0 "C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildsqllog.cc", unsigned int src_line = 0x1105) [C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildincludemysqlpsimysql_thread.h @ 1070]
server!MYSQL_BIN_LOG::reset_logs(class THD * thd = 0x000002f3`5efac5b8, bool create_new_log = true, struct rpl_gtid * init_state = 0x00000000`00000000, unsigned int init_state_len = 0, unsigned long next_log_number = 0) [C:UsersjenkinsworkspaceESDebug-WIN-MTRwin_buildsqllog.cc @ 4358]

It should be noted that MDEV-25948 was implemented in MariaDB Server 10.6.3 already, so it did not possibly "fix" this, since this failure occurred on a branch that is ahead of mariadb-10.6.4 by a number of commits.

The last failure on a less modified MariaDB Server branch (1 additional commit on a main branch) was right before MDEV-25948 had been fixed in the main branch.

It is unclear to me what appears to have fixed the problem, and I am not convinced that it has been really fixed. The probability of hang could have been lowered possibly due to some change in the replication side, or some change to InnoDB page flushing (which drives log checkpoints and can trigger log writes) that would make it harder to hit the hang.

knielsen, thank you for looking at this and creating a more usable test case.

Comment by Andrei Elkin [ 2023-06-26 ]

By analysis done so far the hang is most probably was (maybe still is) caused by Innodb not flushing redo log to disk even if RESET MASTER "thinks" it should do. And it does not flush 'cos of not having any real data modification to log.
That suggests a simple workaround for the user - to unhang the RM just execute any dummy modification involving Innodb, and
and a straightforward fix - to do the above in the server when a timeout elapses.

Comment by Julien Fritsch [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Comment by JiraAutomate [ 2023-12-05 ]

Automated message:
----------------------------
Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

Generated at Thu Feb 08 09:39:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.