Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.4.18, 10.5.12, 10.5.15, 10.6.7
-
Windows Server 2012, 2016
Description
Our custom app went through an install on 2021-07-29 where we dumped the master DB (with master info/pos included), imported it into the new slave, and proceeded to run with replication - this started in line 87 of the attached file.
On 2021-08-04, we upgraded our custom app (does NOT upgrade MariaDB) which runs the following commands between the times shown:
2021-08-04 10:01:39
stop slave;
CHANGE MASTER TO MASTER_CONNECT_RETRY = 1, MASTER_HEARTBEAT_PERIOD = 90, MASTER_USER = 'mvp_repl_slave', MASTER_PASSWORD = '####';
start slave;
DELETE FROM user WHERE !((User='root' AND Host='localhost') OR (User='mariadb.sys' AND Host='localhost'));
FLUSH PRIVILEGES;
GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE, CREATE, DROP, CREATE VIEW, SHOW VIEW, FILE, SUPER, REPLICATION CLIENT ON . to mvp_local@'localhost' IDENTIFIED BY '####';
GRANT SELECT, EXECUTE, SUPER, REPLICATION CLIENT ON . to mvp_peer@'192.168.2.2' IDENTIFIED BY '####';
GRANT SELECT, EXECUTE, SUPER, REPLICATION CLIENT ON . to mvp_peer@'192.168.2.3' IDENTIFIED BY '####';
FLUSH PRIVILEGES;
2021-08-04 10:01:52
And then stops the service (2021-08-04 10:01:57), and restarts it (2021-08-04 10:02:03).
This resulted in the following error, which has NOT been readily reproducible:
ntdll.dll!RtlpUnWaitCriticalSection()
ntdll.dll!RtlEnterCriticalSection()
ntdll.dll!RtlEnterCriticalSection()
mysqld.exe!mysql_manager_submit()[sql_manager.cc:51]
mysqld.exe!rpl_slave_state::update()[rpl_gtid.cc:358]
mysqld.exe!rpl_load_gtid_slave_state()[rpl_rli.cc:1930]
mysqld.exe!binlog_background_thread()[log.cc:10026]
mysqld.exe!pthread_start()[my_winthread.c:62]
ucrtbase.dll!o_realloc_base()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()
Attachments
Issue Links
- relates to
-
MDEV-33799 mysql_manager_submit Segfault at Startup Still Possible During Recovery
-
- Closed
-
Hi Elkin
Thank you very much for that recommendation. With gtid_cleanup_batch_size=1024 I have been unable to reproduce the crash in over 10,000 restarts of an active slave with 16 threads updating the master. So this very much looks like it was the problem, no doubt Brandon's patch fixes it, and we have an effective workaround as well by simply turning up gtid_cleanup_batch_size until there's a lower possibility of the race-condition.
Thank you both!