[MDEV-26473] mysqld got exception 0xc0000005 (rpl_slave_state/rpl_load_gtid_slave_state) - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.4.18, 10.5.12, 10.5.15, 10.6.7
Fix Version/s: 10.4.25, 10.5.16, 10.6.8, 10.7.4, 10.8.3
Component/s: Replication
Labels:
- replication
Environment:
Windows Server 2012, 2016

Description

Our custom app went through an install on 2021-07-29 where we dumped the master DB (with master info/pos included), imported it into the new slave, and proceeded to run with replication - this started in line 87 of the attached file.

On 2021-08-04, we upgraded our custom app (does NOT upgrade MariaDB) which runs the following commands between the times shown:

2021-08-04 10:01:39

stop slave;
CHANGE MASTER TO MASTER_CONNECT_RETRY = 1, MASTER_HEARTBEAT_PERIOD = 90, MASTER_USER = 'mvp_repl_slave', MASTER_PASSWORD = '####';
start slave;
DELETE FROM user WHERE !((User='root' AND Host='localhost') OR (User='mariadb.sys' AND Host='localhost'));
FLUSH PRIVILEGES;
GRANT SELECT, INSERT, UPDATE, DELETE, EXECUTE, CREATE, DROP, CREATE VIEW, SHOW VIEW, FILE, SUPER, REPLICATION CLIENT ON . to mvp_local@'localhost' IDENTIFIED BY '####';
GRANT SELECT, EXECUTE, SUPER, REPLICATION CLIENT ON . to mvp_peer@'192.168.2.2' IDENTIFIED BY '####';
GRANT SELECT, EXECUTE, SUPER, REPLICATION CLIENT ON . to mvp_peer@'192.168.2.3' IDENTIFIED BY '####';
FLUSH PRIVILEGES;

2021-08-04 10:01:52

And then stops the service (2021-08-04 10:01:57), and restarts it (2021-08-04 10:02:03).

This resulted in the following error, which has NOT been readily reproducible:

ntdll.dll!RtlpUnWaitCriticalSection()
ntdll.dll!RtlEnterCriticalSection()
ntdll.dll!RtlEnterCriticalSection()
mysqld.exe!mysql_manager_submit()[sql_manager.cc:51]
mysqld.exe!rpl_slave_state::update()[rpl_gtid.cc:358]
mysqld.exe!rpl_load_gtid_slave_state()[rpl_rli.cc:1930]
mysqld.exe!binlog_background_thread()[log.cc:10026]
mysqld.exe!pthread_start()[my_winthread.c:62]
ucrtbase.dll!o_realloc_base()
KERNEL32.DLL!BaseThreadInitThunk()
ntdll.dll!RtlUserThreadStart()

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

cs0390600-MDEV-26473-20220329-WinServ2016Std-1607-MDB-10-6-7-W2016-88.dmp
61 kB
2022-03-29 20:59
cs0390600-MDEV-26473-20220329-WinServ2016Std-1607-MDB-10-6-7-W2016-88.err
4 kB
2022-03-29 20:59
gtid_slave_pos.ibd
96 kB
2021-08-30 18:54
gtid_slave_pos.ibd.30d6c68c3137b307b8428df46a6aca07
96 kB
2021-10-07 13:30
gtid_slave_pos.ibd.5d875c467ca8835cc92261e52836fa75
96 kB
2021-10-07 13:30
gtid_slave_pos.ibd.6408264d725618fd8dd40a14df42d5ee
96 kB
2021-10-07 13:30
master_binlog_Main_gm_a.000399_short.txt
24 kB
2021-08-30 18:35
mysqld_crash_Aug04.txt
26 kB
2021-08-24 21:34

Issue Links

relates to

MDEV-33799 mysql_manager_submit Segfault at Startup Still Possible During Recovery

Closed

Activity

Ascending order - Click to sort in descending order

View 17 older comments

Juan added a comment - 2022-04-14 18:10 - edited

Hi Elkin
Thank you very much for that recommendation. With gtid_cleanup_batch_size=1024 I have been unable to reproduce the crash in over 10,000 restarts of an active slave with 16 threads updating the master. So this very much looks like it was the problem, no doubt Brandon's patch fixes it, and we have an effective workaround as well by simply turning up gtid_cleanup_batch_size until there's a lower possibility of the race-condition.

Thank you both!

Juan added a comment - 2022-04-14 18:10 - edited Hi Elkin Thank you very much for that recommendation. With gtid_cleanup_batch_size=1024 I have been unable to reproduce the crash in over 10,000 restarts of an active slave with 16 threads updating the master. So this very much looks like it was the problem, no doubt Brandon's patch fixes it, and we have an effective workaround as well by simply turning up gtid_cleanup_batch_size until there's a lower possibility of the race-condition. Thank you both!

Andrei Elkin added a comment - 2022-04-22 10:50

Review notes are made on GH.

Andrei Elkin added a comment - 2022-04-22 10:50 Review notes are made on GH.

Andrei Elkin added a comment - 2022-04-26 12:53 - edited

bnestere: I think
> larger numbers would only delay the crash
First I thought it would prevent any crash, but it depends on a number of factors actually which one of them is unpredictable pace of binlog background thread. So in theory it could be lazy at shutdown time while the table size greater than 32K records, and then at restart the initialization time garbage-collection may hit that initialized mutex.

Andrei Elkin added a comment - 2022-04-26 12:53 - edited bnestere : I think > larger numbers would only delay the crash First I thought it would prevent any crash, but it depends on a number of factors actually which one of them is unpredictable pace of binlog background thread. So in theory it could be lazy at shutdown time while the table size greater than 32K records, and then at restart the initialization time garbage-collection may hit that initialized mutex.

Dim added a comment - 2022-04-27 10:23

@Andrei if I set gtid_cleanup_batch_size=1024 like @Juan mention does it prevent the crash completely?

Dim added a comment - 2022-04-27 10:23 @Andrei if I set gtid_cleanup_batch_size=1024 like @Juan mention does it prevent the crash completely?

Brandon Nesterenko added a comment - 2022-05-05 14:49

Hi juan.vera,

That is correct. And for completeness, this bug should also exist in all released versions of 10.6, 10.7, and 10.8. That is, you won't be able to downgrade anything 10.6+ to circumvent this bug. 10.5.8 is the "most recent" unaffected version.

Brandon

Brandon Nesterenko added a comment - 2022-05-05 14:49 Hi juan.vera , That is correct. And for completeness, this bug should also exist in all released versions of 10.6, 10.7, and 10.8. That is, you won't be able to downgrade anything 10.6+ to circumvent this bug. 10.5.8 is the "most recent" unaffected version. Brandon

People

Assignee:: Brandon Nesterenko

Reporter:: Pat K

Votes:: 2 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 2021-08-24 21:35

Updated:: 2024-07-07 21:03

Resolved:: 2022-04-25 17:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration