[MDEV-36287] Server crash in SHOW SLAVE STATUS concurrent with STOP SLAVE - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5.28
Fix Version/s: 12.0.0
Component/s: Replication
Labels:
None

Description

The code for SHOW SLAVE STATUS accesses mi->rli.sql_driver_thd->proc_info without holding mi->run_lock. This means the THD can go away in the middle (in case of STOP SLAVE) and cause SHOW SLAVE STATUS to access invalid memory and crash the server.

This appeared as an MSAN test failure in Buildbot, the precise test probably doesn't matter as most any replication test could hit this rare race:

https://buildbot.mariadb.org/#/builders/640/builds/9922

I could reproduce with a hacked test case using sleep and a small code patch to inject a sleep to make the race easier to hit:

--source include/master-slave.inc

--connection slave1

send STOP SLAVE;

--connection slave

--sleep 0.5

send SHOW SLAVE STATUS;

--connection slave1

reap;

--sleep 0.5

START SLAVE;

--connection slave

reap;

--source include/rpl_end.inc

diff --git a/sql/slave.cc b/sql/slave.cc

index 6f4176f233d..3e4df0ff4e6 100644

--- a/sql/slave.cc

+++ b/sql/slave.cc

@@ -3343,8 +3343,10 @@ static bool send_show_master_info_data(THD *thd, Master_info *mi, bool full,

     // SQL_Remaining_Delay

     // THD::proc_info is not protected by any lock, so we read it once

     // to ensure that we use the same value throughout this function.

+    THD *sql_driver= mi->rli.sql_driver_thd;

+    my_sleep(2000000);

     const char *slave_sql_running_state=

-      mi->rli.sql_driver_thd ? mi->rli.sql_driver_thd->proc_info : "";

+      sql_driver ? sql_driver->proc_info : "";

     if (slave_sql_running_state == stage_sql_thd_waiting_until_delay.m_name)

       time_t t= my_time(0), sql_delay_end= mi->rli.get_sql_delay_end();

@@ -5839,6 +5841,7 @@ pthread_handler_t handle_slave_sql(void *arg)

     could be used by slave through Relay_log_info::save_temporary_tables.

*/

   thd->temporary_tables= 0;

+my_sleep(1000000);

   rli->sql_driver_thd= 0;

   thd->rgi_fake= thd->rgi_slave= NULL;

Setting 10.5 as the target as this is a crashing bug.

Attachments

Activity

People

Assignee:: Kristian Nielsen

Reporter:: Kristian Nielsen

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-03-13 13:24

Updated:: 2025-03-26 09:10

Resolved:: 2025-03-15 10:54

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server