Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.5.28
-
None
Description
The code for SHOW SLAVE STATUS accesses mi->rli.sql_driver_thd->proc_info without holding mi->run_lock. This means the THD can go away in the middle (in case of STOP SLAVE) and cause SHOW SLAVE STATUS to access invalid memory and crash the server.
This appeared as an MSAN test failure in Buildbot, the precise test probably doesn't matter as most any replication test could hit this rare race:
https://buildbot.mariadb.org/#/builders/640/builds/9922
I could reproduce with a hacked test case using sleep and a small code patch to inject a sleep to make the race easier to hit:
--source include/master-slave.inc
|
|
--connection slave1
|
send STOP SLAVE;
|
|
--connection slave
|
--sleep 0.5
|
send SHOW SLAVE STATUS;
|
|
--connection slave1
|
reap;
|
--sleep 0.5
|
START SLAVE;
|
|
--connection slave
|
reap;
|
|
--source include/rpl_end.inc
|
diff --git a/sql/slave.cc b/sql/slave.cc
|
index 6f4176f233d..3e4df0ff4e6 100644
|
--- a/sql/slave.cc
|
+++ b/sql/slave.cc
|
@@ -3343,8 +3343,10 @@ static bool send_show_master_info_data(THD *thd, Master_info *mi, bool full,
|
// SQL_Remaining_Delay
|
// THD::proc_info is not protected by any lock, so we read it once
|
// to ensure that we use the same value throughout this function.
|
+ THD *sql_driver= mi->rli.sql_driver_thd;
|
+ my_sleep(2000000);
|
const char *slave_sql_running_state=
|
- mi->rli.sql_driver_thd ? mi->rli.sql_driver_thd->proc_info : "";
|
+ sql_driver ? sql_driver->proc_info : "";
|
if (slave_sql_running_state == stage_sql_thd_waiting_until_delay.m_name)
|
{
|
time_t t= my_time(0), sql_delay_end= mi->rli.get_sql_delay_end();
|
@@ -5839,6 +5841,7 @@ pthread_handler_t handle_slave_sql(void *arg)
|
could be used by slave through Relay_log_info::save_temporary_tables.
|
*/
|
thd->temporary_tables= 0;
|
+my_sleep(1000000);
|
rli->sql_driver_thd= 0;
|
thd->rgi_fake= thd->rgi_slave= NULL;
|
|
Setting 10.5 as the target as this is a crashing bug.