[MDEV-27069] heap-use-after-free in dict_stats_recalc_pool_del() Created: 2021-11-17  Updated: 2021-11-26  Resolved: 2021-11-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.5
Fix Version/s: 10.6.6

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, regression-10.6

Issue Links:
Blocks
Problem/Incident
is caused by MDEV-25919 InnoDB reports misleading lock wait t... Closed

 Description   

origin/10.6 6841d1afddf8c9b433f98980f0ff0c940048b431 2021-11-16T17:15:13+02:00
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 116218][rr 3209544 116222]==3209544==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000e2a68 at pc 0x558a4ffb86ce bp 0x7f93c22965a0 sp 0x7f93c2296590
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 116225][rr 3209544 116227]READ of size 4 at 0x6060000e2a68 thread T11
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 117939]2021-11-17 14:20:01 37 [Warning] Aborted connection 37 to db: 'test' user: 'root' host: 'localhost' (This connection closed normally)
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123458]    #0 0x558a4ffb86cd in dict_stats_recalc_pool_del(unsigned long, bool) /data/Server/10.6F/storage/innobase/dict/dict0stats_bg.cc:231
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123516]    #1 0x558a4f92df1b in ha_innobase::info_low(unsigned int, bool) /data/Server/10.6F/storage/innobase/handler/ha_innodb.cc:14629
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123522]    #2 0x558a4f92f9d9 in ha_innobase::analyze(THD*, st_ha_check_opt*) /data/Server/10.6F/storage/innobase/handler/ha_innodb.cc:14932
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123540]    #3 0x558a4f023557 in handler::ha_analyze(THD*, st_ha_check_opt*) /data/Server/10.6F/sql/handler.cc:5095
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123550]    #4 0x558a4ec62c78 in mysql_admin_table /data/Server/10.6F/sql/sql_admin.cc:1212
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123556]    #5 0x558a4ec651f2 in Sql_cmd_optimize_table::execute(THD*) /data/Server/10.6F/sql/sql_admin.cc:1567
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123572]    #6 0x558a4e84abae in mysql_execute_command(THD*, bool) /data/Server/10.6F/sql/sql_parse.cc:5997
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123580]    #7 0x558a4e85706f in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.6F/sql/sql_parse.cc:8030
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123582]    #8 0x558a4e82f2e6 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.6F/sql/sql_parse.cc:1896
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123584]    #9 0x558a4e82c6be in do_command(THD*, bool) /data/Server/10.6F/sql/sql_parse.cc:1404
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123592]    #10 0x558a4ec2c5f2 in do_handle_one_connection(CONNECT*, bool) /data/Server/10.6F/sql/sql_connect.cc:1418
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123610]    #11 0x558a4ec2be83 in handle_one_connection /data/Server/10.6F/sql/sql_connect.cc:1312
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123618]    #12 0x7f93e1ec3608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
# 2021-11-17T14:22:26 [3189143] | [rr 3209544 123620]    #13 0x7f93e1a99292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
...
SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.6F/storage/innobase/dict/dict0stats_bg.cc:231 in dict_stats_recalc_pool_del(unsigned long, bool)
...
Query: OPTIMIZE TABLE
Status: KILL_TIMEOUT
 
pluto:/data/rqg/results/1637150945/TBR-1244/data/rqg/1637150945/164/1/rr
 
RQG
====
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental 5d964598d1a7eb7957b1afa9b5950ef927803070 2021-10-29T15:21:01+02:00
# Warning: This version of RQG contains not yet pushed experimental changes
# and so the commandline might need modifications.
#
# rqg.pl  : Version 4.0.2 (2021-10)
#
# $RQG_HOME/rqg1.pl \
# --gendata \
# --vcols \
# --views \
# --grammar=conf/mariadb/instant_add.yy \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=9 \
# --mysqld=--innodb-use-native-aio=0 \
# --rr=Extended \
# --rr_options=--wait \
# --vardir_type=slow \     <-- Only known to not yet pushed version.
# --mysqld=--innodb_page_size=16K \
# --mysqld=--innodb-buffer-pool-size=8M \
# --no_mask \
# --mtr-build-thread=<local settings> \
# --batch \    <-- Only known to not yet pushed version.
# --basedir1=<local settings> \
# --script_debug=_nix_ \



 Comments   
Comment by Matthias Leich [ 2021-11-17 ]

The RQG grammar contains too much different DDL/DML. Hence I will apply the automatic test simplifier.

Comment by Marko Mäkelä [ 2021-11-17 ]

In the trace, I can see a race:

continue
watch -l *(char*)0x0c0c8001454e
reverse-continue
reverse-continue
backtrace
thread apply 3 backtrace

Two threads are in a race. At the time the memory was freed, Thread 3 (which would crash a little later) was waiting here:

#12 0x0000558a502b3654 in safe_cond_wait (cond=0x558a52db98a0 <recalc_pool_cond>, mp=0x558a52db97c0 <recalc_pool_mutex>, file=0x558a5128ad00 "/data/Server/10.6F/storage/innobase/dict/dict0stats_bg.cc", line=230) at /data/Server/10.6F/mysys/thr_mutex.c:492
#13 0x0000558a4ffb8690 in dict_stats_recalc_pool_del (id=83, have_mdl_exclusive=false) at /data/Server/10.6F/storage/innobase/dict/dict0stats_bg.cc:230

The memory-freeing Thread 18 was executing the following:

#6  0x0000558a4ffbe7ed in std::vector<recalc, ut_allocator<recalc, true> >::_M_realloc_insert<recalc> (this=0x558a52db9900 <recalc_pool>, __position=
  {id = 3429799609, state = 928432}) at /usr/include/c++/9/bits/vector.tcc:500
#7  0x0000558a4ffbd3b7 in std::vector<recalc, ut_allocator<recalc, true> >::emplace_back<recalc> (this=0x558a52db9900 <recalc_pool>)
    at /usr/include/c++/9/bits/vector.tcc:121
#8  0x0000558a4ffb7c66 in dict_stats_recalc_pool_add (id=74)
    at /data/Server/10.6F/storage/innobase/dict/dict0stats_bg.cc:116

Note: They are operating on different keys. Without looking at details, it would seem that the memory in question is internal to the std::vector container. The _M_realloc_insert() suggests that the original backing store would be freed.

I think that the function dict_stats_recalc_pool_del() must reposition the iterator like this (untested patch):

diff --git a/storage/innobase/dict/dict0stats_bg.cc b/storage/innobase/dict/dict0stats_bg.cc
index 676656122a3..645f5818f57 100644
--- a/storage/innobase/dict/dict0stats_bg.cc
+++ b/storage/innobase/dict/dict0stats_bg.cc
@@ -216,7 +216,7 @@ void dict_stats_recalc_pool_del(table_id_t id, bool have_mdl_exclusive)
 
   mysql_mutex_lock(&recalc_pool_mutex);
 
-  const auto end= recalc_pool.end();
+  auto end= recalc_pool.end();
   auto i= std::find_if(recalc_pool.begin(), end,
                        [&](const recalc &r){return r.id == id;});
   if (i != end)
@@ -227,7 +227,14 @@ void dict_stats_recalc_pool_del(table_id_t id, bool have_mdl_exclusive)
       {
         i->state= recalc::IN_PROGRESS_DELETING;
         do
+        {
           my_cond_wait(&recalc_pool_cond, &recalc_pool_mutex.m_mutex);
+          end= recalc_pool.end();
+          i= std::find_if(recalc_pool.begin(), end,
+                          [&](const recalc &r){return r.id == id;});
+          if (i == end)
+            goto done;
+        }
         while (i->state == recalc::IN_PROGRESS_DELETING);
       }
       /* fall through */
@@ -241,6 +248,7 @@ void dict_stats_recalc_pool_del(table_id_t id, bool have_mdl_exclusive)
     }
   }
 
+done:
   mysql_mutex_unlock(&recalc_pool_mutex);
 }
 

Generated at Thu Feb 08 09:50:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.