[MDEV-29518] ERROR: AddressSanitizer: heap-use-after-free around fil_space_t::name() storage/innobase/fil/fil0fil.cc:3182 Created: 2022-09-12  Updated: 2022-11-08  Resolved: 2022-11-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.10
Fix Version/s: 10.10.2, 10.11.1, 10.6.12, 10.7.8, 10.8.7, 10.9.5

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: rr-profile-analyzed

Issue Links:
Relates
relates to MDEV-22343 Remove SYS_TABLESPACES and SYS_DATAFILES Closed

 Description   

origin/bb-10.6-MDEV-29479 b154f0072e619f13c6b311bf5af14736b5cc0a88 2022-09-12T17:09:06+05:30
Per Thiru the bug is not related to his modifications.
 
[rr 1555129 298326][rr 1555129 298330]==1555129==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030001cc930 at pc 0x7f6ccab1fcbd bp 0x7f6caa7a6000 sp 0x7f6caa7a57a8
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 298333][rr 1555129 298335]READ of size 2 at 0x6030001cc930 thread T12
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 298659]2022-09-12  5:14:59 17 [Note] InnoDB: Sync to disk of `test`.`DD` started.
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 298663]2022-09-12  5:14:59 17 [Note] InnoDB: Stopping purge
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 298707]2022-09-12  5:15:00 17 [Note] InnoDB: Writing table metadata to './test/DD.cfg'
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 298715]2022-09-12  5:15:00 17 [Note] InnoDB: Table `test`.`DD` flushed to disk
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 299452]2022-09-12  5:15:00 17 [Note] InnoDB: Deleting the meta-data file './test/DD.cfg'
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 299456]2022-09-12  5:15:00 17 [Note] InnoDB: Resuming purge
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304482]    #0 0x7f6ccab1fcbc  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0x74cbc)
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304528]    #1 0x55b90d7dfb70 in fil_space_t::name() const /data/Server/bb-10.6-MDEV-29479/storage/innobase/fil/fil0fil.cc:3182
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304548]    #2 0x55b90d22a102 in i_s_sys_tablespaces_fill /data/Server/bb-10.6-MDEV-29479/storage/innobase/handler/i_s.cc:6433
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304554]    #3 0x55b90d22adb8 in i_s_sys_tablespaces_fill_table /data/Server/bb-10.6-MDEV-29479/storage/innobase/handler/i_s.cc:6494
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304575]    #4 0x55b90c2071c8 in get_schema_tables_result(JOIN*, enum_schema_table_state) /data/Server/bb-10.6-MDEV-29479/sql/sql_show.cc:8877
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304620]    #5 0x55b90c0e6ba9 in JOIN::exec_inner() /data/Server/bb-10.6-MDEV-29479/sql/sql_select.cc:4741
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304622]    #6 0x55b90c0e4b89 in JOIN::exec() /data/Server/bb-10.6-MDEV-29479/sql/sql_select.cc:4562
# 2022-09-12T05:23:09 [1551698] | [rr 1555129 304630]    #7 0x55b90c0e878e in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /data/Server/bb-10.6-MDEV-29479/sql/sql_select.cc:5041
[rr 1555129 304632]    #8 0x55b90c0bd182 in handle_select(THD*, LEX*, select_result*, unsigned long) /data/Server/bb-10.6-MDEV-29479/sql/sql_select.cc:554
[rr 1555129 304671]    #9 0x55b90c037264 in execute_sqlcom_select /data/Server/bb-10.6-MDEV-29479/sql/sql_parse.cc:6256
[rr 1555129 304673]    #10 0x55b90c02605d in mysql_execute_command(THD*, bool) /data/Server/bb-10.6-MDEV-29479/sql/sql_parse.cc:3946
 
Query (0x62b000118238): SELECT * FROM information_schema.innodb_sys_tablespaces
[rr 1555129 321754]Status: KILL_TIMEOUT
 
sdp:/data/results/1662983009/TBR-1606$ _RR_TRACE_DIR="./1/rr" rr replay --mark-stdio
 
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental 870d6d8aa40c27a3b53e445e8fc86aae51cba19c 2022-08-30T20:57:51+02:00
# rqg.pl  : Version 4.0.5 (2022-04)
#
# $RQG_HOME/rqg.pl \
# --views \
# --grammar=conf/mariadb/partitions_innodb.yy \
# --redefine=conf/mariadb/alter_table.yy \
# --redefine=conf/mariadb/instant_add.yy \
# --redefine=conf/mariadb/modules/alter_table_columns.yy \
# --redefine=conf/mariadb/bulk_insert.yy \
# --redefine=conf/mariadb/modules/foreign_keys.yy \
# --redefine=conf/mariadb/modules/locks.yy \
# --redefine=conf/mariadb/modules/sql_mode.yy \
# --redefine=conf/mariadb/versioning.yy \
# --redefine=conf/mariadb/sequences.yy \
# --redefine=conf/mariadb/modules/locks-10.4-extra.yy \
# --redefine=redefine_MDEV-29479.yy \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb_file_per_table=1 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=off \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
# --mysqld=--loose-max-statement-time=30 \
# --threads=2 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--loose_innodb_change_buffering=all \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --vardir_type=fast \
# --mysqld=--innodb_page_size=8K \
# --mysqld=--innodb-buffer-pool-size=8M \
# <local settings>



 Comments   
Comment by Matthias Leich [ 2022-09-12 ]

The bad effect is quite good reproducible and a good candidate for test simplification.

Comment by Marko Mäkelä [ 2022-10-28 ]

Thank you. I wonder if we could avoid the dynamic memory allocation and copying by protecting both this access and fil_space_t::rename() with fil_space_t::latch.

The bug scenario seems to be present in 10.5 and possibly older versions as well. In the 10.5 fil_rename_tablespace(), I think that we would want to replace the second acquisition of fil_system.mutex with an acquisition of an exclusive fil_space_t::latch. That acquisition should be done before acquiring log_sys.mutex.

Can you please try to create a DEBUG_SYNC test for reproducing this? (That test case is unlikely to work after the fix, but I think we need to reproduce the problem.)

I see that MDEV-12266 removed fil_system_t::named_hash already in 10.3, so the logic for renaming should be similar for all versions between 10.3 and 10.5, unless MDEV-23855 or related changes in 10.5 radically changed something.

Comment by Marko Mäkelä [ 2022-10-28 ]

The logic for producing the contents of information_schema.innodb_sys_tablespaces was rewritten in MDEV-22343 in 10.6.0. It is possible that this bug does not affect earlier versions.

Comment by Thirunarayanan Balathandayuthapani [ 2022-10-28 ]

Following patch and test case repeats the issue in asan build:

diff --git a/storage/innobase/handler/i_s.cc b/storage/innobase/handler/i_s.cc
index 82b8968876f..e362dfbe142 100644
--- a/storage/innobase/handler/i_s.cc
+++ b/storage/innobase/handler/i_s.cc
@@ -6433,6 +6433,10 @@ static int i_s_sys_tablespaces_fill(THD *thd, const fil_space_t &s, TABLE *t)
     const auto name= s.name();
     if (name.data())
     {
+#ifdef UNIV_DEBUG
+      if (!strncmp(name.data(), "test/t1", name.size()))
+        DEBUG_SYNC(thd, "i_s_query_tablespace_name");
+#endif /* UNIV_DEBUG */
       OK(f->store(name.data(), name.size(), system_charset_info));
       f->set_notnull();
     }

Test case:

--source include/have_innodb.inc
CREATE TABLE t1(f1 INT NOT NULL)ENGINE=InnoDB;
SET DEBUG_SYNC="i_s_query_tablespace_name SIGNAL con1_signal WAIT_FOR default_signal";
send SELECT name FROM INFORMATION_SCHEMA.INNODB_SYS_TABLESPACES WHERE
name="test/t1";
 
connect(con1,localhost,root,,,);
SET DEBUG_SYNC="now WAIT_FOR con1_signal";
RENAME TABLE t1 to t2;
SET DEBUG_SYNC="now SIGNAL default_signal";
 
connection default;
reap;
DROP TABLE t2;

Patch to fix this issue:

diff --git a/storage/innobase/dict/dict0dict.cc b/storage/innobase/dict/dict0dict.cc
index b74cd043439..43ccf8c4d09 100644
--- a/storage/innobase/dict/dict0dict.cc
+++ b/storage/innobase/dict/dict0dict.cc
@@ -1488,6 +1488,7 @@ dict_table_t::rename_tablespace(span<const char> new_name, bool replace) const
     err= DB_TABLESPACE_EXISTS;
   else
   {
+    space->x_lock();
     err= space->rename(path, true, replace);
     if (data_dir)
     {
@@ -1495,6 +1496,7 @@ dict_table_t::rename_tablespace(span<const char> new_name, bool replace) const
         new_name= {name.m_name, strlen(name.m_name)};
       RemoteDatafile::delete_link_file(new_name);
     }
+    space->x_unlock();
   }
 
   ut_free(path);
diff --git a/storage/innobase/handler/i_s.cc b/storage/innobase/handler/i_s.cc
index 82b8968876f..e362dfbe142 100644
--- a/storage/innobase/handler/i_s.cc
+++ b/storage/innobase/handler/i_s.cc
@@ -6491,7 +6495,9 @@ static int i_s_sys_tablespaces_fill_table(THD *thd, TABLE_LIST *tables, Item*)
     {
       space.reacquire();
       mysql_mutex_unlock(&fil_system.mutex);
+      space.s_lock();
       err= i_s_sys_tablespaces_fill(thd, space, tables->table);
+      space.s_unlock();
       mysql_mutex_lock(&fil_system.mutex);
       space.release();
       if (err)

Comment by Thirunarayanan Balathandayuthapani [ 2022-10-31 ]

In older version, IS query fetches the tablespace name, flags, id from INNODB.SYSTABLESPACES table under dict_sys.mutex.
So the problem won't happen from 10.3-10.5

Comment by Marko Mäkelä [ 2022-10-31 ]

The fix looks OK to me. Obviously, the test case would hang if the fix is applied, so we will be unable to add a regression test for this.

Generated at Thu Feb 08 10:09:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.