[MDEV-27451] gcol.virtual_index_drop fails with LeakSanitizer errors Created: 2022-01-09  Updated: 2022-02-02  Resolved: 2022-01-24

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests, Virtual Columns
Affects Version/s: 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4.23, 10.5.14, 10.6.6, 10.7.2, 10.8.1

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-19606 Replace std::list with std::forward_list Closed
relates to MDEV-24971 SEGV in in row_create_prebuilt Closed

 Description   

10.7 ce663ad4

gcol.virtual_index_drop 'innodb'         [ pass ]    110
***Warnings generated in error logs during shutdown after running tests: gcol.virtual_index_drop
 
==3377123==ERROR: LeakSanitizer: detected memory leaks
SUMMARY: AddressSanitizer: 48 byte(s) leaked in 1 allocation(s).
Attempting backtrace. You can use the following information to find out
 
==3376918==ERROR: LeakSanitizer: detected memory leaks
 
Direct leak of 48 byte(s) in 1 object(s) allocated from:
    #0 0x8c74ed in malloc (/mnt8t/bld/10.7-asan/bin/mariadbd+0x8c74ed)
    #1 0x28e80fc in ut_allocator<std::_Fwd_list_node<dict_v_idx_t>, true>::allocate(unsigned long, std::_Fwd_list_node<dict_v_idx_t> const*, unsigned int, bool, bool) /data/src/10.7/storage/innobase/include/ut0new.h:375:11
    #2 0x28e7d4e in ut_allocator<std::_Fwd_list_node<dict_v_idx_t>, true>::allocate(unsigned long) /data/src/10.7/storage/innobase/include/ut0new.h:321:41
    #3 0x28e7cfc in std::allocator_traits<ut_allocator<std::_Fwd_list_node<dict_v_idx_t>, true> >::allocate(ut_allocator<std::_Fwd_list_node<dict_v_idx_t>, true>&, unsigned long) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/alloc_traits.h:314:20
    #4 0x28e7c41 in std::_Fwd_list_base<dict_v_idx_t, ut_allocator<dict_v_idx_t, true> >::_M_get_node() /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/forward_list.h:352:15
    #5 0x28e95bf in std::_Fwd_list_node<dict_v_idx_t>* std::_Fwd_list_base<dict_v_idx_t, ut_allocator<dict_v_idx_t, true> >::_M_create_node<dict_v_idx_t>(dict_v_idx_t&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/forward_list.h:360:26
    #6 0x28e92f4 in std::_Fwd_list_node_base* std::_Fwd_list_base<dict_v_idx_t, ut_allocator<dict_v_idx_t, true> >::_M_insert_after<dict_v_idx_t>(std::_Fwd_list_const_iterator<dict_v_idx_t>, dict_v_idx_t&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/forward_list.tcc:55:19
    #7 0x28e6aed in std::forward_list<dict_v_idx_t, ut_allocator<dict_v_idx_t, true> >::push_front(dict_v_idx_t&&) /usr/bin/../lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/forward_list.h:862:15
    #8 0x301e462 in dict_index_add_col(dict_index_t*, dict_table_t const*, dict_col_t*, unsigned long) /data/src/10.7/storage/innobase/dict/dict0dict.cc:2272:20
    #9 0x3032e95 in dict_index_copy(dict_index_t*, dict_index_t const*, unsigned long, unsigned long) /data/src/10.7/storage/innobase/dict/dict0dict.cc:2332:3
    #10 0x301d644 in dict_index_build_internal_non_clust(dict_index_t*) /data/src/10.7/storage/innobase/dict/dict0dict.cc:2619:2
    #11 0x301794c in dict_index_add_to_cache(dict_index_t*&, unsigned long, dict_add_v_col_t const*) /data/src/10.7/storage/innobase/dict/dict0dict.cc:1987:6
    #12 0x2ff39ef in dict_create_index_step(que_thr_t*) /data/src/10.7/storage/innobase/dict/dict0crea.cc:1225:9
    #13 0x2b411ee in que_thr_step(que_thr_t*) /data/src/10.7/storage/innobase/que/que0que.cc:663:9
    #14 0x2b3fbfa in que_run_threads_low(que_thr_t*) /data/src/10.7/storage/innobase/que/que0que.cc:709:25
    #15 0x2b3f79b in que_run_threads(que_thr_t*) /data/src/10.7/storage/innobase/que/que0que.cc:729:2
    #16 0x288845e in create_index_dict(trx_t*, dict_index_t*, fil_encryption_t, unsigned int, dict_add_v_col_t const*) /data/src/10.7/storage/innobase/handler/handler0alter.cc:6080:2
    #17 0x286334d in prepare_inplace_alter_table_dict(Alter_inplace_info*, TABLE const*, TABLE const*, char const*, unsigned long, unsigned long, unsigned long, bool, bool) /data/src/10.7/storage/innobase/handler/handler0alter.cc:6942:12
    #18 0x2849cc8 in ha_innobase::prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/src/10.7/storage/innobase/handler/handler0alter.cc:8276:7
    #19 0x19f0733 in handler::ha_prepare_inplace_alter_table(TABLE*, Alter_inplace_info*) /data/src/10.7/sql/handler.cc:5171:10
    #20 0x11f8dce in mysql_inplace_alter_table(THD*, TABLE_LIST*, TABLE*, TABLE*, Alter_inplace_info*, MDL_request*, st_ddl_log_state*, TRIGGER_RENAME_PARAM*, Alter_table_ctx*) /data/src/10.7/sql/sql_table.cc:7446:20
    #21 0x11e3bdd in mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool) /data/src/10.7/sql/sql_table.cc:10342:16
    #22 0x145da1f in Sql_cmd_alter_table::execute(THD*) /data/src/10.7/sql/sql_alter.cc:544:11
    #23 0xe46743 in mysql_execute_command(THD*, bool) /data/src/10.7/sql/sql_parse.cc:5988:26
    #24 0xe17a62 in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/src/10.7/sql/sql_parse.cc:8027:18
    #25 0xe107e4 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/src/10.7/sql/sql_parse.cc:1894:7
    #26 0xe1a749 in do_command(THD*, bool) /data/src/10.7/sql/sql_parse.cc:1402:17
    #27 0x1436f99 in do_handle_one_connection(CONNECT*, bool) /data/src/10.7/sql/sql_connect.cc:1418:11
    #28 0x14365e3 in handle_one_connection /data/src/10.7/sql/sql_connect.cc:1312:5
    #29 0x24e9777 in pfs_spawn_thread /data/src/10.7/storage/perfschema/pfs.cc:2201:3
 
SUMMARY: AddressSanitizer: 48 byte(s) leaked in 1 allocation(s).
220109 18:16:36 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.7.2-MariaDB-debug-log
read_buffer_size=131072
max_used_connections=2
thread_count=0
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x100000
??:0(__interceptor_backtrace)[0x88457b]
mysys/stacktrace.c:212(my_print_stacktrace)[0x351ff49]
sql/signal_handler.cc:226(handle_fatal_signal)[0x19b5df9]
sigaction.c:0(__restore_rt)[0x7fae82c10140]
linux/raise.c:51(__GI_raise)[0x7fae82878ce1]
stdlib/abort.c:81(__GI_abort)[0x7fae82862537]
:0(__sanitizer::Abort())[0x8e51e7]
:0(__sanitizer::Die())[0x8e3b61]
:0(__lsan::HandleLeaks())[0x8f0cf8]
:0(__lsan::DoLeakCheck())[0x8ee461]
stdlib/exit.c:109(__run_exit_handlers)[0x7fae8287b4d7]
:0(__GI_exit)[0x7fae8287b67a]
addr2line: DWARF error: could not find variable specification at offset 1ed1d
mysqld.cc:0(mysqld_exit(int))[0x8fd2aa]
sql/mysqld.cc:5905(mysqld_main(int, char**))[0x904c55]
sql/main.cc:34(main)[0x8f9a92]
csu/libc-start.c:308(__libc_start_main)[0x7fae82863d0a]
??:0(_start)[0x84d47a]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /dev/shm/var_auto_Qa30/mysqld.1/data
Resource Limits:
Fatal signal 11 while backtracing

The test fails since 10.4.19 when it was added.
It may not fail in buildbot and other multi-test runs because some next tests in the chain may hide the failure. For example, in 10.4 buildbot innodb.alter_copy does the trick (by killing the server):

worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
gcol.virtual_index_drop 'innodb'         [ pass ]     70
innodb.alter_copy 'innodb'               [ pass ]  14181
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 14.251 of 17 seconds executing testcases
 
Completed: All 2 tests were successful.

There are likely other tests which do the same.



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2022-01-18 ]

This problem should exist in 10.2. InnoDB fails to drop the aborted, corrupted
index while dropping the table. dict_table_close() callers has try_drop parameter
as false.

Comment by Marko Mäkelä [ 2022-01-24 ]

I noticed this on 10.8 on buildbot, so I started narrowed it down. This reproduced it for me:

./mtr --no-reorder gcol.virtual_index_drop innodb.innodb-corrupted-table

All that the second test does is that it shuts down the server. In fact, the leak can be reproduced by running gcol.virtual_index_drop alone.
The critical part of the test is the following:

--source include/have_innodb.inc
--source include/have_debug.inc
 
CREATE TABLE t1(f1 INT, f2 INT AS (f1) VIRTUAL)ENGINE=InnoDB;
INSERT INTO t1(f1) VALUES(1);
SET DEBUG_DBUG="+d,create_index_fail";
SET DEBUG_SYNC="innodb_inplace_alter_table_enter SIGNAL con1_go WAIT_FOR alter_signal";
SEND ALTER TABLE t1 ADD COLUMN f3 INT AS (f1) VIRTUAL, ADD INDEX(f2, f3);
connect(con1,localhost,root,,,);
SET DEBUG_SYNC="now WAIT_FOR con1_go";
BEGIN;
SELECT * FROM t1;
SET DEBUG_SYNC="now SIGNAL alter_signal";
connection default;
--error ER_DUP_ENTRY
reap;
connection con1;
rollback;
connection default;
SHOW CREATE TABLE t1;
DROP TABLE t1;

This section will trigger the memory leak. But, I got a leak reported only starting with 10.4, not with 10.2 or 10.3.

Comment by Marko Mäkelä [ 2022-01-24 ]

The reason why this leak is only present starting with 10.4 is that in MDEV-19606, we took advantage of the C+11 std::forward_list. In the C+98 Standard Template Library, only a doubly-linked list was available; there was a nonstandard extension std::slist that we did not use.

MDEV-19606 itself probably is innocent. The memory should have been freed just fine by dict_v_col_t::detach(const dict_index_t&), which was consulting dict_v_col_t::n_v_indexes.

I currently suspect that this memory leak was introduced by MDEV-23672, which removed dict_v_col_t::n_v_indexes.

Comment by Marko Mäkelä [ 2022-01-24 ]

It turns out that MDEV-23672 is innocent: I get no memory leak with that or its parent commit. The oldest problematic commit that causes trouble with that test is a merge that contains the 10.2 commit cf2c6b7f8db9413e7c5e31e87bb5c799253d8d2e that aimed to fix MDEV-24971. The above test case would start to crash with that 10.2 commit:

10.2 cf2c6b7f8db9413e7c5e31e87bb5c799253d8d2e

mysqld: /mariadb/10.2/storage/innobase/include/dict0mem.h:692: dict_v_col_t* dict_add_v_col_info::add_drop_v_col(mem_heap_t*, dict_v_col_t*, ulint): Assertion `offset < n_v_col' failed.

That problem was already fixed before the merge; we only see the leak:

10.4 5008171b05e0d3b8b5f4af312b94a312281e77c7

gcol.virtual_index_drop 'innodb'         [ pass ]    234
***Warnings generated in error logs during shutdown after running tests: gcol.virtual_index_drop
 
==578316==ERROR: LeakSanitizer: detected memory leaks

While the test was introduced in the merge, the test snippet that I posted in my previous comment works just fine (without memory leaks) before the merge.

Comment by Marko Mäkelä [ 2022-01-24 ]

The following change to the rollback of native ALTER TABLE fixes the memory leak, and I do not think that it could break anything else. All mtr tests passed for the ASAN build.

diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index 41871cedfe2..e812bfc4928 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -8914,6 +8914,14 @@ rollback_inplace_alter_table(
 
 			row_mysql_unlock_data_dictionary(prebuilt->trx);
 		}
+
+		if (ctx->add_vcol) {
+			for (ulint i = 0; i < ctx->num_to_add_vcol; i++) {
+				ctx->add_vcol[i].~dict_v_col_t();
+			}
+			ctx->num_to_add_vcol = 0;
+			ctx->add_vcol = nullptr;
+		}
 	}
 
 	/* Reset dict_col_t::ord_part for those columns fail to be indexed,

In MDEV-19606, the pointer indirection for dict_v_col_t::v_indexes was removed. That is, dict_v_col_t would directly encapsulate the object. That in turn means that the v_indexes must be freed by dict_v_col_t::~dict_v_col_t().

I did not debug why 10.2 or 10.3 do not report a memory leak, even though we are not invoking UT_DELETE(v_indexes) in the corresponding code. But, because those versions are soon end-of-life, and the code there is different, and no leak was reproduced for them, I would not touch those versions.

Generated at Thu Feb 08 09:53:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.