|
Interestingly, running this same testcase again, I ran into MDEV-24911.
|
10.6.0 03ff588d153f22f00ff00923e82498cbac63505f (Debug)
|
Core was generated by `/test/MD060321-mariadb-10.6.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core-'.
|
Program terminated with signal SIGABRT, Aborted.
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6)
|
at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
|
[Current thread is 1 (Thread 0x14e0cffff700 (LWP 2605024))]
|
(gdb) bt
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
|
#1 0x0000564caccd43f3 in my_write_core (sig=sig@entry=6) at /test/10.6_dbg/mysys/stacktrace.c:424
|
#2 0x0000564cac479195 in handle_fatal_signal (sig=6) at /test/10.6_dbg/sql/signal_handler.cc:331
|
#3 <signal handler called>
|
#4 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
|
#5 0x000014e1697f0859 in __GI_abort () at abort.c:79
|
#6 0x0000564cacabb6e7 in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at /test/10.6_dbg/storage/innobase/ut/ut0ut.cc:576
|
#7 0x0000564caca66ff2 in srv_monitor_task () at /test/10.6_dbg/storage/innobase/include/ut0ut.h:412
|
#8 0x0000564cacc620f8 in tpool::thread_pool_generic::timer_generic::run (this=0x564caff40630) at /test/10.6_dbg/tpool/tpool_generic.cc:309
|
#9 tpool::thread_pool_generic::timer_generic::execute (arg=0x564caff40630) at /test/10.6_dbg/tpool/tpool_generic.cc:329
|
#10 0x0000564cacc63063 in tpool::task::execute (this=0x564caff40670) at /test/10.6_dbg/tpool/task.cc:52
|
#11 0x0000564cacc61c13 in tpool::thread_pool_generic::worker_main (this=0x564cafb99870, thread_var=0x564cafba8d90) at /test/10.6_dbg/tpool/tpool_generic.cc:546
|
#12 0x0000564cacc61f4a in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__t=<optimized out>, __f=<optimized out>) at /usr/include/c++/9/bits/invoke.h:89
|
#13 std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=<optimized out>) at /usr/include/c++/9/bits/invoke.h:95
|
#14 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=<optimized out>) at /usr/include/c++/9/thread:244
|
#15 std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=<optimized out>) at /usr/include/c++/9/thread:251
|
#16 std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=<optimized out>) at /usr/include/c++/9/thread:195
|
#17 0x000014e169be2d84 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
|
#18 0x000014e169cfe609 in start_thread (arg=<optimized out>) at pthread_create.c:477
|
#19 0x000014e1698ed293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
|
|
Another similar testcase:
SET GLOBAL innodb_limit_optimistic_insert_debug=7;
|
SET GLOBAL innodb_change_buffering_debug=1;
|
INSERT INTO t1 VALUES(NULL,1), (1,NULL), (NULL,NULL), (1,1), (2,2);
|
CREATE TABLE t1 (i INT, a CHAR(200), b TEXT, UNIQUE (a), UNIQUE (b (300))) charSET utf8 ROW_FORMAT=DYNAMIC ENGINE=InnoDB;
|
ALTER TABLE t1 CHANGE a id INT;
|
SET @utf8_message=CONVERT ('testcase 3.5.8.5 (IF):' using utf8);
|
Not sure if that last query is necessary. If you replay this in the CLI, the ALTER will hang;
|
10.6.0 03ff588d153f22f00ff00923e82498cbac63505f (Debug)
|
10.6.0>show processlist;
|
+----+------+-----------+------+---------+------+---------------------+--------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+------+-----------+------+---------+------+---------------------+--------------------------------+----------+
|
| 4 | root | localhost | test | Query | 302 | Rename result table | ALTER TABLE t1 CHANGE a id INT | 0.000 |
|
| 5 | root | localhost | test | Query | 0 | starting | show processlist | 0.000 |
|
+----+------+-----------+------+---------+------+---------------------+--------------------------------+----------+
|
2 rows in set (0.000 sec)
|
A mysqladmin shutdown attempt will also hang. It can also crash with the same uniqueID as the one above.
|
|
Roel, did you try this with cmake -DWITH_ASAN=ON?
Also, does this go away if you use cmake -DWITH_DBUG_TRACE=OFF?
|
|
Findings:
1. Tested with TSAN. Ref attached error log mdev-25072_tsan.err. Procedure used: start with TSAN (many errors) > write "START" to the error log > execute SQL of last testcase above > write "STOP" to the error log > shutdown server. Thus, please reference in between START and STOP in the log. Quite a few issues there, almost 8K lines generated by TSAN.
2. Tested with ASAN as requested. Same procedure, same testcase. Ref attached error log mdev-25072_asan.err. No relevant errors seen. Note that the testlog has a prior startup/shutdown with an unrelated ASAN InnoDB error.
3. Tested with -DWITH_DBUG_TRACE=OFF as requested. Used 10.6 trunk build in debug (with that option off). Same testcase. Found that server hangs the same way.
|
10.6.0 2e31b2ffe9b443b029e51961ef6b0617df380388 (Debug)
|
10.6.0>show processlist;
|
+----+------+-----------+------+---------+------+---------------------+--------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----+------+-----------+------+---------+------+---------------------+--------------------------------+----------+
|
| 4 | root | localhost | test | Query | 384 | Rename result table | ALTER TABLE t1 CHANGE a id INT | 0.000 |
|
| 6 | root | localhost | test | Query | 0 | starting | show processlist | 0.000 |
|
+----+------+-----------+------+---------+------+---------------------+--------------------------------+----------+
|
2 rows in set (0.000 sec)
|
Btw, while it is hanging, this is the CLI output/screen at that point in time (and persisting):
10.6.0>ALTER TABLE t1 CHANGE a id INT;
|
Stage: 2 of 2 'Enabling keys' 0% of stage done
|
|
|
Roel, mdev-25072_asan.err does not look like an ASAN trace. It looks like UBSAN:
/data/builds/10.6_dbg_san/storage/innobase/srv/srv0start.cc:2069:31: runtime error: member call on null pointer of type 'struct fil_space_t'
|
That error is something that should be reported and fixed separately.
|
|
Yes, trace is an ASAN + UBSAN trace combined. Will log new ticket for the other issue. Other issue logged as MDEV-25239.
|
|
Also tested full run (i.e. wait till it crashes), but regrettably no further output is generated. The build just crashes without coredump, error log, ...
Discussed with Marko. He will have a look at the hang first, then we shall do additional testing to better define the possible heap corruption.
|
|
Further to the previous comment, which was testing the last testcase above, I also tested the same with the original testcase, which notably hangs on a different query, and this yielded an ASAN trace, though unresolved. Oddly, the trace also did not go to the error log but to the console.
=================================================================
|
==3676958==ERROR: LeakSanitizer: detected memory leaks
|
|
Direct leak of 13312 byte(s) in 104 object(s) allocated from:
|
#0 0x55b5fbf50f46 in __interceptor_calloc (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x4b4f46)
|
#1 0x55b5fbfcbefa in mysql_init (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x52fefa)
|
#2 0x55b5fbf9c4fd (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5004fd)
|
#3 0x55b5fbf9cc34 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500c34)
|
#4 0x55b5fbf9cef7 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500ef7)
|
#5 0x55b5fbfa676f (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x50a76f)
|
#6 0x55b5fbfaf0c8 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5130c8)
|
#7 0x55b5fbfb1e83 in main (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x515e83)
|
#8 0x14ed087d50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
|
|
Direct leak of 1920 byte(s) in 15 object(s) allocated from:
|
#0 0x55b5fbf50f46 in __interceptor_calloc (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x4b4f46)
|
#1 0x55b5fbfcbefa in mysql_init (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x52fefa)
|
#2 0x55b5fbf9c4fd (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5004fd)
|
#3 0x55b5fbf9cc34 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500c34)
|
#4 0x55b5fbf9cef7 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500ef7)
|
#5 0x55b5fbf9d1ec (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5011ec)
|
#6 0x55b5fbfaf011 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x513011)
|
#7 0x55b5fbfb1e83 in main (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x515e83)
|
#8 0x14ed087d50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
|
|
Direct leak of 832 byte(s) in 104 object(s) allocated from:
|
#0 0x55b5fbf50f46 in __interceptor_calloc (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x4b4f46)
|
#1 0x55b5fbfcbea0 in mysql_init (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x52fea0)
|
#2 0x55b5fbf9c4fd (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5004fd)
|
#3 0x55b5fbf9cc34 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500c34)
|
#4 0x55b5fbf9cef7 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500ef7)
|
#5 0x55b5fbfa676f (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x50a76f)
|
#6 0x55b5fbfaf0c8 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5130c8)
|
#7 0x55b5fbfb1e83 in main (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x515e83)
|
#8 0x14ed087d50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
|
|
Direct leak of 128 byte(s) in 1 object(s) allocated from:
|
#0 0x55b5fbf50f46 in __interceptor_calloc (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x4b4f46)
|
#1 0x55b5fbfcbefa in mysql_init (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x52fefa)
|
#2 0x55b5fbf9c4fd (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5004fd)
|
#3 0x55b5fbf9cc34 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500c34)
|
#4 0x55b5fbf9cef7 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500ef7)
|
#5 0x55b5fbf9cf9e in mysql_real_query_for_lazy(char const*, unsigned long) (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500f9e)
|
#6 0x55b5fbfa6697 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x50a697)
|
#7 0x55b5fbfaf0c8 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5130c8)
|
#8 0x55b5fbfb1e83 in main (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x515e83)
|
#9 0x14ed087d50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
|
|
Direct leak of 120 byte(s) in 15 object(s) allocated from:
|
#0 0x55b5fbf50f46 in __interceptor_calloc (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x4b4f46)
|
#1 0x55b5fbfcbea0 in mysql_init (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x52fea0)
|
#2 0x55b5fbf9c4fd (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5004fd)
|
#3 0x55b5fbf9cc34 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500c34)
|
#4 0x55b5fbf9cef7 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500ef7)
|
#5 0x55b5fbf9d1ec (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5011ec)
|
#6 0x55b5fbfaf011 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x513011)
|
#7 0x55b5fbfb1e83 in main (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x515e83)
|
#8 0x14ed087d50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
|
|
Direct leak of 8 byte(s) in 1 object(s) allocated from:
|
#0 0x55b5fbf50f46 in __interceptor_calloc (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x4b4f46)
|
#1 0x55b5fbfcbea0 in mysql_init (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x52fea0)
|
#2 0x55b5fbf9c4fd (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5004fd)
|
#3 0x55b5fbf9cc34 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500c34)
|
#4 0x55b5fbf9cef7 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500ef7)
|
#5 0x55b5fbf9cf9e in mysql_real_query_for_lazy(char const*, unsigned long) (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x500f9e)
|
#6 0x55b5fbfa6697 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x50a697)
|
#7 0x55b5fbfaf0c8 (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x5130c8)
|
#8 0x55b5fbfb1e83 in main (/test/UBASAN_MD260121-mariadb-10.6.0-linux-x86_64-dbg/bin/mariadb+0x515e83)
|
#9 0x14ed087d50b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
|
|
SUMMARY: AddressSanitizer: 16320 byte(s) leaked in 240 allocation(s).
|
./cl: line 4: 3676958 Aborted (core dumped)
|
No idea why the stacks are partly unresolved.
|
|
Further checking with nm, gdb and objdump show all functions, but NOT any in the 500*** range which may indicate memory corruption (with thanks valerii) and likely caused the unresolved stacks above. Given that all frames between main and mysql_init are mangled, it is possible that the memory corruption is rather significant/large, which may correspond with the long hang duration before crash (memory being overwritten in bulk).
|
|
I tried to repeat with ./mtr and the following test:
--source include/have_innodb.inc
|
--source include/have_partition.inc
|
let $n=10000;
|
while ($n) {
|
DROP DATABASE test;
|
CREATE DATABASE test;
|
USE test;
|
SET GLOBAL innodb_limit_optimistic_insert_debug=7;
|
CREATE TABLE t1 (c1 INT) PARTITION BY HASH (c1) PARTITIONS 15;
|
SET GLOBAL innodb_change_buffering_debug=1;
|
DROP TABLE t1;
|
--error 1630
|
SELECT SUBSTRING ('00', 1, 1);
|
dec $n;
|
}
|
But, the test would actually pass, in 21867ms of execution time.
I was using the following build options:
cmake -DCMAKE_{C_COMPILER=clang,CXX_COMPILER=clang++}-11 -DCMAKE_C{,XX}_FLAGS='-O2 -march=native -mtune=native -Wno-unused-command-line-argument -fdebug-macro' -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DWITH_DBUG_TRACE=OFF -DCONC_WITH_{UNITTEST,SSL}=OFF -DWITH_EMBEDDED_SERVER=OFF -DWITH_UNIT_TESTS=OFF -DCMAKE_BUILD_TYPE=Debug -DPLUGIN_{ARCHIVE,TOKUDB,MROONGA,OQGRAPH,ROCKSDB,CONNECT,SPIDER}=NO -DWITH_SAFEMALLOC=OFF -DWITH_{ZLIB,SSL}=system ..
|
Part of the posted trace would be skipped due to -DWITH_DBUG_TRACE=OFF. Another attempt with -DWITH_DBUG_TRACE=ON did not hang either, but took longer (25771ms).
Roel, please note that SUBSTRING returned an error for me:
|
10.6 cb545f11169d2425316d96feafc78ac841950e43
|
ERROR 42000: FUNCTION test.SUBSTRING does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual
|
Maybe this function must actually be executed so that it would be able to corrupt some memory? What is your SQL_MODE? Can you create a test case for ./mtr or otherwise self-contained test case?
|
|
Roel, this is not a recent 10.6 at all:
commit 27d66d644cf2ebe9201e0362f2050036cce2908a
|
Author: Jan Lindström <jan.lindstrom@mariadb.com>
|
Date: Tue Nov 3 10:44:26 2020 +0200
|
|
MENT-411 : Implement wsrep_replicate_aria
|
Why are you testing a several months old version of the 10.6 branch? There are 88 commits on top of this.
|
|
marko Hi!
I never test on builds several months old.
The build date (of a trunk pull on that day) is: 26/2/21 (Ref the path name, almost always included on my bugs, which lists the build date of trunk at that point in time: "MD260221-mariadb-10.6.0") which was 11 days old at the day of testing.
Also, the date of commit 27d66d644cf2ebe9201e0362f2050036cce2908a is 25/2/21, not sure where the 3/11/20 comes from.
https://github.com/MariaDB/server/commit/27d66d644cf2ebe9201e0362f2050036cce2908a
Also, the most recent testing (5 days ago), for the hang, was on 2e31b2ffe9b443b029e51961ef6b0617df380388 (a 6 day old build).
|
|
Roel, sorry, I mistook the commit date for the push date. I figured out that a space after the SUBSTRING keyword would apparently cause a lookup of a non-existing user-defined function:
--error ER_FUNC_INEXISTENT_NAME_COLLISION
|
SELECT SUBSTRING ('00', 1, 1);
|
Without a space after the SUBSTRING, the built-in function by that name would be invoked.
I was still unable to reproduce the hang on 10.6 e1a514d565189e2d8b5555512963b8bfdba61a09, compiled with either value of -DWITH_DBUG_TRACE, tested with the space present or absent after the SUBSTRING keyword.
|
|
No worries! I have done a lot of testing on this today, without any progress. Here are my current findings and thoughts;
1) The issue does not reproduce on the rr server, neither with a new version, nor with a known-to-fail version
2) The space in the SUBSTRING may have some effect (generates error or runs procedure), but would likely not have been the determining factor in terms of the bug, as that space was highly likely introduced by a testcase-prettifier script during testcase reduction and the same issue would have existed both before and after the testcase prettification. IOW, it is interesting but likely a unrelated sideroad.
3) I confirmed that on a trunk build from today 30/3 (rev 49ddfb63780c55b4133755c3479e13c7ea9a9d7b) the issue still exists.
|
|
elenst pointed out that I had forgotten the ‘Simon says’ (ENGINE=InnoDB) and was actually testing MyISAM. Based on her advice, I tried a simpler test, but it does not hang for me:
--source include/have_innodb.inc
|
--source include/have_partition.inc
|
--disable_query_log
|
let $n=10000;
|
while ($n) {
|
SET GLOBAL innodb_limit_optimistic_insert_debug=7;
|
CREATE TABLE t1 (c1 INT) ENGINE=InnoDB PARTITION BY HASH (c1) PARTITIONS 15;
|
SET GLOBAL innodb_change_buffering_debug=1;
|
DROP TABLE t1;
|
dec $n;
|
}
|
|
|
With that test, I was able to repeat some hangs. The dict_sys.mutex is held by Thread 7, which is waiting for something in DBUG while trying to acquire buf_pool.mutex (which is perfectly valid to be acquired while holding dict_sys.mutex):
Thread 7 (Thread 0x7fb549f80700 (LWP 2397063)):
|
#0 0x000055cb71729d4e in DbugParse (cs=<optimized out>, cs@entry=0x7fb51c000ce0, control=<optimized out>, control@entry=0x55cb71971a65 "") at /mariadb/bb-10.6-monty/dbug/dbug.c:703
|
#1 0x000055cb71729e28 in _db_push_ (control=0x55cb71971a65 "") at /mariadb/bb-10.6-monty/dbug/dbug.c:865
|
#2 0x000055cb7171086a in safe_mutex_lock (mp=mp@entry=0x55cb7204fcc0 <buf_pool>, my_flags=my_flags@entry=0, file=file@entry=0x55cb71a3d4a8 "/mariadb/bb-10.6-monty/storage/innobase/buf/buf0flu.cc", line=line@entry=1551) at /mariadb/bb-10.6-monty/mysys/thr_mutex.c:236
|
#3 0x000055cb7170b6ab in psi_mutex_lock (that=0x55cb7204fcc0 <buf_pool>, file=0x55cb71a3d4a8 "/mariadb/bb-10.6-monty/storage/innobase/buf/buf0flu.cc", line=1551) at /mariadb/bb-10.6-monty/mysys/my_thr_init.c:487
|
#4 0x000055cb715af08f in inline_mysql_mutex_lock (that=<optimized out>, src_file=<optimized out>, src_line=1551) at /mariadb/bb-10.6-monty/include/mysql/psi/mysql_thread.h:746
|
#5 buf_flush_lists (max_n=max_n@entry=18446744073709551615, lsn=lsn@entry=18446744073709551615) at /mariadb/bb-10.6-monty/storage/innobase/buf/buf0flu.cc:1551
|
#6 0x000055cb715974e2 in buf_page_get_low (page_id={m_id = 500}, zip_size=<optimized out>, zip_size@entry=32, rw_latch=2, rw_latch@entry=0, guess=<optimized out>, guess@entry=0x7fb549f7b690, mode=<optimized out>, mode@entry=1, mtr=<optimized out>, mtr@entry=0x4, err=0x7fb549f7a9ac, allow_ibuf_merge=<optimized out>) at /mariadb/bb-10.6-monty/storage/innobase/buf/buf0buf.cc:2971
|
#7 0x000055cb71598f45 in buf_page_get_gen (page_id=<optimized out>, page_id@entry={m_id = 140416614318493}, zip_size=<optimized out>, zip_size@entry=8589934593, rw_latch=<optimized out>, rw_latch@entry=1240991488, guess=<optimized out>, guess@entry=0x7fb549f7b980, mode=<optimized out>, mode@entry=11, mtr=<optimized out>, err=0x2, allow_ibuf_merge=<optimized out>) at /mariadb/bb-10.6-monty/storage/innobase/buf/buf0buf.cc:3103
|
#8 0x000055cb71556788 in btr_cur_search_to_nth_level_func (index=<optimized out>, index@entry=0x55cb741a2308, level=<optimized out>, level@entry=0, tuple=<optimized out>, tuple@entry=0x7fb51c2f32b8, mode=<optimized out>, mode@entry=PAGE_CUR_LE, latch_mode=<optimized out>, latch_mode@entry=4098, cursor=<optimized out>, cursor@entry=0x7fb549f7b980, ahi_latch=0x0, mtr=0x7fb549f7bfc8, autoinc=0) at /mariadb/bb-10.6-monty/storage/innobase/btr/btr0cur.cc:1610
|
#9 0x000055cb714a259a in btr_pcur_open_low (index=index@entry=0x55cb741a2308, level=0, tuple=tuple@entry=0x7fb51c2f32b8, mode=PAGE_CUR_LE, latch_mode=latch_mode@entry=4098, cursor=cursor@entry=0x7fb549f7b980, autoinc=0, mtr=mtr@entry=0x7fb549f7bfc8) at /mariadb/bb-10.6-monty/storage/innobase/include/btr0pcur.ic:439
|
#10 0x000055cb714a2b4b in row_search_index_entry (index=index@entry=0x55cb741a2308, entry=entry@entry=0x7fb51c2f32b8, mode=4098, pcur=pcur@entry=0x7fb549f7b980, mtr=mtr@entry=0x7fb549f7bfc8) at /mariadb/bb-10.6-monty/storage/innobase/row/row0row.cc:1301
|
#11 0x000055cb714c5461 in row_upd_sec_index_entry (node=0x7fb51c2deed8, thr=0x7fb51c2df720) at /mariadb/bb-10.6-monty/storage/innobase/row/row0upd.cc:2065
|
#12 row_upd_sec_step (node=0x7fb51c2deed8, thr=0x7fb51c2df720) at /mariadb/bb-10.6-monty/storage/innobase/row/row0upd.cc:2264
|
#13 row_upd (node=node@entry=0x7fb51c2deed8, thr=<optimized out>, thr@entry=0x7fb51c2df720) at /mariadb/bb-10.6-monty/storage/innobase/row/row0upd.cc:2994
|
#14 0x000055cb714c3d5e in row_upd_step (thr=thr@entry=0x7fb51c2df720) at /mariadb/bb-10.6-monty/storage/innobase/row/row0upd.cc:3109
|
#15 0x000055cb7140e1b4 in que_thr_step (thr=<optimized out>) at /mariadb/bb-10.6-monty/storage/innobase/que/que0que.cc:635
|
#16 que_run_threads_low (thr=0x7fb51c2df720) at /mariadb/bb-10.6-monty/storage/innobase/que/que0que.cc:709
|
#17 que_run_threads (thr=thr@entry=0x7fb51c2df720) at /mariadb/bb-10.6-monty/storage/innobase/que/que0que.cc:729
|
#18 0x000055cb7140e9c4 in que_eval_sql (info=info@entry=0x55cb741ca518, sql=<optimized out>, reserve_dict_mutex=false, trx=trx@entry=0x7fb54b21a2a0) at /mariadb/bb-10.6-monty/storage/innobase/que/que0que.cc:779
|
#19 0x000055cb71470999 in row_drop_table_for_mysql (name=<optimized out>, name@entry=0x7fb549f7cdc0 "test/t1#P#p1", trx=trx@entry=0x7fb54b21a2a0, sqlcom=sqlcom@entry=SQLCOM_DROP_TABLE, create_failed=<optimized out>, nonatomic=<optimized out>) at /mariadb/bb-10.6-monty/storage/innobase/row/row0mysql.cc:3602
|
#20 0x000055cb713150c6 in ha_innobase::delete_table (this=<optimized out>, name=0x7fb549f7da30 "./test/t1#P#p1", sqlcom=SQLCOM_DROP_TABLE) at /mariadb/bb-10.6-monty/storage/innobase/handler/ha_innodb.cc:13088
|
#21 0x000055cb71241d0f in ha_partition::del_ren_table (this=<optimized out>, from=0x7fb549f7ed30 "./test/t1", to=<optimized out>) at /mariadb/bb-10.6-monty/sql/ha_partition.cc:2509
|
#22 0x000055cb70fc1bbe in hton_drop_table (hton=<optimized out>, path=0x1 <error: Cannot access memory at address 0x1>) at /mariadb/bb-10.6-monty/sql/handler.cc:577
|
In safe_mutex_lock(), the DBUG_PUSH_EMPTY was added to supposedly speed up the code, but I suspect that it is actually slowing down builds with cmake -DWITH_DBUG_TRACE=OFF. Anyway, that should not be the cause of the hang.
Note: buf_pool.mutex is available according to the core dump. Also Thread 14 is trying to acquire it.
Thread 12 and Thread 13 are both trying to acquire dict_sys.mutex, which is held by Thread 7.
It is unclear to me why that Thread 7 would not be able to proceed. I will try to reproduce this with other build options, to see whether SAFE_MUTEX or DBUG are contributing to the hang.
|
|
I analyzed the rr replay trace that elenst provided to me. I just now realized that there was no reason for the two SET GLOBAL to be inside the loop in my mtr test case. The key here is that
SET GLOBAL innodb_change_buffering_debug=1;
|
is causing contention. In that trace, rr record appears to have played its trick of unfair scheduling. DROP TABLE t1 would be waiting forever for buf_pool.mutex while holding dict_sys.mutex and dict_sys.latch:
|
10.6 49ddfb63780c55b4133755c3479e13c7ea9a9d7b
|
#4 0x0000555a5dbc0a91 in inline_mysql_mutex_lock (that=0x555a5eaaacc0 <buf_pool>, src_file=0x555a5e2aae08 "/home/mdbe/MDEV-25297/10.6/storage/innobase/buf/buf0buf.cc", src_line=2933)
|
at /home/mdbe/MDEV-25297/10.6/include/mysql/psi/mysql_thread.h:746
|
#5 0x0000555a5dbc803c in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=11, mtr=0x7ff674155ed0, err=0x7ff674154c2c, allow_ibuf_merge=true)
|
at /home/mdbe/MDEV-25297/10.6/storage/innobase/buf/buf0buf.cc:2933
|
At least one thread is constantly acquiring and releasing buf_pool.mutex:
#0 buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=15, mtr=0x7ff5489f90f0, err=0x7ff5489f7e5c, allow_ibuf_merge=true) at /home/mdbe/MDEV-25297/10.6/storage/innobase/buf/buf0buf.cc:2935
|
#1 0x0000555a5dbc8804 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=15, mtr=0x7ff5489f90f0, err=0x7ff5489f7e5c, allow_ibuf_merge=true)
|
at /home/mdbe/MDEV-25297/10.6/storage/innobase/buf/buf0buf.cc:3104
|
#2 0x0000555a5db8d77d in btr_cur_search_to_nth_level_func (index=0x555a61813698, level=0, tuple=0x7ff528004aa8, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x7ff5489f8e50, ahi_latch=0x0, mtr=0x7ff5489f90f0,
|
autoinc=0) at /home/mdbe/MDEV-25297/10.6/storage/innobase/btr/btr0cur.cc:1610
|
#3 0x0000555a5dac4068 in btr_pcur_open_low (index=0x555a61813698, level=0, tuple=0x7ff528004aa8, mode=PAGE_CUR_LE, latch_mode=8194, cursor=0x7ff5489f8e50, autoinc=0, mtr=0x7ff5489f90f0)
|
at /home/mdbe/MDEV-25297/10.6/storage/innobase/include/btr0pcur.ic:439
|
#4 0x0000555a5dac8120 in row_search_index_entry (index=0x555a61813698, entry=0x7ff528004aa8, mode=8194, pcur=0x7ff5489f8e50, mtr=0x7ff5489f90f0)
|
at /home/mdbe/MDEV-25297/10.6/storage/innobase/row/row0row.cc:1301
|
#5 0x0000555a5dabbbbf in row_purge_remove_sec_if_poss_leaf (node=0x555a618210c8, index=0x555a61813698, entry=0x7ff528004aa8) at /home/mdbe/MDEV-25297/10.6/storage/innobase/row/row0purge.cc:473
|
#6 0x0000555a5dabc0e3 in row_purge_remove_sec_if_poss (node=0x555a618210c8, index=0x555a61813698, entry=0x7ff528004aa8) at /home/mdbe/MDEV-25297/10.6/storage/innobase/row/row0purge.cc:582
|
#7 0x0000555a5dabc2f7 in row_purge_del_mark (node=0x555a618210c8) at /home/mdbe/MDEV-25297/10.6/storage/innobase/row/row0purge.cc:652
|
#8 0x0000555a5dabdd27 in row_purge_record_func (node=0x555a618210c8, undo_rec=0x7ff66400aea8 "", thr=0x555a61820ed8, updated_extern=false) at /home/mdbe/MDEV-25297/10.6/storage/innobase/row/row0purge.cc:1066
|
Both are accessing the same page (334 in the system tablespace), the secondary index ID_IND of SYS_TABLES. Because of the debug setting, they are trying to evict the page from the buffer pool:
#if defined UNIV_DEBUG || defined UNIV_IBUF_DEBUG
|
re_evict:
|
if (mode != BUF_GET_IF_IN_POOL
|
&& mode != BUF_GET_IF_IN_POOL_OR_WATCH) {
|
} else if (!ibuf_debug) {
|
} else if (fil_space_t* space = fil_space_t::get(page_id.space())) {
|
/* Try to evict the block from the buffer pool, to use the
|
insert buffer (change buffer) as much as possible. */
|
|
mysql_mutex_lock(&buf_pool.mutex);
|
|
fix_block->unfix();
|
|
/* Blocks cannot be relocated or enter or exit the
|
buf_pool while we are holding the buf_pool.mutex. */
|
const bool evicted = buf_LRU_free_page(&fix_block->page, true);
|
space->release();
|
…
|
if (!fix_block->page.oldest_modification()) {
|
goto re_evict;
|
}
|
So, both threads would keep acquiring and releasing fil_system.mutex and buf_pool.mutex in a busy loop. I think that the following should fix the livelock between the two threads:
diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
|
index 93178bada97..ace267db9af 100644
|
--- a/storage/innobase/buf/buf0buf.cc
|
+++ b/storage/innobase/buf/buf0buf.cc
|
@@ -2971,7 +2971,8 @@ buf_page_get_low(
|
buf_flush_lists(ULINT_UNDEFINED, LSN_MAX);
|
buf_flush_wait_batch_end_acquiring_mutex(false);
|
|
- if (!fix_block->page.oldest_modification()) {
|
+ if (fix_block->page.buf_fix_count() == 1
|
+ && !fix_block->page.oldest_modification()) {
|
goto re_evict;
|
}
|
|
Another problem that is illustrated by this test case is that the internal watchdog is not displaying any warning messages before killing the server:
2021-03-31 8:35:56 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.mutex. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
|
|
Roel, are you able to repeat the memory corruption inside DBUG? I was not. I think that that one should be filed as a separate ticket.
Does the patch work for you? I think that the livelock is present already in 10.5.7 due to MDEV-23399.
|
- Verified that a 10.6 debug build from today (1/4/21 at revision 1bd4115841ecded24217e5d753ed4d9822b4cffd) no longer crashes on either testcase.
- Verified with a ASAN/UBSAN build of the same revision and neither testcase produces any ASAN/UBSAN errors.
Thus deemed fixed in full for the time being. If any related issues occur, will log a new bug.
|