[MDEV-29368] Assertion `trx->mysql_thd == thd' failed in innobase_kill_query from process_timers/timer_handler and use-after-poison in innobase_kill_query Created: 2022-08-24  Updated: 2022-12-15  Resolved: 2022-09-29

Status: Closed
Project: MariaDB Server
Component/s: Locking, XA
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: XA, locking, not-10.3, not-10.4, race, regression, sporadic, timer

Issue Links:
Blocks
blocks MDEV-29575 Access to innodb_trx, innodb_locks an... Closed
Duplicate
is duplicated by MDEV-26014 Race condition between KILL and clien... Closed
Problem/Incident
causes MDEV-30084 Shutdown hangs in some tests Closed
Relates
relates to MDEV-17092 ERROR: AddressSanitizer: use-after-po... Closed
relates to MDEV-23536 Race condition between KILL and trans... Closed

 Description   

Sporadic. Usually reproduces every 1 in 200 to 5000. Possibly same as, or connected with, MDEV-26014. Also see MDEV-23536 and MDEV-17092.

Single threaded SQL replay suffices. Readily reproduces at the CLI on debug builds of 10.5 to 10.11. No issues observed on optimized builds, even with longer runs.

Engine=TokuDB can be ignored; it is just a testcase artifact (auto-swapped to InnoDB). Likely a few other artifacts remain (high sporadicity) like the table colums.

# Repeat 200 to 5000 times for issue to trigger
DROP DATABASE test;  # Only here to ensure clean state on each iteration
CREATE DATABASE test;  # Idem
USE test;  # Idem
SET sql_mode='';
CREATE TABLE ti (a INT NOT NULL, b INT UNSIGNED NOT NULL, c CHAR(45), d VARCHAR(57) NOT NULL, e VARBINARY(48), f VARCHAR(97) NOT NULL, g BLOB, h LONGBLOB NOT NULL, id BIGINT NOT NULL, KEY(b), KEY(e), PRIMARY KEY(id)) ENGINE=TokuDB;
XA START 'a';
XA END 'a';
SET @@max_statement_time=0.0001;
SET SESSION pseudo_slave_mode=1;
XA PREPARE 'a';;
XA COMMIT 'a';  # Only here to ensure the next round of SQL works correctly, not required in actual crash sequence.

Leads to:

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

mysqld: /test/10.11_dbg/storage/innobase/handler/ha_innodb.cc:4969: void innobase_kill_query(handlerton*, THD*, thd_kill_levels): Assertion `trx->mysql_thd == thd' failed.

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

Core was generated by `/test/MD190822-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x15366b7a3700 (LWP 2465524))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000015366bb3d859 in __GI_abort () at abort.c:79
#2  0x000015366bb3d729 in __assert_fail_base (fmt=0x15366bcd3588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55723d115a9b "trx->mysql_thd == thd", file=0x55723d1183a8 "/test/10.11_dbg/storage/innobase/handler/ha_innodb.cc", line=4969, function=<optimized out>) at assert.c:92
#3  0x000015366bb4efd6 in __GI___assert_fail (assertion=assertion@entry=0x55723d115a9b "trx->mysql_thd == thd", file=file@entry=0x55723d1183a8 "/test/10.11_dbg/storage/innobase/handler/ha_innodb.cc", line=line@entry=4969, function=function@entry=0x55723d1193b0 "void innobase_kill_query(handlerton*, THD*, thd_kill_levels)") at assert.c:101
#4  0x000055723c902759 in innobase_kill_query (thd=0x153610000db8) at /test/10.11_dbg/storage/innobase/handler/ha_innodb.cc:4969
#5  0x000055723c5fd9ad in kill_handlerton (thd=thd@entry=0x153610000db8, plugin=<optimized out>, level=level@entry=0x15366b7a2c3c) at /test/10.11_dbg/sql/handler.cc:923
#6  0x000055723c33723c in plugin_foreach_with_mask (thd=thd@entry=0x153610000db8, func=func@entry=0x55723c5fd953 <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, arg=arg@entry=0x15366b7a2c3c) at /test/10.11_dbg/sql/sql_plugin.cc:2511
#7  0x000055723c6011d7 in ha_kill_query (thd=thd@entry=0x153610000db8, level=<optimized out>) at /test/10.11_dbg/sql/handler.cc:930
#8  0x000055723c2ab7a1 in THD::awake_no_mutex (this=this@entry=0x153610000db8, state_to_set=state_to_set@entry=KILL_TIMEOUT) at /test/10.11_dbg/sql/sql_class.cc:1899
#9  0x000055723c2ab9b9 in THD::awake (state_to_set=KILL_TIMEOUT, this=0x153610000db8) at /test/10.11_dbg/sql/sql_class.h:3769
#10 thd_kill_timeout (thd=thd@entry=0x153610000db8) at /test/10.11_dbg/sql/sql_class.cc:628
#11 0x000055723cd42d00 in process_timers (now=<synthetic pointer>) at /test/10.11_dbg/mysys/thr_timer.c:271
#12 timer_handler (arg=<optimized out>) at /test/10.11_dbg/mysys/thr_timer.c:315
#13 0x000015366c04e609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x000015366bc3a133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bug confirmed present in:
MariaDB: 10.5.18 (dbg), 10.6.10 (dbg), 10.7.6 (dbg), 10.8.5 (dbg), 10.9.2 (dbg), 10.10.2 (dbg), 10.11.0 (dbg)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.3.37 (dbg), 10.3.37 (opt), 10.4.27 (dbg), 10.4.27 (opt), 10.5.18 (opt), 10.6.10 (opt), 10.7.6 (opt), 10.8.5 (opt), 10.9.2 (opt), 10.10.2 (opt), 10.11.0 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.38 (dbg), 5.7.38 (opt), 8.0.29 (dbg), 8.0.29 (opt)



 Comments   
Comment by Roel Van de Paar [ 2022-08-24 ]

jplindst and serg fyi

Comment by Marko Mäkelä [ 2022-08-24 ]

Roel, please provide an rr replay trace.

Comment by Roel Van de Paar [ 2022-08-30 ]

marko rr replay trace now available at:

rr:/test/MD060822-mariadb-10.10.0-linux-x86_64-dbg/rr/mysqld-0$ rr replay .

10.10.0 360d99429c97d2bf2e1dc32276e4f3b3f46c9a74 (Debug)

Thread 2 received signal SIGABRT, Aborted.
[Switching to Thread 1124588.1124637]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000014e55744c859 in __GI_abort () at abort.c:79
#2  0x000014e55744c729 in __assert_fail_base (fmt=0x14e5575e2588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x56311df9e67a "trx->mysql_thd == thd", 
    file=0x56311dfa0f90 "/test/10.10_dbg/storage/innobase/handler/ha_innodb.cc", line=4969, function=<optimized out>)
    at assert.c:92
#3  0x000014e55745dfd6 in __GI___assert_fail (assertion=assertion@entry=0x56311df9e67a "trx->mysql_thd == thd", 
    file=file@entry=0x56311dfa0f90 "/test/10.10_dbg/storage/innobase/handler/ha_innodb.cc", line=line@entry=4969, 
    function=function@entry=0x56311dfa1f38 "void innobase_kill_query(handlerton*, THD*, thd_kill_levels)")
    at assert.c:101
#4  0x000056311d77e8f2 in innobase_kill_query (thd=0x4f330072228)
    at /test/10.10_dbg/storage/innobase/handler/ha_innodb.cc:4969
...

Please note that the first continue will take ~30 minutes or more.

Comment by Marko Mäkelä [ 2022-08-30 ]

I analyzed the trace:

ssh rr
rr replay /test/MD060822-mariadb-10.10.0-linux-x86_64-dbg/rr/mysqld-0

continue
frame 4
watch -l trx->mysql_thd
reverse-continue
reverse-continue
backtrace
thread apply 2 backtrace

The assertion would fail in Thread 2. The stack traces reveal when the trx->mysql_thd was changed and what the soon-to-crash thread was doing at that point:

10.10 360d99429c97d2bf2e1dc32276e4f3b3f46c9a74

#0  trx_t::free (this=this@entry=0x56311ef00b80) at /test/10.10_dbg/storage/innobase/trx/trx0trx.cc:391
#1  0x000056311d77baf6 in innobase_close_connection (hton=<optimized out>, thd=0x4f330072228) at /test/10.10_dbg/storage/innobase/handler/ha_innodb.cc:4956
#2  0x000056311d47b4c0 in ha_close_connection (thd=thd@entry=0x4f330072228) at /test/10.10_dbg/sql/handler.cc:906
#3  0x000056311d3de056 in slave_applier_reset_xa_trans (thd=0x4f330072228) at /test/10.10_dbg/sql/xa.cc:1099
#4  trans_xa_prepare (thd=thd@entry=0x4f330072228) at /test/10.10_dbg/sql/xa.cc:548
#5  0x000056311d19b74f in mysql_execute_command (thd=thd@entry=0x4f330072228, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.10_dbg/sql/sql_parse.cc:5857
#6  0x000056311d184332 in mysql_parse (thd=thd@entry=0x4f330072228, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x14e555162330) at /test/10.10_dbg/sql/sql_parse.cc:8037
#7  0x000056311d191900 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x4f330072228, packet=packet@entry=0x4f330078d99 "XA PREPARE 'a'", packet_length=packet_length@entry=14, 
    blocking=blocking@entry=true) at /test/10.10_dbg/sql/sql_class.h:1366
#8  0x000056311d194008 in do_command (thd=0x4f330072228, blocking=blocking@entry=true) at /test/10.10_dbg/sql/sql_parse.cc:1407
#9  0x000056311d2f5146 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x563120f36478, put_in_cache=put_in_cache@entry=true) at /test/10.10_dbg/sql/sql_connect.cc:1418
#10 0x000056311d2f564f in handle_one_connection (arg=0x563120f36478) at /test/10.10_dbg/sql/sql_connect.cc:1312
#11 0x0000499e5189b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x000014e557549133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) thread apply 2 backtrace
#0  __GI___pthread_mutex_lock (mutex=<optimized out>) at ../nptl/pthread_mutex_lock.c:159
#1  0x000014e557589291 in __GI___dl_iterate_phdr (callback=0x6ffe15f0, data=0x698a35791680) at dl-iteratephdr.c:40
#2  0x000000006ffe26c1 in _Unwind_Find_FDE () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#3  0x000000006ffde868 in ?? () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#4  0x000000006ffe077b in _Unwind_Backtrace () from /lib/x86_64-linux-gnu/libgcc_s.so.1
#5  0x000014e5575570a6 in __GI___backtrace (array=array@entry=0x698a35791a30, size=size@entry=10) at backtrace.c:116
#6  0x000056311dbd1f41 in sf_malloc (size=size@entry=32, my_flags=my_flags@entry=16) at /test/10.10_dbg/mysys/safemalloc.c:147
#7  0x000056311dbc5073 in my_malloc (key=key@entry=0, size=size@entry=8, my_flags=my_flags@entry=16) at /test/10.10_dbg/mysys/my_malloc.c:90
#8  0x000056311d1af909 in intern_plugin_lock (lex=lex@entry=0x0, rc=rc@entry=0x698a35791be0, state_mask=state_mask@entry=8) at /test/10.10_dbg/sql/sql_plugin.cc:993
#9  0x000056311d1b406f in plugin_foreach_with_mask (thd=thd@entry=0x4f330072228, func=func@entry=0x56311d477d0b <kill_handlerton(THD*, plugin_ref, void*)>, type=type@entry=1, state_mask=state_mask@entry=8, arg=arg@entry=0x698a35791c3c) at /test/10.10_dbg/sql/sql_plugin.cc:2501
#10 0x000056311d47b589 in ha_kill_query (thd=thd@entry=0x4f330072228, level=<optimized out>) at /test/10.10_dbg/sql/handler.cc:930
#11 0x000056311d127e8f in THD::awake_no_mutex (this=this@entry=0x4f330072228, state_to_set=state_to_set@entry=KILL_TIMEOUT) at /test/10.10_dbg/sql/sql_class.cc:1902
#12 0x000056311d1280a7 in THD::awake (state_to_set=KILL_TIMEOUT, this=0x4f330072228) at /test/10.10_dbg/sql/sql_class.h:3794
#13 thd_kill_timeout (thd=thd@entry=0x4f330072228) at /test/10.10_dbg/sql/sql_class.cc:628
#14 0x000056311dbcdb33 in process_timers (now=<synthetic pointer>) at /test/10.10_dbg/mysys/thr_timer.c:271
#15 timer_handler (arg=<optimized out>) at /test/10.10_dbg/mysys/thr_timer.c:315
#16 0x0000499e5189b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#17 0x000014e557549133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I think that the trace could have been a lot shorter if cmake -DWITH_SAFEMALLOC=OFF had been specified for the build. Anyway, we can see that at the time ha_kill_query() was going to invoke innobase_kill_query(), ha_close_connection() had already destroyed the connection.

Is THD::LOCK_thd_data supposed to prevent this race? If it is, then ha_close_connection() should be holding it.

Roel, did you try to reproduce this with AddressSanitizer? I hope that it would report heap-use-after-poison.

I do not think that this race condition can be fixed inside InnoDB.

Comment by Roel Van de Paar [ 2022-08-30 ]

marko Excellent analysis, thank you. You are 100% correct; use-after-poison confirmed using ASAN on 10.11 debug trunk build:

10.11.0 fe1f8f2c6b6f3b8e3383168225f9ae7853028947 (Debug, UBASAN)

Version: '10.11.0-MariaDB-debug'  socket: '/test/UBASAN_MD310822-mariadb-10.11.0-linux-x86_64-dbg/socket.sock'  port: 10011  MariaDB Server
=================================================================
==45728==ERROR: AddressSanitizer: use-after-poison on address 0x14c381b10c10 at pc 0x5571a6803204 bp 0x14c389f9d790 sp 0x14c389f9d780
READ of size 8 at 0x14c381b10c10 thread T1
    #0 0x5571a6803203 in innobase_kill_query /test/10.11_dbg_san/storage/innobase/handler/ha_innodb.cc:4969
    #1 0x5571a4f6de4f in kill_handlerton /test/10.11_dbg_san/sql/handler.cc:923
    #2 0x5571a3875468 in plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int**, void*), int, unsigned int, void*) /test/10.11_dbg_san/sql/sql_plugin.cc:2511
    #3 0x5571a4f8659d in ha_kill_query(THD*, thd_kill_levels) /test/10.11_dbg_san/sql/handler.cc:930
    #4 0x5571a34087d5 in THD::awake_no_mutex(killed_state) /test/10.11_dbg_san/sql/sql_class.cc:1899
    #5 0x5571a3409762 in THD::awake(killed_state) /test/10.11_dbg_san/sql/sql_class.h:3769
    #6 0x5571a3409762 in thd_kill_timeout /test/10.11_dbg_san/sql/sql_class.cc:628
    #7 0x5571a79ca8b8 in process_timers /test/10.11_dbg_san/mysys/thr_timer.c:271
    #8 0x5571a79ca8b8 in timer_handler /test/10.11_dbg_san/mysys/thr_timer.c:315
    #9 0x14c38f541608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477
    #10 0x14c38e7b6132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132)
 
0x14c381b10c10 is located 5136 bytes inside of 4194304-byte region [0x14c381b0f800,0x14c381f0f800)
allocated by thread T0 here:
    #0 0x5571a2dcaa55 in memalign (/test/UBASAN_MD310822-mariadb-10.11.0-linux-x86_64-dbg/bin/mariadbd+0x7ee2a55)
    #1 0x5571a717e575 in aligned_malloc(unsigned long, unsigned long) /test/10.11_dbg_san/include/aligned.h:26
    #2 0x5571a717e575 in Pool<trx_t, TrxFactory, TrxPoolLock>::Pool(unsigned long) /test/10.11_dbg_san/storage/innobase/include/ut0pool.h:68
    #3 0x5571a717e575 in PoolManager<Pool<trx_t, TrxFactory, TrxPoolLock>, TrxPoolManagerLock>::add_pool(unsigned long) /test/10.11_dbg_san/storage/innobase/include/ut0pool.h:304
    #4 0x5571a715af1b in PoolManager<Pool<trx_t, TrxFactory, TrxPoolLock>, TrxPoolManagerLock>::create() /test/10.11_dbg_san/storage/innobase/include/ut0pool.h:329
    #5 0x5571a715af1b in PoolManager<Pool<trx_t, TrxFactory, TrxPoolLock>, TrxPoolManagerLock>::PoolManager(unsigned long) /test/10.11_dbg_san/storage/innobase/include/ut0pool.h:215
    #6 0x5571a715af1b in trx_pool_init() /test/10.11_dbg_san/storage/innobase/trx/trx0trx.cc:305
    #7 0x5571a703ea97 in srv_boot() /test/10.11_dbg_san/storage/innobase/srv/srv0srv.cc:670
    #8 0x5571a705fa16 in srv_start(bool) /test/10.11_dbg_san/storage/innobase/srv/srv0start.cc:932
    #9 0x5571a6895e01 in innodb_init /test/10.11_dbg_san/storage/innobase/handler/ha_innodb.cc:4231
    #10 0x5571a4f84aeb in ha_initialize_handlerton(st_plugin_int*) /test/10.11_dbg_san/sql/handler.cc:649
    #11 0x5571a388252e in plugin_initialize /test/10.11_dbg_san/sql/sql_plugin.cc:1464
    #12 0x5571a388b82b in plugin_init(int*, char**, int) /test/10.11_dbg_san/sql/sql_plugin.cc:1757
    #13 0x5571a2e210d1 in init_server_components /test/10.11_dbg_san/sql/mysqld.cc:5132
    #14 0x5571a2e33e0f in mysqld_main(int, char**) /test/10.11_dbg_san/sql/mysqld.cc:5750
    #15 0x5571a2e0b81a in main /test/10.11_dbg_san/sql/main.cc:34
    #16 0x14c38e6bb082 in __libc_start_main ../csu/libc-start.c:308
Thread T1 created by T0 here:
    #0 0x5571a2cf7295 in __interceptor_pthread_create (/test/UBASAN_MD310822-mariadb-10.11.0-linux-x86_64-dbg/bin/mariadbd+0x7e0f295)
    #1 0x5571a79caeae in init_thr_timer /test/10.11_dbg_san/mysys/thr_timer.c:90
    #2 0x5571a2e20029 in init_server_components /test/10.11_dbg_san/sql/mysqld.cc:4838
    #3 0x5571a2e33e0f in mysqld_main(int, char**) /test/10.11_dbg_san/sql/mysqld.cc:5750
    #4 0x5571a2e0b81a in main /test/10.11_dbg_san/sql/main.cc:34
    #5 0x14c38e6bb082 in __libc_start_main ../csu/libc-start.c:308
 
SUMMARY: AddressSanitizer: use-after-poison /test/10.11_dbg_san/storage/innobase/handler/ha_innodb.cc:4969 in innobase_kill_query
Shadow bytes around the buggy address:
  0x0298f035a130: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0298f035a140: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0298f035a150: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0298f035a160: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0298f035a170: 00 00 00 00 00 00 00 00 f7 00 f7 07 00 f7 00 f7
=>0x0298f035a180: f7 f7[f7]f7 f7 04 00 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0298f035a190: f7 f7 f7 f7 f7 f7 00 f7 f7 f7 00 f7 f7 04 f7 f7
  0x0298f035a1a0: f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7 f7
  0x0298f035a1b0: f7 f7 f7 f7 f7 f7 f7 f7 f7 00 00 f7 00 00 00 00
  0x0298f035a1c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0298f035a1d0: 00 f7 00 00 00 00 00 00 f7 f7 f7 f7 f7 f7 f7 f7
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==45728==ABORTING
220831  9:23:51 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.11.0-MariaDB-debug
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=2
max_threads=10002
thread_count=2
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 22162481 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x100000
/test/UBASAN_MD310822-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld(+0x7e417c0)[0x5571a2d297c0]
asan_interceptors.o:0(__interceptor_backtrace.part.0)[0x5571a79b95b4]
mysys/stacktrace.c:212(my_print_stacktrace)[0x5571a4f66308]
sigaction.c:0(__restore_rt)[0x14c38f54d420]
addr2line: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
??:0(gsignal)[0x14c38e6da00b]
??:0(abort)[0x14c38e6b9859]
:0(__sanitizer::Abort())[0x5571a2de7d42]
:0(__sanitizer::Die())[0x5571a2df28ec]
:0(__asan::ScopedInErrorReport::~ScopedInErrorReport())[0x5571a2dd3f7c]
:0(__asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool))[0x5571a2dd39f3]
??:0(__asan_report_load8)[0x5571a2dd483b]
handler/ha_innodb.cc:4969(innobase_kill_query(handlerton*, THD*, thd_kill_levels))[0x5571a6803204]
sql/handler.cc:925(kill_handlerton(THD*, st_plugin_int**, void*))[0x5571a4f6de50]
sql/sql_plugin.cc:2511(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int**, void*), int, unsigned int, void*))[0x5571a3875469]
sql/handler.cc:928(ha_kill_query(THD*, thd_kill_levels))[0x5571a4f8659e]
sql/sql_class.cc:1899(THD::awake_no_mutex(killed_state))[0x5571a34087d6]
psi/mysql_thread.h:790(thd_kill_timeout)[0x5571a3409763]
mysys/thr_timer.c:277(process_timers)[0x5571a79ca8b9]
nptl/pthread_create.c:478(start_thread)[0x14c38f541609]
addr2line: DWARF error: section .debug_info is larger than its filesize! (0x93ef57 vs 0x530ea0)
??:0(clone)[0x14c38e7b6133]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /test/UBASAN_MD310822-mariadb-10.11.0-linux-x86_64-dbg/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            unlimited            unlimited            bytes
Max core file size        0                    0                    bytes
Max resident set          unlimited            unlimited            bytes
Max processes             unlimited            unlimited            processes
Max open files            1048576              1048576              files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       unlimited            unlimited            signals
Max msgqueue size         unlimited            unlimited            bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us
Core pattern: core
 
Kernel version: Linux version 5.15.0-1016-gcp (buildd@lcy02-amd64-087) (gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #21~20.04.1-Ubuntu SMP Fri Aug 5 12:53:07 UTC 2022

Also note the interesting 'addr2line: DWARF error: section .debug_info is larger than its filesize!' errors.

Comment by Sergei Golubchik [ 2022-09-28 ]

This is basically MDEV-23536, that was fixed in 10.3, but the fix was lost in 10.4->10.5 merge

Comment by Marko Mäkelä [ 2022-10-12 ]

On 10.5, I disabled two Galera tests that apparently started to fail because of this fix:

galera.galera_bf_abort_shutdown          [ disabled ]  MDEV-29368 shutdown hang
galera_3nodes.galera_parallel_apply_3nodes [ disabled ]  MDEV-29368 DEBUG_SYNC timeout

Update: this change is innocent. These tests fail in the same way even if I revert these changes:

git revert --no-commit e29fb956145cfa1f4f8c41cafcddea36a20b23aa 6b685ea7b0776430d45b095cb4be3ef0739a3c04{,~}

MDEV-29774 was filed to fix these.

It is also worth noting that a number of galera_bf_ tests are disabled. Based on the names of the tests, they are seemingly testing functionality related to MDEV-23328.

Generated at Thu Feb 08 10:08:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.