[MDEV-23542] Server crashes in thd_clear_errors() Created: 2020-08-23  Updated: 2021-09-09  Resolved: 2021-05-05

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.4.14, 10.4.17
Fix Version/s: 10.4.19, 10.5.10

Type: Bug Priority: Blocker
Reporter: Valerii Kravchuk Assignee: Oleksandr Byelkin
Resolution: Fixed Votes: 1
Labels: crash


 Description   

Server crashes and in full backtrace we can see the following crashing thread:

Thread 1 (Thread 0x7f6ea5be0700 (LWP 22860)):
#0  0x00007f7005f01a71 in pthread_kill () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0000560e44d517ae in handle_fatal_signal (sig=11) at /home/buildbot/buildbot/build/sql/signal_handler.cc:343
        curr_time = 1598152611
        tm = {tm_sec = 51, tm_min = 16, tm_hour = 12, tm_mday = 23, tm_mon = 7, tm_year = 120, tm_wday = 0, tm_yday = 235, tm_isdst = 0, tm_gmtoff = 32400, tm_zone = 0x560e47c0c460 "KST"}
        print_invalid_query_pointer = false
#2  <signal handler called>
No symbol table info available.
#3  0x0000560e44af622e in thd_clear_errors (thd=thd@entry=0x7f5ce02d3c38) at /home/buildbot/buildbot/build/sql/sql_class.cc:311
No locals.
#4  0x0000560e44afb06f in THD::change_user (this=this@entry=0x7f5ce02d3c38) at /home/buildbot/buildbot/build/sql/sql_class.cc:1420
No locals.
#5  0x0000560e44afb269 in THD::reset_for_reuse (this=this@entry=0x7f5ce02d3c38) at /home/buildbot/buildbot/build/sql/sql_class.cc:1632
No locals.
#6  0x0000560e44c2fbf5 in CONNECT::create_thd (this=this@entry=0x560e496d5308, thd=thd@entry=0x7f5ce02d3c38) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1500
        res = <optimized out>
        thd_reused = true
#7  0x0000560e44a810aa in cache_thread (thd=0x7f5ce02d3c38) at /home/buildbot/buildbot/build/sql/mysqld.cc:2720
        abstime = {tv_sec = 1598152892, tv_nsec = 842931000}
#8  one_thread_per_connection_end (thd=0x7f5ce02d3c38, put_in_cache=<optimized out>) at /home/buildbot/buildbot/build/sql/mysqld.cc:2783
        wsrep_applier = <optimized out>
#9  0x0000560e44c2ff2b in do_handle_one_connection (connect=connect@entry=0x560e4b406c18) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1423
        create_user = <optimized out>
        thr_create_utime = <optimized out>
        thd = 0x7f5ce02d3c38
#10 0x0000560e44c3007d in handle_one_connection (arg=arg@entry=0x560e4b406c18) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1316
        connect = 0x560e4b406c18
#11 0x0000560e4526abdd in pfs_spawn_thread (arg=0x560e9dc031a8) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1869
        typed_arg = 0x560e9dc031a8
        user_arg = 0x560e4b406c18
        pfs = <optimized out>
        user_start_routine = 0x560e44c30040 <handle_one_connection(void*)>
        klass = <optimized out>
#12 0x00007f7005efce75 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#13 0x00007f7004d5a8fd in clone () from /lib64/libc.so.6
No symbol table info available.



 Comments   
Comment by Oleksandr Byelkin [ 2021-01-13 ]

Is patch above fix the situation?

Comment by Marcos Albe [ 2021-03-25 ]

Hello Oleksandr, Valerii,

We see same backtrace:

(gdb) bt
+bt
#0  0x00007fa99b92e428 in raise () from /home/marcos.albe/CS0016731/tmp/libc.so.6
#1  0x00007fa99b93002a in abort () from /home/marcos.albe/CS0016731/tmp/libc.so.6
#2  0x00007fa99b9707ea in ?? () from /home/marcos.albe/CS0016731/tmp/libc.so.6
#3  0x00007fa99ba1215c in __fortify_fail () from /home/marcos.albe/CS0016731/tmp/libc.so.6
#4  0x00007fa99ba10160 in __chk_fail () from /home/marcos.albe/CS0016731/tmp/libc.so.6
#5  0x00007fa99ba120a7 in __fdelt_warn () from /home/marcos.albe/CS0016731/tmp/libc.so.6
#6  0x00000099604dcbae in my_addr_resolve (ptr=<optimized out>, loc=loc@entry=0x7f7126fdd0c0) at /home/buildbot/buildbot/build/mariadb-10.4.7/mysys/my_addr_resolve.c:234
#7  0x00000099604c0102 in print_with_addr_resolve (n=<optimized out>, addrs=0x7f7126fdd0e0) at /home/buildbot/buildbot/build/mariadb-10.4.7/mysys/stacktrace.c:254
#8  my_print_stacktrace (stack_bottom=<optimized out>, thread_stack=196608, silent=silent@entry=0 '\000') at /home/buildbot/buildbot/build/mariadb-10.4.7/mysys/stacktrace.c:273
#9  0x000000995ff53b07 in handle_fatal_signal (sig=11) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/signal_handler.cc:207
#10 <signal handler called>
#11 0x000000995fcef04e in thd_clear_errors (thd=thd@entry=0x7f740c2a8eb8) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_class.cc:311
#12 0x000000995fcf3f9b in THD::change_user (this=this@entry=0x7f740c2a8eb8) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_class.cc:1416
#13 0x000000995fcf4179 in THD::reset_for_reuse (this=this@entry=0x7f740c2a8eb8) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_class.cc:1628
#14 0x000000995fe26214 in CONNECT::create_thd (this=this@entry=0x9a08909468, thd=thd@entry=0x7f740c2a8eb8) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_connect.cc:1496
#15 0x000000995fc794c1 in cache_thread (thd=0x7f740c2a8eb8) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/mysqld.cc:2714
#16 one_thread_per_connection_end (thd=0x7f740c2a8eb8, put_in_cache=<optimized out>) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/mysqld.cc:2777
#17 0x000000995fe265ee in do_handle_one_connection (connect=connect@entry=0x9a0ae41458) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_connect.cc:1419
#18 0x000000995fe267bd in handle_one_connection (arg=arg@entry=0x9a0ae41458) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_connect.cc:1306
#19 0x00000099601028d1 in pfs_spawn_thread (arg=0x9a0af8cec8) at /home/buildbot/buildbot/build/mariadb-10.4.7/storage/perfschema/pfs.cc:1862
#20 0x00007fa99c5596ba in start_thread () from /home/marcos.albe/CS0016731/tmp/libpthread.so.0
#21 0x00007fa99ba0041d in clone () from /home/marcos.albe/CS0016731/tmp/libc.so.6

And we indeed see thd->mysys_var is null pointer

(gdb) f 11
+f 11
#11 0x000000995fcef04e in thd_clear_errors (thd=thd@entry=0x7f740c2a8eb8) at /home/buildbot/buildbot/build/mariadb-10.4.7/sql/sql_class.cc:311
311       thd->mysys_var->abort= 0;
(gdb) p thd->mysys_var
+p thd->mysys_var
$3 = (st_my_thread_var *) 0x0
(gdb) p thd->mysys_var->abort
+p thd->mysys_var->abort
Cannot access memory at address 0x98

Still in latest 10.4.18 we see the patch was not applied (which indeed should solve the problem).

Comment by Sergei Golubchik [ 2021-04-26 ]

why thd->mysys_var would be NULL?

Generated at Thu Feb 08 09:23:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.