|
Reducer reduced this testcase, over 3 days of machine time, looking for any crash using between 10 to 50 mysqld's at varied times. The issue is highly sporadic. It reduced it to a 339 lines testcase, and on inspection it was a blessing to find that all queries inside the trace were identical!
Execute this testcase in 1 to 250 threads simultaneously (only one required, but you want to push the server towards this issue as it's highly sporadic). Wait for the issue to hit (may take some time - for example 1-2 minutes at 200 threads). Some tests shows that approx 40k to 300k identical query executions are required.
SELECT COUNT(*) FROM INFORMATION_SCHEMA.ALL_PLUGINS;
|
Leads to various crashes:
|
10.4.15 de76bebc5798313e3798a35e02468e42254dd608 (Debug)
|
Core was generated by `/test/MD150920-mariadb-10.4.15-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
|
Program terminated with signal SIGSEGV, Segmentation fault.
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
|
at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
|
[Current thread is 1 (Thread 0x14806d85c700 (LWP 2699987))]
|
(gdb) bt
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
|
#1 0x00005622e886b308 in my_write_core (sig=sig@entry=11) at /test/10.4_dbg/mysys/stacktrace.c:386
|
#2 0x00005622e7ff13c8 in handle_fatal_signal (sig=11) at /test/10.4_dbg/sql/signal_handler.cc:343
|
#3 <signal handler called>
|
#4 __GI___pthread_mutex_lock (mutex=0x120) at ../nptl/pthread_mutex_lock.c:65
|
#5 0x000014806d501035 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
|
#6 0x000014806d501445 in ?? () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
|
#7 0x0000147feb1f7e40 in ?? () from /usr/lib/x86_64-linux-gnu/libtasn1.so.6
|
#8 0x0000147feb1f8e38 in asn1_array2tree () from /usr/lib/x86_64-linux-gnu/libtasn1.so.6
|
#9 0x0000148042dc3d15 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
|
#10 0x0000148042da2651 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
|
#11 0x000014806d71e783 in call_init (env=0x7ffc43280410, argv=0x7ffc432803b8, argc=10, l=<optimized out>) at dl-init.c:72
|
#12 _dl_init (main_map=main_map@entry=0x148046442a00, argc=10, argv=0x7ffc432803b8, env=0x7ffc43280410) at dl-init.c:119
|
#13 0x000014806d72324f in dl_open_worker (a=a@entry=0x14806d858090) at dl-open.c:522
|
#14 0x000014806bc1a51f in __GI__dl_catch_exception (exception=exception@entry=0x14806d858070, operate=operate@entry=0x14806d722e10 <dl_open_worker>, args=args@entry=0x14806d858090) at dl-error-skeleton.c:196
|
#15 0x000014806d72281a in _dl_open (file=0x14806d858550 "/test/MD150920-mariadb-10.4.15-linux-x86_64-dbg/lib/plugin/client_ed25519.so", mode=-2147483646, caller_dlopen=0x5622e7d4e6dc <plugin_dl_add(LEX_CSTRING const*, myf)+643>, nsid=<optimized out>, argc=10, argv=<optimized out>, env=0x7ffc43280410) at dl-open.c:605
|
#16 0x000014806c5cbf96 in dlopen_doit (a=a@entry=0x14806d8582c0) at dlopen.c:66
|
#17 0x000014806bc1a51f in __GI__dl_catch_exception (exception=exception@entry=0x14806d858260, operate=operate@entry=0x14806c5cbf40 <dlopen_doit>, args=args@entry=0x14806d8582c0) at dl-error-skeleton.c:196
|
#18 0x000014806bc1a5af in __GI__dl_catch_error (objname=objname@entry=0x148046092050, errstring=errstring@entry=0x148046092058, mallocedp=mallocedp@entry=0x148046092048, operate=operate@entry=0x14806c5cbf40 <dlopen_doit>, args=args@entry=0x14806d8582c0) at dl-error-skeleton.c:215
|
#19 0x000014806c5cc745 in _dlerror_run (operate=operate@entry=0x14806c5cbf40 <dlopen_doit>, args=args@entry=0x14806d8582c0) at dlerror.c:162
|
#20 0x000014806c5cc051 in __dlopen (file=file@entry=0x14806d858550 "/test/MD150920-mariadb-10.4.15-linux-x86_64-dbg/lib/plugin/client_ed25519.so", mode=mode@entry=2) at dlopen.c:87
|
#21 0x00005622e7d4e6dc in plugin_dl_add (dl=dl@entry=0x14806d858a40, MyFlags=MyFlags@entry=0) at /test/10.4_dbg/sql/sql_plugin.cc:752
|
#22 0x00005622e7d5046c in plugin_dl_foreach (thd=thd@entry=0x148046015070, dl
|
And
|
10.4.15 de76bebc5798313e3798a35e02468e42254dd608 (Debug)
|
Core was generated by `/test/MD150920-mariadb-10.4.15-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
|
Program terminated with signal SIGSEGV, Segmentation fault.
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
|
at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
|
[Current thread is 1 (Thread 0x149a9151d700 (LWP 2721159))]
|
(gdb) bt
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
|
#1 0x000055a224603308 in my_write_core (sig=sig@entry=11) at /test/10.4_dbg/mysys/stacktrace.c:386
|
#2 0x000055a223d893c8 in handle_fatal_signal (sig=11) at /test/10.4_dbg/sql/signal_handler.cc:343
|
#3 <signal handler called>
|
#4 0x0000149a94a08fac in free () from /usr/lib/x86_64-linux-gnu/libjemalloc.so.1
|
#5 0x0000149a689b40fd in ?? () from /usr/lib/x86_64-linux-gnu/libp11-kit.so.0
|
#6 0x0000149a94c50ccc in _dl_close_worker (map=map@entry=0x149a64c42500, force=force@entry=false) at dl-close.c:288
|
#7 0x0000149a94c51afa in _dl_close_worker (force=false, map=0x149a64c42500) at dl-close.c:125
|
#8 _dl_close (_map=0x149a64c42500) at dl-close.c:842
|
#9 0x0000149a9314751f in __GI__dl_catch_exception (exception=exception@entry=0x149a91519270, operate=operate@entry=0x149a93af9070 <dlclose_doit>, args=args@entry=0x149a64c42500) at dl-error-skeleton.c:196
|
#10 0x0000149a931475af in __GI__dl_catch_error (objname=objname@entry=0x149a64c92050, errstring=errstring@entry=0x149a64c92058, mallocedp=mallocedp@entry=0x149a64c92048, operate=operate@entry=0x149a93af9070 <dlclose_doit>, args=args@entry=0x149a64c42500) at dl-error-skeleton.c:215
|
#11 0x0000149a93af9745 in _dlerror_run (operate=operate@entry=0x149a93af9070 <dlclose_doit>, args=0x149a64c42500) at dlerror.c:162
|
#12 0x0000149a93af90b3 in __dlclose (handle=<optimized out>) at dlclose.c:46
|
#13 0x000055a223ae4e21 in free_plugin_mem (p=p@entry=0x149a915193b0) at /test/10.4_dbg/sql/sql_plugin.cc:493
|
#14 0x000055a223ae6a79 in plugin_dl_add (dl=dl@entry=0x149a91519a40, MyFlags=MyFlags@entry=0) at /test/10.4_dbg/sql/sql_plugin.cc:842
|
#15 0x000055a223ae846c in plugin_dl_foreach (thd=thd@entry=0x149a64c15070, dl=dl@entry=0x149a91519a40, func=func@entry=0x55a223b61e02 <show_plugins(THD*, plugin_ref, void*)>, arg=arg@entry=0x149a64c76088) at /test/10.4_dbg/sql/sql_plugin.cc:2527
|
#16 0x000055a223b69820 in fill_all_plugins (thd=0x149a64c15070, tables=<optimized out>, cond=<optimized out>) at /test/10.4_dbg/sql/sql_show.cc:372
|
#17 0x000055a223b71028 in get_schema_tables_result (join=join@entry=0x149a64c6f880, executed_place=executed_place@entry=PROCESSED_BY_JOIN_EXEC) at /test/10.4_dbg/sql/sql_show.cc:8949
|
#18 0x000055a223b4fa6f in JOIN::exec_inner (this=this@entry=0x149a64c6f880) at /test/10.4_dbg/sql/sql_select.cc:4435
|
#19 0x000055a223b503df in JOIN::exec (this=this@entry=0x149a64c6f880) at /test/10.4_dbg/sql/sql_select.cc:4260
|
#20 0x000055a223b4e785 in mysql_select (thd=thd@entry=0x149a64c15070, tables=<optimized out>, wild_num=0, fields=@0x149a64c6d3e8: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x149a64c6d8d0, last = 0x149a64c6d8d0, elements = 1}, <No data fields>}, conds=0x0, og_num=<optimized out>, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2684619520, result=0x149a64c6f858, unit=0x149a64c18f98, select_lex=0x149a64c6d2a0) at /test/10.4_dbg/sql/sql_select.cc:4695
|
#21 0x000055a223b4eab0 in handle_select (thd=thd@entry=0x149a64c15070, lex=lex@entry=0x149a64c18ed8, result=result@entry=0x149a64c6f858, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.4_dbg/sql/sql_select.cc:410
|
#22 0x000055a223ac7504 in execute_sqlcom_select (thd=thd@entry=0x149a64c15070, all_tables=0x149a64c6d950) at /test/10.4_dbg/sql/sql_parse.cc:6355
|
#23 0x000055a223ad293c in mysql_execute_command (thd=thd@entry=0x149a64c15070) at /test/10.4_dbg/sql/sql_parse.cc:3889
|
#24 0x000055a223add237 in mysql_parse (thd=thd@entry=0x149a64c15070, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x149a9151c460, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.4_dbg/sql/sql_parse.cc:7896
|
#25 0x000055a223adfac7 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x149a64c15070, packet=packet@entry=0x149a64c57071 "SELECT COUNT(*) FROM INFORMATION_SCHEMA.ALL_PLUGINS", packet_length=packet_length@entry=51, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.4_dbg/sql/sql_parse.cc:1834
|
#26 0x000055a223ae3502 in do_command (thd=0x149a64c15070) at /test/10.4_dbg/sql/sql_parse.cc:1352
|
#27 0x000055a223c1111c in do_handle_one_connection (connect=connect@entry=0x149a91c36830) at /test/10.4_dbg/sql/sql_connect.cc:1412
|
#28 0x000055a223c1123c in handle_one_connection (arg=0x149a91c36830) at /test/10.4_dbg/sql/sql_connect.cc:1316
|
#29 0x0000149a93f876db in start_thread (arg=0x149a9151d700) at pthread_create.c:463
|
#30 0x0000149a93101a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
The latter is hit most frequently.
|
|
Further attempts with ASAN show something interesting. This is an ASAN build of 10.4.16:
|
10.4.16 c4f8ccc0f53a4de9ca0cac54392b35348494040c (Optimized)
|
10.4.16>CREATE TABLE t1(id int);
|
Query OK, 0 rows affected (0.060 sec)
|
10.4.16>insert into t1 values (1);
|
Query OK, 1 row affected (0.028 sec)
|
10.4.16>select * from t1;
|
+------+
|
| id |
|
+------+
|
| 1 |
|
+------+
|
1 row in set (0.016 sec)
|
10.4.16>SELECT COUNT(*) FROM INFORMATION_SCHEMA.ALL_PLUGINS;
|
ERROR 2013 (HY000): Lost connection to MySQL server during query
|
10.4.16>exit
|
qa-roel-2:/test/ASAN_MD261020-mariadb-10.4.16-linux-x86_64-opt$ tail -3 log/master.err
|
2020-11-07 18:06:18 0 [Note] /test/ASAN_MD261020-mariadb-10.4.16-linux-x86_64-opt/bin/mysqld: ready for connections.
|
Version: '10.4.16-MariaDB' socket: '/test/ASAN_MD261020-mariadb-10.4.16-linux-x86_64-opt/socket.sock' port: 10267 MariaDB Server
|
==443118==Your application is linked against incompatible ASan runtimes.
|
mysqld effectively terminates, without core, without error log update except the ASAN message. I first took this message at face value, but there is something else at play here.
I had also noticed that during startup, some (customary etc.) ASAN errors were reported, indicating that ASAN may be working correctly after all. This was further confirmed by first testing some queries (ref above), which worked fine.
In other words. the 'Your application is linked against incompatible ASan runtimes' is an event which can be triggered only in a specific scenario, and likely an invalid error message. And, the query SELECT COUNT(*) FROM INFORMATION_SCHEMA.ALL_PLUGINS; somehow triggers that situation.
I next tested with a large flat SQL file with varied queries to further validate normal ASAN operation. Various ASAN errors were produced, but not the 'Your application is linked against incompatible ASan runtimes' output!
I believe this certifies that ASAN is working correctly, but something is causing the - what looks to be incorrect - 'Your application is linked against incompatible ASan runtimes' output and immediate termination.
|
|
The above findings also meant there was now a way to test whether removing a particular plugin "fixes" the issue. Trial-and-error showed that the issue does not seem specific to one particular plugin, or that the issue may be sporadic under ASAN too (somewhat unlikely as I saw zero occurrences of sporadic behavior on retests below, nor during the plugin removal tests).
Results (no=no crash seen, yes=crash seen, alone=nothing else in plugin directories except this file):
No plugins at all: *no* (confirmed twice)
|
client_ed25519.so alone: *yes* (confirmed twice)
|
caching_sha2_password.so alone: *yes* (confirmed twice)
|
ha_rocksdb.so alone: *no* (confirmed twice)
|
ha_spider.so alone: *no* (confirmed twice)
|
handlersocket.so alone: *no* (confirmed twice)
|
file_key_management.so alone: *no* (confirmed twice)
|
many other plugins: not tested yet (ref below)
|
So for the moment only client_ed25519.so and caching_sha2_password.so produce the crash and immediate termination + incorrect ASAN error.
Perhaps it is something (some common class definition or similar) which is used in specific plugins but not in others?
I can do a test per plugin, but it is somewhat labor intensive, so I would probably automate the testing. Let me know if that is needed.
|