Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.4(EOL)
-
None
Description
This bug is likely one of the most sporadic issues I have ever worked on. Both elenst and myself have observed this bug in QA runs. It is very hard to reduce and reproduce. I am thus uploading everything I have so far in the hope that a developer can find the issue through code + core dump analysis. The stack trace is also very short so hopefully bug analysis is not hard. A full core dump is available. An SQL trace reduced to just over 8K lines is attached as MDEV-23657_8K.sql, but reproducibility with the same (only a single thread is needed) is ultra low.
10.4.15 eae968f62d285de97ed607c87bc131cd863d5d03 (Optimized) |
Core was generated by `/test/MD110820-mariadb-10.4.15-linux-x86_64-opt/bin/mysqld --no-defaults --max_'.
|
Program terminated with signal SIGSEGV, Segmentation fault.
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
|
at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
|
[Current thread is 1 (Thread 0x14ad7c0c9700 (LWP 876069))]
|
(gdb) bt
|
#0 __pthread_kill (threadid=<optimized out>, signo=signo@entry=11)
|
at ../sysdeps/unix/sysv/linux/pthread_kill.c:57
|
#1 0x000055b679d54a77 in my_write_core (sig=sig@entry=11) at /test/10.4_opt/mysys/stacktrace.c:482
|
#2 0x000055b67972c62a in handle_fatal_signal (sig=11) at /test/10.4_opt/sql/signal_handler.cc:343
|
#3 <signal handler called>
|
#4 malloc_size_and_flag (is_thread_specific=<synthetic pointer>, p=0x9)
|
at /test/10.4_opt/mysys/my_malloc.c:43
|
#5 my_free (ptr=0x9) at /test/10.4_opt/mysys/my_malloc.c:213
|
#6 0x000055b679606348 in PROFILING::finish_current_query_impl (this=this@entry=0x14ad340042f0)
|
at /test/10.4_opt/sql/sql_profile.cc:391
|
#7 0x000055b679524dc4 in PROFILING::finish_current_query (this=0x14ad340042f0)
|
at /test/10.4_opt/sql/sql_profile.h:302
|
#8 dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x14ad34000c08,
|
packet=<optimized out>, packet@entry=0x14ad34924b69 "", packet_length=<optimized out>,
|
packet_length@entry=86, is_com_multi=is_com_multi@entry=false,
|
is_next_command=is_next_command@entry=false) at /test/10.4_opt/sql/sql_parse.cc:2466
|
#9 0x000055b679526e04 in do_command (thd=0x14ad34000c08) at /test/10.4_opt/sql/sql_parse.cc:1352
|
#10 0x000055b679603dbe in do_handle_one_connection (connect=connect@entry=0x55b67bfeb1d8)
|
at /test/10.4_opt/sql/sql_connect.cc:1412
|
#11 0x000055b679603e7d in handle_one_connection (arg=arg@entry=0x55b67bfeb1d8)
|
at /test/10.4_opt/sql/sql_connect.cc:1316
|
#12 0x000055b679c5c4ea in pfs_spawn_thread (arg=0x55b67bf71958)
|
at /test/10.4_opt/storage/perfschema/pfs.cc:1869
|
#13 0x000014ad98ad36db in start_thread (arg=0x14ad7c0c9700) at pthread_create.c:463
|
#14 0x000014ad97c4da3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
|
All other details + files (core + mysqld + ldd files, data dir, error log, reduced testcase) uploaded as attachment.
elenst had a great idea to change the reducer from an exact bug match to a more generic one, and I next tried to reduce towards 'malloc' in the error log only, rather then looking for a specific unique bug ID match.
This led to the following testcase (exactly as reduced to avoid the risk of non-reproducibility):
DROP DATABASE transforms;CREATE DATABASE transforms;DROP DATABASE test;CREATE DATABASE test;USE test;
|
DROP DATABASE transforms;
|
SET @@GLOBAL.OPTIMIZER_SWITCH="duplicateweedout=ON";#ERROR: 1231 - Variable 'optimizer_switch' can't be set to the value of 'duplicateweedout=ON'
|
ALTER TABLE tab MODIFY COLUMN c3 POLYGON NOT NULL;#ERROR: 1399 - XAER_RMFAIL: The command cannot be executed when global transaction is in the ACTIVE state
|
SELECT * FROM t1 WHERE c2 BETWEEN '1971-01-01 00:00:01' AND '2038-01-09 03:14:07' ORDER BY c1,c2 DESC;#ERROR: 1054 - Nepoznata kolona 'c2' u 'where clause'
|
insert into ti (id) values (1) on duplicate key update y = 0, z = 42;#ERROR: 1792 - Cannot execute statement in a READ ONLY transaction
|
SET @@session.default_master_connection = '1234-5678';#NOERROR
|
SELECT ST_NUMINTERIORRINGS(1);#ERROR: 4079 - Illegal parameter data type int for operation 'st_numinteriorrings'
|
change master to IGNORE_SERVER_IDS= ();#NOERROR
|
SET @@GLOBAL.replicate_wild_ignore_table="";#NOERROR
|
SELECT SLEEP(3);
|
Likely quite a few items in this testcase are superfluous like the initial DROP DATABASE transforms statement etc. And please note that this testcase leads to malloc being mentioned in the error log only, not to the stack shown above. From memory, I setup reducer to reduce based on the MySQL client: I expect this issue to be reproducible in high repetition parallel MTR runs with minor modifications if required for MTR. The same applies for the 8K SQL testcase (MDEV-23657_8K.sql).
For reference, I have also uploaded the original SQL trace (88K lines) as MDEV-23657_FULL_ORIGINAL.sql.tar.gz and the full set file (ref first comment below) also includes all in-between stages (_out files, the more _out's the more reduced)
MDEV-22706 has a slightly similar stacktrace, though I doubt it's related, and it was previously fixed.
Attachments
Issue Links
- relates to
-
MDEV-23534 SIGSEGV in sf_malloc_usable_size/my_free on SET GLOBAL REPLICATE_DO_TABLE
- Closed