Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.6
Description
The issue described here affects testing regularly. It leads to some wasted work as existing bugs/asserts are incorrectly identified as new issues given the different stack they sporadically produce (ref below).
What happens is this: for various bugs/testcases (have captured different ones below), the original assert will be produced (correct assert + SIGABRT), but the stack will always be in the frames Diagnostics_area::set_error_status from THD::raise_condition from THD::raise_condition from my_message_sql.
An example of such observed asserts (uniqueID's):
table_list->table |SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql ## MDEV-19569
|
0 |SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql ## MDEV-17890
|
0 |SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql ## MDEV-22885
|
row_end |SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql ## MDEV-22660
|
marked_for_read() |SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql ## MDEV-24658
|
scale <= precision|SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql ## MDEV-25317
|
At some point I thought there to be a connection to these bugs, but I no longer believe this. Instead, it seems that (during the server crash/testcase replay), quite sporadically, the Diagnostics_area::set_error_status codepath is followed and the assertion (likely "any assertion", though maybe only for sporadic bugs/testcases) is presented with the Diagnostics_area::set_error_status as crashing frame.
It would seem to make little sense to connect the stacks of these assertions with the bugs in question, though let me know if you think otherwise.
It may be that this expression of the issue is 10.6 only (TBD). I have set it to 10.6 for the time being. The issue seems to have become much more prevalent lately.
It may also be specifically connected with issues which produce many different asserts across different versions like MDEV-25317 and MDEV-22660.
I have seen this Diagnostics_area::set_error_status sporadic issue in 10.5 before too, but the result was different in that the assert was not displayed. The assert now showing could be a result of the small code patch which we do before building as previously discussed with danblack. https://github.com/mariadb-corporation/mariadb-qa/blob/master/build_mdpsms_dbg.sh#L267-L270
The issue seems sporadic and not readily repeatable.
Attachments
Issue Links
- relates to
-
MDEV-17225 Assertion `log_descriptor.bc.buffer->prev_last_lsn == ((LSN)0)' failed in translog_flush_buffers
-
- Open
-
Activity
Very interesting related stack:
10.6.2 06dd151bb86ad5b87d4d46011f36da1289c01074 (Debug) |
Core was generated by `/test/MD050621-mariadb-10.6.2-linux-x86_64-dbg/bin/mysqld --no-defaults --lc-me'.
|
Program terminated with signal SIGABRT, Aborted.
|
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
|
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
|
[Current thread is 1 (Thread 0x14ba9a5ff800 (LWP 2280181))]
|
(gdb) bt
|
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
|
#1 0x000014ba9a7bb859 in __GI_abort () at abort.c:79
|
#2 0x000014ba9a7bb729 in __assert_fail_base (fmt=0x14ba9a951588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
|
assertion=0x563e45193a88 "! is_set() || m_can_overwrite_status",
|
file=0x563e45193830 "/test/10.6_dbg/sql/sql_error.cc", line=444, function=<optimized out>) at assert.c:92
|
#3 0x000014ba9a7ccf36 in __GI___assert_fail (
|
assertion=assertion@entry=0x563e45193a88 "! is_set() || m_can_overwrite_status",
|
file=file@entry=0x563e45193830 "/test/10.6_dbg/sql/sql_error.cc", line=line@entry=444,
|
function=function@entry=0x563e45193a00 "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*)") at assert.c:101
|
#4 0x0000563e444a59ab in Diagnostics_area::set_error_status (this=this@entry=0x563e488995a0,
|
sql_errno=sql_errno@entry=6,
|
message=message@entry=0x7fff7f3b65f0 "Error on delete of '/tmp/#sql-temptable-22caf5-2-1.MAD' (Errcode: 2 \"No such file or directory\")", sqlstate=sqlstate@entry=0x563e451b10a8 "HY000", ucid=..., error_condition=0x0)
|
at /test/10.6_dbg/sql/sql_error.h:1015
|
#5 0x0000563e4448526f in THD::raise_condition (this=this@entry=0x563e48893718, sql_errno=sql_errno@entry=6,
|
sqlstate=0x563e451b10a8 "HY000", sqlstate@entry=0x0, level=<optimized out>, ucid=...,
|
msg=msg@entry=0x7fff7f3b65f0 "Error on delete of '/tmp/#sql-temptable-22caf5-2-1.MAD' (Errcode: 2 \"No such file or directory\")") at /test/10.6_dbg/sql/sql_class.cc:1126
|
#6 0x0000563e443e3885 in THD::raise_condition (
|
msg=0x7fff7f3b65f0 "Error on delete of '/tmp/#sql-temptable-22caf5-2-1.MAD' (Errcode: 2 \"No such file or directory\")", level=<optimized out>, sqlstate=0x0, sql_errno=6, this=0x563e48893718) at /test/10.6_dbg/sql/sql_error.h:217
|
#7 my_message_sql (error=6,
|
str=0x7fff7f3b65f0 "Error on delete of '/tmp/#sql-temptable-22caf5-2-1.MAD' (Errcode: 2 \"No such file or directory\")", MyFlags=4) at /test/10.6_dbg/sql/mysqld.cc:3238
|
#8 0x0000563e45020730 in my_error (nr=nr@entry=6, MyFlags=MyFlags@entry=4) at /test/10.6_dbg/mysys/my_error.c:124
|
#9 0x0000563e45020372 in my_delete (name=name@entry=0x7fff7f3b6bf0 "/tmp/#sql-temptable-22caf5-2-1.MAD",
|
MyFlags=MyFlags@entry=16) at /test/10.6_dbg/mysys/my_delete.c:53
|
#10 0x0000563e45029c33 in my_handler_delete_with_symlink (
|
filename=filename@entry=0x7fff7f3b6bf0 "/tmp/#sql-temptable-22caf5-2-1.MAD", sync_dir=sync_dir@entry=16)
|
at /test/10.6_dbg/mysys/my_symlink2.c:190
|
#11 0x0000563e44ac398f in inline_mysql_file_delete_with_symlink (key=0,
|
src_file=src_file@entry=0x563e453a4a60 "/test/10.6_dbg/storage/maria/ma_delete_table.c",
|
src_line=src_line@entry=103, name=name@entry=0x563e488f5a20 "/tmp/#sql-temptable-22caf5-2-1",
|
ext=ext@entry=0x563e4538d2fd ".MAD", flags=flags@entry=16) at /test/10.6_dbg/include/mysql/psi/mysql_file.h:1396
|
#12 0x0000563e44ac3a7f in maria_delete_table_files (name=name@entry=0x563e488f5a20 "/tmp/#sql-temptable-22caf5-2-1",
|
temporary=temporary@entry=1 '\001', flags=flags@entry=16) at /test/10.6_dbg/storage/maria/ma_delete_table.c:103
|
#13 0x0000563e44a4fd0a in ha_maria::drop_table (this=0x563e488f6860,
|
name=0x563e488f5a20 "/tmp/#sql-temptable-22caf5-2-1") at /test/10.6_dbg/storage/maria/ha_maria.cc:2793
|
#14 0x0000563e447dfc6f in handler::ha_drop_table (this=0x563e488f6860,
|
name=0x563e488f5a20 "/tmp/#sql-temptable-22caf5-2-1") at /test/10.6_dbg/sql/handler.cc:5049
|
#15 0x0000563e44556298 in free_tmp_table (thd=thd@entry=0x563e48893718, entry=0x563e488f48e0)
|
at /test/10.6_dbg/sql/sql_select.cc:20289
|
#16 0x0000563e44466f7e in close_thread_tables (thd=thd@entry=0x563e48893718) at /test/10.6_dbg/sql/sql_base.cc:840
|
#17 0x0000563e44505e9f in mysql_execute_command (thd=thd@entry=0x563e48893718) at /test/10.6_dbg/sql/sql_parse.cc:6049
|
#18 0x0000563e444ebadc in mysql_parse (thd=0x563e48893718, rawbuf=<optimized out>, length=<optimized out>,
|
parser_state=parser_state@entry=0x7fff7f3b7510) at /test/10.6_dbg/sql/sql_parse.cc:8016
|
#19 0x0000563e444f86c4 in bootstrap (file=0x563e463bd930 <instrumented_stdin>) at /test/10.6_dbg/sql/sql_class.h:1337
|
#20 0x0000563e443ee0b1 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.6_dbg/sql/mysqld.cc:5736
|
#21 0x0000563e443dfb36 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.6_dbg/sql/main.cc:34
|
Specifically:
(gdb) f 4
|
#4 0x0000563e444a59ab in Diagnostics_area::set_error_status (this=this@entry=0x563e488995a0,
|
sql_errno=sql_errno@entry=6,
|
message=message@entry=0x7fff7f3b65f0 "Error on delete of '/tmp/#sql-temptable-22caf5-2-1.MAD' (Errcode: 2 \"No such file or directory\")", sqlstate=sqlstate@entry=0x563e451b10a8 "HY000", ucid=..., error_condition=0x0)
|
at /test/10.6_dbg/sql/sql_error.h:1015
|
More interesting findings. Using the testcase from MDEV-22885, was able to capture a stack of the diversion, and interestingly, the same problem is present, though with a different extension MAI instead of MAD: "Error on delete of '/tmp/#sql-temptable-989b6-1-26.MAI' (Errcode: 2 \"No such file or directory\")" in frame 4:
10.6.2 06dd151bb86ad5b87d4d46011f36da1289c01074 (Debug) |
(gdb) bt
|
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
|
#1 0x0000152e8b212859 in __GI_abort () at abort.c:79
|
#2 0x0000152e8b212729 in __assert_fail_base (fmt=0x152e8b3a8588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
|
assertion=0x560907ce0a88 "! is_set() || m_can_overwrite_status",
|
file=0x560907ce0830 "/test/10.6_dbg/sql/sql_error.cc", line=444, function=<optimized out>) at assert.c:92
|
#3 0x0000152e8b223f36 in __GI___assert_fail (
|
assertion=assertion@entry=0x560907ce0a88 "! is_set() || m_can_overwrite_status",
|
file=file@entry=0x560907ce0830 "/test/10.6_dbg/sql/sql_error.cc", line=line@entry=444,
|
function=function@entry=0x560907ce0a00 "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*)") at assert.c:101
|
#4 0x0000560906ff29ab in Diagnostics_area::set_error_status (this=this@entry=0x56090aa94850,
|
sql_errno=sql_errno@entry=6,
|
message=message@entry=0x7ffd21b27b20 "Error on delete of '/tmp/#sql-temptable-989b6-1-26.MAI' (Errcode: 2 \"No such file or directory\")", sqlstate=sqlstate@entry=0x560907cfe0a8 "HY000", ucid=..., error_condition=0x0)
|
at /test/10.6_dbg/sql/sql_error.h:1015
|
#5 0x0000560906fd226f in THD::raise_condition (this=this@entry=0x56090aa8e9c8, sql_errno=sql_errno@entry=6,
|
sqlstate=0x560907cfe0a8 "HY000", sqlstate@entry=0x0, level=<optimized out>, ucid=...,
|
msg=msg@entry=0x7ffd21b27b20 "Error on delete of '/tmp/#sql-temptable-989b6-1-26.MAI' (Errcode: 2 \"No such file or directory\")") at /test/10.6_dbg/sql/sql_class.cc:1126
|
#6 0x0000560906f30885 in THD::raise_condition (
|
msg=0x7ffd21b27b20 "Error on delete of '/tmp/#sql-temptable-989b6-1-26.MAI' (Errcode: 2 \"No such file or directory\")", level=<optimized out>, sqlstate=0x0, sql_errno=6, this=0x56090aa8e9c8) at /test/10.6_dbg/sql/sql_error.h:217
|
#7 my_message_sql (error=6,
|
str=0x7ffd21b27b20 "Error on delete of '/tmp/#sql-temptable-989b6-1-26.MAI' (Errcode: 2 \"No such file or directory\")", MyFlags=4) at /test/10.6_dbg/sql/mysqld.cc:3238
|
#8 0x0000560907b6d730 in my_error (nr=nr@entry=6, MyFlags=MyFlags@entry=4) at /test/10.6_dbg/mysys/my_error.c:124
|
#9 0x0000560907b6d372 in my_delete (name=name@entry=0x7ffd21b28120 "/tmp/#sql-temptable-989b6-1-26.MAI",
|
MyFlags=MyFlags@entry=16) at /test/10.6_dbg/mysys/my_delete.c:53
|
#10 0x0000560907b76c33 in my_handler_delete_with_symlink (
|
filename=filename@entry=0x7ffd21b28120 "/tmp/#sql-temptable-989b6-1-26.MAI", sync_dir=sync_dir@entry=16)
|
at /test/10.6_dbg/mysys/my_symlink2.c:190
|
#11 0x000056090761098f in inline_mysql_file_delete_with_symlink (key=0,
|
src_file=src_file@entry=0x560907ef1a60 "/test/10.6_dbg/storage/maria/ma_delete_table.c",
|
src_line=src_line@entry=100, name=name@entry=0x56090b0860e8 "/tmp/#sql-temptable-989b6-1-26",
|
ext=ext@entry=0x560907eda280 ".MAI", flags=flags@entry=16) at /test/10.6_dbg/include/mysql/psi/mysql_file.h:1396
|
#12 0x0000560907610a52 in maria_delete_table_files (name=name@entry=0x56090b0860e8 "/tmp/#sql-temptable-989b6-1-26",
|
temporary=temporary@entry=1 '\001', flags=flags@entry=16) at /test/10.6_dbg/storage/maria/ma_delete_table.c:100
|
#13 0x000056090759cd0a in ha_maria::drop_table (this=0x56090b087780,
|
name=0x56090b0860e8 "/tmp/#sql-temptable-989b6-1-26") at /test/10.6_dbg/storage/maria/ha_maria.cc:2793
|
#14 0x000056090732cc6f in handler::ha_drop_table (this=0x56090b087780,
|
name=0x56090b0860e8 "/tmp/#sql-temptable-989b6-1-26") at /test/10.6_dbg/sql/handler.cc:5049
|
#15 0x00005609070a3298 in free_tmp_table (thd=thd@entry=0x56090aa8e9c8, entry=0x56090b084bd0)
|
at /test/10.6_dbg/sql/sql_select.cc:20289
|
#16 0x0000560906fb3f7e in close_thread_tables (thd=thd@entry=0x56090aa8e9c8) at /test/10.6_dbg/sql/sql_base.cc:840
|
#17 0x0000560907052e9f in mysql_execute_command (thd=thd@entry=0x56090aa8e9c8) at /test/10.6_dbg/sql/sql_parse.cc:6049
|
#18 0x0000560907038adc in mysql_parse (thd=0x56090aa8e9c8, rawbuf=<optimized out>, length=<optimized out>,
|
parser_state=parser_state@entry=0x7ffd21b28a40) at /test/10.6_dbg/sql/sql_parse.cc:8016
|
#19 0x00005609070456c4 in bootstrap (file=0x560908f0a930 <instrumented_stdin>) at /test/10.6_dbg/sql/sql_class.h:1337
|
#20 0x0000560906f3b0b1 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.6_dbg/sql/mysqld.cc:5736
|
#21 0x0000560906f2cb36 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.6_dbg/sql/main.cc:34
|
I have a suspicion that this bug is much more serious then expected earlier, and that it is indeed a new situation in recent 10.6 revisions only.
When a testcase like the one in MDEV-25890 is being executed against many different versions at the same time, then several/many mysqld's hang, whereas when you test each of those mysqld versions individually the bug doesn't even produce the issue. Thus they seem to affect each other . I have seen this twice now in both MDEV-17225 and in
MDEV-25890.
This is not a bug to fix. Instead, this is a collection of various unrelated bugs. A meta-bug, in a way.
Please, create separate entries for different Diagnostics_area::set_error_status crashes.
You can link them all into this one for some kind of grouping purposes. Or you can drop/close this one completely, as you like.
Understood.
The Diagnostics_area::set_error_status crashes will likely sooner or later show for all assertion bugs, and that in a sporadic fashion (1 in x).
It is as if this code path is hit occasionally due to an as yet unknown sequence of events (and now more often than in the past it looks like).
If no further thoughts, I'll close this for the time being.
The underlying cause for these is more clearly defined in https://jira.mariadb.org/browse/MDEV-22768?focusedCommentId=195197&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195197
I continue seeing this issue regularly. Here is what we know thus far;
- The issue seems to happen mostly on testcases that have TEMPORARY in table DDL.
- The issue can happen for seemingly any random testcase which eventually crashes the server.
- The issue seems more pronounced (or "only present") when replaying a given crashing testcase against different servers/versions at the same time. Cross-server interaction (with nothing shared) may be a possibility.
- The SIGABRT|Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql assert happens, sporadically, instead of the crash a given testcase would normally generate.
- A fix or way to prevent this specific crash would benefit test work as regularly this SIGABRT will pollute test results, requiring a re-run of the same testcase.
Bit more info (rewritten in another way to make it clear what is happening)
1. I have been seeing various crashes previously in 10.5 in Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql when doing testing. So for example bug xyz would produce - for example for 1 in 50 runs - sporadically this stack, rather than giving the assert or SIGSEGV applicable to the testcase being executed.
2. The issue was not too disturbing for work, given that it happened very infrequently.
3. Recently, two things started happening;
3.1. I am seeing this stack a whole lot more
3.2. Now these asserts will also actually contain the assertion message itself
4. One way to try and nail this down would be to take testcase xyz and repeat it over and over, and check each resulting crash to see if it is the standard crash expected for that testcase, or if it is the diversion into the Diagnostics_area::set_error_status stack. Once found, at least there will be a (sporadic) way to reproduce it reliably.
5. The issue seems much more prevalent in 10.6, and it happens mostly when there is a high test load on the servers (usually seen when testing the same testcase against different versions at the same time).