[MDEV-25874] Various sporadic asserts produced via an alternative Diagnostics_area::set_error_status codepath Created: 2021-06-08 Updated: 2021-12-26 Resolved: 2021-07-10 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.6 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Roel Van de Paar | Assignee: | Roel Van de Paar |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | affects-tests, cross-mysqld-interaction | ||
| Issue Links: |
|
||||||||
| Description |
|
The issue described here affects testing regularly. It leads to some wasted work as existing bugs/asserts are incorrectly identified as new issues given the different stack they sporadically produce (ref below). What happens is this: for various bugs/testcases (have captured different ones below), the original assert will be produced (correct assert + SIGABRT), but the stack will always be in the frames Diagnostics_area::set_error_status from THD::raise_condition from THD::raise_condition from my_message_sql. An example of such observed asserts (uniqueID's):
At some point I thought there to be a connection to these bugs, but I no longer believe this. Instead, it seems that (during the server crash/testcase replay), quite sporadically, the Diagnostics_area::set_error_status codepath is followed and the assertion (likely "any assertion", though maybe only for sporadic bugs/testcases) is presented with the Diagnostics_area::set_error_status as crashing frame. It would seem to make little sense to connect the stacks of these assertions with the bugs in question, though let me know if you think otherwise. It may be that this expression of the issue is 10.6 only (TBD). I have set it to 10.6 for the time being. The issue seems to have become much more prevalent lately. I have seen this Diagnostics_area::set_error_status sporadic issue in 10.5 before too, but the result was different in that the assert was not displayed. The assert now showing could be a result of the small code patch which we do before building as previously discussed with danblack. https://github.com/mariadb-corporation/mariadb-qa/blob/master/build_mdpsms_dbg.sh#L267-L270 The issue seems sporadic and not readily repeatable. |
| Comments |
| Comment by Roel Van de Paar [ 2021-06-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bit more info (rewritten in another way to make it clear what is happening) 1. I have been seeing various crashes previously in 10.5 in Diagnostics_area::set_error_status|THD::raise_condition|THD::raise_condition|my_message_sql when doing testing. So for example bug xyz would produce - for example for 1 in 50 runs - sporadically this stack, rather than giving the assert or SIGSEGV applicable to the testcase being executed. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-06-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Very interesting related stack:
Specifically:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-06-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
More interesting findings. Using the testcase from MDEV-22885, was able to capture a stack of the diversion, and interestingly, the same problem is present, though with a different extension MAI instead of MAD: "Error on delete of '/tmp/#sql-temptable-989b6-1-26.MAI' (Errcode: 2 \"No such file or directory\")" in frame 4:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-06-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I have a suspicion that this bug is much more serious then expected earlier, and that it is indeed a new situation in recent 10.6 revisions only. When a testcase like the one in | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2021-06-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This is not a bug to fix. Instead, this is a collection of various unrelated bugs. A meta-bug, in a way. You can link them all into this one for some kind of grouping purposes. Or you can drop/close this one completely, as you like. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-07-01 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Understood. The Diagnostics_area::set_error_status crashes will likely sooner or later show for all assertion bugs, and that in a sporadic fashion (1 in x). It is as if this code path is hit occasionally due to an as yet unknown sequence of events (and now more often than in the past it looks like). If no further thoughts, I'll close this for the time being. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-07-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The underlying cause for these is more clearly defined in https://jira.mariadb.org/browse/MDEV-22768?focusedCommentId=195197&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-195197 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Roel Van de Paar [ 2021-12-26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I continue seeing this issue regularly. Here is what we know thus far;
|