[MDEV-23657] SIGSEGV in malloc_size_and_flag from my_free / PROFILING::finish_current_query (on optimized builds) Created: 2020-09-03 Updated: 2021-04-15 Resolved: 2021-04-15 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | N/A |
| Affects Version/s: | 10.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Roel Van de Paar | Assignee: | Sujatha Sivakumar (Inactive) |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
This bug is likely one of the most sporadic issues I have ever worked on. Both elenst and myself have observed this bug in QA runs. It is very hard to reduce and reproduce. I am thus uploading everything I have so far in the hope that a developer can find the issue through code + core dump analysis. The stack trace is also very short so hopefully bug analysis is not hard. A full core dump is available. An SQL trace reduced to just over 8K lines is attached as
All other details + files (core + mysqld + ldd files, data dir, error log, reduced testcase) uploaded as attachment. elenst had a great idea to change the reducer from an exact bug match to a more generic one, and I next tried to reduce towards 'malloc' in the error log only, rather then looking for a specific unique bug ID match. This led to the following testcase (exactly as reduced to avoid the risk of non-reproducibility):
Likely quite a few items in this testcase are superfluous like the initial DROP DATABASE transforms statement etc. And please note that this testcase leads to malloc being mentioned in the error log only, not to the stack shown above. From memory, I setup reducer to reduce based on the MySQL client: I expect this issue to be reproducible in high repetition parallel MTR runs with minor modifications if required for MTR. The same applies for the 8K SQL testcase ( For reference, I have also uploaded the original SQL trace (88K lines) as
|
| Comments |
| Comment by Roel Van de Paar [ 2020-09-03 ] | |||||||||||||||||
|
With thanks to elenst - the shorter version of the testcase seems to lead us back to | |||||||||||||||||
| Comment by Roel Van de Paar [ 2020-09-04 ] | |||||||||||||||||
|
A scan for 'malloc' in the logs while reducing gives different issues;
However, some good news, using a new 10.4 build made today (@ rev: 1cda462f46305daf2a5becb1ed0ce4fcdf3ae404) which includes the patch for In summary, it may be that the patch for | |||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2020-09-04 ] | |||||||||||||||||
|
Hello Roel Thank you for the bug report. I looked into the stack trace, but I could not associate the current issue with | |||||||||||||||||
| Comment by Roel Van de Paar [ 2020-09-04 ] | |||||||||||||||||
|
Thank you sujatha.sivakumar. From the testing with the patched version it "looks" gone but indeed I can also not say for sure it is. Let's keep an eye open. Deadline proposal; if this issue was not re-observed by 1-1-2021 I think we can close it. | |||||||||||||||||
| Comment by Roel Van de Paar [ 2020-09-04 ] | |||||||||||||||||
|
Small update; elenst is seeing this particular bug also for trials which are not using replicate variables. | |||||||||||||||||
| Comment by Roel Van de Paar [ 2020-11-20 ] | |||||||||||||||||
|
Attempting to run the original (shorter) testcase under ASAN on various versions including 10.4 yield no results. | |||||||||||||||||
| Comment by Roel Van de Paar [ 2020-11-20 ] | |||||||||||||||||
|
However, running the longer testcase under ASAN did produce a number of ASAN issues, reported in the error log. sujatha.sivakumar I uploaded the error log. Would you please have a look to see if any of the issues in there clearly related to the SIGSEGV? If so, I will reduce towards such a particular issue. All I need is a specific string from the ASAN trace to validate against, for example 'Query_arena::set_query_arena' etc. It looks like most of the issues (like the Query_arena) are known, i.e. | |||||||||||||||||
| Comment by Roel Van de Paar [ 2021-02-26 ] | |||||||||||||||||
|
Hi sujatha.sivakumar. Could you have a look at the ASAN issues seen to see if there is any clearly related to the SIGSEGV please? If so, I can reduce towards that. Thank you. | |||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2021-03-01 ] | |||||||||||||||||
|
Hello Roel Thank you for the details. The SIGSEGV is caused during the cleanup of 'QUERY_PROFILE' objects. Upon enabling 'set profiling=1' a history of executed queries is maintained in a queue. By default 'profiling_history_size=15'. When number of query elements within history queue execede this size, elements are popped from the history queue.
When history.elements=16 and thd->variables.profiling_history_size=15 then element is popped from queue as shown below.
The SIGSEGV seems to be caused because of an invalid QUERY_PROFILE item being freed. I looked into the master.err file (Excluded Query_arena), I could not find any relation between ASAN stack traces and current SIGSEGV. | |||||||||||||||||
| Comment by Roel Van de Paar [ 2021-04-15 ] | |||||||||||||||||
|
I did some more testing and could not reproduce this issue. |