[MDEV-33362] stop slave in 10.6.12 with tcmalloc added causes crash Created: 2024-02-02 Updated: 2024-02-07 |
|
| Status: | Needs Feedback |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.6.12 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major |
| Reporter: | Bruno Bear | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Docker on Debian 11 |
||
| Attachments: |
|
| Description |
|
After having memory issues we added tcmalloc to the official maradb 10.6.12 docker image and restarted a slave server with it. How we added tcmalloc:
Slave was running fine until we used "stop slave" to switch to a new master. This resulted in an immediate crash: "[ERROR] mysqld got signal 6 ;" |
| Comments |
| Comment by Sergei Golubchik [ 2024-02-02 ] |
|
1. could you show few more lines from the log? before the "mysqld got signal 6" ? if it's assert, they'll tell what exactly condition was false. 2. could you also install binutils? they include addr2line, so you'll get rid of "Printing to addr2line failed" and instead will get resolved stack trace with file names and line numbers. |
| Comment by Kristian Nielsen [ 2024-02-02 ] |
|
This crash/stacktrace looks very similar to those in MDEV-25633 and In those other MDEVs, the problem was the mixing of different versions of the same library, due to both statically and dynamically linking of libraries. But this does not appear to be the case here, the entire stack trace seems to be within the standard system .so dynamic libraries. So while the symptoms seem obviously related, I cannot immediately think of a common root cause. There might be some conflict between libtcmalloc and the system libraries, but again I cannot immediately think what it could be, also it seems to be all standard dynamic .so system libraries used, which should hopefully be compatible (unless the google-perftools are from a 3rd-party repo?) I wonder if there is something wrong in the way the mariadb code uses threads (or <something>) around this that makes it particularly fragile to assertions inside libunwind... but I have no concrete ideas unfortunately. Maybe Sergei's suggestion can give a few more hints to what could be wrong. |
| Comment by Bruno Bear [ 2024-02-02 ] |
|
Ich will try to deliver what Sergei asked for, as soon as our master slave setup is up again. |
| Comment by Bruno Bear [ 2024-02-06 ] |
|
I startet the slave again with the tcmalloc image, used "stop slave" and it crashed. |
| Comment by Bruno Bear [ 2024-02-06 ] |
|
I found a solution. |
| Comment by Sergei Golubchik [ 2024-02-06 ] |
|
Well, the idea was to use any other memory allocator, not the system one. Because it'll use a different memory allocation pattern. So yes, I suppose "libtcmalloc_minimal" is different enough, let's see if it helps |
| Comment by Kristian Nielsen [ 2024-02-06 ] |
|
Thanks for the additional stacktrace. From this, we can see that it crashes in __cxa_get_globals, which means it is different from the problem seen in MDEV-25633 and A web search pops up some discussions that suggest some problem with thread local storage, but I wasn't really able to get any ideas what could be the root cause, unfortunately. But good that you found a solution with a different libtcmalloc. |
| Comment by Bruno Bear [ 2024-02-07 ] |
|
I only found a solution for "stop slave", because that was the only error I got right after one day. So far I dont know if MariaDB is stable in general with "libtcmalloc_minimal" or if it just fixed that one command. |