[MDEV-27156] MariaDB 10.5.13 Galera Node core dumped Created: 2021-12-02 Updated: 2024-01-05 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Galera, Server |
| Affects Version/s: | 10.5.13 |
| Fix Version/s: | 10.5, 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Rumen Palov | Assignee: | Jan Lindström |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | crash, galera, replication, untable_to_start_again | ||
| Environment: |
MariaDB 10.5.13 , galera provider 26.4.10 ,FreeBSD 12 and 13 . ZFS storage, 1500G RAM , 64 or 96 Cores |
||
| Description |
|
Hello , one of our Galera Nodes dies hours ago with following state:
If we try to join the back to the cluster the same behavior repeat . First we have the same situation with node which was GTID replica slave of this one. What can we to supply more useful information? The core dump file is 150G |
| Comments |
| Comment by Daniel Black [ 2021-12-02 ] |
|
Looks similar to MDEV-26141 in the message. Are you doing similar SQL operations? From the core, can you obtain a backtrace - https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#analyzing-a-core-file-with-gdb-on-linux ( I assume the Linux method is similar to FreeBSD. Is it compiled with debug symbols there? Or are those a separate package? They will be needed to make a bit more sense out of it) That aside and out of interest, I'm wondering why the core is so big, MADV_NOCORE is used on the large allocations. How big is the core compared to a running memory resident mariadbd process? |
| Comment by Rumen Palov [ 2021-12-03 ] |
|
Hello Daniel, The RES memory of the process is between 200G and 500G depends if it is write accepting node or not. In our case it was not write accepting. Mariadb was not compiled with the DEBUG symbols, default in the port. The core dump from 10.5.13 was deleted in production recovery procedure. We have the same situation with 10.5.9 - identical output in error log with preserved core dump. I will try to get backtrace from it Cheers |
| Comment by Marko Mäkelä [ 2024-01-05 ] |
|
First, MariaDB Server 10.6 is not supposed to crash on corrupted data, ever since There used to be a problem with the default wsrep_sst_method=rsync, which allowed InnoDB to write to data files while a snapshot transfer (SST) was in progress. This was fixed by me in Every now and then we reproduce (typically in 10.6 or later) and fix (in 10.5 or a later applicable release) some bugs in crash recovery and mariadb-backup. Such bugs could affect all other modes than wsrep_sst_method=mysqldump. |