[MDEV-24892] mariadb/galera 10.3.27 crashes during certain queries Created: 2021-02-16  Updated: 2021-03-30  Resolved: 2021-03-30

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.3.27
Fix Version/s: 10.3.29

Type: Bug Priority: Major
Reporter: Sven Kieske Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 2
Labels: None
Environment:

Ubuntu 18.04.5 LTS in docker 19.03.8 on ubuntu 18.04.4 with kernel version 5.3.0.-40-generic #32
3 node HA setup
wsrep_provider = /usr/lib/galera/libgalera_smm.so
wsrep_sst_method = mariabackup


Attachments: File os_ctrl_0_debug2021_03_23.tar     File os_ctrl_1_debug_2021_03_23.log     File os_ctrl_2_debug_2021_03_23.log    

 Description   

Hello, we experienced multiple mariadb/galera crashes in the mentioned environment.

After googling around I think we hit this bug, but I'm not 100% sure as I'm not familiar with the sourcecode of galera nor mariadb:

MDEV-23851

So I figured I create a separate bugreport, in case it is something else (maybe related?).

Jira does not let me attach the file (I always get an error), so here's a link to the log:
https://drive.google.com/file/d/1B3EpKL2R4lgB-NOzvnzxd4zhzQqUl4lF/view?usp=sharing

You can easily find the crash by searching for "assertion".

If you need any other/more information regarding these crashes, please don't hesitate to contact me here or via email.

If this is indeed the same bug as MDEV-23851 may I kindly ask if there is any timeline when the fix in 10.3.28 will be released? Even a raw estimate would be really helpful, because we must decide if it is worthwile to rollback to a more stable release.

For reference (maybe others hit this bug as well):

This occurred in a pre production openstack environment during setup of kubernetes clusters via the "magnum" project of openstack.

We were also able to sometimes reproduce this issue in our virtual test environment during cluster creation, specifically during network setup for the cluster, so it seems magnum is able to produce the relevant queries to make galera/mariadb crash.



 Comments   
Comment by Jan Lindström (Inactive) [ 2021-03-01 ]

Hi, it is not exactly the same issue as MDEV-23851. To further get to bottom of this can you run with wsrep-debug=1 and then provide error logs from both nodes?

Comment by Sven Kieske [ 2021-03-01 ]

Thanks for coming back to me.

Yes I will try this setting and try to recreate the crash.

Notice that we did update in the meantime to 10.3.28, but if these bugs are not related I'm confident we can replicate the crash.

Comment by Sven Kieske [ 2021-03-23 ]

Hi!

So I have a crash log with wsrep-debug=1 set, I'm not 100% sure if it's the same crash though.

Please find the Logs from all 3 nodes attached.

os_ctrl_1_debug_2021_03_23.log os_ctrl_2_debug_2021_03_23.log

Notice os-ctrl-0 log is gzipped and tar'ed as it exceeds the maximum upload capacity of this jira instance (10 MB).
os_ctrl_0_debug2021_03_23.tar

Comment by Jan Lindström (Inactive) [ 2021-03-30 ]

Fixed in https://jira.mariadb.org/browse/MDEV-24923

Generated at Thu Feb 08 09:33:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.