Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.3.31
-
CentOS 7 3.10.0-1160.36.2.el7.x86_64
Description
Cluster keeps crashing after random time period of time. Its been there for a while but now we added a third node and the problem increased. I am now at the point that there is something wrong with the MariaDB service as the crash report understates this.
The table which is likely causing the crash is using InnoDB. Structure:
Table: news
Columns:
id int(10) UN AI PK
parent_id int(10) UN
post_user_id int(10) UN
source_user_id int(10) UN
title varchar(255)
header text
content mediumtext
keywords varchar(255)
image_id int(10) UN
type_id int(10) UN
alert_id int(10) UN
can_comment tinyint(3) UN
views int(10) UN
deleted_at datetime
created_at timestamp
updated_at timestamp
movie_id int(10) UN
imdb_id int(10) UN
source_domain varchar(255)
source_url varchar(255)
event_id int(10) UN
According to the error message:
2021-08-21 14:27:19 1 [Note] WSREP: Victim thread:
THD: 132951, mode: local, state: committing, conflict: no conflict, seqno: -1
SQL: UPDATE `news` SET `views`=25499 WHERE `id`=83796
2021-08-21 14:27:19 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():736
2021-08-21 14:27:19 0 [ERROR] WSREP: cancel commit bad exit: 7 514275587
210821 14:27:19 [ERROR] mysqld got signal 6 ;
The 'views' are updates with Queued Jobs and therefor its impossible that this update event is executed by multiple instances by one user within x seconds.
All nodes are clones of 1 master image, meaning; software wise they are the exact same VM's. I attached the logs of the other 2 nodes at the time of the crash.
All connections to the DB are handled by Galera Load Balancer. At first this helped al lot, but now with new node added, the problem returned.
https://galeracluster.com/library/documentation/glb.html
#server.cnf attached.
#stack-trace attached
#logs attached