[MDEV-32620] MariaDB Galera Cluster crashes after failing to apply replications Created: 2023-10-30  Updated: 2023-10-30

Status: Open
Project: MariaDB Server
Component/s: Data Manipulation - Insert, Galera
Affects Version/s: 10.4.26
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Nguyen Hoang Anh Tu Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: galera
Environment:

MariaDB 10.4.26 Docker official image, with Galera Cluster enabled


Attachments: HTML File jiralog-node1     HTML File jiralog-node2     HTML File jiralog-node3    

 Description   

I have multiple instances of an application that write records into a table, called table1, which has CONSTRAINT pk_d_id PRIMARY KEY (d_id), d_id int NOT NULL AUTO_INCREMENT, and a CONSTRAINT unq_d UNIQUE (name, field, data). There are small chances that the application instances write 2 identical records into the table at the same time, using INSERT underlying, which technically violates the CONSTRAINT unq_d UNIQUE (name, field, data). Before inserting, the application issues a SELECT query to be aware of the record's existence and only inserts if the SELECT query returns empty rows.

Normally the application should get the error 1062, "Duplicate entry 'name, field, data' for key 'unq_d'", then rollback and retry, depending on the logic defined. This could happen because many application instances are running simultaneously.

However, there is a significant probability that the application instances do not receive this error. Instead, this error occurs during applying the write-sets when replication is happening. Specifically, the error causes all the nodes on the cluster to be disconnected from the cluster and the wsrep_local_state_comment is set to Inconsistent

What I expected is that the error should only happen on the application level. This can be a disaster since it breaks the whole cluster. For your information, I tried the configuration option wsrep_sync_wait and it did not work.

The error logs of 3 nodes of the MariaDB Galera Cluster are in the attachment

Please help me with this case. Feel free to ask for more information. Any insights about this is appreciated


Generated at Thu Feb 08 10:32:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.