Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
10.4.26
-
None
-
MariaDB 10.4.26 Docker official image, with Galera Cluster enabled
Description
I have multiple instances of an application that write records into a table, called table1, which has CONSTRAINT pk_d_id PRIMARY KEY (d_id), d_id int NOT NULL AUTO_INCREMENT, and a CONSTRAINT unq_d UNIQUE (name, field, data). There are small chances that the application instances write 2 identical records into the table at the same time, using INSERT underlying, which technically violates the CONSTRAINT unq_d UNIQUE (name, field, data). Before inserting, the application issues a SELECT query to be aware of the record's existence and only inserts if the SELECT query returns empty rows.
Normally the application should get the error 1062, "Duplicate entry 'name, field, data' for key 'unq_d'", then rollback and retry, depending on the logic defined. This could happen because many application instances are running simultaneously.
However, there is a significant probability that the application instances do not receive this error. Instead, this error occurs during applying the write-sets when replication is happening. Specifically, the error causes all the nodes on the cluster to be disconnected from the cluster and the wsrep_local_state_comment is set to Inconsistent
What I expected is that the error should only happen on the application level. This can be a disaster since it breaks the whole cluster. For your information, I tried the configuration option wsrep_sync_wait and it did not work.
The error logs of 3 nodes of the MariaDB Galera Cluster are in the attachment
Please help me with this case. Feel free to ask for more information. Any insights about this is appreciated