Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.4.12
-
Galera cluster, 3 nodes on debian 9, provider version 26.4.3(r4535)
Description
Hello,
For several months we have a problem with large (max 250 000) data operation ( delete + load data ) randomly . the entire cluster crash ( no more queries works, the cluster is like "locked" ) --> bootstrap a node etc...
This occurs during the journey, others transactions still arrive from normal traffic on our websites, no problem with large date operations in the night .
We do these large operations for several yearq without problems, don't konw if the problem can be associated with wsrep_load_data_splitting is off by default since several maria versions ?
So, for 1 month we use the new "streaming replication " feature with galera 4
SET SESSION wsrep_trx_fragment_unit='rows'
SET SESSION wsrep_trx_fragment_size=10000 ;
We active it in our program that do the large date operation
The replication streaming seems to be used correctly because the table wsrep_streaming_log growing each days (the space , not in rows because rows are deleted automaticaly )
Despite of the use of streaming replication nothing has changed, today a table of 245 000 records is sucessfully passed and 3 min after the entire cluster has crashed.
How to solve this problem that occurs randomly ( sometimes 3 times on a week , or sometimes 1 time by month .
Thank you