Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
10.4.14
-
None
Description
I am running a three-node MariaDB/Galera cluster on Ubuntu 16.04 with MariaDB 10.4.14 and Galera 26.4.5. The cluster is setup in "master/slave" mode such that all connections are to one (master) node. At times the two slave nodes start generating GRA_∗.log files at an alarming rate, on the order of 100k files per day.
Something seem to trigger the situation; that is, I typically have some information in the server log corresponding to the first GRA_∗.log file. One such example is:
2020-11-05 13:15:02 69 [Note] InnoDB: BF-BF X lock conflict,mode: 1027 supremum: 0conflicts states: my executing locked committing
|
RECORD LOCKS space id 251 page no 1933311 n bits 160 index PRIMARY of table `zabbix`.`problem` trx id 3608288917 lock_mode X locks rec but not gap
|
Record lock, heap no 88 PHYSICAL RECORD: n_fields 16; compact format; info bits 32
|
0: len 8; hex 000000000a212cd3; asc !, ;;
|
1: len 6; hex 0000d7121e95; asc ;;
|
2: len 7; hex 16000980311ea3; asc 1 ;;
|
3: len 4; hex 80000000; asc ;;
|
4: len 4; hex 80000000; asc ;;
|
5: len 8; hex 00000000001af108; asc ;;
|
6: len 4; hex dd9fd628; asc (;;
|
7: len 4; hex 81c70daa; asc ;;
|
8: len 8; hex 00000000135ec5e3; asc ^ ;;
|
9: len 4; hex dfa2a5cb; asc ;;
|
10: len 4; hex b565950e; asc e ;;
|
11: SQL NULL;
|
12: len 8; hex 0000000000000000; asc ;;
|
13: len 30; hex 5669727475616c20736572766572202f436f6d6d6f6e2f44544c535f6470; asc Virtual server /Common/DTLS_dp; (total 88 bytes);
|
14: len 4; hex 80000000; asc ;;
|
15: len 4; hex 80000002; asc ;;
|
Record lock, heap no 89 PHYSICAL RECORD: n_fields 16; compact format; info bits 32
|
0: len 8; hex 000000000a212cd1; asc !, ;;
|
1: len 6; hex 0000d7121e95; asc ;;
|
2: len 7; hex 16000980311dd2; asc 1 ;;
|
3: len 4; hex 80000000; asc ;;
|
4: len 4; hex 80000000; asc ;;
|
5: len 8; hex 00000000001af107; asc ;;
|
6: len 4; hex dd9fd627; asc ';;
|
7: len 4; hex b89c75a7; asc u ;;
|
8: len 8; hex 00000000135ec5dc; asc ^ ;;
|
9: len 4; hex dfa2a5cb; asc ;;
|
10: len 4; hex b70de8af; asc ;;
|
11: SQL NULL;
|
12: len 8; hex 0000000000000000; asc ;;
|
13: len 30; hex 5669727475616c20736572766572202f436f6d6d6f6e2f64705f53545f47; asc Virtual server /Common/dp_ST_G; (total 83 bytes);
|
14: len 4; hex 80000000; asc ;;
|
15: len 4; hex 80000002; asc ;;
|
[121B blob data]
|
2020-11-05 13:15:02 69 [ERROR] mysqld: Can't find record in 'problem'
|
2020-11-05 13:15:02 69 [Warning] WSREP: Ignoring error 'Can't find record in 'problem'' on Delete_rows_v1 event. Error_code: 1032
|
2020-11-05 13:15:02 69 [Warning] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 2835058, Internal MariaDB error code: 1032
|
After this initial event the server generates GRA_∗.log files at an alarming rate, 1–2 per second or about 100k per day. There are no further entries in the server corresponding to the flood of GRA_∗.log files. If gone unnoticed you'll have millions of GRA_∗.log files in /var/lib/mysql in a matter of weeks, and the server will grind to a halt.
I have not been able to determine reproduction steps, however it is important to note if I set wsrep_slave_threads=1 (dynamically; without a server restart) the creation of GRA_∗.log files immediately stops. I can then return wsrep_slave_threads to a value >1 and no GRA_∗.log files are created (well, until a few days later when the situation repeats).
It is also important to note the cluster remains in-sync throughout (wsrep_local_state_comment reports Synced on all nodes), as if all these GRA_∗.log files are being generated erroneously.