[MDEV-24159] Millions of GRA_*.log files - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.4.14
Fix Version/s: N/A
Component/s: Galera, wsrep
Labels:
None

Description

I am running a three-node MariaDB/Galera cluster on Ubuntu 16.04 with MariaDB 10.4.14 and Galera 26.4.5. The cluster is setup in "master/slave" mode such that all connections are to one (master) node. At times the two slave nodes start generating GRA_∗.log files at an alarming rate, on the order of 100k files per day.

Something seem to trigger the situation; that is, I typically have some information in the server log corresponding to the first GRA_∗.log file. One such example is:

2020-11-05 13:15:02 69 [Note] InnoDB: BF-BF X lock conflict,mode: 1027 supremum: 0conflicts states: my executing locked committing

RECORD LOCKS space id 251 page no 1933311 n bits 160 index PRIMARY of table `zabbix`.`problem` trx id 3608288917 lock_mode X locks rec but not gap

Record lock, heap no 88 PHYSICAL RECORD: n_fields 16; compact format; info bits 32

 0: len 8; hex 000000000a212cd3; asc      !, ;;

 1: len 6; hex 0000d7121e95; asc       ;;

 2: len 7; hex 16000980311ea3; asc     1  ;;

 3: len 4; hex 80000000; asc     ;;

 4: len 4; hex 80000000; asc     ;;

 5: len 8; hex 00000000001af108; asc         ;;

 6: len 4; hex dd9fd628; asc    (;;

 7: len 4; hex 81c70daa; asc     ;;

 8: len 8; hex 00000000135ec5e3; asc      ^  ;;

 9: len 4; hex dfa2a5cb; asc     ;;

 10: len 4; hex b565950e; asc  e  ;;

 11: SQL NULL;

 12: len 8; hex 0000000000000000; asc         ;;

 13: len 30; hex 5669727475616c20736572766572202f436f6d6d6f6e2f44544c535f6470; asc Virtual server /Common/DTLS_dp; (total 88 bytes);

 14: len 4; hex 80000000; asc     ;;

 15: len 4; hex 80000002; asc     ;;

Record lock, heap no 89 PHYSICAL RECORD: n_fields 16; compact format; info bits 32

 0: len 8; hex 000000000a212cd1; asc      !, ;;

 1: len 6; hex 0000d7121e95; asc       ;;

 2: len 7; hex 16000980311dd2; asc     1  ;;

 3: len 4; hex 80000000; asc     ;;

 4: len 4; hex 80000000; asc     ;;

 5: len 8; hex 00000000001af107; asc         ;;

 6: len 4; hex dd9fd627; asc    ';;

 7: len 4; hex b89c75a7; asc   u ;;

 8: len 8; hex 00000000135ec5dc; asc      ^  ;;

 9: len 4; hex dfa2a5cb; asc     ;;

 10: len 4; hex b70de8af; asc     ;;

 11: SQL NULL;

 12: len 8; hex 0000000000000000; asc         ;;

 13: len 30; hex 5669727475616c20736572766572202f436f6d6d6f6e2f64705f53545f47; asc Virtual server /Common/dp_ST_G; (total 83 bytes);

 14: len 4; hex 80000000; asc     ;;

 15: len 4; hex 80000002; asc     ;;

[121B blob data]

2020-11-05 13:15:02 69 [ERROR] mysqld: Can't find record in 'problem'

2020-11-05 13:15:02 69 [Warning] WSREP: Ignoring error 'Can't find record in 'problem'' on Delete_rows_v1 event. Error_code: 1032

2020-11-05 13:15:02 69 [Warning] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 2835058, Internal MariaDB error code: 1032

After this initial event the server generates GRA_∗.log files at an alarming rate, 1–2 per second or about 100k per day. There are no further entries in the server corresponding to the flood of GRA_∗.log files. If gone unnoticed you'll have millions of GRA_∗.log files in /var/lib/mysql in a matter of weeks, and the server will grind to a halt.

I have not been able to determine reproduction steps, however it is important to note if I set wsrep_slave_threads=1 (dynamically; without a server restart) the creation of GRA_∗.log files immediately stops. I can then return wsrep_slave_threads to a value >1 and no GRA_∗.log files are created (well, until a few days later when the situation repeats).

It is also important to note the cluster remains in-sync throughout (wsrep_local_state_comment reports Synced on all nodes), as if all these GRA_∗.log files are being generated erroneously.

Attachments

Activity

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Mark Reibert

Votes:: 2 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2020-11-06 17:57

Updated:: 2022-01-05 20:04

Resolved:: 2021-12-27 09:10

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.