Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24159

Millions of GRA_*.log files

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.4.14
    • N/A
    • Galera, wsrep
    • None

    Description

      I am running a three-node MariaDB/Galera cluster on Ubuntu 16.04 with MariaDB 10.4.14 and Galera 26.4.5. The cluster is setup in "master/slave" mode such that all connections are to one (master) node. At times the two slave nodes start generating GRA_∗.log files at an alarming rate, on the order of 100k files per day.

      Something seem to trigger the situation; that is, I typically have some information in the server log corresponding to the first GRA_∗.log file. One such example is:

      2020-11-05 13:15:02 69 [Note] InnoDB: BF-BF X lock conflict,mode: 1027 supremum: 0conflicts states: my executing locked committing
      RECORD LOCKS space id 251 page no 1933311 n bits 160 index PRIMARY of table `zabbix`.`problem` trx id 3608288917 lock_mode X locks rec but not gap
      Record lock, heap no 88 PHYSICAL RECORD: n_fields 16; compact format; info bits 32
       0: len 8; hex 000000000a212cd3; asc      !, ;;
       1: len 6; hex 0000d7121e95; asc       ;;
       2: len 7; hex 16000980311ea3; asc     1  ;;
       3: len 4; hex 80000000; asc     ;;
       4: len 4; hex 80000000; asc     ;;
       5: len 8; hex 00000000001af108; asc         ;;
       6: len 4; hex dd9fd628; asc    (;;
       7: len 4; hex 81c70daa; asc     ;;
       8: len 8; hex 00000000135ec5e3; asc      ^  ;;
       9: len 4; hex dfa2a5cb; asc     ;;
       10: len 4; hex b565950e; asc  e  ;;
       11: SQL NULL;
       12: len 8; hex 0000000000000000; asc         ;;
       13: len 30; hex 5669727475616c20736572766572202f436f6d6d6f6e2f44544c535f6470; asc Virtual server /Common/DTLS_dp; (total 88 bytes);
       14: len 4; hex 80000000; asc     ;;
       15: len 4; hex 80000002; asc     ;;
      Record lock, heap no 89 PHYSICAL RECORD: n_fields 16; compact format; info bits 32
       0: len 8; hex 000000000a212cd1; asc      !, ;;
       1: len 6; hex 0000d7121e95; asc       ;;
       2: len 7; hex 16000980311dd2; asc     1  ;;
       3: len 4; hex 80000000; asc     ;;
       4: len 4; hex 80000000; asc     ;;
       5: len 8; hex 00000000001af107; asc         ;;
       6: len 4; hex dd9fd627; asc    ';;
       7: len 4; hex b89c75a7; asc   u ;;
       8: len 8; hex 00000000135ec5dc; asc      ^  ;;
       9: len 4; hex dfa2a5cb; asc     ;;
       10: len 4; hex b70de8af; asc     ;;
       11: SQL NULL;
       12: len 8; hex 0000000000000000; asc         ;;
       13: len 30; hex 5669727475616c20736572766572202f436f6d6d6f6e2f64705f53545f47; asc Virtual server /Common/dp_ST_G; (total 83 bytes);
       14: len 4; hex 80000000; asc     ;;
       15: len 4; hex 80000002; asc     ;;
      [121B blob data]
      2020-11-05 13:15:02 69 [ERROR] mysqld: Can't find record in 'problem'
      2020-11-05 13:15:02 69 [Warning] WSREP: Ignoring error 'Can't find record in 'problem'' on Delete_rows_v1 event. Error_code: 1032
      2020-11-05 13:15:02 69 [Warning] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 2835058, Internal MariaDB error code: 1032
      

      After this initial event the server generates GRA_∗.log files at an alarming rate, 1–2 per second or about 100k per day. There are no further entries in the server corresponding to the flood of GRA_∗.log files. If gone unnoticed you'll have millions of GRA_∗.log files in /var/lib/mysql in a matter of weeks, and the server will grind to a halt.

      I have not been able to determine reproduction steps, however it is important to note if I set wsrep_slave_threads=1 (dynamically; without a server restart) the creation of GRA_∗.log files immediately stops. I can then return wsrep_slave_threads to a value >1 and no GRA_∗.log files are created (well, until a few days later when the situation repeats).

      It is also important to note the cluster remains in-sync throughout (wsrep_local_state_comment reports Synced on all nodes), as if all these GRA_∗.log files are being generated erroneously.

      Attachments

        Activity

          People

            jplindst Jan Lindström (Inactive)
            mreibert Mark Reibert
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.