Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26468

[ERROR] WSREP: invalid state ROLLED_BACK (FATAL)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.3.31
    • Fix Version/s: None
    • Component/s: Galera, wsrep
    • Labels:
    • Environment:
      CentOS 7 3.10.0-1160.36.2.el7.x86_64

      Description

      Cluster keeps crashing after random time period of time. Its been there for a while but now we added a third node and the problem increased. I am now at the point that there is something wrong with the MariaDB service as the crash report understates this.

      The table which is likely causing the crash is using InnoDB. Structure:

      Table: news

      Columns:
      id int(10) UN AI PK
      parent_id int(10) UN
      post_user_id int(10) UN
      source_user_id int(10) UN
      title varchar(255)
      header text
      content mediumtext
      keywords varchar(255)
      image_id int(10) UN
      type_id int(10) UN
      alert_id int(10) UN
      can_comment tinyint(3) UN
      views int(10) UN
      deleted_at datetime
      created_at timestamp
      updated_at timestamp
      movie_id int(10) UN
      imdb_id int(10) UN
      source_domain varchar(255)
      source_url varchar(255)
      event_id int(10) UN

      According to the error message:

      2021-08-21 14:27:19 1 [Note] WSREP: Victim thread:
      THD: 132951, mode: local, state: committing, conflict: no conflict, seqno: -1
      SQL: UPDATE `news` SET `views`=25499 WHERE `id`=83796
      2021-08-21 14:27:19 0 [ERROR] WSREP: invalid state ROLLED_BACK (FATAL)
      at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:abort_trx():736
      2021-08-21 14:27:19 0 [ERROR] WSREP: cancel commit bad exit: 7 514275587
      210821 14:27:19 [ERROR] mysqld got signal 6 ;

      The 'views' are updates with Queued Jobs and therefor its impossible that this update event is executed by multiple instances by one user within x seconds.

      All nodes are clones of 1 master image, meaning; software wise they are the exact same VM's. I attached the logs of the other 2 nodes at the time of the crash.

      All connections to the DB are handled by Galera Load Balancer. At first this helped al lot, but now with new node added, the problem returned.
      https://galeracluster.com/library/documentation/glb.html

      #server.cnf attached.
      #stack-trace attached
      #logs attached

        Attachments

        1. MySQL stack trace.txt
          7 kB
        2. Node1 MySQL log.txt
          4 kB
        3. Node2 MySQL log.txt
          4 kB
        4. server.cnf
          2 kB

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            digitalhuman Victor Angelier
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:

                Git Integration