Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22404

Server Crash after Galera WSREP event received

    XMLWordPrintable

    Details

      Description

      We hava three nodes with Galera cluster. one day moring two of them crash due to "InnoDB: Rec offset 99, cur1 offset 10327, cur2 offset 15073" signal 6 message, and the backtrace includes "/usr/sbin/mysqld(start_wsrep_THD+0x29e)[0x7f41deb3858e]".

      We start to investigate releated logs and gather some information below.
      1. Two DB nodes crashed on 4/23 around 8:30 am.
      2. "table A"'s index is currption.
      3. There are still a lot of update or select sql from "table A" have been executed before db crashed.
      4. The "table A" alter add "column C" VARCHAR(10) NOT NULL DEFAULT 'Y' without assigned algorithm on 4/22 around 2:00 am.
      (our alter_algorithm value is DEFAULT. according to the Mariadb online document,
      "the most efficient available algorithm will usually be used". so it might be "instant")
      5. We don't have the currption .ibd file. but we have the dump sql file.
      6. The "column C"'s data in "table A" should be empty,
      but the insert sql "column C"'s value is wired,such as "010045]9`?"、"10044]ˏs10"、010049\"?.. etc... in the dump sql file.

      We would like to know what caused our db crash.
      Is it a innodb engine or galera cluster's bug?

      Thanks.

        Attachments

        1. createTable.sql
          7 kB
        2. logWithMemDump.txt
          104 kB
        3. oneNodeCrashLog.txt
          2 kB
        4. Record overlaps another_Log.txt
          3 kB
        5. sqlLogDumpPage.txt
          51 kB

          Issue Links

            Activity

              People

              Assignee:
              jplindst Jan Lindström
              Reporter:
              falcon1631 Chun-Liang Chen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: