Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10301

Signal 11 crash at random times

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Duplicate
    • Affects Version/s: 10.1.12, 10.1.14
    • Fix Version/s: 10.1.30
    • Component/s: Galera, Replication
    • Environment:
      Ubuntu 14.04 LTS, amd64, VMware vSphere 6.0, VM v8, 2 vCPU, 6.1G RAM. /var/db volume is 56G used out of 200G total, FS is ext4 with rw,relatime mount flags. Deadline IO scheduler used for /var/db.

      Description

      Background: We migrated from a MariaDB 5.5 active/passive replication cluster in february 2016 to MariaDB 10.1 galera active/active cluster with two DB nodes and one arbitrator node.

      This setup was made in preparation for a new DC. So the final setup when the new DC is ready will be two db nodes in two DCs each, and one arbitrator in a third DC. For now it's all in one DC with two DB nodes handling queries and one arbitrator doing backups with innobackupex.

      The solution was stable for a while and the first precisely recorded crash came 2016-03-30.

      Some crash times I have recorded are.

      2016-03-30 18:47: signal 11
      2016-04-04 06:37: signal 11
      2016-05-17 02:00: signal 11
      2016-05-25 we upgraded from 10.1.12 to 10.1.14 and issue seemed resolved until last night.
      2016-06-28 19:41: signal 11

      There are more, equally random, that I have not recorded precisely. The crash happens randomly on either of the two db nodes.

      Each crash has resulted in an unclean state, -1 in grastate for example, so the end result has always been a removal of the datadir and a full SST to the crashed node using xtrabackup-v2.

      The server is used by an authentication system, so many simple read queries for user data but also the bulk of the stored data is auth logging. Simple insert queries. This is what takes up 54G of the total 56G on that volume, data retention.

      I have attached one crashlog from each db node, two separate crash times.

      I have also attached my configuration which is mostly centered in the file /etc/mysql/conf.d/replication.conf.

      I monitor many things like tps, system load, memory use on the nodes but I can see no deviations in these graphs except that when the mysqld process crashes around 3G of RAM (out of 3.7G used) is freed and tps goes down.

        Attachments

        1. crashlog-20160330.txt
          4 kB
          Stefan Midjich
        2. crashlog-20160628.txt
          4 kB
          Stefan Midjich
        3. optimizations.cnf
          0.1 kB
          Stefan Midjich
        4. replication.cnf
          0.7 kB
          Stefan Midjich

          Issue Links

            Activity

              People

              Assignee:
              sachin.setiya.007 Sachin Setiya
              Reporter:
              stemid Stefan Midjich
              Votes:
              2 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: