Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11135

Adding first node to a busy cluster

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.1.17
    • 10.1(EOL)
    • None
    • None

    Description

      When starting the first JOINER node on a busy DONOR, after the SST is successfully complete, the IST process is started in order to complete the synchronization. It appears as if the IST fails if the galera.cache file is not large enough to hold all the changes that occurred between the start of the SST until the beginning of the IST. This causes the MariaDB instance to fail on the Joining Node and also causes the DONOR node to become Non-Primary:

      On DONOR node innobackup.backup.log file:

      xtrabackup: The latest check point (for incremental): '4321113633769'
      xtrabackup: Stopping log copying thread.
      .161025 16:14:59 >> log scanned up to (4321113633769)

      161025 16:14:59 Executing UNLOCK TABLES
      161025 16:14:59 All tables unlocked
      161025 16:14:59 Backup created in directory '/tmp'
      MySQL binlog position: filename 'binlog.000107', position '408', GTID of the last change '0-3-529,1-3-11806550'
      161025 16:14:59 [00] Streaming backup-my.cnf
      161025 16:14:59 [00] ...done
      161025 16:14:59 [00] Streaming xtrabackup_info
      161025 16:14:59 [00] ...done
      xtrabackup: Transaction log of lsn (4320841669921) to (4321113633769) was copied.
      161025 16:15:02 completed OK!

      This shows the backup completed.

      On the JOINER node:

      Oct 25 16:15:02 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:02 140643296864000 [Note] WSREP: 0.0 (mariadb-03): State transfer to 1.0 (mariadb-plm-01) complete.
      Oct 25 16:15:02 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:02 140643296864000 [Note] WSREP: Member 0.0 (mariadb-03) synced with group.
      Oct 25 16:15:02 mariadb-plm-01 mysqld[70555]: WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20161025 16:15:02.889)
      Oct 25 16:15:02 mariadb-plm-01 mysqld[70555]: WSREP_SST: [INFO] Total time on joiner: 0 seconds (20161025 16:15:02.893)
      Oct 25 16:15:02 mariadb-plm-01 mysqld[70555]: WSREP_SST: [INFO] Removing the sst_in_progress file (20161025 16:15:02.896)
      Oct 25 16:15:02 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:02 140677831268288 [Note] WSREP: SST complete, seqno: 1899034
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: The InnoDB memory heap is disabled
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: Compressed tables use zlib 1.2.8
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: Using Linux native AIO
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: Using SSE crc32 instructions
      Oct 25 16:15:03 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:03 140677831268288 [Note] InnoDB: Initializing buffer pool, size = 30.0G
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Note] InnoDB: Completed initialization of buffer pool
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Note] InnoDB: Highest supported file format is Barracuda.
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Warning] InnoDB: The log sequence number in the ibdata files is higher than the log sequence number in the ib_logfiles! Are you sure you are using the right ib_logfiles to start up the database. Log sequence number in the ib_logfiles is 4320323667423, logsequence numbers stamped to ibdata file headers are between 4320836805942 and 4320836805942.
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Note] InnoDB: The log sequence numbers 4320836805942 and 4320836805942 in ibdata files do not match the log sequence number 4320323667423 in the ib_logfiles!
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Note] InnoDB: Database was not shutdown normally!
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Note] InnoDB: Starting crash recovery.
      Oct 25 16:15:04 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:04 140677831268288 [Note] InnoDB: Reading tablespace information from the .ibd files...
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:06 140677831268288 [Note] InnoDB: Restoring possible half-written data pages
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:06 140677831268288 [Note] InnoDB: from the doublewrite buffer...
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: 2016-10-25 16:15:06 7ff21c2917c0 InnoDB: Error: page 7 log sequence number 4320605260426
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: InnoDB: is in the future! Current system log sequence number 4320323667423.
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: InnoDB: Your database may be corrupt or you may have copied the InnoDB
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: InnoDB: tablespace but not the InnoDB log files. See
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
      Oct 25 16:15:06 mariadb-plm-01 mysqld[70555]: InnoDB: for more information.

      Back on the DONOR server:

      Oct 25 16:15:02 mariadb-03 mysqld: 2016-10-25 16:15:02 139885163833088 [Note] WSREP: 0.0 (mariadb-03): State transfer to 1.0 (mariadb-plm-01) complete.
      Oct 25 16:15:02 mariadb-03 mysqld: 2016-10-25 16:15:02 139885163833088 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 1900097)
      Oct 25 16:15:02 mariadb-03 mysqld: 2016-10-25 16:15:02 139885163833088 [Note] WSREP: Member 0.0 (mariadb-03) synced with group.
      Oct 25 16:15:02 mariadb-03 mysqld: 2016-10-25 16:15:02 139885163833088 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 1900097)
      Oct 25 16:15:02 mariadb-03 mysqld: 2016-10-25 16:15:02 139919665761024 [Note] WSREP: Synchronized with group, ready for connections
      Oct 25 16:15:02 mariadb-03 mysqld: 2016-10-25 16:15:02 139919665761024 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Oct 25 16:15:02 mariadb-03 mysqld: WSREP_SST: [INFO] Total time on donor: 0 seconds (20161025 16:15:02.833)
      Oct 25 16:15:11 mariadb-03 mysqld: 2016-10-25 16:15:11 139885172225792 [Note] WSREP: (2604c0a4, 'tcp://0.0.0.0:4567') connection to peer 40e10823 with addr tcp://192.168.90.43:4567 timed out, no messages seen in PT3S
      Oct 25 16:15:11 mariadb-03 mysqld: 2016-10-25 16:15:11 139885172225792 [Note] WSREP: (2604c0a4, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.90.43:4567
      Oct 25 16:15:12 mariadb-03 mysqld: 2016-10-25 16:15:12 139885172225792 [Note] WSREP: (2604c0a4, 'tcp://0.0.0.0:4567') reconnecting to 40e10823 (tcp://192.168.90.43:4567), attempt 0
      Oct 25 16:15:13 mariadb-03 mysqld: 2016-10-25 16:15:13 139885172225792 [Note] WSREP: evs::proto(2604c0a4, OPERATIONAL, view_id(REG,2604c0a4,2)) suspecting node: 40e10823
      Oct 25 16:15:13 mariadb-03 mysqld: 2016-10-25 16:15:13 139885172225792 [Note] WSREP: evs::proto(2604c0a4, OPERATIONAL, view_id(REG,2604c0a4,2)) suspected node without join message, declaring inactive
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885172225792 [Note] WSREP: view(view_id(NON_PRIM,2604c0a4,2) memb

      { Oct 25 16:15:14 mariadb-03 mysqld: #0112604c0a4,0 Oct 25 16:15:14 mariadb-03 mysqld: }

      joined

      { Oct 25 16:15:14 mariadb-03 mysqld: }

      left

      { Oct 25 16:15:14 mariadb-03 mysqld: }

      partitioned

      { Oct 25 16:15:14 mariadb-03 mysqld: #01140e10823,0 Oct 25 16:15:14 mariadb-03 mysqld: }

      )
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885172225792 [Note] WSREP: view(view_id(NON_PRIM,2604c0a4,3) memb

      { Oct 25 16:15:14 mariadb-03 mysqld: #0112604c0a4,0 Oct 25 16:15:14 mariadb-03 mysqld: }

      joined

      { Oct 25 16:15:14 mariadb-03 mysqld: }

      left

      { Oct 25 16:15:14 mariadb-03 mysqld: }

      partitioned

      { Oct 25 16:15:14 mariadb-03 mysqld: #01140e10823,0 Oct 25 16:15:14 mariadb-03 mysqld: }

      )
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: Flow-control interval: [16, 16]
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: Received NON-PRIMARY.
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1906499)
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139919665761024 [Note] WSREP: New cluster view: global state: b0728b95-9aae-11e6-982b-da4f6771abcd:1906499, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139919665761024 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: Flow-control interval: [16, 16]
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139885163833088 [Note] WSREP: Received NON-PRIMARY.
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139919665761024 [Note] WSREP: New cluster view: global state: b0728b95-9aae-11e6-982b-da4f6771abcd:1906499, view# -1: non-Primary, number of nodes: 1, my index: 0, protocol version 3
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139919665761024 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Oct 25 16:15:14 mariadb-03 mysqld: 2016-10-25 16:15:14 139846180711168 [ERROR] Slave SQL: Node has dropped from cluster, Gtid 1-3-11814016, Internal MariaDB error code: 1047
      Oct 25 16:15:15 mariadb-03 mysqld: 2016-10-25 16:15:15 139885172225792 [Note] WSREP: (2604c0a4, 'tcp://0.0.0.0:4567') connection to peer 00000000 with addr tcp://192.168.90.43:4567 timed out, no messages seen in PT3S

      Attachments

        Activity

          People

            Unassigned Unassigned
            mnguyen Michael Nguyen
            Votes:
            1 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.