Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9423

cannot add new node to the cluser: Binlog file '/var/log/mysql/mariadb-bin.000001' not found in binlog index, needed for recovery. Aborting.

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.10
    • 10.1.15
    • Galera
    • None
    • Linux fm-first 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u1 (2015-12-14) x86_64 GNU/Linux
    • 10.1.15

    Description

      Looks like the master hasn't been shutdown properly and now the new node that tries to connect to this cluster tries to run a recovery , but the binlog file doesn't exist.

      starting the daemon with

      mysqld --wsrep_cluster_address=......
      

      2016-01-15  9:33:23 140341928318912 [Note] mysqld (mysqld 10.1.10-MariaDB-1~jessie-log) starting as process 1 ...
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: CRC-32C: using hardware acceleration.
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Passing config to GCS: base_host = 10.0.0.8; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 3G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1;
      2016-01-15  9:33:23 140338580059904 [Note] WSREP: Service thread queue flushed.
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: wsrep_sst_grab()
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Start replication
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: protonet asio version 0
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: Using CRC-32C for message checksums.
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: backend: asio
      2016-01-15  9:33:23 140341928318912 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: restore pc from disk failed
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: GMCast version 0
      2016-01-15  9:33:23 140341928318912 [Warning] WSREP: Failed to resolve tcp://compose_mariadb_1:4567
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: EVS version 0
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'compose_garbd_1:4567,compose_mariadb_1:4567,second_mariadb_1:4567'
      2016-01-15  9:33:23 140341928318912 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
      2016-01-15  9:33:24 140341928318912 [Note] WSREP: declaring 1afe1669 at tcp://10.0.0.9:4567 stable
      2016-01-15  9:33:24 140341928318912 [Note] WSREP: declaring 41c70c8c at tcp://10.0.0.12:4567 stable
      2016-01-15  9:33:24 140341928318912 [Note] WSREP: Node 1afe1669 state prim
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: view(view_id(PRIM,05b59999,96) memb {
      	05b59999,0
      	1afe1669,0
      	41c70c8c,0
      } joined {
      } left {
      } partitioned {
      })
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: save pc into disk
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: gcomm: connected
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: Opened channel 'my_wsrep_cluster'
      2016-01-15  9:33:25 140341928318912 [Note] WSREP: Waiting for SST to complete.
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: sent state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: got state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186 from 0 ()
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: got state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186 from 1 (garb)
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: got state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186 from 2 ()
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 93,
      	members    = 2/3 (joined/total),
      	act_id     = 205387,
      	last_appl. = -1,
      	protocols  = 0/7/3 (gcs/repl/appl),
      	group UUID = 0600869f-af4f-11e5-8c89-a35fa6cdd2ef
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: Flow-control interval: [28, 28]
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 205387)
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: State transfer required:
      	Group state: 0600869f-af4f-11e5-8c89-a35fa6cdd2ef:205387
      	Local state: 00000000-0000-0000-0000-000000000000:-1
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: New cluster view: global state: 0600869f-af4f-11e5-8c89-a35fa6cdd2ef:205387, view# 94: Primary, number of nodes: 3, my index: 0, protocol version 3
      2016-01-15  9:33:25 140341928011520 [Warning] WSREP: Gap in state sequence. Need state transfer.
      2016-01-15  9:33:25 140338488407808 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.0.0.8' --datadir '/var/lib/mysql/'   --parent '1' --binlog '/var/log/mysql/mariadb-bin' '
      WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20160115 09:33:25.515)
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.0.0.8:4444/xtrabackup_sst//1
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-01-15  9:33:25 140338580059904 [Note] WSREP: Service thread queue flushed.
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: Assign initial position for certification: 205387, protocol version: 3
      2016-01-15  9:33:25 140338580059904 [Note] WSREP: Service thread queue flushed.
      2016-01-15  9:33:25 140341928011520 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (0600869f-af4f-11e5-8c89-a35fa6cdd2ef): 1 (Operation not permitted)
      	 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: Member 0.0 () requested state transfer from '*any*'. Selected 2.0 ()(SYNCED) as donor.
      2016-01-15  9:33:25 140338517763840 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 205387)
      2016-01-15  9:33:25 140341928011520 [Note] WSREP: Requesting state transfer: success, donor: 2
      2016-01-15  9:33:27 140338526156544 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-01-15 10:25:15 140338517763840 [Note] WSREP: 2.0 (): State transfer to 0.0 () complete.
      2016-01-15 10:25:15 140338517763840 [Note] WSREP: Member 2.0 () synced with group.
      2016-01-15 10:25:20 140341928318912 [Note] WSREP: SST complete, seqno: 205388
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Memory barrier is not used
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Compressed tables use zlib 1.2.8
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Using Linux native AIO
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Using SSE crc32 instructions
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Initializing buffer pool, size = 9.8G
      2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Completed initialization of buffer pool
      2016-01-15 10:25:22 140341928318912 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-01-15 10:25:22 140341928318912 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-01-15 10:25:22 140341928318912 [Note] InnoDB: Waiting for purge to start
      2016-01-15 10:25:22 140341928318912 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 79527883504
      2016-01-15 10:25:22 140318926145280 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-01-15 10:25:22 140341928318912 [Note] Plugin 'FEEDBACK' is disabled.
      2016-01-15 10:25:22 140341928318912 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
      2016-01-15 10:25:22 140341928318912 [ERROR] Binlog file '/var/log/mysql/mariadb-bin.000001' not found in binlog index, needed for recovery. Aborting.
      2016-01-15 10:25:22 140341928318912 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete (or rename) binary log and start mysqld with --tc-heuristic-recover={commit|rollback}
      2016-01-15 10:25:22 140341928318912 [ERROR] Can't init tc log
      2016-01-15 10:25:22 140341928318912 [ERROR] Aborting
      

      Attachments

        Issue Links

          Activity

            andrew.garner Andrew Garner added a comment -

            Adding FLUSH LOCAL ENGINE LOGS prior to copying the binary log seems to erase this problem. I'm not sure if this is bad MariaDB behavior or if xtrabackup should be expected to flush engine logs here.

            I opened a bug with Percona against percona-xtrabackup that includes a minor patch against xtrabackup 2.3 that resolved this problem for me:

            https://bugs.launchpad.net/percona-xtrabackup/+bug/1570560

            andrew.garner Andrew Garner added a comment - Adding FLUSH LOCAL ENGINE LOGS prior to copying the binary log seems to erase this problem. I'm not sure if this is bad MariaDB behavior or if xtrabackup should be expected to flush engine logs here. I opened a bug with Percona against percona-xtrabackup that includes a minor patch against xtrabackup 2.3 that resolved this problem for me: https://bugs.launchpad.net/percona-xtrabackup/+bug/1570560
            aaronl Aaron Lager added a comment -

            I just wanted to add that I'm also having this issue.
            I'm using xtrabackup-v2
            xtrabackup version 2.4
            MariaDB 10.1.13

            aaronl Aaron Lager added a comment - I just wanted to add that I'm also having this issue. I'm using xtrabackup-v2 xtrabackup version 2.4 MariaDB 10.1.13
            aaronl Aaron Lager added a comment -

            I just reverted back to xtrabackup version 2.2 and I was finally able to get the node to join without error.

            aaronl Aaron Lager added a comment - I just reverted back to xtrabackup version 2.2 and I was finally able to get the node to join without error.
            andrew.garner Andrew Garner added a comment -

            I verified that xtrabackup 2.2 is also affected by this same issue (due to also not running flush engine logs prior to copying the binlog). This is racy so it will sometimes work in the right conditions (under any supported xtrabackup version).

            andrew.garner Andrew Garner added a comment - I verified that xtrabackup 2.2 is also affected by this same issue (due to also not running flush engine logs prior to copying the binlog). This is racy so it will sometimes work in the right conditions (under any supported xtrabackup version).
            nirbhay_c Nirbhay Choubey (Inactive) added a comment - http://lists.askmonty.org/pipermail/commits/2016-June/009505.html

            People

              nirbhay_c Nirbhay Choubey (Inactive)
              vipconsult Krasimir
              Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.