[MDEV-9423] cannot add new node to the cluser: Binlog file '/var/log/mysql/mariadb-bin.000001' not found in binlog index, needed for recovery. Aborting. Created: 2016-01-15  Updated: 2016-08-21  Resolved: 2016-06-29

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.1.10
Fix Version/s: 10.1.15

Type: Bug Priority: Major
Reporter: Krasimir Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 4
Labels: None
Environment:

Linux fm-first 3.16.0-4-amd64 #1 SMP Debian 3.16.7-ckt20-1+deb8u1 (2015-12-14) x86_64 GNU/Linux


Issue Links:
Duplicate
duplicates MDEV-10188 MariaDB server can't join Galera clus... Closed
Sprint: 10.1.15

 Description   

Looks like the master hasn't been shutdown properly and now the new node that tries to connect to this cluster tries to run a recovery , but the binlog file doesn't exist.

starting the daemon with

mysqld --wsrep_cluster_address=......

2016-01-15  9:33:23 140341928318912 [Note] mysqld (mysqld 10.1.10-MariaDB-1~jessie-log) starting as process 1 ...
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-01-15  9:33:23 140341928318912 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2016-01-15  9:33:23 140341928318912 [Note] WSREP: wsrep_load(): Galera 3.9(rXXXX) by Codership Oy <info@codership.com> loaded successfully.
2016-01-15  9:33:23 140341928318912 [Note] WSREP: CRC-32C: using hardware acceleration.
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Passing config to GCS: base_host = 10.0.0.8; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.size = 3G; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc.ignore_sb = false; pc.npvo = false; pc.recovery = 1;
2016-01-15  9:33:23 140338580059904 [Note] WSREP: Service thread queue flushed.
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-01-15  9:33:23 140341928318912 [Note] WSREP: wsrep_sst_grab()
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Start replication
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-01-15  9:33:23 140341928318912 [Note] WSREP: protonet asio version 0
2016-01-15  9:33:23 140341928318912 [Note] WSREP: Using CRC-32C for message checksums.
2016-01-15  9:33:23 140341928318912 [Note] WSREP: backend: asio
2016-01-15  9:33:23 140341928318912 [Warning] WSREP: access file(gvwstate.dat) failed(No such file or directory)
2016-01-15  9:33:23 140341928318912 [Note] WSREP: restore pc from disk failed
2016-01-15  9:33:23 140341928318912 [Note] WSREP: GMCast version 0
2016-01-15  9:33:23 140341928318912 [Warning] WSREP: Failed to resolve tcp://compose_mariadb_1:4567
2016-01-15  9:33:23 140341928318912 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-01-15  9:33:23 140341928318912 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-01-15  9:33:23 140341928318912 [Note] WSREP: EVS version 0
2016-01-15  9:33:23 140341928318912 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer 'compose_garbd_1:4567,compose_mariadb_1:4567,second_mariadb_1:4567'
2016-01-15  9:33:23 140341928318912 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
2016-01-15  9:33:24 140341928318912 [Note] WSREP: declaring 1afe1669 at tcp://10.0.0.9:4567 stable
2016-01-15  9:33:24 140341928318912 [Note] WSREP: declaring 41c70c8c at tcp://10.0.0.12:4567 stable
2016-01-15  9:33:24 140341928318912 [Note] WSREP: Node 1afe1669 state prim
2016-01-15  9:33:25 140341928318912 [Note] WSREP: view(view_id(PRIM,05b59999,96) memb {
	05b59999,0
	1afe1669,0
	41c70c8c,0
} joined {
} left {
} partitioned {
})
2016-01-15  9:33:25 140341928318912 [Note] WSREP: save pc into disk
2016-01-15  9:33:25 140341928318912 [Note] WSREP: gcomm: connected
2016-01-15  9:33:25 140341928318912 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-01-15  9:33:25 140341928318912 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-01-15  9:33:25 140341928318912 [Note] WSREP: Opened channel 'my_wsrep_cluster'
2016-01-15  9:33:25 140341928318912 [Note] WSREP: Waiting for SST to complete.
2016-01-15  9:33:25 140338517763840 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186
2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: sent state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186
2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: got state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186 from 0 ()
2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: got state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186 from 1 (garb)
2016-01-15  9:33:25 140338517763840 [Note] WSREP: STATE EXCHANGE: got state msg: 069bc67c-bb6b-11e5-a9b3-1ad7790ea186 from 2 ()
2016-01-15  9:33:25 140338517763840 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = 93,
	members    = 2/3 (joined/total),
	act_id     = 205387,
	last_appl. = -1,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = 0600869f-af4f-11e5-8c89-a35fa6cdd2ef
2016-01-15  9:33:25 140338517763840 [Note] WSREP: Flow-control interval: [28, 28]
2016-01-15  9:33:25 140338517763840 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 205387)
2016-01-15  9:33:25 140341928011520 [Note] WSREP: State transfer required:
	Group state: 0600869f-af4f-11e5-8c89-a35fa6cdd2ef:205387
	Local state: 00000000-0000-0000-0000-000000000000:-1
2016-01-15  9:33:25 140341928011520 [Note] WSREP: New cluster view: global state: 0600869f-af4f-11e5-8c89-a35fa6cdd2ef:205387, view# 94: Primary, number of nodes: 3, my index: 0, protocol version 3
2016-01-15  9:33:25 140341928011520 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-01-15  9:33:25 140338488407808 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.0.0.8' --datadir '/var/lib/mysql/'   --parent '1' --binlog '/var/log/mysql/mariadb-bin' '
WSREP_SST: [INFO] Logging all stderr of SST/Innobackupex to syslog (20160115 09:33:25.515)
2016-01-15  9:33:25 140341928011520 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.0.0.8:4444/xtrabackup_sst//1
2016-01-15  9:33:25 140341928011520 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-01-15  9:33:25 140341928011520 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-01-15  9:33:25 140338580059904 [Note] WSREP: Service thread queue flushed.
2016-01-15  9:33:25 140341928011520 [Note] WSREP: Assign initial position for certification: 205387, protocol version: 3
2016-01-15  9:33:25 140338580059904 [Note] WSREP: Service thread queue flushed.
2016-01-15  9:33:25 140341928011520 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (0600869f-af4f-11e5-8c89-a35fa6cdd2ef): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
2016-01-15  9:33:25 140338517763840 [Note] WSREP: Member 0.0 () requested state transfer from '*any*'. Selected 2.0 ()(SYNCED) as donor.
2016-01-15  9:33:25 140338517763840 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 205387)
2016-01-15  9:33:25 140341928011520 [Note] WSREP: Requesting state transfer: success, donor: 2
2016-01-15  9:33:27 140338526156544 [Note] WSREP: (05b59999, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-01-15 10:25:15 140338517763840 [Note] WSREP: 2.0 (): State transfer to 0.0 () complete.
2016-01-15 10:25:15 140338517763840 [Note] WSREP: Member 2.0 () synced with group.
2016-01-15 10:25:20 140341928318912 [Note] WSREP: SST complete, seqno: 205388
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: The InnoDB memory heap is disabled
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Memory barrier is not used
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Compressed tables use zlib 1.2.8
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Using Linux native AIO
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Using SSE crc32 instructions
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Initializing buffer pool, size = 9.8G
2016-01-15 10:25:21 140341928318912 [Note] InnoDB: Completed initialization of buffer pool
2016-01-15 10:25:22 140341928318912 [Note] InnoDB: Highest supported file format is Barracuda.
2016-01-15 10:25:22 140341928318912 [Note] InnoDB: 128 rollback segment(s) are active.
2016-01-15 10:25:22 140341928318912 [Note] InnoDB: Waiting for purge to start
2016-01-15 10:25:22 140341928318912 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 79527883504
2016-01-15 10:25:22 140318926145280 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-01-15 10:25:22 140341928318912 [Note] Plugin 'FEEDBACK' is disabled.
2016-01-15 10:25:22 140341928318912 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
2016-01-15 10:25:22 140341928318912 [ERROR] Binlog file '/var/log/mysql/mariadb-bin.000001' not found in binlog index, needed for recovery. Aborting.
2016-01-15 10:25:22 140341928318912 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete (or rename) binary log and start mysqld with --tc-heuristic-recover={commit|rollback}
2016-01-15 10:25:22 140341928318912 [ERROR] Can't init tc log
2016-01-15 10:25:22 140341928318912 [ERROR] Aborting



 Comments   
Comment by Elena Stepanova [ 2016-01-16 ]

vipconsult,

How exactly did you initialize the node? Did you copy the entire data directory from an existing node? If so, did you copy the binary logs and binary log index as well?

Comment by Krasimir [ 2016-01-16 ]

no data copied from the master
empty data folder.

Comment by Elena Stepanova [ 2016-01-16 ]

Your data directory is apparently /var/lib/mysql. Did you also clean up /var/log/mysql, where your binary files are configured to reside?
The error log suggests that it found some broken remains of binary logs in there. Could you please check?

Comment by Krasimir [ 2016-01-16 ]

both dirs were empty
I think this reference comes from the main master I am trying to replicate.

I resolved the issue by running RESET MASTER ; on the main master and then added the node again.

So my logic is that the master didn't shutdown cleanly the last time and this affected the new node trying to run the recovery.

Comment by Elena Stepanova [ 2016-01-16 ]

The error log looks strange for a message coming from another node, but I cannot rule it out right away. Assigning to nirbhay_c who can say for sure.

Comment by Andrew Garner [ 2016-04-14 ]

I can reproduce this reliably in my environment on MariaDB 10.1.13 / Percona XtraBackup 2.3.4 on Debian8 with wsrep_sst_method=xtrabackup-v2. I just create a simple innodb table and start a loop writing to the cluster from one node and force SST on another node and I get results similar to:

[Note] Recovering after a crash using /var/log/mysql/mysql-bin
[ERROR] Binlog file '/var/log/mysql/mysql-bin.000149' not found in binlog index, needed for recovery. Aborting.
[ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error and restart, or delete (or rename) binary log and start mysqld with --tc-heuristic-recover={commit|rollback}
[ERROR] Can't init tc log
[ERROR] Aborting

Recovery started from an empty MySQL datadir and empty /var/log/mysql/.

I found that commenting out the lines in wsrep_sst_xtrabackup-v2 that copy Percona XtraBackup's archived binary log back to /var/log/mysql/ kept this from happening. I suppose that binlog recovery is initiated by the presence of the binary log which xtrabackup copies from the donor. I'm not sure of the ramifications of skipping that recovery.

Comment by Igor Mazur [ 2016-04-14 ]

I'm getting an exactly same problem. (also MariaDB 10.1.13)

2016-04-14 6:59:00 140693871286208 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
2016-04-14 6:59:00 140693871286208 [ERROR] Binlog file '/var/log/mysql/mariadb-bin.000036' not found in binlog index, needed for recovery. Aborting.
2016-04-14 6:59:00 140693871286208 [ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete (or rename) binary log and start mysqld with --tc-heuristic-recover=

{commit|rollback}

2016-04-14 6:59:00 140693871286208 [ERROR] Can't init tc log
2016-04-14 6:59:00 140693871286208 [ERROR] Aborting

There is list of log files on donor:
drwxr-xr-x 2 999 docker 4096 Apr 5 15:24 mariadb-bin
rw-rw--- 1 999 docker 81279761 Apr 13 10:41 mariadb-bin.000031
rw-rw--- 1 999 docker 104857796 Apr 13 17:51 mariadb-bin.000032
rw-rw--- 1 999 docker 94852709 Apr 13 21:51 mariadb-bin.000033
rw-rw--- 1 999 docker 104857677 Apr 14 02:05 mariadb-bin.000034
rw-rw--- 1 999 docker 96645794 Apr 14 06:52 mariadb-bin.000035
rw-rw--- 1 999 docker 1744580 Apr 14 06:58 mariadb-bin.000036
rw-rw--- 1 999 docker 22837123 Apr 14 08:17 mariadb-bin.000037
rw-rw--- 1 999 docker 13260939 Apr 14 08:45 mariadb-bin.000038
rw-rw--- 1 999 docker 442 Apr 14 08:17 mariadb-bin.index

And on new node:
drwxr-xr-x 2 999 docker 4096 Apr 14 06:33 mariadb-bin
rw-rw--- 1 999 docker 361 Apr 14 06:58 mariadb-bin.000037
rw-rw--- 1 999 docker 34 Apr 14 06:58 mariadb-bin.index
rw-rw--- 1 999 docker 7 Apr 14 06:33 mariadb-bin.state

For some reason during making SST donor rotated a log mariadb-bin.000036 and new log file mariadb-bin.000037 made and copied to new node.

Comment by Igor Mazur [ 2016-04-14 ]

May be an important observation:

I used:
wsrep_sst_method = xtrabackup-v2

After changing to:
wsrep_sst_method = xtrabackup

I've successfully added a node to cluster twice in a row.

Comment by Andrew Garner [ 2016-04-14 ]

Another datapoint is that I see the binary log bundled in the xtrabackup (v2.3.4) output has only a single binlog checkpoint:

# at 249
#160414 16:21:01 server id 168730634  end_log_pos 288   Gtid list [0-168730634-7392349]
# at 288
#160414 16:21:01 server id 168730634  end_log_pos 327   Binlog checkpoint mysql-bin.000330
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

Where mysql-bin.000330 corresponds to the previous binary log and the backup image has mysql-bin.000331.

wsrep_sst_rsync generate a binary log with two checkpoints:

# at 249
#700101  0:00:00 server id 168730634  end_log_pos 288 	Gtid list [0-168730634-6670147]
# at 288
#700101  0:00:00 server id 168730634  end_log_pos 327 	Binlog checkpoint mysql-bin.000282
# at 327
#700101  0:00:00 server id 168730634  end_log_pos 366 	Binlog checkpoint mysql-bin.000283
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;

Where this was a earlier run and 000283 corresponds to the current (copied) binlog and 000282 corresponds to the previous (uncopied) log.

I assume this is probably some bug in xtrabackup causing this - manually running FTWRL + FLUSH BINARY LOGS always gives me two checkpoints in the latest log. Maybe there's a race condition here that rsync doesn't trip over (maybe because it's holding the lock longer?)

Comment by Andrew Garner [ 2016-04-14 ]

Actually there is a race here that after all. In my environment, I see up to a second of delay between running FLUSH BINARY LOGS/SHOW MASTER STATUS and when the current binary log has its second binlog checkpoint written.

Given this python harness:

import os
import time
 
import MySQLdb
 
def main():
 
    c = MySQLdb.connect(read_default_group='client')
    z = c.cursor()
 
    z.execute('SELECT @@version, @@wsrep_patch_version, @@innodb_version')
    v_mariadb, v_wsrep, v_innodb = z.fetchone()
    print("MariaDB version: %s" % v_mariadb)
    print("WSREP version:   %s" % v_wsrep)
    print("InnoDB version:  %s" % v_innodb)
 
    start = time.time()
    print("FTWRL")
    z.execute('FLUSH TABLES WITH READ LOCK')
    print(" -> %.3f secs" % (time.time() - start))
 
    start = time.time()
    print("FLUSH BINARY LOGS")
    z.execute('FLUSH BINARY LOGS')
    print(" -> %.3f secs" % (time.time() - start))
 
    start = time.time()
    print('SHOW MASTER STATUS')
    z.execute('SHOW MASTER STATUS')
    print(" -> %.3f secs" % (time.time() - start))
 
 
    file = z.fetchone()[0]
    st = os.stat('/var/log/mysql/' + file)
    print("[%s] %s : %s" % (time.time(), file, os.stat('/var/log/mysql/' + file).st_size))
    start = time.time()
    while st.st_size < 366 and os.stat('/var/log/mysql/' + file).st_size == st.st_size:
        continue
    now = time.time()
    print("[%s] %s size changed after %.3f secs" % (now, file, now - start))
    print("[%s] %s : %s" % (time.time(), file, os.stat('/var/log/mysql/' + file).st_size))
 
    start = time.time()
    print("UNLOCK TABLES")
    z.execute('UNLOCK TABLES')
    print(" -> %.3f secs" % (time.time() - start))
 
main()

I see this output:

$ python mdev-9423_race.py
MariaDB version: 10.1.13-MariaDB-1~jessie
WSREP version:   wsrep_25.13
InnoDB version:  5.6.28-76.1
FTWRL
 -> 0.001 secs
FLUSH BINARY LOGS
 -> 0.004 secs
SHOW MASTER STATUS
 -> 0.000 secs
[1460653106.87] mysql-bin.000445 : 327
[1460653107.79] mysql-bin.000445 size changed after 0.916 secs
[1460653107.79] mysql-bin.000445 : 366
UNLOCK TABLES
 -> 0.000 secs

So under FTWRL, the binary log size is changing after FLUSH BINARY LOGS returns and before we run UNLOCK TABLES but if we read the binary log too fast (even under the ftwrl lock) it is missing some data.

Comment by Andrew Garner [ 2016-04-14 ]

Adding FLUSH LOCAL ENGINE LOGS prior to copying the binary log seems to erase this problem. I'm not sure if this is bad MariaDB behavior or if xtrabackup should be expected to flush engine logs here.

I opened a bug with Percona against percona-xtrabackup that includes a minor patch against xtrabackup 2.3 that resolved this problem for me:

https://bugs.launchpad.net/percona-xtrabackup/+bug/1570560

Comment by Aaron Lager [ 2016-04-14 ]

I just wanted to add that I'm also having this issue.
I'm using xtrabackup-v2
xtrabackup version 2.4
MariaDB 10.1.13

Comment by Aaron Lager [ 2016-04-14 ]

I just reverted back to xtrabackup version 2.2 and I was finally able to get the node to join without error.

Comment by Andrew Garner [ 2016-04-26 ]

I verified that xtrabackup 2.2 is also affected by this same issue (due to also not running flush engine logs prior to copying the binlog). This is racy so it will sometimes work in the right conditions (under any supported xtrabackup version).

Comment by Nirbhay Choubey (Inactive) [ 2016-06-28 ]

http://lists.askmonty.org/pipermail/commits/2016-June/009505.html

Generated at Thu Feb 08 07:34:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.