Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15876

Node fails to rejoin cluster after master-slave peer was resumed

    XMLWordPrintable

    Details

      Description

      Node fails to rejoin cluster after master-slave peer was resumed

      note : it's observed with wsrep_sst_method=mariabackup
      problem was not observed with wsrep_sst_method=rsync

      1. Establish MariDB Galera Cluster composed by 3 Master-Master Nodes
      the used sst method is mariabackup

       
      [galera]
       
      binlog_format=ROW
      default-storage-engine=innodb
      innodb_autoinc_lock_mode=2
      bind-address=0.0.0.0
      wsrep_on=ON
      wsrep_provider=/usr/lib64/galera/libgalera_smm.so
      wsrep_cluster_address="gcomm://192.168.104.193,192.168.104.195,192.168.104.196"
       
      ## Galera Cluster Configuration
      wsrep_cluster_name="cluster1"
       
      ## Galera Synchronization Configuration
      wsrep_sst_method=mariabackup
      #wsrep_sst_method=rsync
       
      ## Galera Node Configuration
      wsrep_node_address="192.168.104.193"
      wsrep_node_name="t4w3"
      wsrep_sst_auth=root:1
      
      

      2. Verify that changes performed on 1 node are propagated to the other Nodes
      3. Disjoint the 3rd Node from the cluster
      4. Create Master-Slave peer
      5. Enable log_bin on 1 Node
      [mysqld]
      log-bin
      binlog_format=ROW
      server_id=1000
      6. Restart mysqld on the Master Node
      7. Verify that Master is enabled
      MariaDB [(none)]> show master status ;
      -----------------------------------------------------+

      File Position Binlog_Do_DB Binlog_Ignore_DB

      -----------------------------------------------------+

      t4w5-bin.000002 327    

      -----------------------------------------------------+
      1 row in set (0.000 sec)

      8. Enable Slave on the other Node
      [mysqld]
      server-id=2222
      read-only
      9. Star the slave on other Node
      CHANGE MASTER TO MASTER_HOST='t4w5', MASTER_PORT=3306, MASTER_USER='root', MASTER_PASSWORD='1', MASTER_LOG_FILE='t4w5-bin.000002', MASTER_LOG_POS=327;
      10. Perform any change on the Master Node – for example drop 1 database
      11. Verify that the change is propagated to the Slave – database is dropped
      12. Disable the log_bin on the Master Node – Remove the replication configuration from the master server config
      13. Restart mysql on the Master Node
      14. Verify that Master bin replication is disabled on the Master Node
      MariaDB [(none)]> show master status ;
      Empty set (0.000 sec)
      14. Remove the replication slave configuration from the Slave Node
      15. Disable Slave
      STOP SLAVE;RESET SLAVE ALL;
      16 .Restart mysqld on the slave
      17. Verify that MariaDB galera Cluster is active , synced and composed from 2 Nodes
      18. Rejoin the 3rd Node – operation failed

      [root@t4w6 ~]# service mysql start --wsrep_cluster_address=gcomm://192.168.104.195
      Starting mysql (via systemctl):  Job for mariadb.service failed because the control process exited with error code. See "systemctl status mariadb.service" and "journalctl -xe" for details.
      

      Note : when sst method is mariabackup the disconnected Node fails to rejoin
      after changing the sst to rsync Node succeed to rejoin

      2018-04-13 20:18:44 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2018-04-13 20:18:44 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      2018-04-13 20:18:44 0 [Note] WSREP: wsrep_load(): Galera 25.3.23(r3789) by Codership Oy <info@codership.com> loaded successfully.
      2018-04-13 20:18:44 0 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
      2018-04-13 20:18:44 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
      2018-04-13 20:18:44 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.104.196; base_port = 4567; cert.log_conflic       ts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_ti       meout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout        = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcac       he.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.       fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036       854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.bootstrap = YES;        pc
      2018-04-13 20:18:44 0 [Note] WSREP: GCache history reset: 9f23373a-3e6d-11e8-b625-0646494b946c:0 -> 00000000-0000-0000-0000-000000000000:-1
      2018-04-13 20:18:44 0 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
      2018-04-13 20:18:44 0 [Note] WSREP: wsrep_sst_grab()
      2018-04-13 20:18:44 0 [Note] WSREP: Start replication
      2018-04-13 20:18:44 0 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
      2018-04-13 20:18:44 0 [Note] WSREP: protonet asio version 0
      2018-04-13 20:18:44 0 [Note] WSREP: Using CRC-32C for message checksums.
      2018-04-13 20:18:44 0 [Note] WSREP: backend: asio
      2018-04-13 20:18:44 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
      2018-04-13 20:18:44 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
      2018-04-13 20:18:44 0 [Note] WSREP: restore pc from disk failed
      2018-04-13 20:18:44 0 [Note] WSREP: GMCast version 0
      2018-04-13 20:18:44 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2018-04-13 20:18:44 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2018-04-13 20:18:44 0 [Note] WSREP: EVS version 0
      2018-04-13 20:18:44 0 [Note] WSREP: gcomm: connecting to group 'cluster1', peer '192.168.104.191:,192.168.104.193:,192.168.104.195:,192.168.104.196:'
      2018-04-13 20:18:44 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') connection established to b7f9cfc2 tcp://192.168.104.196:4567
      2018-04-13 20:18:44 0 [Warning] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') address 'tcp://192.168.104.196:4567' points to own listening address, blacklis       ting
      2018-04-13 20:18:44 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') connection established to 8148851c tcp://192.168.104.195:4567
      2018-04-13 20:18:44 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
      2018-04-13 20:18:44 0 [Note] WSREP: declaring 8148851c at tcp://192.168.104.195:4567 stable
      2018-04-13 20:18:44 0 [Note] WSREP: Node 8148851c state prim
      2018-04-13 20:18:44 0 [Note] WSREP: view(view_id(PRIM,8148851c,36) memb {
              8148851c,0
              b7f9cfc2,0
      } joined {
      } left {
      } partitioned {
      })
      2018-04-13 20:18:44 0 [Note] WSREP: save pc into disk
      2018-04-13 20:18:44 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.104.191:4567
      2018-04-13 20:18:44 0 [Note] WSREP: discarding pending addr without UUID: tcp://192.168.104.193:4567
      2018-04-13 20:18:45 0 [Note] WSREP: gcomm: connected
      2018-04-13 20:18:45 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2018-04-13 20:18:45 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2018-04-13 20:18:45 0 [Note] WSREP: Opened channel 'cluster1'
      2018-04-13 20:18:45 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
      2018-04-13 20:18:45 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2018-04-13 20:18:45 0 [Note] WSREP: Waiting for SST to complete.
      2018-04-13 20:18:45 0 [Note] WSREP: STATE EXCHANGE: sent state msg: b84716d0-3f3e-11e8-bdcb-ef3d8d36f495
      2018-04-13 20:18:45 0 [Note] WSREP: STATE EXCHANGE: got state msg: b84716d0-3f3e-11e8-bdcb-ef3d8d36f495 from 0 (t4w5)
      2018-04-13 20:18:45 0 [Note] WSREP: STATE EXCHANGE: got state msg: b84716d0-3f3e-11e8-bdcb-ef3d8d36f495 from 1 (t4w6)
      2018-04-13 20:18:45 0 [Note] WSREP: Quorum results:
              version    = 4,
              component  = PRIMARY,
              conf_id    = 35,
              members    = 1/2 (joined/total),
              act_id     = 1,
              last_appl. = -1,
              protocols  = 0/7/3 (gcs/repl/appl),
              group UUID = 9f23373a-3e6d-11e8-b625-0646494b946c
      2018-04-13 20:18:45 0 [Note] WSREP: Flow-control interval: [23, 23]
      2018-04-13 20:18:45 0 [Note] WSREP: Trying to continue unpaused monitor
      2018-04-13 20:18:45 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 1)
      2018-04-13 20:18:45 1 [Note] WSREP: State transfer required:
              Group state: 9f23373a-3e6d-11e8-b625-0646494b946c:1
              Local state: 00000000-0000-0000-0000-000000000000:-1
      2018-04-13 20:18:45 1 [Note] WSREP: New cluster view: global state: 9f23373a-3e6d-11e8-b625-0646494b946c:1, view# 36: Primary, number of nodes: 2, my        index: 1, protocol version 3
      2018-04-13 20:18:45 1 [Warning] WSREP: Gap in state sequence. Need state transfer.
      2018-04-13 20:18:45 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.196' --datadir '/var/lib/mysql/'   --parent        '26540'  '' '
      WSREP_SST: [INFO] Streaming with xbstream (20180413 20:18:45.580)
      WSREP_SST: [INFO] Using socat as streamer (20180413 20:18:45.583)
      WSREP_SST: [INFO] Stale sst_in_progress file: /var/lib/mysql//sst_in_progress (20180413 20:18:45.588)
      WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | pv -f  -i 10 -N joiner -F '%N => Rate:%r Avg:%a Elapsed:%t        %e Bytes: %b %p'  | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20180413 20:18:45.641)
      2018-04-13 20:18:45 1 [Note] WSREP: Prepared SST request: mariabackup|192.168.104.196:4444/xtrabackup_sst//1
      2018-04-13 20:18:45 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2018-04-13 20:18:45 1 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2018-04-13 20:18:45 1 [Note] WSREP: Assign initial position for certification: 1, protocol version: 3
      2018-04-13 20:18:45 0 [Note] WSREP: Service thread queue flushed.
      2018-04-13 20:18:45 1 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does        not match group state UUID (9f23373a-3e6d-11e8-b625-0646494b946c): 1 (Operation not permitted)
               at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
      2018-04-13 20:18:45 0 [Note] WSREP: Member 1.0 (t4w6) requested state transfer from '*any*'. Selected 0.0 (t4w5)(SYNCED) as donor.
      2018-04-13 20:18:45 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 1)
      2018-04-13 20:18:45 1 [Note] WSREP: Requesting state transfer: success, donor: 0
      2018-04-13 20:18:45 1 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 9f23373a-3e6d-11e8-b625-0646494b946c:1
         joiner: => Rate:[ 218 B/s] Avg:[ 218 B/s] Elapsed:0:00:00
      WSREP_SST: [INFO] NOTE: Joiner-Recv-gtid took 1 seconds (20180413 20:18:46.292)
      WSREP_SST: [INFO] WARNING: Stale temporary SST directory: /var/lib/mysql//.sst from previous state transfer. Removing (20180413 20:18:46.316)
      WSREP_SST: [INFO] Proceeding with SST (20180413 20:18:46.418)
      WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20180413 20:18:46.423)
      WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | pv -f  -i 10 -N joiner -F '%N => Rate:%r Avg:%a Elapsed:%t %e Bytes: %b %p'  |        mbstream -x; RC=( ${PIPESTATUS[@]} ) (20180413 20:18:46.423)
      removed ‘/var/lib/mysql/aria_log_control’
      removed ‘/var/lib/mysql/aria_log.00000001’
      removed ‘/var/lib/mysql/ibdata1’
      removed ‘/var/lib/mysql/ib_logfile1’
      removed ‘/var/lib/mysql/mysql.sock’
      WSREP_SST: [INFO] Waiting for SST streaming to complete! (20180413 20:18:46.449)
      2018-04-13 20:18:47 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') connection to peer b7f9cfc2 with addr tcp://192.168.104.196:4567 timed out, no me       ssages seen in PT3S
      2018-04-13 20:18:47 0 [Note] WSREP: (b7f9cfc2, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2018-04-13 20:19:23 0 [Note] WSREP: 0.0 (t4w5): State transfer to 1.0 (t4w6) complete.
      2018-04-13 20:19:23 0 [Note] WSREP: Member 0.0 (t4w5) synced with group.
         joiner: => Rate:[23.7MiB/s] Avg:[23.7MiB/s] Elapsed:0:00:41
      WSREP_SST: [INFO] NOTE: Joiner-Recv-SST took 42 seconds (20180413 20:19:28.174)
      WSREP_SST: [INFO] Preparing the backup at /var/lib/mysql//.sst (20180413 20:19:28.190)
      WSREP_SST: [INFO] Evaluating mariabackup --innobackupex   --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20180413 20:19:28.197)
      WSREP_SST: [INFO] NOTE: Xtrabackup prepare stage took 0 seconds (20180413 20:19:28.679)
      rm: cannot remove ‘/var/lib/mysql//innobackup.prepare.log’: No such file or directory
      rm: cannot remove ‘/var/lib/mysql//innobackup.move.log’: No such file or directory
      WSREP_SST: [INFO] Moving the backup to /var/lib/mysql/ (20180413 20:19:28.686)
      WSREP_SST: [INFO] Evaluating mariabackup --innobackupex       --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (2018041       3 20:19:28.693)
      WSREP_SST: [ERROR] Cleanup after exit with status:1 (20180413 20:19:28.728)
      2018-04-13 20:19:28 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.196' --datadir '/var/l       ib/mysql/'   --parent '26540'  '' : 1 (Operation not permitted)
      2018-04-13 20:19:28 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
      2018-04-13 20:19:28 0 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
      2018-04-13 20:19:28 0 [ERROR] Aborting
       
      Error in my_thread_global_end(): 1 threads didn't exit
      
      

        Attachments

          Activity

            People

            • Assignee:
              seppo Seppo Jaakola
              Reporter:
              winstone Zdravelina Sokolovska
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: