Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.3.9, 10.3.11, 10.3.14, 10.3.15
    • 10.3.16, 10.4.6
    • Galera
    • None
    • CentOS 7, running on XenServer hypervisor.

    Description

      Servers went down due to failed query. I bootstrapped cluster and was unable to start the remaining nodes with the following error:

      2018-10-12 20:12:15 0 [Warning] WSREP: SST position can't be set in past. Requested: 92172, Current: 3620686.
      2018-10-12 20:12:15 0 [Warning] WSREP: Can't continue.
      2018-10-12 20:12:15 0 [ERROR] Aborting

      It says the current commit is 3620686, but it was actually 9953033. Not sure why this matters since the cluster was restarted and that number restarted.

      Attachments

        Issue Links

          Activity

            Also getting this error after upgrading to 10.3.10:
            [root@galeradbprod02 obissick@Ghw.com]# galera_new_cluster
            /bin/galera_new_cluster: line 31: return: can only `return' from a function or sourced script

            obissick Oren Bissick (Inactive) added a comment - Also getting this error after upgrading to 10.3.10: [root@galeradbprod02 obissick@Ghw.com] # galera_new_cluster /bin/galera_new_cluster: line 31: return: can only `return' from a function or sourced script

            Issue seems to be with the default value of wsrep_start_position, the default is suppose to be Default Value: 00000000-0000-0000-0000-000000000000:-1; however, it shows this.

            FYI i do not have this defined in my config file.

            obissick Oren Bissick (Inactive) added a comment - Issue seems to be with the default value of wsrep_start_position, the default is suppose to be Default Value: 00000000-0000-0000-0000-000000000000:-1; however, it shows this. FYI i do not have this defined in my config file.

            Not sure where it got that position from, as i said my cluster was beyond that number.

            obissick Oren Bissick (Inactive) added a comment - Not sure where it got that position from, as i said my cluster was beyond that number.

            Same here on 10.3.10+maria~stretch:
            018-10-20 19:29:22 0 [Note] WSREP: SST complete, seqno: 38877
            2018-10-20 19:29:22 0 [Note] InnoDB: Using Linux native AIO
            2018-10-20 19:29:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2018-10-20 19:29:22 0 [Note] InnoDB: Uses event mutexes
            2018-10-20 19:29:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
            2018-10-20 19:29:22 0 [Note] InnoDB: Number of pools: 1
            2018-10-20 19:29:22 0 [Note] InnoDB: Using SSE2 crc32 instructions
            2018-10-20 19:29:22 0 [Note] InnoDB: Initializing buffer pool, total size = 5G, instances = 5, chunk size = 128M
            2018-10-20 19:29:22 0 [Note] InnoDB: Completed initialization of buffer pool
            2018-10-20 19:29:22 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
            2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes
            2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1073741824 bytes
            2018-10-20 19:29:22 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
            2018-10-20 19:29:22 0 [Note] InnoDB: New log files created, LSN=376470769565
            2018-10-20 19:29:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
            2018-10-20 19:29:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables
            2018-10-20 19:29:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
            2018-10-20 19:29:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
            2018-10-20 19:29:23 0 [Note] InnoDB: Waiting for purge to start
            2018-10-20 19:29:23 0 [Note] InnoDB: 10.3.10 started; log sequence number 376470769676; transaction id 588592630
            2018-10-20 19:29:23 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
            2018-10-20 19:29:23 0 [Note] Plugin 'FEEDBACK' is disabled.
            2018-10-20 19:29:23 0 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
            2018-10-20 19:29:23 0 [Note] Starting crash recovery...
            2018-10-20 19:29:23 0 [Note] Crash recovery finished.
            2018-10-20 19:29:23 0 [Note] Server socket created on IP: '0.0.0.0'.
            2018-10-20 19:29:23 0 [Note] WSREP: Signalling provider to continue.
            *2018-10-20 19:29:23 0 [Warning] WSREP: SST position can't be set in past. Requested: 38877, Current: 56220548.
            *2018-10-20 19:29:23 0 [Warning] WSREP: Can't continue.
            2018-10-20 19:29:23 0 [ERROR] Aborting

            i do not understand why the joiner starts a crash recovery after SST..

            I tried:

            • wresp-new-cluster and then restart joiner
            • switch from xtrabackup to mariadbbackup
            • delete galera.cache & grastate.dat on master then new joiner
            • deleted /var/lib/mysql & /var/log/mysql on new joiner

            My Galera cluster is down, only one node can be enabled.

            Any idea ?
            I will try to downgrade to previous 10.3.x next monday.

            chanouha Louis Chanouha added a comment - Same here on 10.3.10+maria~stretch: 018-10-20 19:29:22 0 [Note] WSREP: SST complete, seqno: 38877 2018-10-20 19:29:22 0 [Note] InnoDB: Using Linux native AIO 2018-10-20 19:29:22 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2018-10-20 19:29:22 0 [Note] InnoDB: Uses event mutexes 2018-10-20 19:29:22 0 [Note] InnoDB: Compressed tables use zlib 1.2.8 2018-10-20 19:29:22 0 [Note] InnoDB: Number of pools: 1 2018-10-20 19:29:22 0 [Note] InnoDB: Using SSE2 crc32 instructions 2018-10-20 19:29:22 0 [Note] InnoDB: Initializing buffer pool, total size = 5G, instances = 5, chunk size = 128M 2018-10-20 19:29:22 0 [Note] InnoDB: Completed initialization of buffer pool 2018-10-20 19:29:22 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 1073741824 bytes 2018-10-20 19:29:22 0 [Note] InnoDB: Setting log file ./ib_logfile1 size to 1073741824 bytes 2018-10-20 19:29:22 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 2018-10-20 19:29:22 0 [Note] InnoDB: New log files created, LSN=376470769565 2018-10-20 19:29:23 0 [Note] InnoDB: 128 out of 128 rollback segments are active. 2018-10-20 19:29:23 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2018-10-20 19:29:23 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2018-10-20 19:29:23 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB. 2018-10-20 19:29:23 0 [Note] InnoDB: Waiting for purge to start 2018-10-20 19:29:23 0 [Note] InnoDB: 10.3.10 started; log sequence number 376470769676; transaction id 588592630 2018-10-20 19:29:23 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool 2018-10-20 19:29:23 0 [Note] Plugin 'FEEDBACK' is disabled. 2018-10-20 19:29:23 0 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin 2018-10-20 19:29:23 0 [Note] Starting crash recovery... 2018-10-20 19:29:23 0 [Note] Crash recovery finished. 2018-10-20 19:29:23 0 [Note] Server socket created on IP: '0.0.0.0'. 2018-10-20 19:29:23 0 [Note] WSREP: Signalling provider to continue. *2018-10-20 19:29:23 0 [Warning] WSREP: SST position can't be set in past. Requested: 38877, Current: 56220548. *2018-10-20 19:29:23 0 [Warning] WSREP: Can't continue. 2018-10-20 19:29:23 0 [ERROR] Aborting i do not understand why the joiner starts a crash recovery after SST.. I tried: wresp-new-cluster and then restart joiner switch from xtrabackup to mariadbbackup delete galera.cache & grastate.dat on master then new joiner deleted /var/lib/mysql & /var/log/mysql on new joiner My Galera cluster is down, only one node can be enabled. Any idea ? I will try to downgrade to previous 10.3.x next monday.

            I finally found a solution as mentionned here: https://groups.google.com/forum/#!topic/codership-team/HVl5emCJEIw

            I stopped the only one active node, edited grastate.dat, put 56220548 as seqno. Restarded the node, and then others nodes successfully synced (SST/Mariadbbackup).

            I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully.

            Sorry for my english

            chanouha Louis Chanouha added a comment - I finally found a solution as mentionned here: https://groups.google.com/forum/#!topic/codership-team/HVl5emCJEIw I stopped the only one active node, edited grastate.dat, put 56220548 as seqno. Restarded the node, and then others nodes successfully synced (SST/Mariadbbackup). I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully. Sorry for my english

            I used that same solution but in the grastate.dat i put the requested sequence number instead of current.

            obissick Oren Bissick (Inactive) added a comment - I used that same solution but in the grastate.dat i put the requested sequence number instead of current.
            GeoffMontee Geoff Montee (Inactive) added a comment - - edited

            It looks like this warning:

            2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current:  1103114475.
            

            is printed here:

            https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L311

            The inconsistent sequence number that the warning is complaining about is read from the storage engine just before that:

            https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L301

            https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_xid.cc#L155

            Since this inconsistent seqno is read from the storage engine, it's possible that this could be a Mariabackup or InnoDB problem.

            Relevant log file:

            2019-04-04 12:31:18 0 [Note] WSREP: Read nil XID from storage engines, skipping position init
            2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
            2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <info@codership.com> loaded successfully.
            2019-04-04 12:31:18 0 [Note] WSREP: CRC-32C: using hardware acceleration.
            2019-04-04 12:31:18 0 [Note] WSREP: Found saved state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367, safe_to_bootstrap: 0
            2019-04-04 12:31:18 0 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/data/; base_host = 10.1.1.3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb_logs/gcache; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb_logs/gcache/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 1T; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0;
            2019-04-04 12:31:18 0 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367
            2019-04-04 12:31:18 0 [Note] WSREP: Assign initial position for certification: 86665367, protocol version: -1
            2019-04-04 12:31:18 0 [Note] WSREP: wsrep_sst_grab()
            2019-04-04 12:31:18 0 [Note] WSREP: Start replication
            2019-04-04 12:31:18 0 [Note] WSREP: Setting initial position to ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367
            2019-04-04 12:31:18 0 [Note] WSREP: protonet asio version 0
            2019-04-04 12:31:18 0 [Note] WSREP: Using CRC-32C for message checksums.
            2019-04-04 12:31:18 0 [Note] WSREP: backend: asio
            2019-04-04 12:31:18 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
            2019-04-04 12:31:18 0 [Warning] WSREP: access file(/mariadb/data//gvwstate.dat) failed(No such file or directory)
            2019-04-04 12:31:18 0 [Note] WSREP: restore pc from disk failed
            2019-04-04 12:31:18 0 [Note] WSREP: GMCast version 0
            2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
            2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
            2019-04-04 12:31:18 0 [Note] WSREP: EVS version 0
            2019-04-04 12:31:18 0 [Note] WSREP: gcomm: connecting to group 'cluster', peer '10.1.1.2:,10.1.1.3:,10.1.1.4:'
            2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to c843aa31 tcp://10.1.1.3:4567
            2019-04-04 12:31:18 0 [Warning] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') address 'tcp://10.1.1.3:4567' points to own listening address, blacklisting
            2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to d76cc0f1 tcp://10.1.1.4:4567
            2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
            2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to 65b30a89 tcp://10.1.1.2:4567
            2019-04-04 12:31:19 0 [Note] WSREP: declaring 65b30a89 at tcp://10.1.1.2:4567 stable
            2019-04-04 12:31:19 0 [Note] WSREP: declaring d76cc0f1 at tcp://10.1.1.4:4567 stable
            2019-04-04 12:31:19 0 [Note] WSREP: Node 65b30a89 state prim
            2019-04-04 12:31:19 0 [Note] WSREP: view(view_id(PRIM,65b30a89,17) memb {
                    65b30a89,0
                    c843aa31,0
                    d76cc0f1,0
            } joined {
            } left {
            } partitioned {
            })
            2019-04-04 12:31:19 0 [Note] WSREP: save pc into disk
            2019-04-04 12:31:19 0 [Note] WSREP: gcomm: connected
            2019-04-04 12:31:19 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
            2019-04-04 12:31:19 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
            2019-04-04 12:31:19 0 [Note] WSREP: Opened channel 'cluster'
            2019-04-04 12:31:19 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
            2019-04-04 12:31:19 0 [Note] WSREP: Waiting for SST to complete.
            2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
            2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: sent state msg: c890995e-56c4-11e9-a721-6b928fdf9889
            2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 0 (node30)
            2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 2 (node32)
            2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 1 (node31)
            2019-04-04 12:31:19 0 [Note] WSREP: Quorum results:
                    version    = 4,
                    component  = PRIMARY,
                    conf_id    = 16,
                    members    = 2/3 (joined/total),
                    act_id     = 86665368,
                    last_appl. = -1,
                    protocols  = 0/9/3 (gcs/repl/appl),
                    group UUID = ce38d7fa-3dbc-11e9-947f-42b551dbbedc
            2019-04-04 12:31:19 0 [Note] WSREP: Flow-control interval: [28, 28]
            2019-04-04 12:31:19 0 [Note] WSREP: Trying to continue unpaused monitor
            2019-04-04 12:31:19 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 86665368)
            2019-04-04 12:31:19 2 [Note] WSREP: State transfer required:
                    Group state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368
                    Local state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367
            2019-04-04 12:31:19 2 [Note] WSREP: New cluster view: global state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3
            2019-04-04 12:31:19 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
            2019-04-04 12:31:19 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.1.1.3' --datadir '/mariadb/data/'   --parent '21590' --binlog '/binlogs/binlog'  '''
            WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.prepare.log to /tmp/sst_log_archive/mariabackup.prepare.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.831)
            WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.move.log to /tmp/sst_log_archive/mariabackup.move.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.842)
            2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection to peer c843aa31 with addr tcp://10.1.1.3:4567 timed out, no messages seen in PT3S
            2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting off
            WSREP_SST: [INFO] Streaming with xbstream (20190404 12:31:48.886)
            WSREP_SST: [INFO] Using socat as streamer (20190404 12:31:48.892)
            WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20190404 12:31:48.949)
            2019-04-04 12:31:49 2 [Note] WSREP: Prepared SST request: mariabackup|10.1.1.3:4444/xtrabackup_sst//1
            2019-04-04 12:31:49 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            2019-04-04 12:31:49 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
            2019-04-04 12:31:49 2 [Note] WSREP: Assign initial position for certification: 86665368, protocol version: 4
            2019-04-04 12:31:49 0 [Note] WSREP: Service thread queue flushed.
            2019-04-04 12:31:49 2 [Note] WSREP: IST receiver addr using tcp://10.1.1.3:4568
            2019-04-04 12:31:49 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.1.1.3:4568
            2019-04-04 12:31:49 0 [Note] WSREP: Member 1.0 (node31) requested state transfer from '*any*'. Selected 0.0 (node30)(SYNCED) as donor.
            2019-04-04 12:31:49 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 86665368)
            2019-04-04 12:31:49 2 [Note] WSREP: Requesting state transfer: success, donor: 0
            2019-04-04 12:31:49 2 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368
            2019-04-04 12:31:50 0 [Note] WSREP: 0.0 (node30): State transfer to 1.0 (node31) complete.
            2019-04-04 12:31:50 0 [Note] WSREP: Member 0.0 (node30) synced with group.
            WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20190404 12:31:50.123)
            WSREP_SST: [INFO] Galera co-ords from recovery: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 0 (20190404 12:31:50.129)
            WSREP_SST: [INFO] Total time on joiner: 0 seconds (20190404 12:31:50.133)
            WSREP_SST: [INFO] Removing the sst_in_progress file (20190404 12:31:50.138)
            2019-04-04 12:31:50 0 [Note] WSREP: SST complete, seqno: 86665367
            2019-04-04 12:31:50 0 [Note] InnoDB: Using Linux native AIO
            2019-04-04 12:31:50 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2019-04-04 12:31:50 0 [Note] InnoDB: Uses event mutexes
            2019-04-04 12:31:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
            2019-04-04 12:31:50 0 [Note] InnoDB: Number of pools: 1
            2019-04-04 12:31:50 0 [Note] InnoDB: Using SSE2 crc32 instructions
            2019-04-04 12:31:50 0 [Note] InnoDB: Initializing buffer pool, total size = 128G, instances = 16, chunk size = 128M
            2019-04-04 12:31:55 0 [Note] InnoDB: Completed initialization of buffer pool
            2019-04-04 12:31:55 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
            2019-04-04 12:32:55 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
            2019-04-04 12:32:55 0 [Note] InnoDB: Creating shared tablespace for temporary tables
            2019-04-04 12:32:55 0 [Note] InnoDB: Setting file '/mariadb/InnoDB/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
            2019-04-04 12:32:55 0 [Note] InnoDB: File '/mariadb/InnoDB/ibtmp1' size is now 12 MB.
            2019-04-04 12:32:55 0 [Note] InnoDB: Waiting for purge to start
            2019-04-04 12:32:55 0 [Note] InnoDB: 10.3.14 started; log sequence number 51784170981115; transaction id 2265708494
            2019-04-04 12:32:55 0 [Note] InnoDB: Loading buffer pool(s) from /mariadb/InnoDB/ib_buffer_pool
            2019-04-04 12:32:55 0 [Note] Plugin 'FEEDBACK' is disabled.
            2019-04-04 12:32:55 0 [Note] Server socket created on IP: '0.0.0.0'.
            2019-04-04 12:32:56 0 [Note] InnoDB: Buffer pool(s) load completed at 190404 12:32:56
            2019-04-04 12:32:56 0 [Note] WSREP: Signalling provider to continue.
            2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current:  1103114475.
            2019-04-04 12:32:56 0 [Warning] WSREP: Can't continue.
            2019-04-04 12:32:56 0 [ERROR] Aborting
            

            GeoffMontee Geoff Montee (Inactive) added a comment - - edited It looks like this warning: 2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current: 1103114475. is printed here: https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L311 The inconsistent sequence number that the warning is complaining about is read from the storage engine just before that: https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_sst.cc#L301 https://github.com/MariaDB/server/blob/mariadb-10.3.14/sql/wsrep_xid.cc#L155 Since this inconsistent seqno is read from the storage engine, it's possible that this could be a Mariabackup or InnoDB problem. Relevant log file: 2019-04-04 12:31:18 0 [Note] WSREP: Read nil XID from storage engines, skipping position init 2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so' 2019-04-04 12:31:18 0 [Note] WSREP: wsrep_load(): Galera 25.3.26(r3857) by Codership Oy <info@codership.com> loaded successfully. 2019-04-04 12:31:18 0 [Note] WSREP: CRC-32C: using hardware acceleration. 2019-04-04 12:31:18 0 [Note] WSREP: Found saved state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367, safe_to_bootstrap: 0 2019-04-04 12:31:18 0 [Note] WSREP: Passing config to GCS: base_dir = /mariadb/data/; base_host = 10.1.1.3; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /mariadb_logs/gcache; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /mariadb_logs/gcache/galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 1T; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; 2019-04-04 12:31:18 0 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 2019-04-04 12:31:18 0 [Note] WSREP: Assign initial position for certification: 86665367, protocol version: -1 2019-04-04 12:31:18 0 [Note] WSREP: wsrep_sst_grab() 2019-04-04 12:31:18 0 [Note] WSREP: Start replication 2019-04-04 12:31:18 0 [Note] WSREP: Setting initial position to ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 2019-04-04 12:31:18 0 [Note] WSREP: protonet asio version 0 2019-04-04 12:31:18 0 [Note] WSREP: Using CRC-32C for message checksums. 2019-04-04 12:31:18 0 [Note] WSREP: backend: asio 2019-04-04 12:31:18 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 2019-04-04 12:31:18 0 [Warning] WSREP: access file(/mariadb/data//gvwstate.dat) failed(No such file or directory) 2019-04-04 12:31:18 0 [Note] WSREP: restore pc from disk failed 2019-04-04 12:31:18 0 [Note] WSREP: GMCast version 0 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 2019-04-04 12:31:18 0 [Note] WSREP: EVS version 0 2019-04-04 12:31:18 0 [Note] WSREP: gcomm: connecting to group 'cluster', peer '10.1.1.2:,10.1.1.3:,10.1.1.4:' 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to c843aa31 tcp://10.1.1.3:4567 2019-04-04 12:31:18 0 [Warning] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') address 'tcp://10.1.1.3:4567' points to own listening address, blacklisting 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to d76cc0f1 tcp://10.1.1.4:4567 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: 2019-04-04 12:31:18 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection established to 65b30a89 tcp://10.1.1.2:4567 2019-04-04 12:31:19 0 [Note] WSREP: declaring 65b30a89 at tcp://10.1.1.2:4567 stable 2019-04-04 12:31:19 0 [Note] WSREP: declaring d76cc0f1 at tcp://10.1.1.4:4567 stable 2019-04-04 12:31:19 0 [Note] WSREP: Node 65b30a89 state prim 2019-04-04 12:31:19 0 [Note] WSREP: view(view_id(PRIM,65b30a89,17) memb { 65b30a89,0 c843aa31,0 d76cc0f1,0 } joined { } left { } partitioned { }) 2019-04-04 12:31:19 0 [Note] WSREP: save pc into disk 2019-04-04 12:31:19 0 [Note] WSREP: gcomm: connected 2019-04-04 12:31:19 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 2019-04-04 12:31:19 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) 2019-04-04 12:31:19 0 [Note] WSREP: Opened channel 'cluster' 2019-04-04 12:31:19 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3 2019-04-04 12:31:19 0 [Note] WSREP: Waiting for SST to complete. 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID. 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: sent state msg: c890995e-56c4-11e9-a721-6b928fdf9889 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 0 (node30) 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 2 (node32) 2019-04-04 12:31:19 0 [Note] WSREP: STATE EXCHANGE: got state msg: c890995e-56c4-11e9-a721-6b928fdf9889 from 1 (node31) 2019-04-04 12:31:19 0 [Note] WSREP: Quorum results: version = 4, component = PRIMARY, conf_id = 16, members = 2/3 (joined/total), act_id = 86665368, last_appl. = -1, protocols = 0/9/3 (gcs/repl/appl), group UUID = ce38d7fa-3dbc-11e9-947f-42b551dbbedc 2019-04-04 12:31:19 0 [Note] WSREP: Flow-control interval: [28, 28] 2019-04-04 12:31:19 0 [Note] WSREP: Trying to continue unpaused monitor 2019-04-04 12:31:19 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 86665368) 2019-04-04 12:31:19 2 [Note] WSREP: State transfer required: Group state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368 Local state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 2019-04-04 12:31:19 2 [Note] WSREP: New cluster view: global state: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368, view# 17: Primary, number of nodes: 3, my index: 1, protocol version 3 2019-04-04 12:31:19 2 [Warning] WSREP: Gap in state sequence. Need state transfer. 2019-04-04 12:31:19 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'joiner' --address '10.1.1.3' --datadir '/mariadb/data/' --parent '21590' --binlog '/binlogs/binlog' ''' WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.prepare.log to /tmp/sst_log_archive/mariabackup.prepare.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.831) WSREP_SST: [INFO] Moving /mariadb/data//mariabackup.move.log to /tmp/sst_log_archive/mariabackup.move.log.2019.04.04-12.31.19.824490466 (20190404 12:31:19.842) 2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') connection to peer c843aa31 with addr tcp://10.1.1.3:4567 timed out, no messages seen in PT3S 2019-04-04 12:31:21 0 [Note] WSREP: (c843aa31, 'tcp://0.0.0.0:4567') turning message relay requesting off WSREP_SST: [INFO] Streaming with xbstream (20190404 12:31:48.886) WSREP_SST: [INFO] Using socat as streamer (20190404 12:31:48.892) WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | mbstream -x; RC=( ${PIPESTATUS[@]} ) (20190404 12:31:48.949) 2019-04-04 12:31:49 2 [Note] WSREP: Prepared SST request: mariabackup|10.1.1.3:4444/xtrabackup_sst//1 2019-04-04 12:31:49 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. 2019-04-04 12:31:49 2 [Note] WSREP: REPL Protocols: 9 (4, 2) 2019-04-04 12:31:49 2 [Note] WSREP: Assign initial position for certification: 86665368, protocol version: 4 2019-04-04 12:31:49 0 [Note] WSREP: Service thread queue flushed. 2019-04-04 12:31:49 2 [Note] WSREP: IST receiver addr using tcp://10.1.1.3:4568 2019-04-04 12:31:49 2 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.1.1.3:4568 2019-04-04 12:31:49 0 [Note] WSREP: Member 1.0 (node31) requested state transfer from '*any*'. Selected 0.0 (node30)(SYNCED) as donor. 2019-04-04 12:31:49 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 86665368) 2019-04-04 12:31:49 2 [Note] WSREP: Requesting state transfer: success, donor: 0 2019-04-04 12:31:49 2 [Note] WSREP: GCache history reset: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:0 -> ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665368 2019-04-04 12:31:50 0 [Note] WSREP: 0.0 (node30): State transfer to 1.0 (node31) complete. 2019-04-04 12:31:50 0 [Note] WSREP: Member 0.0 (node30) synced with group. WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20190404 12:31:50.123) WSREP_SST: [INFO] Galera co-ords from recovery: ce38d7fa-3dbc-11e9-947f-42b551dbbedc:86665367 0 (20190404 12:31:50.129) WSREP_SST: [INFO] Total time on joiner: 0 seconds (20190404 12:31:50.133) WSREP_SST: [INFO] Removing the sst_in_progress file (20190404 12:31:50.138) 2019-04-04 12:31:50 0 [Note] WSREP: SST complete, seqno: 86665367 2019-04-04 12:31:50 0 [Note] InnoDB: Using Linux native AIO 2019-04-04 12:31:50 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2019-04-04 12:31:50 0 [Note] InnoDB: Uses event mutexes 2019-04-04 12:31:50 0 [Note] InnoDB: Compressed tables use zlib 1.2.7 2019-04-04 12:31:50 0 [Note] InnoDB: Number of pools: 1 2019-04-04 12:31:50 0 [Note] InnoDB: Using SSE2 crc32 instructions 2019-04-04 12:31:50 0 [Note] InnoDB: Initializing buffer pool, total size = 128G, instances = 16, chunk size = 128M 2019-04-04 12:31:55 0 [Note] InnoDB: Completed initialization of buffer pool 2019-04-04 12:31:55 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority(). 2019-04-04 12:32:55 0 [Note] InnoDB: 128 out of 128 rollback segments are active. 2019-04-04 12:32:55 0 [Note] InnoDB: Creating shared tablespace for temporary tables 2019-04-04 12:32:55 0 [Note] InnoDB: Setting file '/mariadb/InnoDB/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ... 2019-04-04 12:32:55 0 [Note] InnoDB: File '/mariadb/InnoDB/ibtmp1' size is now 12 MB. 2019-04-04 12:32:55 0 [Note] InnoDB: Waiting for purge to start 2019-04-04 12:32:55 0 [Note] InnoDB: 10.3.14 started; log sequence number 51784170981115; transaction id 2265708494 2019-04-04 12:32:55 0 [Note] InnoDB: Loading buffer pool(s) from /mariadb/InnoDB/ib_buffer_pool 2019-04-04 12:32:55 0 [Note] Plugin 'FEEDBACK' is disabled. 2019-04-04 12:32:55 0 [Note] Server socket created on IP: '0.0.0.0'. 2019-04-04 12:32:56 0 [Note] InnoDB: Buffer pool(s) load completed at 190404 12:32:56 2019-04-04 12:32:56 0 [Note] WSREP: Signalling provider to continue. 2019-04-04 12:32:56 0 [Warning] WSREP: SST position can't be set in past. Requested: 86665367, Current: 1103114475. 2019-04-04 12:32:56 0 [Warning] WSREP: Can't continue. 2019-04-04 12:32:56 0 [ERROR] Aborting

            chanouha,

            I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully.

            I submitted a specific bug for that here: MDEV-19193

            GeoffMontee Geoff Montee (Inactive) added a comment - chanouha , I think this is a bug, --wsrep-new-cluster should start in a clean new cluster state. As i deleted all data in joiner nodes, sync should have been done successfully. I submitted a specific bug for that here: MDEV-19193

            Please request another review after addressing my comments.

            marko Marko Mäkelä added a comment - Please request another review after addressing my comments.

            The upgrade was broken due to MDEV-15158, which was introduced in MariaDB 10.3.5.

            marko Marko Mäkelä added a comment - The upgrade was broken due to MDEV-15158 , which was introduced in MariaDB 10.3.5.

            Thanks sir, appreciate the input. I think that's sounds about correct, related to mariabackup, as I was also able to duplicate this even doing a manual sst.
            I noticed the coords were wrong and usually off between 1 or 4 positions. Here's what I did to fix it:

            list of errors with wrong coords:

             grep "WSREP: SST position can't be set in past. Requested" /var/log/mysql/error.log
            2021-02-02 21:30:06 140122607290560 [Warning] WSREP: SST position can't be set in past. Requested: 688039012, Current:  688039013.
            2021-02-03 10:18:19 139797349947584 [Warning] WSREP: SST position can't be set in past. Requested: 688264699, Current:  688264701.
            2021-02-03 11:08:51 140313816185024 [Warning] WSREP: SST position can't be set in past. Requested: 688297756, Current:  688297759.
            2021-02-03 13:16:02 140193343183040 [Warning] WSREP: SST position can't be set in past. Requested: 688370258, Current:  688370259.
            2021-02-03 14:26:35 140649485912256 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current:  688408040.
            2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current:  688408040.
            

            Fix:

            1 remove all data from bad node
            2 do streaming backup from good node to bad node:

            mariabackup --backup --parallel=4 --tmpdir=/tmp --stream=mbstream --galera-info --datadir=/var/lib/mysql --user=$user --password=$pass | pigz -c --fast | nc -w 2 badNode 4444
            

            4 prepare, change perms on bad node
            5 get the info from here on bad node

            cat /var/lib/mysql/xtrabackup_galera_info
            bd34378f-525c-11e5-92d5-130282ff70e0:688408036
            

            6 edit on bad node /var/lib/mysql/grastate.dat and add the info from the above command

            # GALERA saved state
            version: 2.1
            uuid:    bd34378f-525c-11e5-92d5-130282ff70e0
            seqno:   688408036
            safe_to_bootstrap: 0
            

            7 change perms on that file so mysql owns it
            8 start mariadb again (this will fail again with the same error)

            2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current:  688408040.
            

            9 edit /var/lib/mysql/grastate.dat again, now replace the seqno with the seqno from the error message above
            from 688408036 to 688408040 and start mariadb again, i.e.

            # GALERA saved state
            version: 2.1
            uuid:    bd34378f-525c-11e5-92d5-130282ff70e0
            seqno:   688408040
            safe_to_bootstrap: 0
            

            dalmeida Daniel Almeida (Inactive) added a comment - Thanks sir, appreciate the input. I think that's sounds about correct, related to mariabackup, as I was also able to duplicate this even doing a manual sst. I noticed the coords were wrong and usually off between 1 or 4 positions. Here's what I did to fix it: list of errors with wrong coords: grep "WSREP: SST position can't be set in past. Requested" /var/log/mysql/error.log 2021-02-02 21:30:06 140122607290560 [Warning] WSREP: SST position can't be set in past. Requested: 688039012, Current: 688039013. 2021-02-03 10:18:19 139797349947584 [Warning] WSREP: SST position can't be set in past. Requested: 688264699, Current: 688264701. 2021-02-03 11:08:51 140313816185024 [Warning] WSREP: SST position can't be set in past. Requested: 688297756, Current: 688297759. 2021-02-03 13:16:02 140193343183040 [Warning] WSREP: SST position can't be set in past. Requested: 688370258, Current: 688370259. 2021-02-03 14:26:35 140649485912256 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040. 2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040. Fix: 1 remove all data from bad node 2 do streaming backup from good node to bad node: mariabackup --backup --parallel=4 --tmpdir=/tmp --stream=mbstream --galera-info --datadir=/var/lib/mysql --user=$user --password=$pass | pigz -c --fast | nc -w 2 badNode 4444 4 prepare, change perms on bad node 5 get the info from here on bad node cat /var/lib/mysql/xtrabackup_galera_info bd34378f-525c-11e5-92d5-130282ff70e0:688408036 6 edit on bad node /var/lib/mysql/grastate.dat and add the info from the above command # GALERA saved state version: 2.1 uuid: bd34378f-525c-11e5-92d5-130282ff70e0 seqno: 688408036 safe_to_bootstrap: 0 7 change perms on that file so mysql owns it 8 start mariadb again (this will fail again with the same error) 2021-02-03 14:59:56 140348690200768 [Warning] WSREP: SST position can't be set in past. Requested: 688408036, Current: 688408040. 9 edit /var/lib/mysql/grastate.dat again, now replace the seqno with the seqno from the error message above from 688408036 to 688408040 and start mariadb again, i.e. # GALERA saved state version: 2.1 uuid: bd34378f-525c-11e5-92d5-130282ff70e0 seqno: 688408040 safe_to_bootstrap: 0

            People

              sciascid Daniele Sciascia
              obissick Oren Bissick (Inactive)
              Votes:
              3 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.