Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32974

Member fails to join due to old seqno in GTID

Details

    Description

      After upgrading initially from 10.6->10.11->11.0.3 and now to 11.0.4 we see that 2 of the members start up without any issues but the 3rd member(db-0) fails to start up due to an old sequence no.(117376) that i believe is being passed on from the donor.

      We are unable to find the old seqno anywhere except in the `ibdata1` file of the donor(by searching for the hex of it). But not sure how to get rid of this old seqno.
      What we did try is starting db-0 with fresh volume by removing K8s PVC for it, so it does a SST but that fails as well.

      Logs from member db-0:

      [Note] WSREP: SST received
      [Note] WSREP: Server status change joiner -> initializing
      [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      [Note] mysqld: Aria engine: starting recovery
      recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1(0.0 seconds); 
      [Note] mysqld: Aria engine: recovery done
      [Note] InnoDB: Compressed tables use zlib 1.2.11
      [Note] InnoDB: Number of transaction pools: 1
      [Note] InnoDB: Using crc32 + pclmulqdq instructions
      [Note] InnoDB: Using Linux native AIO
      [Note] InnoDB: Initializing buffer pool, total size = 2.000GiB, chunk size = 32.000MiB
      [Note] InnoDB: Completed initialization of buffer pool
      [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      [Note] InnoDB: End of log at LSN=180964319
      [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=180964319
      [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
      [Note] InnoDB: Data file .//undo001 did not exist: new to be created
      [Note] InnoDB: Setting file .//undo001 size to 10.000MiB
      [Note] InnoDB: Database physically writes the file full: wait...
      [Note] InnoDB: Data file .//undo002 did not exist: new to be created
      [Note] InnoDB: Setting file .//undo002 size to 10.000MiB
      [Note] InnoDB: Database physically writes the file full: wait...
      [Note] InnoDB: Data file .//undo003 did not exist: new to be created
      [Note] InnoDB: Setting file .//undo003 size to 10.000MiB
      [Note] InnoDB: Database physically writes the file full: wait...
      [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
      [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      [Note] InnoDB: log sequence number 180964319; transaction id 73558
      [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
      [Note] InnoDB: Cannot open '/bitnami/mariadb/data/ib_buffer_pool' for reading: No such file or directory
      [Note] Plugin 'FEEDBACK' is disabled.
      [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      [Note] Recovering after a crash using mysql-bin
      [Note] Starting table crash recovery...
      [Note] Crash table recovery finished.
      [Note] Server socket created on IP: '0.0.0.0'.
      [Warning] 'proxies_priv' entry '@% root@db-0' ignored in --skip-name-resolve mode.
      [Note] WSREP: wsrep_init_schema_and_SR (nil)
      [Note] WSREP: Server initialized
      [Note] WSREP: Server status change initializing -> initialized
      [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      [Note] WSREP: Recovered position from storage: 6aa53efc-db72-11ec-880f-a282ce494905:117376
      [Note] WSREP: Starting applier thread 6
      [Note] WSREP: Starting applier thread 7
      [Note] WSREP: Starting applier thread 8
      [Note] WSREP: Recovered view from SST:
        id: 6aa53efc-db72-11ec-880f-a282ce494905:128934
        status: primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(3):
      	0: 46250027-95c8-11ee-9d94-06eb82413a07, db
      	1: 4ed771b5-941f-11ee-82bf-56ec952abcf6, db
      	2: bfa6e688-941f-11ee-97c5-d323f02068c1, db
       
      [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
      [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
      [Note] WSREP: Joiner monitor thread ended with total time 14 sec
      [ERROR] WSREP: Application received wrong state: 
      	Received: 00000000-0000-0000-0000-000000000000
      	Required: 6aa53efc-db72-11ec-880f-a282ce494905
      [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
      [Note] WSREP: ReplicatorSMM::abort()
      [Note] WSREP: Closing send monitor...
      [Note] WSREP: Closed send monitor.
      [Note] WSREP: gcomm: terminating thread
      [Note] WSREP: gcomm: joining thread
      [Note] WSREP: gcomm: closing backend
      [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.
      Version: '11.0.3-MariaDB-log'  socket: '/opt/bitnami/mariadb/tmp/mysql.sock'  port: 3306  Source distribution
      [Note] WSREP: view(view_id(NON_PRIM,46250027-9d94,6636) memb {
      	46250027-9d94,0
      } joined {
      } left {
      } partitioned {
      	4ed771b5-82bf,0
      	bfa6e688-97c5,0
      })
      [Note] WSREP: PC protocol downgrade 1 -> 0
      [Note] WSREP: view((empty))
      [Note] WSREP: gcomm: closed
      [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      [Note] WSREP: Flow-control interval: [128, 160]
      [Note] WSREP: Received NON-PRIMARY.
      [Note] WSREP: Shifting JOINER -> OPEN (TO: 128934)
      [Note] WSREP: New SELF-LEAVE.
      [Note] WSREP: Flow-control interval: [0, 0]
      [Note] WSREP: Received SELF-LEAVE. Closing connection.
      [Note] WSREP: Shifting OPEN -> CLOSED (TO: 128934)
      [Note] WSREP: RECV thread exiting 0: Success
      [Note] WSREP: recv_thread() joined.
      [Note] WSREP: Closing replication queue.
      [Note] WSREP: Closing slave action queue.
      [Note] WSREP: /opt/bitnami/mariadb/sbin/mysqld: Terminated.
      

      Attachments

        1. datadir.tgz
          3.67 MB
        2. node2_before_prepare.tgz
          818 kB
        3. node2_after_prepare.tgz
          812 kB
        4. node1.cnf
          0.8 kB
        5. node2.cnf
          0.8 kB

        Issue Links

          Activity

            ihti Ihtisham ul Haq created issue -
            ihti Ihtisham ul Haq made changes -
            Field Original Value New Value
            Description After upgrading initially from 10.6->10.11->11.0.3 and now to 11.0.4 we see that 2 of the members start up without any issues but the 3rd member(db-0) fails to start up due to an old sequence no.(117376) that i believe is being passed on from the donor.

            Logs from member db-0:

            {code:bash}
            [Note] WSREP: SST received
            [Note] WSREP: Server status change joiner -> initializing
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] mysqld: Aria engine: starting recovery
            recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1(0.0 seconds);
            [Note] mysqld: Aria engine: recovery done
            [Note] InnoDB: Compressed tables use zlib 1.2.11
            [Note] InnoDB: Number of transaction pools: 1
            [Note] InnoDB: Using crc32 + pclmulqdq instructions
            [Note] InnoDB: Using Linux native AIO
            [Note] InnoDB: Initializing buffer pool, total size = 2.000GiB, chunk size = 32.000MiB
            [Note] InnoDB: Completed initialization of buffer pool
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: End of log at LSN=180964319
            [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=180964319
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            [Note] InnoDB: Data file .//undo001 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo001 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo002 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo002 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo003 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo003 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
            [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
            [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
            [Note] InnoDB: log sequence number 180964319; transaction id 73558
            [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
            [Note] InnoDB: Cannot open '/bitnami/mariadb/data/ib_buffer_pool' for reading: No such file or directory
            [Note] Plugin 'FEEDBACK' is disabled.
            [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Note] Recovering after a crash using mysql-bin
            [Note] Starting table crash recovery...
            [Note] Crash table recovery finished.
            [Note] Server socket created on IP: '0.0.0.0'.
            [Warning] 'proxies_priv' entry '@% root@db-0' ignored in --skip-name-resolve mode.
            [Note] WSREP: wsrep_init_schema_and_SR (nil)
            [Note] WSREP: Server initialized
            [Note] WSREP: Server status change initializing -> initialized
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] WSREP: Recovered position from storage: 6aa53efc-db72-11ec-880f-a282ce494905:117376
            [Note] WSREP: Starting applier thread 6
            [Note] WSREP: Starting applier thread 7
            [Note] WSREP: Starting applier thread 8
            [Note] WSREP: Recovered view from SST:
              id: 6aa53efc-db72-11ec-880f-a282ce494905:128934
              status: primary
              protocol_version: 4
              capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
              final: no
              own_index: 0
              members(3):
            0: 46250027-95c8-11ee-9d94-06eb82413a07, db
            1: 4ed771b5-941f-11ee-82bf-56ec952abcf6, db
            2: bfa6e688-941f-11ee-97c5-d323f02068c1, db

            [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
            [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
            [Note] WSREP: Joiner monitor thread ended with total time 14 sec
            [ERROR] WSREP: Application received wrong state:
            Received: 00000000-0000-0000-0000-000000000000
            Required: 6aa53efc-db72-11ec-880f-a282ce494905
            [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
            [Note] WSREP: ReplicatorSMM::abort()
            [Note] WSREP: Closing send monitor...
            [Note] WSREP: Closed send monitor.
            [Note] WSREP: gcomm: terminating thread
            [Note] WSREP: gcomm: joining thread
            [Note] WSREP: gcomm: closing backend
            [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.
            Version: '11.0.3-MariaDB-log' socket: '/opt/bitnami/mariadb/tmp/mysql.sock' port: 3306 Source distribution
            [Note] WSREP: view(view_id(NON_PRIM,46250027-9d94,6636) memb {
            46250027-9d94,0
            } joined {
            } left {
            } partitioned {
            4ed771b5-82bf,0
            bfa6e688-97c5,0
            })
            [Note] WSREP: PC protocol downgrade 1 -> 0
            [Note] WSREP: view((empty))
            [Note] WSREP: gcomm: closed
            [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            [Note] WSREP: Flow-control interval: [128, 160]
            [Note] WSREP: Received NON-PRIMARY.
            [Note] WSREP: Shifting JOINER -> OPEN (TO: 128934)
            [Note] WSREP: New SELF-LEAVE.
            [Note] WSREP: Flow-control interval: [0, 0]
            [Note] WSREP: Received SELF-LEAVE. Closing connection.
            [Note] WSREP: Shifting OPEN -> CLOSED (TO: 128934)
            [Note] WSREP: RECV thread exiting 0: Success
            [Note] WSREP: recv_thread() joined.
            [Note] WSREP: Closing replication queue.
            [Note] WSREP: Closing slave action queue.
            [Note] WSREP: /opt/bitnami/mariadb/sbin/mysqld: Terminated.
            {code}
            After upgrading initially from 10.6->10.11->11.0.3 and now to 11.0.4 we see that 2 of the members start up without any issues but the 3rd member(db-0) fails to start up due to an old sequence no.(117376) that i believe is being passed on from the donor.

            We are unable to fine the old seqno any where except in the `ibdata1` file of the donor(by searching for the hex of it). And not sure how to get rid of this old seqno.
            What we did try is starting db-0 with fresh volume by removing K8s PVC for it, so it does a SST but that fails as well.

            Logs from member db-0:

            {code:bash}
            [Note] WSREP: SST received
            [Note] WSREP: Server status change joiner -> initializing
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] mysqld: Aria engine: starting recovery
            recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1(0.0 seconds);
            [Note] mysqld: Aria engine: recovery done
            [Note] InnoDB: Compressed tables use zlib 1.2.11
            [Note] InnoDB: Number of transaction pools: 1
            [Note] InnoDB: Using crc32 + pclmulqdq instructions
            [Note] InnoDB: Using Linux native AIO
            [Note] InnoDB: Initializing buffer pool, total size = 2.000GiB, chunk size = 32.000MiB
            [Note] InnoDB: Completed initialization of buffer pool
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: End of log at LSN=180964319
            [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=180964319
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            [Note] InnoDB: Data file .//undo001 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo001 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo002 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo002 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo003 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo003 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
            [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
            [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
            [Note] InnoDB: log sequence number 180964319; transaction id 73558
            [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
            [Note] InnoDB: Cannot open '/bitnami/mariadb/data/ib_buffer_pool' for reading: No such file or directory
            [Note] Plugin 'FEEDBACK' is disabled.
            [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Note] Recovering after a crash using mysql-bin
            [Note] Starting table crash recovery...
            [Note] Crash table recovery finished.
            [Note] Server socket created on IP: '0.0.0.0'.
            [Warning] 'proxies_priv' entry '@% root@db-0' ignored in --skip-name-resolve mode.
            [Note] WSREP: wsrep_init_schema_and_SR (nil)
            [Note] WSREP: Server initialized
            [Note] WSREP: Server status change initializing -> initialized
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] WSREP: Recovered position from storage: 6aa53efc-db72-11ec-880f-a282ce494905:117376
            [Note] WSREP: Starting applier thread 6
            [Note] WSREP: Starting applier thread 7
            [Note] WSREP: Starting applier thread 8
            [Note] WSREP: Recovered view from SST:
              id: 6aa53efc-db72-11ec-880f-a282ce494905:128934
              status: primary
              protocol_version: 4
              capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
              final: no
              own_index: 0
              members(3):
            0: 46250027-95c8-11ee-9d94-06eb82413a07, db
            1: 4ed771b5-941f-11ee-82bf-56ec952abcf6, db
            2: bfa6e688-941f-11ee-97c5-d323f02068c1, db

            [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
            [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
            [Note] WSREP: Joiner monitor thread ended with total time 14 sec
            [ERROR] WSREP: Application received wrong state:
            Received: 00000000-0000-0000-0000-000000000000
            Required: 6aa53efc-db72-11ec-880f-a282ce494905
            [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
            [Note] WSREP: ReplicatorSMM::abort()
            [Note] WSREP: Closing send monitor...
            [Note] WSREP: Closed send monitor.
            [Note] WSREP: gcomm: terminating thread
            [Note] WSREP: gcomm: joining thread
            [Note] WSREP: gcomm: closing backend
            [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.
            Version: '11.0.3-MariaDB-log' socket: '/opt/bitnami/mariadb/tmp/mysql.sock' port: 3306 Source distribution
            [Note] WSREP: view(view_id(NON_PRIM,46250027-9d94,6636) memb {
            46250027-9d94,0
            } joined {
            } left {
            } partitioned {
            4ed771b5-82bf,0
            bfa6e688-97c5,0
            })
            [Note] WSREP: PC protocol downgrade 1 -> 0
            [Note] WSREP: view((empty))
            [Note] WSREP: gcomm: closed
            [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            [Note] WSREP: Flow-control interval: [128, 160]
            [Note] WSREP: Received NON-PRIMARY.
            [Note] WSREP: Shifting JOINER -> OPEN (TO: 128934)
            [Note] WSREP: New SELF-LEAVE.
            [Note] WSREP: Flow-control interval: [0, 0]
            [Note] WSREP: Received SELF-LEAVE. Closing connection.
            [Note] WSREP: Shifting OPEN -> CLOSED (TO: 128934)
            [Note] WSREP: RECV thread exiting 0: Success
            [Note] WSREP: recv_thread() joined.
            [Note] WSREP: Closing replication queue.
            [Note] WSREP: Closing slave action queue.
            [Note] WSREP: /opt/bitnami/mariadb/sbin/mysqld: Terminated.
            {code}
            Environment bitnami/mariadb-galera:11.0.4-debian-11-r0 bitnami/mariadb-galera:11.0.4-debian-11-r0 in k8s cluster
            ihti Ihtisham ul Haq made changes -
            Description After upgrading initially from 10.6->10.11->11.0.3 and now to 11.0.4 we see that 2 of the members start up without any issues but the 3rd member(db-0) fails to start up due to an old sequence no.(117376) that i believe is being passed on from the donor.

            We are unable to fine the old seqno any where except in the `ibdata1` file of the donor(by searching for the hex of it). And not sure how to get rid of this old seqno.
            What we did try is starting db-0 with fresh volume by removing K8s PVC for it, so it does a SST but that fails as well.

            Logs from member db-0:

            {code:bash}
            [Note] WSREP: SST received
            [Note] WSREP: Server status change joiner -> initializing
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] mysqld: Aria engine: starting recovery
            recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1(0.0 seconds);
            [Note] mysqld: Aria engine: recovery done
            [Note] InnoDB: Compressed tables use zlib 1.2.11
            [Note] InnoDB: Number of transaction pools: 1
            [Note] InnoDB: Using crc32 + pclmulqdq instructions
            [Note] InnoDB: Using Linux native AIO
            [Note] InnoDB: Initializing buffer pool, total size = 2.000GiB, chunk size = 32.000MiB
            [Note] InnoDB: Completed initialization of buffer pool
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: End of log at LSN=180964319
            [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=180964319
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            [Note] InnoDB: Data file .//undo001 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo001 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo002 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo002 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo003 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo003 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
            [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
            [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
            [Note] InnoDB: log sequence number 180964319; transaction id 73558
            [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
            [Note] InnoDB: Cannot open '/bitnami/mariadb/data/ib_buffer_pool' for reading: No such file or directory
            [Note] Plugin 'FEEDBACK' is disabled.
            [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Note] Recovering after a crash using mysql-bin
            [Note] Starting table crash recovery...
            [Note] Crash table recovery finished.
            [Note] Server socket created on IP: '0.0.0.0'.
            [Warning] 'proxies_priv' entry '@% root@db-0' ignored in --skip-name-resolve mode.
            [Note] WSREP: wsrep_init_schema_and_SR (nil)
            [Note] WSREP: Server initialized
            [Note] WSREP: Server status change initializing -> initialized
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] WSREP: Recovered position from storage: 6aa53efc-db72-11ec-880f-a282ce494905:117376
            [Note] WSREP: Starting applier thread 6
            [Note] WSREP: Starting applier thread 7
            [Note] WSREP: Starting applier thread 8
            [Note] WSREP: Recovered view from SST:
              id: 6aa53efc-db72-11ec-880f-a282ce494905:128934
              status: primary
              protocol_version: 4
              capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
              final: no
              own_index: 0
              members(3):
            0: 46250027-95c8-11ee-9d94-06eb82413a07, db
            1: 4ed771b5-941f-11ee-82bf-56ec952abcf6, db
            2: bfa6e688-941f-11ee-97c5-d323f02068c1, db

            [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
            [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
            [Note] WSREP: Joiner monitor thread ended with total time 14 sec
            [ERROR] WSREP: Application received wrong state:
            Received: 00000000-0000-0000-0000-000000000000
            Required: 6aa53efc-db72-11ec-880f-a282ce494905
            [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
            [Note] WSREP: ReplicatorSMM::abort()
            [Note] WSREP: Closing send monitor...
            [Note] WSREP: Closed send monitor.
            [Note] WSREP: gcomm: terminating thread
            [Note] WSREP: gcomm: joining thread
            [Note] WSREP: gcomm: closing backend
            [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.
            Version: '11.0.3-MariaDB-log' socket: '/opt/bitnami/mariadb/tmp/mysql.sock' port: 3306 Source distribution
            [Note] WSREP: view(view_id(NON_PRIM,46250027-9d94,6636) memb {
            46250027-9d94,0
            } joined {
            } left {
            } partitioned {
            4ed771b5-82bf,0
            bfa6e688-97c5,0
            })
            [Note] WSREP: PC protocol downgrade 1 -> 0
            [Note] WSREP: view((empty))
            [Note] WSREP: gcomm: closed
            [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            [Note] WSREP: Flow-control interval: [128, 160]
            [Note] WSREP: Received NON-PRIMARY.
            [Note] WSREP: Shifting JOINER -> OPEN (TO: 128934)
            [Note] WSREP: New SELF-LEAVE.
            [Note] WSREP: Flow-control interval: [0, 0]
            [Note] WSREP: Received SELF-LEAVE. Closing connection.
            [Note] WSREP: Shifting OPEN -> CLOSED (TO: 128934)
            [Note] WSREP: RECV thread exiting 0: Success
            [Note] WSREP: recv_thread() joined.
            [Note] WSREP: Closing replication queue.
            [Note] WSREP: Closing slave action queue.
            [Note] WSREP: /opt/bitnami/mariadb/sbin/mysqld: Terminated.
            {code}
            After upgrading initially from 10.6->10.11->11.0.3 and now to 11.0.4 we see that 2 of the members start up without any issues but the 3rd member(db-0) fails to start up due to an old sequence no.(117376) that i believe is being passed on from the donor.

            We are unable to find the old seqno anywhere except in the `ibdata1` file of the donor(by searching for the hex of it). But not sure how to get rid of this old seqno.
            What we did try is starting db-0 with fresh volume by removing K8s PVC for it, so it does a SST but that fails as well.

            Logs from member db-0:

            {code:bash}
            [Note] WSREP: SST received
            [Note] WSREP: Server status change joiner -> initializing
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] mysqld: Aria engine: starting recovery
            recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1(0.0 seconds);
            [Note] mysqld: Aria engine: recovery done
            [Note] InnoDB: Compressed tables use zlib 1.2.11
            [Note] InnoDB: Number of transaction pools: 1
            [Note] InnoDB: Using crc32 + pclmulqdq instructions
            [Note] InnoDB: Using Linux native AIO
            [Note] InnoDB: Initializing buffer pool, total size = 2.000GiB, chunk size = 32.000MiB
            [Note] InnoDB: Completed initialization of buffer pool
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: End of log at LSN=180964319
            [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=180964319
            [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
            [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            [Note] InnoDB: Data file .//undo001 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo001 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo002 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo002 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: Data file .//undo003 did not exist: new to be created
            [Note] InnoDB: Setting file .//undo003 size to 10.000MiB
            [Note] InnoDB: Database physically writes the file full: wait...
            [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
            [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
            [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
            [Note] InnoDB: log sequence number 180964319; transaction id 73558
            [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
            [Note] InnoDB: Cannot open '/bitnami/mariadb/data/ib_buffer_pool' for reading: No such file or directory
            [Note] Plugin 'FEEDBACK' is disabled.
            [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
            [Note] Recovering after a crash using mysql-bin
            [Note] Starting table crash recovery...
            [Note] Crash table recovery finished.
            [Note] Server socket created on IP: '0.0.0.0'.
            [Warning] 'proxies_priv' entry '@% root@db-0' ignored in --skip-name-resolve mode.
            [Note] WSREP: wsrep_init_schema_and_SR (nil)
            [Note] WSREP: Server initialized
            [Note] WSREP: Server status change initializing -> initialized
            [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
            [Note] WSREP: Recovered position from storage: 6aa53efc-db72-11ec-880f-a282ce494905:117376
            [Note] WSREP: Starting applier thread 6
            [Note] WSREP: Starting applier thread 7
            [Note] WSREP: Starting applier thread 8
            [Note] WSREP: Recovered view from SST:
              id: 6aa53efc-db72-11ec-880f-a282ce494905:128934
              status: primary
              protocol_version: 4
              capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
              final: no
              own_index: 0
              members(3):
            0: 46250027-95c8-11ee-9d94-06eb82413a07, db
            1: 4ed771b5-941f-11ee-82bf-56ec952abcf6, db
            2: bfa6e688-941f-11ee-97c5-d323f02068c1, db

            [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
            [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
            [Note] WSREP: Joiner monitor thread ended with total time 14 sec
            [ERROR] WSREP: Application received wrong state:
            Received: 00000000-0000-0000-0000-000000000000
            Required: 6aa53efc-db72-11ec-880f-a282ce494905
            [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
            [Note] WSREP: ReplicatorSMM::abort()
            [Note] WSREP: Closing send monitor...
            [Note] WSREP: Closed send monitor.
            [Note] WSREP: gcomm: terminating thread
            [Note] WSREP: gcomm: joining thread
            [Note] WSREP: gcomm: closing backend
            [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.
            Version: '11.0.3-MariaDB-log' socket: '/opt/bitnami/mariadb/tmp/mysql.sock' port: 3306 Source distribution
            [Note] WSREP: view(view_id(NON_PRIM,46250027-9d94,6636) memb {
            46250027-9d94,0
            } joined {
            } left {
            } partitioned {
            4ed771b5-82bf,0
            bfa6e688-97c5,0
            })
            [Note] WSREP: PC protocol downgrade 1 -> 0
            [Note] WSREP: view((empty))
            [Note] WSREP: gcomm: closed
            [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
            [Note] WSREP: Flow-control interval: [128, 160]
            [Note] WSREP: Received NON-PRIMARY.
            [Note] WSREP: Shifting JOINER -> OPEN (TO: 128934)
            [Note] WSREP: New SELF-LEAVE.
            [Note] WSREP: Flow-control interval: [0, 0]
            [Note] WSREP: Received SELF-LEAVE. Closing connection.
            [Note] WSREP: Shifting OPEN -> CLOSED (TO: 128934)
            [Note] WSREP: RECV thread exiting 0: Success
            [Note] WSREP: recv_thread() joined.
            [Note] WSREP: Closing replication queue.
            [Note] WSREP: Closing slave action queue.
            [Note] WSREP: /opt/bitnami/mariadb/sbin/mysqld: Terminated.
            {code}
            rgpublic Ranjan Ghosh added a comment -

            I have the same problem and I didn't even update from a very old version. Somehow I cannot start the second node anymore even if I wipe /var/lib/mysql completely on the joining node. The RSync-SST finishes but then I get: "SST script passed bogus GTID" and the node fails. How can I reset this and get out of this situation?

            rgpublic Ranjan Ghosh added a comment - I have the same problem and I didn't even update from a very old version. Somehow I cannot start the second node anymore even if I wipe /var/lib/mysql completely on the joining node. The RSync-SST finishes but then I get: "SST script passed bogus GTID" and the node fails. How can I reset this and get out of this situation?

            What worked was bootstrapping the cluster. And then manual SST for the affected node(probably the SST on startup would also work but we didn't try).

            We have other databases which have this issue as well(with 11.0.3) and we thought upgrading to 11.0.4 might fix it due this change. But the issue can be seen even with the change, and we don't want to upgrade/restart those due to risk of running in to the same issue.

            ihti Ihtisham ul Haq added a comment - What worked was bootstrapping the cluster. And then manual SST for the affected node(probably the SST on startup would also work but we didn't try). We have other databases which have this issue as well(with 11.0.3) and we thought upgrading to 11.0.4 might fix it due this change . But the issue can be seen even with the change, and we don't want to upgrade/restart those due to risk of running in to the same issue.
            ihti Ihtisham ul Haq made changes -
            Priority Major [ 3 ] Critical [ 2 ]

            @RanjanGhosh Have you been able to reproduce this issue, by any chance?

            ihti Ihtisham ul Haq added a comment - @RanjanGhosh Have you been able to reproduce this issue, by any chance?
            rgpublic Ranjan Ghosh added a comment -

            @ihti: I haven't been able to reproduce it but your tip to bootstrap the whole cluster was also what worked for me. I needed to shut down all nodes and start booting them up one-by-one from scratch. It's quite surprising because it's the first time I can think of that could not be solved by delete /var/lib/mysql on the second node if the first node seems to be running without any problems. Or put differently: You cannot somehow "see" that the first node is in a weird state that doesn't allow the second node to come on. Everything seems normal. The first node is running. It accepts queries etc. And then you try to start the second node and it just won't work. Even if you delete /var/lib/mysql completely - it is impossible to bring up the second node. I tried it multiple times. Only after restarting/bootstrapping the first node, things start to work again.

            Googling didn't immediately yield an answer for me so I hope people will find this now looking for:

            SST script passed bogus GTID

            rgpublic Ranjan Ghosh added a comment - @ihti: I haven't been able to reproduce it but your tip to bootstrap the whole cluster was also what worked for me. I needed to shut down all nodes and start booting them up one-by-one from scratch. It's quite surprising because it's the first time I can think of that could not be solved by delete /var/lib/mysql on the second node if the first node seems to be running without any problems. Or put differently: You cannot somehow "see" that the first node is in a weird state that doesn't allow the second node to come on. Everything seems normal. The first node is running. It accepts queries etc. And then you try to start the second node and it just won't work. Even if you delete /var/lib/mysql completely - it is impossible to bring up the second node. I tried it multiple times. Only after restarting/bootstrapping the first node, things start to work again. Googling didn't immediately yield an answer for me so I hope people will find this now looking for: SST script passed bogus GTID
            ihti Ihtisham ul Haq made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]

            Steps to reproduce the issue:

            1. Run mariadb-galera cluster with 3 peers in version 10.11.6
            We use docker images bitnami/mariadb-galera:10.11.6-debian-11-r0
            ```
            myname="$(hostname -f)";
            export MARIADB_GALERA_NODE_NAME="$myname"
            export MARIADB_EXTRA_FLAGS="--wsrep-sst-receive-address=$myname"
            export MARIADB_GALERA_CLUSTER_ADDRESS="gcomm://$LIST_OF_PEERS
            /opt/bitnami/scripts/mariadb-galera/entrypoint.sh /opt/bitnami/scripts/mariadb-galera/run.sh
            ```
            2. Create some test entries in the database.
            3. Stop one peer and upgrade it to 11.0.4 (e.g. docker image bitnami/mariadb-galera:11.0.4-debian-11-r1)
            4. Create some more entries in the database.
            5. Start the updated peer, it will fail.

            ihti Ihtisham ul Haq added a comment - Steps to reproduce the issue: 1. Run mariadb-galera cluster with 3 peers in version 10.11.6 We use docker images bitnami/mariadb-galera:10.11.6-debian-11-r0 ``` myname="$(hostname -f)"; export MARIADB_GALERA_NODE_NAME="$myname" export MARIADB_EXTRA_FLAGS="--wsrep-sst-receive-address=$myname" export MARIADB_GALERA_CLUSTER_ADDRESS="gcomm://$LIST_OF_PEERS /opt/bitnami/scripts/mariadb-galera/entrypoint.sh /opt/bitnami/scripts/mariadb-galera/run.sh ``` 2. Create some test entries in the database. 3. Stop one peer and upgrade it to 11.0.4 (e.g. docker image bitnami/mariadb-galera:11.0.4-debian-11-r1) 4. Create some more entries in the database. 5. Start the updated peer, it will fail.
            maxlamprecht Max Lamprecht added a comment - - edited

            We have narrowed it down to commit: https://github.com/MariaDB/server/commit/44dce3b2077e64a1efc607668d0d7b42a7c4ee78

            If we set innodb_undo_tablespaces to 0 the start works.

            maxlamprecht Max Lamprecht added a comment - - edited We have narrowed it down to commit: https://github.com/MariaDB/server/commit/44dce3b2077e64a1efc607668d0d7b42a7c4ee78 If we set innodb_undo_tablespaces to 0 the start works.
            serg Sergei Golubchik made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Assignee Marko Mäkelä [ marko ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            maxlamprecht Max Lamprecht added a comment -

            We figured out that we are not able to change the innodb_undo_tablespaces setting online to 3 with the mariabackup wsrep_sst method. This is because with mariabackup there is no clean shutdown of the innodb donor.

            We used wsrep_sst_method=mysqldump as a workaround to migrate our galera clusters to the new default

            maxlamprecht Max Lamprecht added a comment - We figured out that we are not able to change the innodb_undo_tablespaces setting online to 3 with the mariabackup wsrep_sst method. This is because with mariabackup there is no clean shutdown of the innodb donor. We used wsrep_sst_method=mysqldump as a workaround to migrate our galera clusters to the new default

            Thank you for the detailed bug report, including clear steps to reproduce.

            MDEV-29986 indeed changed innodb_undo_tablespaces=3 by default. This was possible thanks to MDEV-19229, which allows the parameter to be changed at server startup, provided that the undo logs are logically empty. Usually this is guaranteed by a prior shutdown with innodb_fast_shutdown=0.

            It is unclear to me why the following error would be caused by a change of innodb_undo_tablespaces:

            [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
            

            InnoDB does not store any GTID information in its internal data structures, such as undo logs. For normal replication, there is a table mysql.gtid_slave_pos, which is in InnoDB format by default. I am not familiar with the GTID and how it plays with Galera.

            For Galera, we store a wsrep_checkpoint in a rollback segment header. The storage format was last changed in MariaDB Server 10.3, related to MDEV-15132 and MDEV-15158. In MDEV-29986, in the function srv_undo_tablespaces_reinit(), we made effort to preserve this information. Maybe there is some flaw in this logic?

            I’m assigning this to the Galera developers for root cause analysis.

            marko Marko Mäkelä added a comment - Thank you for the detailed bug report, including clear steps to reproduce. MDEV-29986 indeed changed innodb_undo_tablespaces=3 by default. This was possible thanks to MDEV-19229 , which allows the parameter to be changed at server startup, provided that the undo logs are logically empty. Usually this is guaranteed by a prior shutdown with innodb_fast_shutdown=0 . It is unclear to me why the following error would be caused by a change of innodb_undo_tablespaces : [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934 InnoDB does not store any GTID information in its internal data structures, such as undo logs. For normal replication, there is a table mysql.gtid_slave_pos , which is in InnoDB format by default. I am not familiar with the GTID and how it plays with Galera. For Galera, we store a wsrep_checkpoint in a rollback segment header. The storage format was last changed in MariaDB Server 10.3, related to MDEV-15132 and MDEV-15158 . In MDEV-29986 , in the function srv_undo_tablespaces_reinit() , we made effort to preserve this information. Maybe there is some flaw in this logic? I’m assigning this to the Galera developers for root cause analysis.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Component/s Galera [ 10124 ]
            Component/s Galera SST [ 10121 ]
            Component/s Storage Engine - InnoDB [ 10129 ]
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.4 [ 29301 ]
            Affects Version/s 11.0.2 [ 28706 ]
            Affects Version/s 11.0.1 [ 28548 ]
            Assignee Marko Mäkelä [ marko ] Seppo Jaakola [ seppo ]
            Labels regression upgrade
            janlindstrom Jan Lindström made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            janlindstrom Jan Lindström made changes -
            Attachment node1.err [ 73220 ]
            Attachment node2.err [ 73221 ]
            janlindstrom Jan Lindström made changes -
            Attachment binlog.000002 [ 73222 ]
            Attachment binlog.000003 [ 73223 ]
            Attachment binlog.index [ 73224 ]
            Attachment donor_galera_info [ 73225 ]
            Attachment mariabackup.move.log [ 73226 ]
            Attachment mariabackup.prepare.log [ 73227 ]
            Attachment mariadb_backup_galera_info [ 73228 ]
            Attachment xtrabackup_binlog_pos_innodb [ 73229 ]

            marko I tested with 2-node cluster so that node1 uses 10.11 and node2 uses 11.3. I can reproduce the issue i.e. node2 does not join the cluster. I can't reproduce if wsrep_sst_method=rsync or if innodb-undo-tablespaces=3 used on 10.11. Therefore, either mariabackup or server behaves incorrectly on upgrade 10.11 --> 11.3 and it seems to be related on fact that innodb-undo-tablespaces=3 by default on 11.3. From logs I see that joiner does send correct gtid to donor and this information is sent correctly to joiner. However, for some reason either on mariabackup --prepare or next server startup wrong gtid is read from storage. From extended logs I can see that donor has correct gtid stored in storage at the point when mariabackup --backup is done. I'm not familiar how this creation of undo log tablespaces work on upgrade but based on my analysis something goes wrong.

            janlindstrom Jan Lindström added a comment - marko I tested with 2-node cluster so that node1 uses 10.11 and node2 uses 11.3. I can reproduce the issue i.e. node2 does not join the cluster. I can't reproduce if wsrep_sst_method=rsync or if innodb-undo-tablespaces=3 used on 10.11. Therefore, either mariabackup or server behaves incorrectly on upgrade 10.11 --> 11.3 and it seems to be related on fact that innodb-undo-tablespaces=3 by default on 11.3. From logs I see that joiner does send correct gtid to donor and this information is sent correctly to joiner. However, for some reason either on mariabackup --prepare or next server startup wrong gtid is read from storage. From extended logs I can see that donor has correct gtid stored in storage at the point when mariabackup --backup is done. I'm not familiar how this creation of undo log tablespaces work on upgrade but based on my analysis something goes wrong.
            janlindstrom Jan Lindström made changes -
            Assignee Seppo Jaakola [ seppo ] Marko Mäkelä [ marko ]

            janlindstrom, can you provide rr replay traces that cover the snapshot transfer and the subsequent failure?

            marko Marko Mäkelä added a comment - janlindstrom , can you provide rr replay traces that cover the snapshot transfer and the subsequent failure?
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            janlindstrom Jan Lindström made changes -
            Attachment binlog.000002 [ 73222 ]
            janlindstrom Jan Lindström made changes -
            Attachment binlog.000003 [ 73223 ]
            janlindstrom Jan Lindström made changes -
            Attachment binlog.index [ 73224 ]
            janlindstrom Jan Lindström made changes -
            Attachment donor_galera_info [ 73225 ]
            janlindstrom Jan Lindström made changes -
            Attachment mariabackup.move.log [ 73226 ]
            janlindstrom Jan Lindström made changes -
            Attachment mariabackup.prepare.log [ 73227 ]
            janlindstrom Jan Lindström made changes -
            Attachment mariadb_backup_galera_info [ 73228 ]
            janlindstrom Jan Lindström made changes -
            Attachment node1.err [ 73220 ]
            janlindstrom Jan Lindström made changes -
            Attachment node2.err [ 73221 ]
            janlindstrom Jan Lindström made changes -
            Attachment xtrabackup_binlog_pos_innodb [ 73229 ]
            janlindstrom Jan Lindström made changes -
            Attachment datadir.tgz [ 73231 ]

            marko rr is not working on my kernel (or more precisely perf). I have provided full datadir, let me see if I can provide same before and after --prepare.

            janlindstrom Jan Lindström added a comment - marko rr is not working on my kernel (or more precisely perf). I have provided full datadir, let me see if I can provide same before and after --prepare.
            janlindstrom Jan Lindström made changes -
            Attachment node2_before_prepare.tgz [ 73232 ]
            janlindstrom Jan Lindström made changes -
            Attachment node2_after_prepare.tgz [ 73233 ]

            marko Attached node2 datadir before and after mariabackup --prepare

            janlindstrom Jan Lindström added a comment - marko Attached node2 datadir before and after mariabackup --prepare

            janlindstrom, which difference should we be looking at? To diagnose this, I would ideally use an rr replay trace of the wsrep_sst_method=mariabackup donor server as well as a copy of the files created by mariadb-backup --backup, and a detailed description of the fault of the restored backup. It could be useful to have an rr replay trace of the snapshot recipient as well.

            marko Marko Mäkelä added a comment - janlindstrom , which difference should we be looking at? To diagnose this, I would ideally use an rr replay trace of the wsrep_sst_method=mariabackup donor server as well as a copy of the files created by mariadb-backup --backup , and a detailed description of the fault of the restored backup. It could be useful to have an rr replay trace of the snapshot recipient as well.

            janlindstrom, as far as I can tell, node2_before_prepare.tgz contains a backup of a data directory that had been created with a non-default setting innodb_undo_tablespaces=0. My invocation of applying log to it:

            _RR_TRACE_DIR=rr rr record \
            mariadb-backup --prepare --target-dir node2/.sst
            

            finishes as follows:

            2024-03-04 11:42:57 0 [Note] InnoDB: End of log at LSN=76798
            2024-03-04 11:42:57 0 [Note] InnoDB: To recover: 158 pages
            2024-03-04 11:42:57 0 [Note] InnoDB: Last binlog file './binlog.000001', position 1506
            [00] 2024-03-04 11:42:57 Last binlog file ./binlog.000001, position 1506
            [00] 2024-03-04 11:42:57 mariabackup: Recovered WSREP position: e8b3b373-d798-11ee-a970-8f678cb12371:8 domain_id: 0
             
            [00] 2024-03-04 11:42:57 completed OK!
            

            The WSREP position information was fetched by trx_rseg_read_wsrep_checkpoint(), and the extra blank line is part of the msg() call in xb_write_galera_info(). As far as I understand, no information was lost at this point yet. How should I invoke the server on this data directory, and where and how should I check for the unexpected result?

            marko Marko Mäkelä added a comment - janlindstrom , as far as I can tell, node2_before_prepare.tgz contains a backup of a data directory that had been created with a non-default setting innodb_undo_tablespaces=0 . My invocation of applying log to it: _RR_TRACE_DIR=rr rr record \ mariadb-backup --prepare --target-dir node2/.sst finishes as follows: 2024-03-04 11:42:57 0 [Note] InnoDB: End of log at LSN=76798 2024-03-04 11:42:57 0 [Note] InnoDB: To recover: 158 pages 2024-03-04 11:42:57 0 [Note] InnoDB: Last binlog file './binlog.000001', position 1506 [00] 2024-03-04 11:42:57 Last binlog file ./binlog.000001, position 1506 [00] 2024-03-04 11:42:57 mariabackup: Recovered WSREP position: e8b3b373-d798-11ee-a970-8f678cb12371:8 domain_id: 0   [00] 2024-03-04 11:42:57 completed OK! The WSREP position information was fetched by trx_rseg_read_wsrep_checkpoint() , and the extra blank line is part of the msg() call in xb_write_galera_info() . As far as I understand, no information was lost at this point yet. How should I invoke the server on this data directory, and where and how should I check for the unexpected result?

            I see that trx_sys_t::reset_page(), which initializes the page of the first rollback segment in the system tablespace, fails to write back the information that trx_rseg_read_wsrep_checkpoint() is looking for.

            marko Marko Mäkelä added a comment - I see that trx_sys_t::reset_page() , which initializes the page of the first rollback segment in the system tablespace, fails to write back the information that trx_rseg_read_wsrep_checkpoint() is looking for.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.3 [ 28565 ]
            Affects Version/s 10.11 [ 27614 ]
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]

            Sorry, I realized that srv_undo_tablespaces_reinit(), which was added in MDEV-19229, does make an effort to preserve the additional Galera metadata. It is part of the first mini-transaction that will reinitialize the TRX_SYS page and all rollback segment header pages for innodb_undo_tablespaces=0:

            #ifdef WITH_WSREP
                if (!trx_sys.recovered_wsrep_xid.is_null())
                  trx_rseg_update_wsrep_checkpoint(
                    first_rseg_hdr, &trx_sys.recovered_wsrep_xid, &mtr);
            #endif /* WITH_WSREP */
              }
             
              dict_hdr->page.fix();
             
              mtr.commit();
            

            A subsequent mini-transaction will initialize the undo log tablespaces.

            While reviewing the logic, I noticed that we probably fail to allocate all undo tablespace IDs:

              if (fil_assign_new_space_id(&srv_undo_space_id_start))
                mtr.write<4>(*dict_hdr, DICT_HDR + DICT_HDR_MAX_SPACE_ID
                             + dict_hdr->page.frame, srv_undo_space_id_start);
            

            This is only advancing the persistent sequence by 1, instead of advancing it by srv_undo_tablespaces.

            marko Marko Mäkelä added a comment - Sorry, I realized that srv_undo_tablespaces_reinit() , which was added in MDEV-19229 , does make an effort to preserve the additional Galera metadata. It is part of the first mini-transaction that will reinitialize the TRX_SYS page and all rollback segment header pages for innodb_undo_tablespaces=0 : #ifdef WITH_WSREP if (!trx_sys.recovered_wsrep_xid.is_null()) trx_rseg_update_wsrep_checkpoint( first_rseg_hdr, &trx_sys.recovered_wsrep_xid, &mtr); #endif /* WITH_WSREP */ }   dict_hdr->page.fix();   mtr.commit(); A subsequent mini-transaction will initialize the undo log tablespaces. While reviewing the logic, I noticed that we probably fail to allocate all undo tablespace IDs: if (fil_assign_new_space_id(&srv_undo_space_id_start)) mtr.write<4>(*dict_hdr, DICT_HDR + DICT_HDR_MAX_SPACE_ID + dict_hdr->page.frame, srv_undo_space_id_start); This is only advancing the persistent sequence by 1, instead of advancing it by srv_undo_tablespaces .
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            janlindstrom Jan Lindström made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            janlindstrom Jan Lindström added a comment - 10.4 : https://github.com/MariaDB/server/pull/3121 10.11 : https://github.com/MariaDB/server/pull/3122
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            The bug occurs due to a failure to read the Galera metadata on startup, before the MDEV-19229 logic for changing innodb_undo_tablespaces (MariaDB Server 10.11) is executed.

            It is clear that this needs to be fixed in 10.11. It is also somewhat understandable to fix this in earlier versions. Do we understand the exact impact of this outside MDEV-19229? Could this explain some Galera SST failures or inconsistencies in earlier versions?

            marko Marko Mäkelä added a comment - The bug occurs due to a failure to read the Galera metadata on startup, before the MDEV-19229 logic for changing innodb_undo_tablespaces (MariaDB Server 10.11) is executed. It is clear that this needs to be fixed in 10.11. It is also somewhat understandable to fix this in earlier versions. Do we understand the exact impact of this outside MDEV-19229 ? Could this explain some Galera SST failures or inconsistencies in earlier versions?
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            janlindstrom Jan Lindström added a comment - Review comments addressed in : https://github.com/MariaDB/server/commit/713f6b89be0a63553f1bdf9984d4567aa8de1f50
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            I would like to see a regression test that fails before the code change and passes with the change. The current one was passing for me also without the code change.

            marko Marko Mäkelä added a comment - I would like to see a regression test that fails before the code change and passes with the change. The current one was passing for me also without the code change.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            janlindstrom Jan Lindström added a comment - - edited

            marko It seems I do not know how to create one. If I manually start node1 with innodb-undo-tablespaces=0 , create sst user, table and some data, and finally start node2 with innodb-undo-tablespaces=3 it does fail wihtout the code change. If I use mtr even if I empty node2 datadir totally, I do not see the failure.

            janlindstrom Jan Lindström added a comment - - edited marko It seems I do not know how to create one. If I manually start node1 with innodb-undo-tablespaces=0 , create sst user, table and some data, and finally start node2 with innodb-undo-tablespaces=3 it does fail wihtout the code change. If I use mtr even if I empty node2 datadir totally, I do not see the failure.

            janlindstrom, to review this, I need the exact steps how to reproduce the error. Please update https://github.com/MariaDB/server/pull/3122 with the details.

            marko Marko Mäkelä added a comment - janlindstrom , to review this, I need the exact steps how to reproduce the error. Please update https://github.com/MariaDB/server/pull/3122 with the details.
            janlindstrom Jan Lindström made changes -
            Attachment node1.cnf [ 73323 ]
            Attachment node2.cnf [ 73324 ]

            marko I added steps to pull request. My 3rd test case using only one node and wsrep-recover did not fail when undo tablespaces were increased, I do not know why.

            janlindstrom Jan Lindström added a comment - marko I added steps to pull request. My 3rd test case using only one node and wsrep-recover did not fail when undo tablespaces were increased, I do not know why.
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            I tried to run the steps to reproduce this, but the steps are incomplete. First, I had to create a node2 subdirectory before starting the server using a modified copy of node2.cnf, to avoid the following:

            2024-04-04  9:53:02 0 [Warning] Can't create test file '/dev/shm/galeratest/node2/HOSTNAME.lower-test' (Errcode: 2 "No such file or directory")
            /dev/shm/11.0/sql/mariadbd: Can't change dir to '/dev/shm/galeratest/node2/' (Errcode: 2 "No such file or directory")
            

            Then, the code that the patch would cover would not run because of the following error:

            2024-04-04  9:57:15 0 [Note] WSREP: Joiner monitor thread started to monitor
            sh: 1: wsrep_sst_mariabackup: not found
            

            A third attempt was not much better:

            PATH=/dev/shm/11.0/scripts:/dev/shm/11.0/extra/mariabackup:"$PATH" /dev/shm/11.0/sql/mariadbd --defaults-file=/dev/shm/node2.cnf
            

            2024-04-04 10:01:04 0 [Warning] WSREP: 1.0 (HOSTNAME): State transfer to 0.0 (HOSTNAME) failed: -2 (No such file or directory)
            2024-04-04 10:01:04 0 [ERROR] WSREP: ./gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1172: Will never receive state. Need to abort.
            

            Can you provide a self-contained test case?

            marko Marko Mäkelä added a comment - I tried to run the steps to reproduce this, but the steps are incomplete. First, I had to create a node2 subdirectory before starting the server using a modified copy of node2.cnf , to avoid the following: 2024-04-04 9:53:02 0 [Warning] Can't create test file '/dev/shm/galeratest/node2/HOSTNAME.lower-test' (Errcode: 2 "No such file or directory") /dev/shm/11.0/sql/mariadbd: Can't change dir to '/dev/shm/galeratest/node2/' (Errcode: 2 "No such file or directory") Then, the code that the patch would cover would not run because of the following error: 2024-04-04 9:57:15 0 [Note] WSREP: Joiner monitor thread started to monitor sh: 1: wsrep_sst_mariabackup: not found A third attempt was not much better: PATH=/dev/shm/11.0/scripts:/dev/shm/11.0/extra/mariabackup:"$PATH" /dev/shm/11.0/sql/mariadbd --defaults-file=/dev/shm/node2.cnf 2024-04-04 10:01:04 0 [Warning] WSREP: 1.0 (HOSTNAME): State transfer to 0.0 (HOSTNAME) failed: -2 (No such file or directory) 2024-04-04 10:01:04 0 [ERROR] WSREP: ./gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():1172: Will never receive state. Need to abort. Can you provide a self-contained test case?
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            marko Updated my instructions but it depends your setup so can't give test case that will work on your laptop. Your problem most likely is that it can't find necessary binaries i.e. you did not have 10.11 installed.

            janlindstrom Jan Lindström added a comment - marko Updated my instructions but it depends your setup so can't give test case that will work on your laptop. Your problem most likely is that it can't find necessary binaries i.e. you did not have 10.11 installed.
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            I do not (want to) have a VM or Docker setup where I could install ephemeral packages without a risk of breaking my environment. I was thinking if this could be reproduced simply by something like the following:

            diff --git a/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf b/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf
            index 857a4101406..43638ace9b3 100644
            --- a/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf
            +++ b/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf
            @@ -7,9 +7,13 @@ wsrep_debug=1
             
             [mysqld.1]
             wsrep_provider_options='base_port=@mysqld.1.#galera_port;gcache.size=1;pc.ignore_sb=true'
            +innodb_fast_shutdown=0
            +innodb_undo_tablespaces=0
             
             [mysqld.2]
             wsrep_provider_options='base_port=@mysqld.2.#galera_port;gcache.size=1;pc.ignore_sb=true'
            +innodb_fast_shutdown=0
            +innodb_undo_tablespaces=3
             
             [sst]
             transferfmt=@ENV.MTR_GALERA_TFMT
            

            With the above modification to the test, I successfully reproduced an error:

            10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the above test patch

            CURRENT_TEST: galera.galera_sst_mariabackup
            mysqltest: In included file "./include/galera_wait_ready.inc": 
            included from /mariadb/10.11/mysql-test/suite/galera/include/start_mysqld.inc at line 16:
            included from ./suite/galera/include/galera_st_kill_slave.inc at line 63:
            included from /mariadb/10.11/mysql-test/suite/galera/t/galera_sst_mariabackup.test at line 15:
            At line 28: "Server did not transition to READY state"
            

            If I apply your patch, the test fails with just a cosmetic error, which I suppose could be fixed by adding an extra restart before the first work step.

            10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the 2 patches

            galera.galera_sst_mariabackup 'debug,innodb' [ fail ]  Found warnings/errors in server log file!
                    Test ended at 2024-04-04 17:45:21
            line
            2024-04-04 17:44:40 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=3 because previous shutdown was not with innodb_fast_shutdown=0
            2024-04-04 17:45:15 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=3 because previous shutdown was not with innodb_fast_shutdown=0
            ^ Found warnings in /dev/shm/10.11/mysql-test/var/log/mysqld.2.err
            

            Please update the pull request with this idea and rebase it to the latest 10.11, so that we can see it in action on our CI systems.

            marko Marko Mäkelä added a comment - I do not (want to) have a VM or Docker setup where I could install ephemeral packages without a risk of breaking my environment. I was thinking if this could be reproduced simply by something like the following: diff --git a/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf b/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf index 857a4101406..43638ace9b3 100644 --- a/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf +++ b/mysql-test/suite/galera/t/galera_sst_mariabackup.cnf @@ -7,9 +7,13 @@ wsrep_debug=1 [mysqld.1] wsrep_provider_options='base_port=@mysqld.1.#galera_port;gcache.size=1;pc.ignore_sb=true' +innodb_fast_shutdown=0 +innodb_undo_tablespaces=0 [mysqld.2] wsrep_provider_options='base_port=@mysqld.2.#galera_port;gcache.size=1;pc.ignore_sb=true' +innodb_fast_shutdown=0 +innodb_undo_tablespaces=3 [sst] transferfmt=@ENV.MTR_GALERA_TFMT With the above modification to the test, I successfully reproduced an error: 10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the above test patch CURRENT_TEST: galera.galera_sst_mariabackup mysqltest: In included file "./include/galera_wait_ready.inc": included from /mariadb/10.11/mysql-test/suite/galera/include/start_mysqld.inc at line 16: included from ./suite/galera/include/galera_st_kill_slave.inc at line 63: included from /mariadb/10.11/mysql-test/suite/galera/t/galera_sst_mariabackup.test at line 15: At line 28: "Server did not transition to READY state" If I apply your patch, the test fails with just a cosmetic error, which I suppose could be fixed by adding an extra restart before the first work step. 10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the 2 patches galera.galera_sst_mariabackup 'debug,innodb' [ fail ] Found warnings/errors in server log file! Test ended at 2024-04-04 17:45:21 line 2024-04-04 17:44:40 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=3 because previous shutdown was not with innodb_fast_shutdown=0 2024-04-04 17:45:15 0 [Warning] InnoDB: Cannot change innodb_undo_tablespaces=3 because previous shutdown was not with innodb_fast_shutdown=0 ^ Found warnings in /dev/shm/10.11/mysql-test/var/log/mysqld.2.err Please update the pull request with this idea and rebase it to the latest 10.11, so that we can see it in action on our CI systems.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            marko You are correct on that test does not pass with those settings. However, it does not reproduce original problem i.e. I cant find error message similar to "[ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934". Using manual testing I could reproduce this error. In last test did you verify that InnoDB really did create 3 tablespaces ?

            janlindstrom Jan Lindström added a comment - marko You are correct on that test does not pass with those settings. However, it does not reproduce original problem i.e. I cant find error message similar to " [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934". Using manual testing I could reproduce this error. In last test did you verify that InnoDB really did create 3 tablespaces ?

            marko I asked help from ramesh to create test case using binary tarball.

            janlindstrom Jan Lindström added a comment - marko I asked help from ramesh to create test case using binary tarball.

            I thought that it was obvious that my suggested change to the existing test does cover the scenario. I reran it for you:

            ./mtr galera.galera_sst_mariabackup
            grep 'InnoDB: Reinitializing innodb_undo_tablespaces' var/log/mysqld.*err
            

            10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the 2 patches

            var/log/mysqld.2.err:2024-04-05 16:19:49 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            var/log/mysqld.2.err:2024-04-05 16:20:24 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            

            If you want more information, inject an abort() to somewhere in srv_undo_tablespaces_reinit() or simply use rr record and set a breakpoint in rr replay.

            marko Marko Mäkelä added a comment - I thought that it was obvious that my suggested change to the existing test does cover the scenario. I reran it for you: ./mtr galera.galera_sst_mariabackup grep 'InnoDB: Reinitializing innodb_undo_tablespaces' var/log/mysqld.*err 10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the 2 patches var/log/mysqld.2.err:2024-04-05 16:19:49 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0 var/log/mysqld.2.err:2024-04-05 16:20:24 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0 If you want more information, inject an abort() to somewhere in srv_undo_tablespaces_reinit() or simply use rr record and set a breakpoint in rr replay .
            janlindstrom Jan Lindström made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            Added suggested test-case and rebased to lastest 10.11

            janlindstrom Jan Lindström added a comment - Added suggested test-case and rebased to lastest 10.11
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            janlindstrom As discussed reproduced the issue on debug build

            2024-04-08  8:44:40 0 [Note] /test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg/bin/mariadbd: ready for connections.
            Version: '10.11.7-MariaDB-debug-log'  socket: '/test/gal/node2/node2.sock'  port: 4600  MariaDB Server
            2024-04-08  8:44:40 3 [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:8. Preceding view GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:11
            2024-04-08  8:44:40 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
            2024-04-08  8:44:40 2 [ERROR] WSREP: Application received wrong state: 
                    Received: 00000000-0000-0000-0000-000000000000
                    Required: b10e45b8-f56a-11ee-943e-bae70fe3e57d
            2024-04-08  8:44:40 0 [Note] WSREP: Joiner monitor thread ended with total time 24 sec
            2024-04-08  8:44:40 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
            

            Test case

            BASEDIR=/test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg
            WORKDIR=/test/gal
             
            ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node2/node2.sock shutdown
            ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node1/node1.sock shutdown
             
            rm -rf $WORKDIR/node1 $WORKDIR/node2
             
            ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=${BASEDIR} --datadir=$WORKDIR/node1
            ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=${BASEDIR} --datadir=$WORKDIR/node2
             
            ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n1.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node1 --socket=$WORKDIR/node1/node1.sock --log-error=$WORKDIR/node1/node1.err --wsrep-new-cluster --innodb-undo-tablespaces=0 > $WORKDIR/node1/node1.err 2>&1 &
             
            sleep 10
             
            ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock
            delete from mysql.user where user='';
            create user mariabackup@'%' identified by 'password';
            grant all on *.* to  mariabackup@'%';
            flush privileges;
            \q
             
            ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=0 > $WORKDIR/node2/node2.err 2>&1 &
             
            sleep 10
             
            ${BASEDIR}/bin/mariadb-admin -uroot -S${WORKDIR}/node2/node2.sock shutdown
             
            ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "create table t1 (id int);"
            ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "insert into t1 values(1),(2),(3);"
             
            rm -rf $WORKDIR/node2/*
             
            ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=3 > $WORKDIR/node2/node2.err 2>&1 &
            

            ramesh Ramesh Sivaraman added a comment - janlindstrom As discussed reproduced the issue on debug build 2024-04-08 8:44:40 0 [Note] /test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg/bin/mariadbd: ready for connections. Version: '10.11.7-MariaDB-debug-log' socket: '/test/gal/node2/node2.sock' port: 4600 MariaDB Server 2024-04-08 8:44:40 3 [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:8. Preceding view GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:11 2024-04-08 8:44:40 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1 2024-04-08 8:44:40 2 [ERROR] WSREP: Application received wrong state: Received: 00000000-0000-0000-0000-000000000000 Required: b10e45b8-f56a-11ee-943e-bae70fe3e57d 2024-04-08 8:44:40 0 [Note] WSREP: Joiner monitor thread ended with total time 24 sec 2024-04-08 8:44:40 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required. Test case BASEDIR=/test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg WORKDIR=/test/gal   ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node2/node2.sock shutdown ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node1/node1.sock shutdown   rm -rf $WORKDIR/node1 $WORKDIR/node2   ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal --basedir=${BASEDIR} --datadir=$WORKDIR/node1 ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal --basedir=${BASEDIR} --datadir=$WORKDIR/node2   ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n1.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node1 --socket=$WORKDIR/node1/node1.sock --log-error=$WORKDIR/node1/node1.err --wsrep-new-cluster --innodb-undo-tablespaces=0 > $WORKDIR/node1/node1.err 2>&1 &   sleep 10   ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock delete from mysql.user where user=''; create user mariabackup@'%' identified by 'password'; grant all on *.* to mariabackup@'%'; flush privileges; \q   ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=0 > $WORKDIR/node2/node2.err 2>&1 &   sleep 10   ${BASEDIR}/bin/mariadb-admin -uroot -S${WORKDIR}/node2/node2.sock shutdown   ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "create table t1 (id int);" ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "insert into t1 values(1),(2),(3);"   rm -rf $WORKDIR/node2/*   ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=3 > $WORKDIR/node2/node2.err 2>&1 &
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            janlindstrom Jan Lindström made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2024-04-11 09:33:23.0 2024-04-11 09:33:23.005
            marko Marko Mäkelä made changes -
            Fix Version/s 10.11.8 [ 29630 ]
            Fix Version/s 11.0.6 [ 29628 ]
            Fix Version/s 11.1.5 [ 29629 ]
            Fix Version/s 11.2.4 [ 29631 ]
            Fix Version/s 11.4.2 [ 29633 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.0 [ 28320 ]
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.4 [ 29301 ]
            Assignee Marko Mäkelä [ marko ] Jan Lindström [ JIRAUSER53125 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]

            People

              janlindstrom Jan Lindström
              ihti Ihtisham ul Haq
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.