Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32974

Member fails to join due to old seqno in GTID

Details

    Description

      After upgrading initially from 10.6->10.11->11.0.3 and now to 11.0.4 we see that 2 of the members start up without any issues but the 3rd member(db-0) fails to start up due to an old sequence no.(117376) that i believe is being passed on from the donor.

      We are unable to find the old seqno anywhere except in the `ibdata1` file of the donor(by searching for the hex of it). But not sure how to get rid of this old seqno.
      What we did try is starting db-0 with fresh volume by removing K8s PVC for it, so it does a SST but that fails as well.

      Logs from member db-0:

      [Note] WSREP: SST received
      [Note] WSREP: Server status change joiner -> initializing
      [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      [Note] mysqld: Aria engine: starting recovery
      recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1(0.0 seconds); 
      [Note] mysqld: Aria engine: recovery done
      [Note] InnoDB: Compressed tables use zlib 1.2.11
      [Note] InnoDB: Number of transaction pools: 1
      [Note] InnoDB: Using crc32 + pclmulqdq instructions
      [Note] InnoDB: Using Linux native AIO
      [Note] InnoDB: Initializing buffer pool, total size = 2.000GiB, chunk size = 32.000MiB
      [Note] InnoDB: Completed initialization of buffer pool
      [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      [Note] InnoDB: End of log at LSN=180964319
      [Note] InnoDB: Resizing redo log from 12.016KiB to 96.000MiB; LSN=180964319
      [Note] InnoDB: File system buffers for log disabled (block size=512 bytes)
      [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
      [Note] InnoDB: Data file .//undo001 did not exist: new to be created
      [Note] InnoDB: Setting file .//undo001 size to 10.000MiB
      [Note] InnoDB: Database physically writes the file full: wait...
      [Note] InnoDB: Data file .//undo002 did not exist: new to be created
      [Note] InnoDB: Setting file .//undo002 size to 10.000MiB
      [Note] InnoDB: Database physically writes the file full: wait...
      [Note] InnoDB: Data file .//undo003 did not exist: new to be created
      [Note] InnoDB: Setting file .//undo003 size to 10.000MiB
      [Note] InnoDB: Database physically writes the file full: wait...
      [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.
      [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      [Note] InnoDB: log sequence number 180964319; transaction id 73558
      [Note] InnoDB: Loading buffer pool(s) from /bitnami/mariadb/data/ib_buffer_pool
      [Note] InnoDB: Cannot open '/bitnami/mariadb/data/ib_buffer_pool' for reading: No such file or directory
      [Note] Plugin 'FEEDBACK' is disabled.
      [Warning] 'innodb-log-files-in-group' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      [Warning] 'innodb-file-format' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
      [Note] Recovering after a crash using mysql-bin
      [Note] Starting table crash recovery...
      [Note] Crash table recovery finished.
      [Note] Server socket created on IP: '0.0.0.0'.
      [Warning] 'proxies_priv' entry '@% root@db-0' ignored in --skip-name-resolve mode.
      [Note] WSREP: wsrep_init_schema_and_SR (nil)
      [Note] WSREP: Server initialized
      [Note] WSREP: Server status change initializing -> initialized
      [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      [Note] WSREP: Recovered position from storage: 6aa53efc-db72-11ec-880f-a282ce494905:117376
      [Note] WSREP: Starting applier thread 6
      [Note] WSREP: Starting applier thread 7
      [Note] WSREP: Starting applier thread 8
      [Note] WSREP: Recovered view from SST:
        id: 6aa53efc-db72-11ec-880f-a282ce494905:128934
        status: primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(3):
      	0: 46250027-95c8-11ee-9d94-06eb82413a07, db
      	1: 4ed771b5-941f-11ee-82bf-56ec952abcf6, db
      	2: bfa6e688-941f-11ee-97c5-d323f02068c1, db
       
      [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934
      [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
      [Note] WSREP: Joiner monitor thread ended with total time 14 sec
      [ERROR] WSREP: Application received wrong state: 
      	Received: 00000000-0000-0000-0000-000000000000
      	Required: 6aa53efc-db72-11ec-880f-a282ce494905
      [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
      [Note] WSREP: ReplicatorSMM::abort()
      [Note] WSREP: Closing send monitor...
      [Note] WSREP: Closed send monitor.
      [Note] WSREP: gcomm: terminating thread
      [Note] WSREP: gcomm: joining thread
      [Note] WSREP: gcomm: closing backend
      [Note] /opt/bitnami/mariadb/sbin/mysqld: ready for connections.
      Version: '11.0.3-MariaDB-log'  socket: '/opt/bitnami/mariadb/tmp/mysql.sock'  port: 3306  Source distribution
      [Note] WSREP: view(view_id(NON_PRIM,46250027-9d94,6636) memb {
      	46250027-9d94,0
      } joined {
      } left {
      } partitioned {
      	4ed771b5-82bf,0
      	bfa6e688-97c5,0
      })
      [Note] WSREP: PC protocol downgrade 1 -> 0
      [Note] WSREP: view((empty))
      [Note] WSREP: gcomm: closed
      [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      [Note] WSREP: Flow-control interval: [128, 160]
      [Note] WSREP: Received NON-PRIMARY.
      [Note] WSREP: Shifting JOINER -> OPEN (TO: 128934)
      [Note] WSREP: New SELF-LEAVE.
      [Note] WSREP: Flow-control interval: [0, 0]
      [Note] WSREP: Received SELF-LEAVE. Closing connection.
      [Note] WSREP: Shifting OPEN -> CLOSED (TO: 128934)
      [Note] WSREP: RECV thread exiting 0: Success
      [Note] WSREP: recv_thread() joined.
      [Note] WSREP: Closing replication queue.
      [Note] WSREP: Closing slave action queue.
      [Note] WSREP: /opt/bitnami/mariadb/sbin/mysqld: Terminated.
      

      Attachments

        1. datadir.tgz
          3.67 MB
        2. node1.cnf
          0.8 kB
        3. node2_after_prepare.tgz
          812 kB
        4. node2_before_prepare.tgz
          818 kB
        5. node2.cnf
          0.8 kB

        Issue Links

          Activity

            marko You are correct on that test does not pass with those settings. However, it does not reproduce original problem i.e. I cant find error message similar to "[ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934". Using manual testing I could reproduce this error. In last test did you verify that InnoDB really did create 3 tablespaces ?

            janlindstrom Jan Lindström added a comment - marko You are correct on that test does not pass with those settings. However, it does not reproduce original problem i.e. I cant find error message similar to " [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: 6aa53efc-db72-11ec-880f-a282ce494905:117376. Preceding view GTID: 6aa53efc-db72-11ec-880f-a282ce494905:128934". Using manual testing I could reproduce this error. In last test did you verify that InnoDB really did create 3 tablespaces ?

            marko I asked help from ramesh to create test case using binary tarball.

            janlindstrom Jan Lindström added a comment - marko I asked help from ramesh to create test case using binary tarball.

            I thought that it was obvious that my suggested change to the existing test does cover the scenario. I reran it for you:

            ./mtr galera.galera_sst_mariabackup
            grep 'InnoDB: Reinitializing innodb_undo_tablespaces' var/log/mysqld.*err
            

            10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the 2 patches

            var/log/mysqld.2.err:2024-04-05 16:19:49 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            var/log/mysqld.2.err:2024-04-05 16:20:24 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0
            

            If you want more information, inject an abort() to somewhere in srv_undo_tablespaces_reinit() or simply use rr record and set a breakpoint in rr replay.

            marko Marko Mäkelä added a comment - I thought that it was obvious that my suggested change to the existing test does cover the scenario. I reran it for you: ./mtr galera.galera_sst_mariabackup grep 'InnoDB: Reinitializing innodb_undo_tablespaces' var/log/mysqld.*err 10.11 af4df93cf855228f094f2a19f7dd0bdc005035cf with the 2 patches var/log/mysqld.2.err:2024-04-05 16:19:49 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0 var/log/mysqld.2.err:2024-04-05 16:20:24 0 [Note] InnoDB: Reinitializing innodb_undo_tablespaces= 3 from 0 If you want more information, inject an abort() to somewhere in srv_undo_tablespaces_reinit() or simply use rr record and set a breakpoint in rr replay .

            Added suggested test-case and rebased to lastest 10.11

            janlindstrom Jan Lindström added a comment - Added suggested test-case and rebased to lastest 10.11

            janlindstrom As discussed reproduced the issue on debug build

            2024-04-08  8:44:40 0 [Note] /test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg/bin/mariadbd: ready for connections.
            Version: '10.11.7-MariaDB-debug-log'  socket: '/test/gal/node2/node2.sock'  port: 4600  MariaDB Server
            2024-04-08  8:44:40 3 [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:8. Preceding view GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:11
            2024-04-08  8:44:40 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
            2024-04-08  8:44:40 2 [ERROR] WSREP: Application received wrong state: 
                    Received: 00000000-0000-0000-0000-000000000000
                    Required: b10e45b8-f56a-11ee-943e-bae70fe3e57d
            2024-04-08  8:44:40 0 [Note] WSREP: Joiner monitor thread ended with total time 24 sec
            2024-04-08  8:44:40 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.
            

            Test case

            BASEDIR=/test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg
            WORKDIR=/test/gal
             
            ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node2/node2.sock shutdown
            ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node1/node1.sock shutdown
             
            rm -rf $WORKDIR/node1 $WORKDIR/node2
             
            ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=${BASEDIR} --datadir=$WORKDIR/node1
            ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal  --basedir=${BASEDIR} --datadir=$WORKDIR/node2
             
            ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n1.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node1 --socket=$WORKDIR/node1/node1.sock --log-error=$WORKDIR/node1/node1.err --wsrep-new-cluster --innodb-undo-tablespaces=0 > $WORKDIR/node1/node1.err 2>&1 &
             
            sleep 10
             
            ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock
            delete from mysql.user where user='';
            create user mariabackup@'%' identified by 'password';
            grant all on *.* to  mariabackup@'%';
            flush privileges;
            \q
             
            ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=0 > $WORKDIR/node2/node2.err 2>&1 &
             
            sleep 10
             
            ${BASEDIR}/bin/mariadb-admin -uroot -S${WORKDIR}/node2/node2.sock shutdown
             
            ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "create table t1 (id int);"
            ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "insert into t1 values(1),(2),(3);"
             
            rm -rf $WORKDIR/node2/*
             
            ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=3 > $WORKDIR/node2/node2.err 2>&1 &
            

            ramesh Ramesh Sivaraman added a comment - janlindstrom As discussed reproduced the issue on debug build 2024-04-08 8:44:40 0 [Note] /test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg/bin/mariadbd: ready for connections. Version: '10.11.7-MariaDB-debug-log' socket: '/test/gal/node2/node2.sock' port: 4600 MariaDB Server 2024-04-08 8:44:40 3 [ERROR] WSREP: sst_received failed: SST script passed bogus GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:8. Preceding view GTID: b10e45b8-f56a-11ee-943e-bae70fe3e57d:11 2024-04-08 8:44:40 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1 2024-04-08 8:44:40 2 [ERROR] WSREP: Application received wrong state: Received: 00000000-0000-0000-0000-000000000000 Required: b10e45b8-f56a-11ee-943e-bae70fe3e57d 2024-04-08 8:44:40 0 [Note] WSREP: Joiner monitor thread ended with total time 24 sec 2024-04-08 8:44:40 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required. Test case BASEDIR=/test/gal/GAL_MD010224-mariadb-10.11.7-linux-x86_64-dbg WORKDIR=/test/gal   ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node2/node2.sock shutdown ${BASEDIR}/bin/mysqladmin -uroot -S${WORKDIR}/node1/node1.sock shutdown   rm -rf $WORKDIR/node1 $WORKDIR/node2   ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal --basedir=${BASEDIR} --datadir=$WORKDIR/node1 ${BASEDIR}/scripts/mariadb-install-db --no-defaults --force --auth-root-authentication-method=normal --basedir=${BASEDIR} --datadir=$WORKDIR/node2   ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n1.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node1 --socket=$WORKDIR/node1/node1.sock --log-error=$WORKDIR/node1/node1.err --wsrep-new-cluster --innodb-undo-tablespaces=0 > $WORKDIR/node1/node1.err 2>&1 &   sleep 10   ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock delete from mysql.user where user=''; create user mariabackup@'%' identified by 'password'; grant all on *.* to mariabackup@'%'; flush privileges; \q   ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=0 > $WORKDIR/node2/node2.err 2>&1 &   sleep 10   ${BASEDIR}/bin/mariadb-admin -uroot -S${WORKDIR}/node2/node2.sock shutdown   ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "create table t1 (id int);" ${BASEDIR}/bin/mariadb -A -uroot -S$WORKDIR/node1/node1.sock test -e "insert into t1 values(1),(2),(3);"   rm -rf $WORKDIR/node2/*   ${BASEDIR}/bin/mariadbd --defaults-file=${WORKDIR}/n2.cnf --basedir=${BASEDIR} --wsrep-provider=${BASEDIR}/lib/libgalera_smm.so --datadir=$WORKDIR/node2 --socket=$WORKDIR/node2/node2.sock --log-error=$WORKDIR/node2/node2.err --innodb-undo-tablespaces=3 > $WORKDIR/node2/node2.err 2>&1 &

            People

              janlindstrom Jan Lindström
              ihti Ihtisham ul Haq
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.