Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19983

Galera: Rolling upgrade: Upgraded node 2 cannot connect to the cluster on rows adding, updating, and removing during upgrade

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.4.7
    • Fix Version/s: 10.4.9
    • Component/s: Galera
    • Labels:
    • Environment:
      CentOS Linux release 7.6.1810 (Core)

      Description


      Galera: Rolling upgrade: Upgraded node 2 cannot connect to the cluster on rows adding, updating, and removing during upgrade.

      This issue was discovered on testing of Rolling Upgrade according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w

      All binaries are non-debug built from sources:

      MariaDB Server 10.4: branch 10.4, commit 9d6b601e797dd8333340dadaefae09ebafc787db.
      Galera Lib4: branch mariadb-4.x, commit ba337dd0ac281a5e9f29c652a890bd7ad2ac464e.

      MariaDB Server 10.3: branch 10.3, commit 099007c3c92d1405625777fa86d2fba3da1d339c.
      Galera Lib3: branch mariadb-3.x, commit 227e96e457acb60037450bc1e81c45594782e906.

      Steps:

      1. Start 3 MariaDB 10.3 nodes with mtr:
      1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
      1.1. cd mysql-test
      1.2. "./mtr --suite=galera_3nodes --start-and-exit"

      NODE 3 UPGRADE:

      2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:

      2.1. Edit:

      [mysqld]
      wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
      wsrep_provider=<path to galera 4 library>
      basedir=<10.4 source tree>
      character-sets-dir=<10.4 source tree>/sql/share/charsets
      lc-messages-dir=<10.4 source tree>/sql/share/
      

      2.2. And add there also:

      binlog-format=row
      wsrep_sst_method=rsync
      innodb-autoinc-lock-mode=2
      

      3. Load test data.

      3.1. Run the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock

      3.2. Create test table:
      use test;
      create table t (i int primary key auto_increment, j int);

      3.3. Load some data for each node:
      watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16000 -e'insert into test.t(j) values(111)' "
      watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16001 -e'insert into test.t(j) values(222)' "
      watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16002 -e'insert into test.t(j) values(333)' "

      4. Upgrade node 3.

      4.1. Stop data loading for the node 3 (-P16002).

      4.2 Stop the Server on node 3:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.3. Remove some rows on the node 1:
      delete from t where i < 10;

      4.4. Update a row on the node 1:
      update t set j = 10001 where i = 10;

      4.5. Make sure that wsrep-on is off:
      vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
      #wsrep-on=1

      4.6. Run 10.4 binaries with 10.3 data:
      /home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none

      4.7. Run mysql_upgrade:
      /home/stepan/galera/git/10.4/server/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002

      4.8. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.9. export PATH=$PATH:/home/stepan/galera/git/10.4/server/scripts

      5. Check upgraded node 3 without the cluster.

      5.1. Start the server:
      /home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      5.2. Start the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      Actual result:
      Server version: 10.4.7-MariaDB-log Source distribution

      5.3. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      6. Join node 3 back to the cluster.

      6.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf:

      wsrep-on=1
      

      6.2. Start the server:
      /home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      6.3. Check the ports:

      $ sudo netstat -tulpn | grep mysqld
      tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      7328/mysqld
      tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      10501/mysqld
      tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      7327/mysqld
      tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      7328/mysqld
      tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      10501/mysqld
      tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      7327/mysqld
      

      6.4. Start the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      6.5. Check WSREP cluster status:

      SHOW GLOBAL STATUS LIKE 'wsrep_cluster_%';
      +----------------------------+----------------------+
      | Variable_name              | Value                |
      +----------------------------+----------------------+
      | wsrep_cluster_weight       | 3                    |
      | wsrep_cluster_capabilities |                      |
      | wsrep_cluster_conf_id      | 18446744073709551615 |
      | wsrep_cluster_size         | 0                    |
      | wsrep_cluster_state_uuid   |                      |
      | wsrep_cluster_status       | Primary              |
      +----------------------------+----------------------+
      

      6.6. Check the presence of the new 10.4 tables:

      use mysql;
      show tables LIKE 'wsrep_%';
      +---------------------------+
      | Tables_in_mysql (wsrep_%) |
      +---------------------------+
      | wsrep_cluster             |
      | wsrep_cluster_members     |
      | wsrep_streaming_log       |
      +---------------------------+
      

      Actual result:
      Upgraded node 3 has joined the cluster successfully.

      NODE 2 UPGRADE:

      7. Create [mysqld.2] the same way as described on the step 2.

      8.1. Load some data for each node:
      watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16000 -e'insert into test.t(j) values(111)' "
      watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16001 -e'insert into test.t(j) values(222)' "
      watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16002 -e'insert into test.t(j) values(333)' "

      9. Upgrade node 2.

      9.1. Stop data loading for the node 2 (-P16001).

      9.2 Stop the Server on node 2:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock

      9.3. Remove some rows on the node 1:
      delete from t where i < 20;

      9.4. Update a row on the node 1:
      update t set j = 20002 where i = 20;

      9.5. Make sure that wsrep-on is off:
      vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf
      #wsrep-on=1

      9.6. Run 10.4 binaries with 10.3 data:
      /home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf --wsrep_provider=none

      9.7. Run mysql_upgrade:
      /home/stepan/galera/git/10.4/server/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf -uroot -h0 -P16001

      9.8. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock

      9.9. export PATH=$PATH:/home/stepan/galera/git/10.4/server/scripts

      10. Check upgraded node 2 without the cluster.

      10.1. Start the server:
      /home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf

      10.2. Start the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
      Actual result:
      Server version: 10.4.7-MariaDB-log Source distribution

      10.3. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock

      11. Join node 2 back to the cluster.

      11.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf:

      wsrep-on=1
      

      11.2. Start the server:
      /home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf

      Actual results:

      11.3. The port 16006 is not used:

      $ sudo netstat -tulpn | grep mysqld
      tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      10501/mysqld
      tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      7327/mysqld
      tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      14576/mysqld
      tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      10501/mysqld
      tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      7327/mysqld
      

      11.4. Run the client on node 2:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock

      11.5. Node 2 is NOT connected to the cluster:

      SHOW GLOBAL STATUS LIKE 'wsrep_cluster_%';
      +----------------------------+--------------------------------------+
      | Variable_name              | Value                                |
      +----------------------------+--------------------------------------+
      | wsrep_cluster_capabilities |                                      |
      | wsrep_cluster_conf_id      | 18446744073709551615                 |
      | wsrep_cluster_size         | 0                                    |
      | wsrep_cluster_state_uuid   | e3c49d03-a155-11e9-80eb-a614fcadb109 |
      | wsrep_cluster_status       | Disconnected                         |
      +----------------------------+--------------------------------------+
      

      11.6. Data from the test table cannot be selected:

      select * from t;
      ERROR 1047 (08S01): WSREP has not yet prepared node for application use
      

      11.7. mysqld.2.err: [ERROR] WSREP: Failed to JOIN the cluster after SST:

      2019-07-08 11:49:36 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2703)
      2019-07-08 11:49:36 3 [Note] WSREP: Requesting state transfer: success, donor: 0
      2019-07-08 11:49:36 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104
      2019-07-08 11:49:36 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2703 last received: -1
      [...]
      2019-07-08 11:49:38 4 [Warning] WSREP: View recovered from stable storage was empty. If the server is doing rolling upgrade from previous version which does not support storing view info into stable storage, this is ok. Otherwise this may be a sign of malfunction.
      2019-07-08 11:49:38 12 [Note] WSREP: Cluster table is empty, not recovering transactions
      2019-07-08 11:49:38 4 [Note] WSREP: SST received: e3c49d03-a155-11e9-80eb-a614fcadb109:1523
      2019-07-08 11:49:38 3 [Warning] WSREP: moving position backwards: 2703 -> 1523
      [...]
      2019-07-08 11:49:38 0 [Note] /home/stepan/galera/git/10.4/server/sql/mysqld: ready for connections.
      Version: '10.4.7-MariaDB-log'  socket: '/home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock'  port: 16001  Source distribution
      2019-07-08 11:49:38 3 [Note] WSREP: Receiving IST: 1180 writesets, seqnos 1524-2703
      2019-07-08 11:49:38 3 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
      	 at galera/src/replicator_smm.hpp:pop_front():320. Null event.
      [...]
      2019-07-08 11:49:38 3 [Note] WSREP: IST received: e3c49d03-a155-11e9-80eb-a614fcadb109:-1
      2019-07-08 11:49:38 3 [Warning] WSREP: Sending JOIN failed: -103 (Software caused connection abort). Will retry in new primary component.
      2019-07-08 11:49:38 3 [ERROR] WSREP: Failed to JOIN the cluster after SST
      

      11.8. mysqld.3.err:

      2019-07-08 11:49:36 0 [Note] WSREP: async IST sender starting to serve tcp://127.0.0.1:16007 sending 1524-157
      2019-07-08 11:49:36 0 [ERROR] WSREP: async IST sender failed to serve tcp://127.0.0.1:16007: sender send first greater than last: 1524 > 157: 22 (Invalid argument)
      	 at galera/src/ist.cpp:send():783
      

      Expected result:
      Upgraded node 2 successfully connects to the cluster on rows adding, updating, and removing during upgrade.

      Other logs are also attached.

        Attachments

        1. stdout.log
          1 kB
        2. mysqld.3.err
          51 kB
        3. mysqld.3.cnf
          1 kB
        4. mysqld.2.err
          49 kB
        5. mysqld.2.cnf
          1 kB
        6. mysqld.1.err
          29 kB
        7. my.cnf
          9 kB

          Issue Links

            Activity

              People

              Assignee:
              stepan.patryshev Stepan Patryshev
              Reporter:
              stepan.patryshev Stepan Patryshev
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: