Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18699

Galera: Rolling upgrade: Upgraded node is stopped on commit if wsrep_trx_fragment_size > 0

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.3
    • 10.4.5
    • Galera
    • CentOS Linux release 7.6.1810 (Core)

    Description


      Galera: Rolling upgrade: Upgraded with 10.4 node is stopped with signal 6 on commit being joined to the cluster with not yet upgraded nodes if wsrep_trx_fragment_size > 0.

      This issue was discovered on testing of Rolling Upgrade according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w

      10.4.3-MariaDB-debug built from sources: commit f0b65102b23f006f596eef35e6e5f4f8b6d8146d
      galera4 lib: Galera 26.4.0, commit 9cdbeb86c330b808571b14270e6428accb899c58

      Steps:

      1. Start 3 MariaDB 10.3 nodes with mtr:
      1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
      1.1. cd mysql-test
      1.2. "./mtr --suite=galera_3nodes --start-and-exit"

      2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:

      2.1. Edit:

      wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
      wsrep_provider=<path to galera 4 library>
      basedir=<10.4 source tree>
      character-sets-dir=<10.4 source tree>/sql/share/charsets
      lc-messages-dir=<10.4 source tree>/sql/share/
      

      2.2. And add there also:

      binlog-format=row
      wsrep_sst_method=rsync
      innodb-autoinc-lock-mode=2
      

      3.1 Load some data.
      3.2. Stop data loading.

      4. Upgrade node 3.

      4.1 Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.2. Make sure that wsrep-on is off:
      sudo vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
      #wsrep-on=1

      4.3. Run 10.4 binaries with 10.3 data:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none

      4.4. Run mysql_upgrade:
      /home/stepan/mariadb/10.4/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002

      4.5. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.6. export PATH=$PATH:/home/stepan/mariadb/10.4/scripts

      5. Check upgraded node 3 without the cluster.

      5.1. Start the server:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      5.2. Start the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      Actual result:
      Server version: 10.4.3-MariaDB-debug-log Source distribution

      5.3. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      6. Join node 3 back to the cluster.

      6.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf:

      wsrep-on=1
      

      6.2. Start the server:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      7. Check how streaming replication behaves on partially upgraded cluster.

      7.1. Run clients for all three nodes:

      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      7.2. Check with wsrep_trx_fragment_size by default.

      7.2.1. On the Node 3:

      START TRANSACTION;
      update t set j = 28700 where i = 287;
      update t set j = 28900 where i = 289;
      

      Actual result:
      The rows which have been updated on the node 3 have not been yet updated on the nodes 1 and 2.

      7.2.2. On the Node 3:

      commit;
      

      Actual result:
      The rows which have been updated on the node 3 have been updated on the nodes 1 and 2 only after commit!

      7.3. Check with wsrep_trx_fragment_size > 0.

      7.3.1. Set wsrep_trx_fragment_size > 0 on the Node 3:

      SET SESSION wsrep_trx_fragment_size = 1;
      Query OK, 0 rows affected (0.000 sec)
       
      MariaDB [test]> SHOW VARIABLES LIKE 'wsrep_trx%';
      +-------------------------+-------+
      | Variable_name           | Value |
      +-------------------------+-------+
      | wsrep_trx_fragment_size | 1     |
      | wsrep_trx_fragment_unit | bytes |
      +-------------------------+-------+
      

      7.3.2. On the Node 3:

      START TRANSACTION;
      update t set j = 28300 where i = 283;
      

      Actual result:
      The row which has been updated on the node 3 has been updated on the nodes 1 and 2 without commit!

      7.3.3. On the Node 3:

      commit;
      

      Actual result:

      The node 3 has stopped:
      Client:

      ERROR 2013 (HY000): Lost connection to MySQL server during query

      mysqld.3.err:

      190222 20:57:14 [ERROR] mysqld got signal 6 ;

      Expected result:
      Upgraded node 3 is NOT stopped on commit being joined to the cluster with not yet upgraded nodes if wsrep_trx_fragment_size > 0.

      Other log and config files are also attached.

      Attachments

        1. my.cnf
          8 kB
        2. mysqld.1.err
          31 kB
        3. mysqld.2.err
          27 kB
        4. mysqld.3.cnf
          1 kB
        5. mysqld.3.err
          60 kB

        Issue Links

          Activity

            People

              seppo Seppo Jaakola
              stepan.patryshev Stepan Patryshev (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.