Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18629

Galera: Rolling upgrade: Upgraded node is stopped with signal 6 on attempt to join the cluster

    Details

      Description

      Upgraded node 3 is stopped with signal 6 on attempt to join the cluster with not yet upgraded nodes.

      This issue was discovered on testing of Rolling Upgrade according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w

      10.4.3-MariaDB-debug built from sources: commit 3f154943db8bc4135fb3c60b8a74f926b65a040b
      galera4 lib: commit 9cdbeb86c330b808571b14270e6428accb899c58

      Steps:

      1. Start 3 MariaDB 10.3 nodes with mtr:
      1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
      1.1. cd mysql-test
      1.2. "./mtr --suite=galera_3nodes --start-and-exit"

      Actual results:
      3 servers are running:

      Started [mysqld.1 - pid: 8432, winpid: 8432] [mysqld.2 - pid: 8433, winpid: 8433] [mysqld.3 - pid: 8434, winpid: 8434]
      worker[1] Using config for test galera_3nodes.galera_certification_ccc
      worker[1] Port and socket path for server(s):
      worker[1] mysqld.1  16000  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock
      worker[1] mysqld.2  16001  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
      worker[1] mysqld.3  16002  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
       
      $ ps -ef | grep mysqld
      stepan     8432      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan     8433      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan     8434      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.3 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      

      And the ports are used:

      $ sudo netstat -tulpn | grep mysqld
      tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      198749/mysqld
      tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
      tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      198749/mysqld
      tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld
      

      2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:

      2.1. Edit:

      wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
      wsrep_provider=<path to galera 4 library>
      basedir=<10.4 source tree>
      character-sets-dir=<10.4 source tree>/sql/share/charsets
      lc-messages-dir=<10.4 source tree>/sql/share/
      

      2.2. And add there also:

      binlog-format=row
      wsrep_sst_method=rsync
      innodb-autoinc-lock-mode=2
      

      3.1 Load some data.
      3.2. Stop data loading.

      4. Upgrade node 3.

      4.1 Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.2. Make sure that wsrep-on is off:
      sudo vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
      #wsrep-on=1

      4.3. Run 10.4 binaries with 10.3 data:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none

      4.4. Run mysql_upgrade:
      /home/stepan/mariadb/10.4/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002

      Actual result:

      Phase 1/7: Checking and upgrading mysql database
      Processing databases
      mysql
      mysql.column_stats                                 OK
      mysql.columns_priv                                 OK
      mysql.db                                           OK
      mysql.event                                        OK
      mysql.func                                         OK
      mysql.gtid_slave_pos                               OK
      mysql.help_category                                OK
      mysql.help_keyword                                 OK
      mysql.help_relation                                OK
      mysql.help_topic                                   OK
      mysql.host                                         OK
      mysql.index_stats                                  OK
      mysql.innodb_index_stats                           OK
      mysql.innodb_table_stats                           OK
      mysql.plugin                                       OK
      mysql.proc                                         OK
      mysql.procs_priv                                   OK
      mysql.proxies_priv                                 OK
      mysql.roles_mapping                                OK
      mysql.servers                                      OK
      mysql.table_stats                                  OK
      mysql.tables_priv                                  OK
      mysql.time_zone                                    OK
      mysql.time_zone_leap_second                        OK
      mysql.time_zone_name                               OK
      mysql.time_zone_transition                         OK
      mysql.time_zone_transition_type                    OK
      mysql.transaction_registry                         OK
      mysql.user                                         OK
      Phase 2/7: Installing used storage engines... Skipped
      Phase 3/7: Fixing views
      Phase 4/7: Running 'mysql_fix_privilege_tables'
      Phase 5/7: Fixing table and database names
      Phase 6/7: Checking and upgrading tables
      Processing databases
      information_schema
      mtr
      mtr.global_suppressions                            OK
      mtr.test_suppressions                              OK
      performance_schema
      test
      test.t                                             OK
      Phase 7/7: Running 'FLUSH PRIVILEGES'
      OK
      

      4.5. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.6. export PATH=$PATH:/home/stepan/mariadb/10.4/scripts

      5. Check upgraded node 3 without the cluster.

      5.1. Start the server:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      5.2. Start the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      Actual result:
      Server version: 10.4.3-MariaDB-debug-log Source distribution

      5.3. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      6. Try to join node 3 back to the cluster.

      6.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf:

      wsrep-on=1
      

      6.2. Start the server:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      Actual results:

      Upgraded node 3 has stopped:

       ps -ef | grep mysqld
      stepan   198747      1  0 12:21 pts/2    00:00:23 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan   198748      1  0 12:21 pts/2    00:00:23 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan   233635 200752  0 14:44 pts/5    00:00:00 grep --color=auto mysqld
      [stepan@cnt7glr11 10.4]$ sudo netstat -tulpn | grep mysqld
      tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
      tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld
      

      mysqld.3.err (attached Node3_join_mysqld.3.err):

      2019-02-18 14:44:43 0 [Note] /home/stepan/mariadb/10.4/sql/mysqld: ready for connections.
      Version: '10.4.3-MariaDB-debug-log'  socket: '/home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock'  port: 16002  Source distribution
      mysqld: galera/src/monitor.hpp:219: void galera::Monitor<C>::self_cancel(C&) [with C = galera::ReplicatorSMM::ApplyOrder]: Assertion `obj_seqno > last_left_' failed.
      190218 14:44:43 [ERROR] mysqld got signal 6 ;
      

      7. Restart the node 3.
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      Actual result:
      The node 1 and 2 have stopped:

      mysqld.2.err (attached Node3_restart_mysqld.2.err):

      2019-02-18 15:16:05 1 [Note] WSREP: New cluster view: global state: 7e61d751-2fa7-11e9-87ba-63ada45808dc:397, view# 7: Primary, number of nodes: 3, my index: 0, protocol version 3
      2019-02-18 15:16:05 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-02-18 15:16:05 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
      2019-02-18 15:16:05 1 [Note] WSREP: Assign initial position for certification: 397, protocol version: 4
      2019-02-18 15:16:05 0 [Note] WSREP: Service thread queue flushed.
      mysqld: gcs/src/gcs_group.cpp:1186: int gcs_group_find_donor(const gcs_group_t*, int, int, const char*, int, const gu_uuid_t*, gcs_seqno_t): Assertion `ist_seqno != GCS_SEQNO_ILL' failed.
      190218 15:16:05 [ERROR] mysqld got signal 6 ;
      

      Expected:
      Upgraded node 3 successfully continue to operate being joined to the cluster with not yet upgraded nodes.

      Other logs and config files are also attached.

        Attachments

        1. my.cnf
          8 kB
        2. mysqld.3.cnf
          1 kB
        3. Node3_join_mysqld.1.err
          26 kB
        4. Node3_join_mysqld.2.err
          21 kB
        5. Node3_join_mysqld.3.err
          42 kB
        6. Node3_restart_mysqld.1.err
          30 kB
        7. Node3_restart_mysqld.2.err
          26 kB
        8. Node3_restart_mysqld.3.err
          53 kB

          Issue Links

            Activity

              People

              • Assignee:
                stepan.patryshev Stepan Patryshev
                Reporter:
                stepan.patryshev Stepan Patryshev
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: