Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18629

Galera: Rolling upgrade: Upgraded node is stopped with signal 6 on attempt to join the cluster

Details

    Description

      Upgraded node 3 is stopped with signal 6 on attempt to join the cluster with not yet upgraded nodes.

      This issue was discovered on testing of Rolling Upgrade according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w

      10.4.3-MariaDB-debug built from sources: commit 3f154943db8bc4135fb3c60b8a74f926b65a040b
      galera4 lib: commit 9cdbeb86c330b808571b14270e6428accb899c58

      Steps:

      1. Start 3 MariaDB 10.3 nodes with mtr:
      1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
      1.1. cd mysql-test
      1.2. "./mtr --suite=galera_3nodes --start-and-exit"

      Actual results:
      3 servers are running:

      Started [mysqld.1 - pid: 8432, winpid: 8432] [mysqld.2 - pid: 8433, winpid: 8433] [mysqld.3 - pid: 8434, winpid: 8434]
      worker[1] Using config for test galera_3nodes.galera_certification_ccc
      worker[1] Port and socket path for server(s):
      worker[1] mysqld.1  16000  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock
      worker[1] mysqld.2  16001  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
      worker[1] mysqld.3  16002  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
       
      $ ps -ef | grep mysqld
      stepan     8432      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan     8433      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan     8434      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.3 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      

      And the ports are used:

      $ sudo netstat -tulpn | grep mysqld
      tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      198749/mysqld
      tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
      tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      198749/mysqld
      tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld
      

      2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:

      2.1. Edit:

      wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
      wsrep_provider=<path to galera 4 library>
      basedir=<10.4 source tree>
      character-sets-dir=<10.4 source tree>/sql/share/charsets
      lc-messages-dir=<10.4 source tree>/sql/share/
      

      2.2. And add there also:

      binlog-format=row
      wsrep_sst_method=rsync
      innodb-autoinc-lock-mode=2
      

      3.1 Load some data.
      3.2. Stop data loading.

      4. Upgrade node 3.

      4.1 Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.2. Make sure that wsrep-on is off:
      sudo vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
      #wsrep-on=1

      4.3. Run 10.4 binaries with 10.3 data:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none

      4.4. Run mysql_upgrade:
      /home/stepan/mariadb/10.4/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002

      Actual result:

      Phase 1/7: Checking and upgrading mysql database
      Processing databases
      mysql
      mysql.column_stats                                 OK
      mysql.columns_priv                                 OK
      mysql.db                                           OK
      mysql.event                                        OK
      mysql.func                                         OK
      mysql.gtid_slave_pos                               OK
      mysql.help_category                                OK
      mysql.help_keyword                                 OK
      mysql.help_relation                                OK
      mysql.help_topic                                   OK
      mysql.host                                         OK
      mysql.index_stats                                  OK
      mysql.innodb_index_stats                           OK
      mysql.innodb_table_stats                           OK
      mysql.plugin                                       OK
      mysql.proc                                         OK
      mysql.procs_priv                                   OK
      mysql.proxies_priv                                 OK
      mysql.roles_mapping                                OK
      mysql.servers                                      OK
      mysql.table_stats                                  OK
      mysql.tables_priv                                  OK
      mysql.time_zone                                    OK
      mysql.time_zone_leap_second                        OK
      mysql.time_zone_name                               OK
      mysql.time_zone_transition                         OK
      mysql.time_zone_transition_type                    OK
      mysql.transaction_registry                         OK
      mysql.user                                         OK
      Phase 2/7: Installing used storage engines... Skipped
      Phase 3/7: Fixing views
      Phase 4/7: Running 'mysql_fix_privilege_tables'
      Phase 5/7: Fixing table and database names
      Phase 6/7: Checking and upgrading tables
      Processing databases
      information_schema
      mtr
      mtr.global_suppressions                            OK
      mtr.test_suppressions                              OK
      performance_schema
      test
      test.t                                             OK
      Phase 7/7: Running 'FLUSH PRIVILEGES'
      OK
      

      4.5. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      4.6. export PATH=$PATH:/home/stepan/mariadb/10.4/scripts

      5. Check upgraded node 3 without the cluster.

      5.1. Start the server:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      5.2. Start the client:
      /home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      Actual result:
      Server version: 10.4.3-MariaDB-debug-log Source distribution

      5.3. Stop the Server:
      /home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

      6. Try to join node 3 back to the cluster.

      6.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf:

      wsrep-on=1
      

      6.2. Start the server:
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      Actual results:

      Upgraded node 3 has stopped:

       ps -ef | grep mysqld
      stepan   198747      1  0 12:21 pts/2    00:00:23 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan   198748      1  0 12:21 pts/2    00:00:23 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
      stepan   233635 200752  0 14:44 pts/5    00:00:00 grep --color=auto mysqld
      [stepan@cnt7glr11 10.4]$ sudo netstat -tulpn | grep mysqld
      tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
      tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
      tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld
      

      mysqld.3.err (attached Node3_join_mysqld.3.err):

      2019-02-18 14:44:43 0 [Note] /home/stepan/mariadb/10.4/sql/mysqld: ready for connections.
      Version: '10.4.3-MariaDB-debug-log'  socket: '/home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock'  port: 16002  Source distribution
      mysqld: galera/src/monitor.hpp:219: void galera::Monitor<C>::self_cancel(C&) [with C = galera::ReplicatorSMM::ApplyOrder]: Assertion `obj_seqno > last_left_' failed.
      190218 14:44:43 [ERROR] mysqld got signal 6 ;
      

      7. Restart the node 3.
      /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

      Actual result:
      The node 1 and 2 have stopped:

      mysqld.2.err (attached Node3_restart_mysqld.2.err):

      2019-02-18 15:16:05 1 [Note] WSREP: New cluster view: global state: 7e61d751-2fa7-11e9-87ba-63ada45808dc:397, view# 7: Primary, number of nodes: 3, my index: 0, protocol version 3
      2019-02-18 15:16:05 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-02-18 15:16:05 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
      2019-02-18 15:16:05 1 [Note] WSREP: Assign initial position for certification: 397, protocol version: 4
      2019-02-18 15:16:05 0 [Note] WSREP: Service thread queue flushed.
      mysqld: gcs/src/gcs_group.cpp:1186: int gcs_group_find_donor(const gcs_group_t*, int, int, const char*, int, const gu_uuid_t*, gcs_seqno_t): Assertion `ist_seqno != GCS_SEQNO_ILL' failed.
      190218 15:16:05 [ERROR] mysqld got signal 6 ;
      

      Expected:
      Upgraded node 3 successfully continue to operate being joined to the cluster with not yet upgraded nodes.

      Other logs and config files are also attached.

      Attachments

        1. my.cnf
          8 kB
        2. mysqld.3.cnf
          1 kB
        3. Node3_join_mysqld.1.err
          26 kB
        4. Node3_join_mysqld.2.err
          21 kB
        5. Node3_join_mysqld.3.err
          42 kB
        6. Node3_restart_mysqld.1.err
          30 kB
        7. Node3_restart_mysqld.2.err
          26 kB
        8. Node3_restart_mysqld.3.err
          53 kB

        Issue Links

          Activity

            I was testing this with NOT the latest Galera lib.
            I've just checked again with the latest lib (commit 9cdbeb86c330b808571b14270e6428accb899c58) and this bug has not reproduced. Closing it.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - I was testing this with NOT the latest Galera lib. I've just checked again with the latest lib (commit 9cdbeb86c330b808571b14270e6428accb899c58) and this bug has not reproduced. Closing it.

            People

              stepan.patryshev Stepan Patryshev (Inactive)
              stepan.patryshev Stepan Patryshev (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.