[MDEV-18629] Galera: Rolling upgrade: Upgraded node is stopped with signal 6 on attempt to join the cluster Created: 2019-02-18  Updated: 2019-02-19  Resolved: 2019-02-18

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.4.3
Fix Version/s: 10.4.3

Type: Bug Priority: Blocker
Reporter: Stepan Patryshev (Inactive) Assignee: Stepan Patryshev (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: galera, galera_4
Environment:

CentOS Linux release 7.6.1810 (Core)


Attachments: File Node3_join_mysqld.1.err     File Node3_join_mysqld.2.err     File Node3_join_mysqld.3.err     File Node3_restart_mysqld.1.err     File Node3_restart_mysqld.2.err     File Node3_restart_mysqld.3.err     File my.cnf     File mysqld.3.cnf    
Issue Links:
Relates
relates to MDEV-18271 Galera 4: test manually rolling upgra... Closed
relates to MDEV-18407 Galera: Rolling upgrade: 10.3 nodes a... Closed
relates to MDEV-18580 Galera: Rolling upgrade: Upgraded nod... Closed

 Description   

Upgraded node 3 is stopped with signal 6 on attempt to join the cluster with not yet upgraded nodes.

This issue was discovered on testing of Rolling Upgrade according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w

10.4.3-MariaDB-debug built from sources: commit 3f154943db8bc4135fb3c60b8a74f926b65a040b
galera4 lib: commit 9cdbeb86c330b808571b14270e6428accb899c58

Steps:

1. Start 3 MariaDB 10.3 nodes with mtr:
1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
1.1. cd mysql-test
1.2. "./mtr --suite=galera_3nodes --start-and-exit"

Actual results:
3 servers are running:

Started [mysqld.1 - pid: 8432, winpid: 8432] [mysqld.2 - pid: 8433, winpid: 8433] [mysqld.3 - pid: 8434, winpid: 8434]
worker[1] Using config for test galera_3nodes.galera_certification_ccc
worker[1] Port and socket path for server(s):
worker[1] mysqld.1  16000  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock
worker[1] mysqld.2  16001  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
worker[1] mysqld.3  16002  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
 
$ ps -ef | grep mysqld
stepan     8432      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan     8433      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan     8434      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.3 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300

And the ports are used:

$ sudo netstat -tulpn | grep mysqld
tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      198749/mysqld
tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      198749/mysqld
tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld

2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:

2.1. Edit:

wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
wsrep_provider=<path to galera 4 library>
basedir=<10.4 source tree>
character-sets-dir=<10.4 source tree>/sql/share/charsets
lc-messages-dir=<10.4 source tree>/sql/share/

2.2. And add there also:

binlog-format=row
wsrep_sst_method=rsync
innodb-autoinc-lock-mode=2

3.1 Load some data.
3.2. Stop data loading.

4. Upgrade node 3.

4.1 Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

4.2. Make sure that wsrep-on is off:
sudo vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
#wsrep-on=1

4.3. Run 10.4 binaries with 10.3 data:
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none

4.4. Run mysql_upgrade:
/home/stepan/mariadb/10.4/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002

Actual result:

Phase 1/7: Checking and upgrading mysql database
Processing databases
mysql
mysql.column_stats                                 OK
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.gtid_slave_pos                               OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.index_stats                                  OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.roles_mapping                                OK
mysql.servers                                      OK
mysql.table_stats                                  OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.transaction_registry                         OK
mysql.user                                         OK
Phase 2/7: Installing used storage engines... Skipped
Phase 3/7: Fixing views
Phase 4/7: Running 'mysql_fix_privilege_tables'
Phase 5/7: Fixing table and database names
Phase 6/7: Checking and upgrading tables
Processing databases
information_schema
mtr
mtr.global_suppressions                            OK
mtr.test_suppressions                              OK
performance_schema
test
test.t                                             OK
Phase 7/7: Running 'FLUSH PRIVILEGES'
OK

4.5. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

4.6. export PATH=$PATH:/home/stepan/mariadb/10.4/scripts

5. Check upgraded node 3 without the cluster.

5.1. Start the server:
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

5.2. Start the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

Actual result:
Server version: 10.4.3-MariaDB-debug-log Source distribution

5.3. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

6. Try to join node 3 back to the cluster.

6.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf:

wsrep-on=1

6.2. Start the server:
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

Actual results:

Upgraded node 3 has stopped:

 ps -ef | grep mysqld
stepan   198747      1  0 12:21 pts/2    00:00:23 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan   198748      1  0 12:21 pts/2    00:00:23 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan   233635 200752  0 14:44 pts/5    00:00:00 grep --color=auto mysqld
[stepan@cnt7glr11 10.4]$ sudo netstat -tulpn | grep mysqld
tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld

mysqld.3.err (attached Node3_join_mysqld.3.err):

2019-02-18 14:44:43 0 [Note] /home/stepan/mariadb/10.4/sql/mysqld: ready for connections.
Version: '10.4.3-MariaDB-debug-log'  socket: '/home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock'  port: 16002  Source distribution
mysqld: galera/src/monitor.hpp:219: void galera::Monitor<C>::self_cancel(C&) [with C = galera::ReplicatorSMM::ApplyOrder]: Assertion `obj_seqno > last_left_' failed.
190218 14:44:43 [ERROR] mysqld got signal 6 ;

7. Restart the node 3.
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

Actual result:
The node 1 and 2 have stopped:

mysqld.2.err (attached Node3_restart_mysqld.2.err):

2019-02-18 15:16:05 1 [Note] WSREP: New cluster view: global state: 7e61d751-2fa7-11e9-87ba-63ada45808dc:397, view# 7: Primary, number of nodes: 3, my index: 0, protocol version 3
2019-02-18 15:16:05 1 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-02-18 15:16:05 1 [Note] WSREP: REPL Protocols: 9 (4, 2)
2019-02-18 15:16:05 1 [Note] WSREP: Assign initial position for certification: 397, protocol version: 4
2019-02-18 15:16:05 0 [Note] WSREP: Service thread queue flushed.
mysqld: gcs/src/gcs_group.cpp:1186: int gcs_group_find_donor(const gcs_group_t*, int, int, const char*, int, const gu_uuid_t*, gcs_seqno_t): Assertion `ist_seqno != GCS_SEQNO_ILL' failed.
190218 15:16:05 [ERROR] mysqld got signal 6 ;

Expected:
Upgraded node 3 successfully continue to operate being joined to the cluster with not yet upgraded nodes.

Other logs and config files are also attached.



 Comments   
Comment by Stepan Patryshev (Inactive) [ 2019-02-18 ]

I was testing this with NOT the latest Galera lib.
I've just checked again with the latest lib (commit 9cdbeb86c330b808571b14270e6428accb899c58) and this bug has not reproduced. Closing it.

Generated at Thu Feb 08 08:45:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.