[MDEV-18580] Galera: Rolling upgrade: Upgraded node 3 is stopped with signal 6 after node 2 shutdown Created: 2019-02-14  Updated: 2019-02-18  Resolved: 2019-02-18

Status: Closed
Project: MariaDB Server
Component/s: Galera, Galera SST
Affects Version/s: 10.4.3
Fix Version/s: 10.4.3

Type: Bug Priority: Critical
Reporter: Stepan Patryshev (Inactive) Assignee: Seppo Jaakola
Resolution: Fixed Votes: 0
Labels: galera, galera_4
Environment:

CentOS Linux release 7.6.1810 (Core)


Attachments: File my.cnf     File mysqld.1.err     File mysqld.2.err     File mysqld.3.cnf     File mysqld.3.err    
Issue Links:
Relates
relates to MDEV-18480 Galera: Rolling upgrade: Assertion `x... Closed
relates to MDEV-18629 Galera: Rolling upgrade: Upgraded nod... Closed
relates to MDEV-18271 Galera 4: test manually rolling upgra... Closed
relates to MDEV-18407 Galera: Rolling upgrade: 10.3 nodes a... Closed

 Description   

This issue was discovered on testing of Rolling Upgrade fix of MDEV-18407 according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w

10.4.3-MariaDB-debug built from sources: commit c568e25379600db8af4bd39df4761ba0fbc1a14e
galera4 lib: commit 9cdbeb86c330b808571b14270e6428accb899c58

Steps:

0. Build MariaDB Server 10.3 with Galera 3 and MariaDB Server 10.4 with Galera 4.

0.1. Galera 3.

git clone https://github.com/MariaDB/galera.git galera3
cd galera3
git checkout mariadb-3.x
git submodule init
git submodule update
./scripts/build.sh -d --dl 2
sudo cp libgalera_smm.so /usr/lib/libgalera_smm_3.so

0.2. Server 10.3.

git clone https://github.com/mariadb/server 10.3
cd 10.3
git checkout 10.3
git pull
git clean -d -f -x
cmake . -DCMAKE_BUILD_TYPE=Debug
make -j16

0.3. Galera 4.
The same steps as described for Galera 3 in the p.0.1, but checkout mariadb-4.x branch.
sudo cp libgalera_smm.so /usr/lib/libgalera_smm_4.so

0.4. Server 10.4.
The same steps as described for 10.3 in the p.0.2, but checkout 10.4 branch.

1. Start 3 MariaDB 10.3 nodes with mtr:
1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
1.1. cd mysql-test
1.2. "./mtr --suite=galera_3nodes --start-and-exit"

Actual results:
3 servers are running:

Started [mysqld.1 - pid: 8432, winpid: 8432] [mysqld.2 - pid: 8433, winpid: 8433] [mysqld.3 - pid: 8434, winpid: 8434]
worker[1] Using config for test galera_3nodes.galera_certification_ccc
worker[1] Port and socket path for server(s):
worker[1] mysqld.1  16000  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock
worker[1] mysqld.2  16001  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
worker[1] mysqld.3  16002  /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
 
$ ps -ef | grep mysqld
stepan     8432      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan     8433      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan     8434      1  1 19:46 pts/2    00:00:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.3 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300

And the ports are used:

$ sudo netstat -tulpn | grep mysqld
tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      198749/mysqld
tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      198749/mysqld
tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld

2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:

2.1. Edit:

wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
wsrep_provider=<path to galera 4 library>
basedir=<10.4 source tree>
character-sets-dir=<10.4 source tree>/sql/share/charsets
lc-messages-dir=<10.4 source tree>/sql/share/

2.2. And add there also:

binlog-format=row
wsrep_sst_method=rsync
innodb-autoinc-lock-mode=2

3. Load some data for each node:

3.1. Run the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock

3.2. Create a table:

use test;
MariaDB [test]> create table t (i int primary key auto_increment, j int);

3.3. Load data for some time:

watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16000 -e'insert into test.t(j) values(1)' "
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16001 -e'insert into test.t(j) values(1)' "
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16002 -e'insert into test.t(j) values(1)' "

3.4. Stop loading data.

4. Upgrade node 3.

4.1 Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

4.2. Run 10.4 binaries with 10.3 data:
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none

4.3. Run mysql_upgrade:
/home/stepan/mariadb/10.4/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002

Actual result:

Phase 1/7: Checking and upgrading mysql database
Processing databases
mysql
mysql.column_stats                                 OK
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.gtid_slave_pos                               OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.index_stats                                  OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.roles_mapping                                OK
mysql.servers                                      OK
mysql.table_stats                                  OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.transaction_registry                         OK
mysql.user                                         OK
Phase 2/7: Installing used storage engines... Skipped
Phase 3/7: Fixing views
Phase 4/7: Running 'mysql_fix_privilege_tables'
Phase 5/7: Fixing table and database names
Phase 6/7: Checking and upgrading tables
Processing databases
information_schema
mtr
mtr.global_suppressions                            OK
mtr.test_suppressions                              OK
performance_schema
test
test.t                                             OK
Phase 7/7: Running 'FLUSH PRIVILEGES'
OK

4.4. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

4.5. export PATH=$PATH:/home/stepan/mariadb/10.4/scripts

5. Check upgraded node 3 without the cluster.

5.1. Start the server:
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

5.2. Start the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

Actual result:
Output: "Server version: 10.4.3-MariaDB-debug-log Source distribution"

5.3. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

6. Join node 3 back to the cluster.

6.1. Add to mysqld.3.cnf:

wsrep-on=1

6.2. Start the server:
/home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

Actual results:

Node 3 has successfully added to the cluster:

$ ps -ef | grep mysql
stepan   198747      1  0 Feb13 ?        00:06:01 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan   198748      1  0 Feb13 ?        00:06:07 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan   238233 221088  2 16:30 pts/2    00:00:00 /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
 
$ sudo netstat -tulpn | grep mysqld
[sudo] password for stepan:
tcp        0      0 0.0.0.0:16009           0.0.0.0:*               LISTEN      238233/mysqld
tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 127.0.0.1:16001         0.0.0.0:*               LISTEN      198748/mysqld
tcp        0      0 127.0.0.1:16002         0.0.0.0:*               LISTEN      238233/mysqld
tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 0.0.0.0:16006           0.0.0.0:*               LISTEN      198748/mysqld

mysqld.3.err:

2019-02-14 16:30:28 3 [Note] WSREP: Server cnt7glr11.localdomain synced with group
2019-02-14 16:30:28 3 [Note] WSREP: Server status change joined -> synced
2019-02-14 16:30:28 3 [Note] WSREP: Synchronized with group, ready for connections

/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock

MariaDB [mysql]> SHOW GLOBAL STATUS LIKE 'wsrep_cluster_%';
+----------------------------+--------------------------------------+
| Variable_name              | Value                                |
+----------------------------+--------------------------------------+
| wsrep_cluster_weight       | 3                                    |
| wsrep_cluster_capabilities |                                      |
| wsrep_cluster_conf_id      | 18446744073709551615                 |
| wsrep_cluster_size         | 3                                    |
| wsrep_cluster_state_uuid   | 7e61d751-2fa7-11e9-87ba-63ada45808dc |
| wsrep_cluster_status       | Primary                              |
+----------------------------+--------------------------------------+

New tables are presented:

use mysql;
show tables;
 
| wsrep_cluster             |
| wsrep_cluster_members     |
| wsrep_streaming_log       |

7. Attempt to upgrade Node 2.

$ ps -ef | grep mysql
stepan   115852 115523  0 17:18 pts/1    00:00:00 grep --color=auto mysql
stepan   198747      1  0 Feb13 ?        00:06:10 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan   198748      1  0 Feb13 ?        00:06:16 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.2 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
stepan   238233 221088  0 16:30 pts/2    00:00:10 /home/stepan/mariadb/10.4/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf

7.1 Stop the Node 2:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock

Actual results:
Upgraded node 3 has been stopped signal 6!

$ ps -ef | grep mysqld
stepan   119318 115523  0 17:19 pts/1    00:00:00 grep --color=auto mysqld
stepan   198747      1  0 Feb13 ?        00:06:10 /home/stepan/mariadb/10.3/sql/mysqld --defaults-group-suffix=.1 --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/my.cnf --log-output=file --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --core-file --loose-debug-sync-timeout=300
 
$ sudo netstat -tulpn | grep mysqld
[sudo] password for stepan:
tcp        0      0 127.0.0.1:16000         0.0.0.0:*               LISTEN      198747/mysqld
tcp        0      0 0.0.0.0:16003           0.0.0.0:*               LISTEN      198747/mysqld

mysqld.3.err:

2019-02-14 17:18:36 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
mysqld: /home/stepan/mariadb/10.4/storage/innobase/trx/trx0rseg.cc:92: void trx_rseg_update_wsrep_checkpoint(trx_rsegf_t*, const XID*, mtr_t*): Assertion `xid_seqno > wsrep_seqno' failed.
190214 17:18:36 [ERROR] mysqld got signal 6 ;

Expected:
Upgraded node 3 successfully continue to operate.

Other logs and config files are also attached.



 Comments   
Comment by Stepan Patryshev (Inactive) [ 2019-02-15 ]

temeo:
"PR https://github.com/MariaDB/server/pull/1186 should take care of that assertion"

Comment by Stepan Patryshev (Inactive) [ 2019-02-18 ]

It seems the same as MDEV-18480.

Comment by Stepan Patryshev (Inactive) [ 2019-02-18 ]

It' s fixed. Have checked it with galera lib commit 9cdbeb86c330b808571b14270e6428accb899c58 and MariaDB server commit f0b65102b23f006f596eef35e6e5f4f8b6d8146d

Generated at Thu Feb 08 08:45:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.