Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.4.7
-
CentOS Linux release 7.6.1810 (Core)
Description
Galera: Rolling upgrade: Upgraded node 2 cannot connect to the cluster on rows adding, updating, and removing during upgrade.
This issue was discovered on testing of Rolling Upgrade according to "MariaDB 10.4 Cluster Rolling Upgrade - Naive Approach" by Seppo Jaakola: https://docs.google.com/document/d/1z4XTpLpzStWMFaNnrSmiESaIVeCoKhu9Hbb1SrDPf0w
All binaries are non-debug built from sources:
MariaDB Server 10.4: branch 10.4, commit 9d6b601e797dd8333340dadaefae09ebafc787db.
Galera Lib4: branch mariadb-4.x, commit ba337dd0ac281a5e9f29c652a890bd7ad2ac464e.
MariaDB Server 10.3: branch 10.3, commit 099007c3c92d1405625777fa86d2fba3da1d339c.
Galera Lib3: branch mariadb-3.x, commit 227e96e457acb60037450bc1e81c45594782e906.
Steps:
1. Start 3 MariaDB 10.3 nodes with mtr:
1.0. export WSREP_PROVIDER=/usr/lib/libgalera_smm_3.so
1.1. cd mysql-test
1.2. "./mtr --suite=galera_3nodes --start-and-exit"
NODE 3 UPGRADE:
2. Copy [mysqld.3] group from var/my.cnf (attached my.cnf) into separate configuration file: mysqld.3.cnf (attached mysqld.3.cnf), and make following edits:
2.1. Edit:
[mysqld]
|
wsrep_cluster_address='gcomm://127.0.0.1:16003,127.0.0.1:16006,127.0.0.1:16009'
|
wsrep_provider=<path to galera 4 library>
|
basedir=<10.4 source tree>
|
character-sets-dir=<10.4 source tree>/sql/share/charsets
|
lc-messages-dir=<10.4 source tree>/sql/share/
|
2.2. And add there also:
binlog-format=row
|
wsrep_sst_method=rsync
|
innodb-autoinc-lock-mode=2
|
3. Load test data.
3.1. Run the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.1.sock
3.2. Create test table:
use test;
create table t (i int primary key auto_increment, j int);
3.3. Load some data for each node:
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16000 -e'insert into test.t(j) values(111)' "
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16001 -e'insert into test.t(j) values(222)' "
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16002 -e'insert into test.t(j) values(333)' "
4. Upgrade node 3.
4.1. Stop data loading for the node 3 (-P16002).
4.2 Stop the Server on node 3:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
4.3. Remove some rows on the node 1:
delete from t where i < 10;
4.4. Update a row on the node 1:
update t set j = 10001 where i = 10;
4.5. Make sure that wsrep-on is off:
vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
#wsrep-on=1
4.6. Run 10.4 binaries with 10.3 data:
/home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf --wsrep_provider=none
4.7. Run mysql_upgrade:
/home/stepan/galera/git/10.4/server/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf -uroot -h0 -P16002
4.8. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
4.9. export PATH=$PATH:/home/stepan/galera/git/10.4/server/scripts
5. Check upgraded node 3 without the cluster.
5.1. Start the server:
/home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
5.2. Start the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
Actual result:
Server version: 10.4.7-MariaDB-log Source distribution
5.3. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
6. Join node 3 back to the cluster.
6.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf:
wsrep-on=1
|
6.2. Start the server:
/home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.3.cnf
6.3. Check the ports:
$ sudo netstat -tulpn | grep mysqld
|
tcp 0 0 0.0.0.0:16006 0.0.0.0:* LISTEN 7328/mysqld
|
tcp 0 0 0.0.0.0:16009 0.0.0.0:* LISTEN 10501/mysqld
|
tcp 0 0 127.0.0.1:16000 0.0.0.0:* LISTEN 7327/mysqld
|
tcp 0 0 127.0.0.1:16001 0.0.0.0:* LISTEN 7328/mysqld
|
tcp 0 0 127.0.0.1:16002 0.0.0.0:* LISTEN 10501/mysqld
|
tcp 0 0 0.0.0.0:16003 0.0.0.0:* LISTEN 7327/mysqld
|
6.4. Start the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.3.sock
6.5. Check WSREP cluster status:
SHOW GLOBAL STATUS LIKE 'wsrep_cluster_%';
|
+----------------------------+----------------------+
|
| Variable_name | Value |
|
+----------------------------+----------------------+
|
| wsrep_cluster_weight | 3 |
|
| wsrep_cluster_capabilities | |
|
| wsrep_cluster_conf_id | 18446744073709551615 |
|
| wsrep_cluster_size | 0 |
|
| wsrep_cluster_state_uuid | |
|
| wsrep_cluster_status | Primary |
|
+----------------------------+----------------------+
|
6.6. Check the presence of the new 10.4 tables:
use mysql;
|
show tables LIKE 'wsrep_%';
|
+---------------------------+
|
| Tables_in_mysql (wsrep_%) |
|
+---------------------------+
|
| wsrep_cluster |
|
| wsrep_cluster_members |
|
| wsrep_streaming_log |
|
+---------------------------+
|
Actual result:
Upgraded node 3 has joined the cluster successfully.
NODE 2 UPGRADE:
7. Create [mysqld.2] the same way as described on the step 2.
8.1. Load some data for each node:
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16000 -e'insert into test.t(j) values(111)' "
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16001 -e'insert into test.t(j) values(222)' "
watch "/home/stepan/mariadb/10.3/client/mysql -uroot -h0 -P16002 -e'insert into test.t(j) values(333)' "
9. Upgrade node 2.
9.1. Stop data loading for the node 2 (-P16001).
9.2 Stop the Server on node 2:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
9.3. Remove some rows on the node 1:
delete from t where i < 20;
9.4. Update a row on the node 1:
update t set j = 20002 where i = 20;
9.5. Make sure that wsrep-on is off:
vi /home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf
#wsrep-on=1
9.6. Run 10.4 binaries with 10.3 data:
/home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf --wsrep_provider=none
9.7. Run mysql_upgrade:
/home/stepan/galera/git/10.4/server/client/mysql_upgrade --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf -uroot -h0 -P16001
9.8. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
9.9. export PATH=$PATH:/home/stepan/galera/git/10.4/server/scripts
10. Check upgraded node 2 without the cluster.
10.1. Start the server:
/home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf
10.2. Start the client:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
Actual result:
Server version: 10.4.7-MariaDB-log Source distribution
10.3. Stop the Server:
/home/stepan/mariadb/10.3/client/mysqladmin -u root shutdown -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
11. Join node 2 back to the cluster.
11.1. Add to /home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf:
wsrep-on=1
|
11.2. Start the server:
/home/stepan/galera/git/10.4/server/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3/mysql-test/var/mysqld.2.cnf
Actual results:
11.3. The port 16006 is not used:
$ sudo netstat -tulpn | grep mysqld
|
tcp 0 0 0.0.0.0:16009 0.0.0.0:* LISTEN 10501/mysqld
|
tcp 0 0 127.0.0.1:16000 0.0.0.0:* LISTEN 7327/mysqld
|
tcp 0 0 127.0.0.1:16001 0.0.0.0:* LISTEN 14576/mysqld
|
tcp 0 0 127.0.0.1:16002 0.0.0.0:* LISTEN 10501/mysqld
|
tcp 0 0 0.0.0.0:16003 0.0.0.0:* LISTEN 7327/mysqld
|
11.4. Run the client on node 2:
/home/stepan/mariadb/10.3/client/mysql -u root -S /home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock
11.5. Node 2 is NOT connected to the cluster:
SHOW GLOBAL STATUS LIKE 'wsrep_cluster_%';
|
+----------------------------+--------------------------------------+
|
| Variable_name | Value |
|
+----------------------------+--------------------------------------+
|
| wsrep_cluster_capabilities | |
|
| wsrep_cluster_conf_id | 18446744073709551615 |
|
| wsrep_cluster_size | 0 |
|
| wsrep_cluster_state_uuid | e3c49d03-a155-11e9-80eb-a614fcadb109 |
|
| wsrep_cluster_status | Disconnected |
|
+----------------------------+--------------------------------------+
|
11.6. Data from the test table cannot be selected:
select * from t;
|
ERROR 1047 (08S01): WSREP has not yet prepared node for application use
|
11.7. mysqld.2.err: [ERROR] WSREP: Failed to JOIN the cluster after SST:
2019-07-08 11:49:36 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2703)
|
2019-07-08 11:49:36 3 [Note] WSREP: Requesting state transfer: success, donor: 0
|
2019-07-08 11:49:36 0 [ERROR] WSREP: got asio system error while reading IST stream: asio.system:104
|
2019-07-08 11:49:36 0 [ERROR] WSREP: IST didn't contain all write sets, expected last: 2703 last received: -1
|
[...]
|
2019-07-08 11:49:38 4 [Warning] WSREP: View recovered from stable storage was empty. If the server is doing rolling upgrade from previous version which does not support storing view info into stable storage, this is ok. Otherwise this may be a sign of malfunction.
|
2019-07-08 11:49:38 12 [Note] WSREP: Cluster table is empty, not recovering transactions
|
2019-07-08 11:49:38 4 [Note] WSREP: SST received: e3c49d03-a155-11e9-80eb-a614fcadb109:1523
|
2019-07-08 11:49:38 3 [Warning] WSREP: moving position backwards: 2703 -> 1523
|
[...]
|
2019-07-08 11:49:38 0 [Note] /home/stepan/galera/git/10.4/server/sql/mysqld: ready for connections.
|
Version: '10.4.7-MariaDB-log' socket: '/home/stepan/mariadb/10.3/mysql-test/var/tmp/mysqld.2.sock' port: 16001 Source distribution
|
2019-07-08 11:49:38 3 [Note] WSREP: Receiving IST: 1180 writesets, seqnos 1524-2703
|
2019-07-08 11:49:38 3 [ERROR] WSREP: Receiving IST failed, node restart required: IST receiver reported failure: 71 (Protocol error)
|
at galera/src/replicator_smm.hpp:pop_front():320. Null event.
|
[...]
|
2019-07-08 11:49:38 3 [Note] WSREP: IST received: e3c49d03-a155-11e9-80eb-a614fcadb109:-1
|
2019-07-08 11:49:38 3 [Warning] WSREP: Sending JOIN failed: -103 (Software caused connection abort). Will retry in new primary component.
|
2019-07-08 11:49:38 3 [ERROR] WSREP: Failed to JOIN the cluster after SST
|
11.8. mysqld.3.err:
2019-07-08 11:49:36 0 [Note] WSREP: async IST sender starting to serve tcp://127.0.0.1:16007 sending 1524-157
|
2019-07-08 11:49:36 0 [ERROR] WSREP: async IST sender failed to serve tcp://127.0.0.1:16007: sender send first greater than last: 1524 > 157: 22 (Invalid argument)
|
at galera/src/ist.cpp:send():783
|
Expected result:
Upgraded node 2 successfully connects to the cluster on rows adding, updating, and removing during upgrade.
Other logs are also attached.
Attachments
Issue Links
- relates to
-
MDEV-29246 WSREP_CLUSTER_SIZE at 0 after rolling update a node from 10.3 to 10.4
- Closed
-
MDEV-18271 Galera 4: test manually rolling upgrade to Server 10.4 + Galera 4
- Closed