[MDEV-22723] Data loss when performing rolling upgrade from 10.3.23-MariaDB to 10.4.13-MariaDB Created: 2020-05-26  Updated: 2022-08-04  Resolved: 2020-08-20

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.3.23, 10.4.13
Fix Version/s: 10.3.26, 10.4.16

Type: Bug Priority: Critical
Reporter: Massimo Assignee: Alexey
Resolution: Fixed Votes: 3
Labels: None
Environment:

OS: CentOS Linux release 7.6.1810 (Core)


Attachments: File 200612_mysqld.1.err     File 200612_mysqld.2.err     File 200612_mysqld.3.err     Zip Archive 200709_patgal_output.zip     Zip Archive 20200713_MDEV-22723_patgal_no_errors.zip     Zip Archive 20200714_MDEV-22723_mdb_no_errors.zip     Zip Archive 20200714_MDEV-22723_patgal_no_errors.zip     Zip Archive 20200720_MDEV-22723_CentOS_7.5_no_errors.zip     Zip Archive 20200723_MDEV-22723_data_loss.zip     HTML File error_log_mdb1     File error_log_mdb2.after_upgrade     File mysqld_new.2.cnf     File mysqld_old.1.cnf     File mysqld_old.2.cnf     File mysqld_old.3.cnf     Text File node1_bootsrapped_10.3.23.log     File node1_bootsrapped_10.3.23.log.rtf     File node2_upgraded.log.rtf     Text File node2_upgraded_10.4.13.log     File server.cnf_mdb1     File server.cnf_mdb2    
Issue Links:
Relates
relates to MDEV-29246 WSREP_CLUSTER_SIZE at 0 after rolling... Closed
relates to MDEV-20439 WSREP_CLUSTER_SIZE at 0 after rolling... Closed
relates to MDEV-22745 node crash on upgrade from 10.3 to 10... Closed

 Description   

Creating a full galera cluster of 10.3.23 with 3 nodes
mdb1,mdb2,mdb3 10.3.23 version.
We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4.13, to enforce IST . We also re-tested with all 3 servers up , same result.

Create a schema and a table on mdb1. all propagate

  • stop mdb2 . yum remove the rpm of Mariadb and galera.
  • install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
  • set wsrep_on=OFF on my.cnf
  • start mdb2
  • perform mysql_upgrade -s
  • stop mdb2
  • set wsrep_on=ON on my.cnf
  • start mbd2

At this point the status galera variables on mdb2:

MariaDB mdb2 [pippo]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
| wsrep_protocol_version        | -1                                                                                                                                             |
| wsrep_last_committed          | 65                                                                                                                                             |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 3                                                                                                                                              |
| wsrep_received_bytes          | 208                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | 64                                                                                                                                             |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0.5                                                                                                                                            |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 1.5                                                                                                                                            |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 1                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | AUTO,10.0.1.13:3306                                                                                                                            |
| wsrep_cluster_weight          | 2                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0.000325151/0.00176008/0.00607075/0.00193032/7                                                                                                 |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4                                                                                                           |
| wsrep_applier_thread_count    | 32                                                                                                                                             |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
| wsrep_cluster_size            | 0                                                                                                                                              |
| wsrep_cluster_state_uuid      |                                                                                                                                                |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 18446744073709551615                                                                                                                           |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 33                                                                                                                                             |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)

NOTE THAT :

wsrep_cluster_status          | Primary
wsrep_local_state_comment     | Synced
wsrep_local_index             | 18446744073709551615
wsrep_cluster_size            | 0

Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:

MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk   |
+----+---------------+--------+
|  1 |           123 | aaaa   |
|  3 |           222 | eeeeaa |
|  4 |      34523452 | e4r4r4 |
+----+---------------+--------+

WHILE ON THE MASTER:

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk   |
+----+---------------+--------+
|  1 |           123 | aaaa   |
|  3 |           222 | eeeeaa |
|  4 |      34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)
 
MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
Query OK, 1 row affected (0.015 sec)
 
MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------------+
| Id | IdDispositivo | kkkk         |
+----+---------------+--------------+
|  1 |           123 | aaaa         |
|  3 |           222 | eeeeaa       |
|  4 |      34523452 | e4r4r4       |
|  6 |             3 | non tireplic |
+----+---------------+--------------+
4 rows in set (0.001 sec)

The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:

[root@mdb2 my.cnf.d]# systemctl restart  mariadb
[root@mdb2 my.cnf.d]# mysql
 
MariaDB md2 [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
| wsrep_protocol_version        | 9                                                                                                                                              |
| wsrep_last_committed          | 66                                                                                                                                             |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 2                                                                                                                                              |
| wsrep_received_bytes          | 200                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | 64                                                                                                                                             |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 0                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 0                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | 10.0.1.13:3306,AUTO                                                                                                                            |
| wsrep_cluster_weight          | 2                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0.000853237/0.001923/0.00333681/0.0010427/3                                                                                                    |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6                                                                                                           |
| wsrep_applier_thread_count    | 32                                                                                                                                             |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 6                                                                                                                                              |
| wsrep_cluster_size            | 2                                                                                                                                              |
| wsrep_cluster_state_uuid      | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 1                                                                                                                                              |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 33                                                                                                                                             |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.002 sec)

NOTE now the status is ok:

wsrep_local_index             | 1
wsrep_cluster_status          | Primary
wsrep_local_state_comment     | Synced
wsrep_local_index             | 1

but when we check the data we expect the new row should be present:

MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk   |
+----+---------------+--------+
|  1 |           123 | aaaa   |
|  3 |           222 | eeeeaa |
|  4 |      34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.



 Comments   
Comment by Rick Pizzi [ 2020-05-27 ]

Looks related to https://jira.mariadb.org/browse/MDEV-19983

Comment by Stepan Patryshev (Inactive) [ 2020-06-12 ]

I have managed to reproduce it only partially. I have not observed any data loss during a node upgrade. But I got these strange values: wsrep_local_index = 18446744073709551615 and wsrep_cluster_size = 0.

Release builds 10.3.23 + Galera 25.3.29(rb0f34b0) and 10.4.13 + Galera 26.4.4(rae24803).

Steps:

1. ./mtr --suite=galera_3nodes --start-and-exit
2. Restart all nodes one by one with separate config files: Node1, Node2, Node3.
3. create table evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));
4. insert into evento4(IdDispositivo, kkkk) values(123, 'aaaa');
insert into evento4(IdDispositivo, kkkk) values(222, 'eeeeaa');
insert into evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');
5. Stop Node 2.
6. Set wsrep-on=OFF and run Node 2 on 10.4.13 binaries with Node2 new config.
7. Perform mysql_upgrade -s.
8. Stop Node 2.
9. Node 3: insert into evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');
select * from evento4;

Id IdDispositivo kkkk
2 123 aaaa
5 222 eeeeaa
8 34523452 e4r4r4
10 777777 While Node 2 was upgrading

10. Start Node 2 with wsrep-on=ON.

11. New data appeared on Node 2:
select * from evento4;

Id IdDispositivo kkkk
2 123 aaaa
5 222 eeeeaa
8 34523452 e4r4r4
10 777777 While Node 2 was upgrading

But:

show global status like 'wsrep%';
 
 
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |
| wsrep_protocol_version        | 9                                                                                                                                              |
| wsrep_last_committed          | 6                                                                                                                                              |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 3                                                                                                                                              |
| wsrep_received_bytes          | 288                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | 6                                                                                                                                              |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 1                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 1                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001                                                                                                |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0.000293552/0.000366098/0.000521759/7.98882e-05/5                                                                                              |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | e05a4078-acc3-11ea-9394-8ba782d6f291                                                                                                           |
| wsrep_applier_thread_count    | 32                                                                                                                                             |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
| wsrep_cluster_size            | 0                                                                                                                                              |
| wsrep_cluster_state_uuid      |                                                                                                                                                |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 18446744073709551615                                                                                                                           |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(rae24803)                                                                                                                               |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 33                                                                                                                                             |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)

wsrep_cluster_status Primary
wsrep_local_state_comment Synced
wsrep_local_index 18446744073709551615
wsrep_cluster_size 0

12. On node 3: insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
13. New data are replicated to Node 2:
select * from evento4;

Id IdDispositivo kkkk
2 123 aaaa
5 222 eeeeaa
8 34523452 e4r4r4
10 777777 While Node 2 was upgrading
13 3 non tireplic

14. Restart Node 2.
15. On Node 2:

show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |
| wsrep_protocol_version        | 9                                                                                                                                              |
| wsrep_last_committed          | 7                                                                                                                                              |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 2                                                                                                                                              |
| wsrep_received_bytes          | 280                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | 6                                                                                                                                              |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 0                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 0                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001                                                                                                |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | a2c23b72-acc8-11ea-afe5-cbd8cb9a86ed                                                                                                           |
| wsrep_applier_thread_count    | 32                                                                                                                                             |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 17                                                                                                                                             |
| wsrep_cluster_size            | 3                                                                                                                                              |
| wsrep_cluster_state_uuid      | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 2                                                                                                                                              |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(rae24803)                                                                                                                               |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 33                                                                                                                                             |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)

wsrep_cluster_status Primary
wsrep_local_state_comment Synced
wsrep_local_index 2
wsrep_cluster_size 3

Server logs: Node 1, Node 2, Node 3.

I also have tried with one node stopped and without data population on Node 1 joined to the cluster during upgrading Node 2, but there were no any data loss anyway.

Comment by Rick Pizzi [ 2020-06-15 ]

Data loss is there, as documented in original description. We reproduced it many times.

Comment by Rick Pizzi [ 2020-06-15 ]

I have re-tested this in my own lab (the original bug report was from Massimo, I'm in same team).

I confirm the bug exist and we don't understand why it is not happening to you.

Exact steps to reproduce:

1. install 3 nodes with latest 10.3, i used 10.3.23, wsrep version 25.3.28(r3875)
2. create a table and insert data in it.

Situation after 2 steps above:

node1>create table dataloss (id int not null auto_increment primary key, value int);
Query OK, 0 rows affected (0.025 sec)
 
node1>insert into dataloss (value) values (1), (2), (3);
Query OK, 3 rows affected (0.003 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
node1>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
+----+-------+
3 rows in set (0.000 sec)
 
node1>show global status like 'wsrep%';
+-------------------------------+------------------------------------------+
| Variable_name                 | Value                                    |
+-------------------------------+------------------------------------------+
| wsrep_applier_thread_count    | 8                                        |
| wsrep_apply_oooe              | 0.000000                                 |
| wsrep_apply_oool              | 0.000000                                 |
| wsrep_apply_window            | 1.000000                                 |
| wsrep_causal_reads            | 0                                        |
| wsrep_cert_deps_distance      | 1.000000                                 |
| wsrep_cert_index_size         | 5                                        |
| wsrep_cert_interval           | 0.000000                                 |
| wsrep_cluster_conf_id         | 19                                       |
| wsrep_cluster_size            | 3                                        |
| wsrep_cluster_state_uuid      | cf61cf68-aef7-11ea-88db-1bc466429584     |
| wsrep_cluster_status          | Primary                                  |
| wsrep_cluster_weight          | 3                                        |
| wsrep_commit_oooe             | 0.000000                                 |
| wsrep_commit_oool             | 0.000000                                 |
| wsrep_commit_window           | 1.000000                                 |
| wsrep_connected               | ON                                       |
| wsrep_desync_count            | 0                                        |
| wsrep_evs_delayed             |                                          |
| wsrep_evs_evict_list          |                                          |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                |
| wsrep_evs_state               | OPERATIONAL                              |
| wsrep_flow_control_paused     | 0.000000                                 |
| wsrep_flow_control_paused_ns  | 0                                        |
| wsrep_flow_control_recv       | 0                                        |
| wsrep_flow_control_sent       | 0                                        |
| wsrep_gcomm_uuid              | 66883d21-af01-11ea-a6eb-260a9c0d8490     |
| wsrep_incoming_addresses      | AUTO,192.168.2.90:3306,192.168.2.92:3306 |
| wsrep_last_committed          | 8                                        |
| wsrep_local_bf_aborts         | 0                                        |
| wsrep_local_cached_downto     | 6                                        |
| wsrep_local_cert_failures     | 0                                        |
| wsrep_local_commits           | 1                                        |
| wsrep_local_index             | 1                                        |
| wsrep_local_recv_queue        | 0                                        |
| wsrep_local_recv_queue_avg    | 0.000000                                 |
| wsrep_local_recv_queue_max    | 1                                        |
| wsrep_local_recv_queue_min    | 0                                        |
| wsrep_local_replays           | 0                                        |
| wsrep_local_send_queue        | 0                                        |
| wsrep_local_send_queue_avg    | 0.000000                                 |
| wsrep_local_send_queue_max    | 1                                        |
| wsrep_local_send_queue_min    | 0                                        |
| wsrep_local_state             | 4                                        |
| wsrep_local_state_comment     | Synced                                   |
| wsrep_local_state_uuid        | cf61cf68-aef7-11ea-88db-1bc466429584     |
| wsrep_open_connections        | 0                                        |
| wsrep_open_transactions       | 0                                        |
| wsrep_protocol_version        | 9                                        |
| wsrep_provider_name           | Galera                                   |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>        |
| wsrep_provider_version        | 25.3.28(r3875)                           |
| wsrep_ready                   | ON                                       |
| wsrep_received                | 4                                        |
| wsrep_received_bytes          | 755                                      |
| wsrep_repl_data_bytes         | 978                                      |
| wsrep_repl_keys               | 9                                        |
| wsrep_repl_keys_bytes         | 144                                      |
| wsrep_repl_other_bytes        | 0                                        |
| wsrep_replicated              | 3                                        |
| wsrep_replicated_bytes        | 1328                                     |
| wsrep_rollbacker_thread_count | 1                                        |
| wsrep_thread_count            | 9                                        |
+-------------------------------+------------------------------------------+
63 rows in set (0.001 sec)

3. on node 2, shut down and upgrade to latest 10.4, I used 10.4.13, wsrep 26.4.4(r4599)

When you restart that node, you see weird values for cluster_size and cluster_local_index:

MariaDB [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | cf61cf68-aef7-11ea-88db-1bc466429584                                                                                                           |
| wsrep_protocol_version        | -1                                                                                                                                             |
| wsrep_last_committed          | 8                                                                                                                                              |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 3                                                                                                                                              |
| wsrep_received_bytes          | 288                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | -1                                                                                                                                             |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 0                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 0                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | AUTO,192.168.2.90:3306,192.168.2.92:3306                                                                                                       |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0.000567644/0.00112438/0.00173288/0.000348106/7                                                                                                |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | 043aaa1a-af04-11ea-9292-9a42c9f9c38d                                                                                                           |
| wsrep_applier_thread_count    | 8                                                                                                                                              |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
| wsrep_cluster_size            | 0                                                                                                                                              |
| wsrep_cluster_state_uuid      |                                                                                                                                                |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 18446744073709551615                                                                                                                           |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 9                                                                                                                                              |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)

Recheck the content of table dataloss on 3 nodes:

node1>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
+----+-------+
3 rows in set (0.001 sec)
 
node2> select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
+----+-------+
3 rows in set (0.001 sec)
 
node3>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
+----+-------+
3 rows in set (0.000 sec)
 

Now insert a row on node1, verify it has been added:

node1>insert into dataloss (value) values (4);
Query OK, 1 row affected (0.002 sec)
 
node1>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
| 11 |     4 |
+----+-------+
4 rows in set (0.000 sec)

If you check on node2, that row is not there and it's lost:

noed2> select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
+----+-------+
3 rows in set (0.000 sec)

On node 3, the row is there:

node3>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
| 11 |     4 |
+----+-------+
4 rows in set (0.000 sec)

Any other row inserted in this situation never reaches node 2 - it's data loss.

Then if you reboot the node2 once more, the wsrep config clears and looks good:

Redirecting to /bin/systemctl stop mariadb.service
[root@docker2 ~]# service mariadb start
Redirecting to /bin/systemctl start mariadb.service
[root@docker2 ~]# mysql -A
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 20
Server version: 10.4.13-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
node2> show global status like 'wsrep_local_index';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| wsrep_local_index | 2     |
+-------------------+-------+
1 row in set (0.001 sec)

Now, if I insert a new row on node1, it is correctly propagated to all nodes, but the row previously inserted is lost:

node1>insert into dataloss (value) values (5);
Query OK, 1 row affected (0.003 sec)
node1>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
| 11 |     4 |
| 16 |     5 |
+----+-------+
5 rows in set (0.000 sec)
 
node2> select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
| 16 |     5 |
+----+-------+
4 rows in set (0.000 sec)
 
node3>select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  2 |     1 |
|  5 |     2 |
|  8 |     3 |
| 11 |     4 |
| 16 |     5 |
+----+-------+
5 rows in set (0.000 sec)

So, please re-test the above scenario to verify that there is actual data loss and it's not only a problem of bad variable display

Thanks
RIck

Comment by Rick Pizzi [ 2020-06-15 ]

stepan.patryshev Please check the above.

Comment by MikaH [ 2020-06-16 ]

tested with rolling-update method. Three node cluster where nodes were 10.3.23 (on Centos 7.6). Node2 upgraded:

node1> MariaDB [test]> create table dataloss (id int not null auto_increment primary key, value int); 
MariaDB [test]> insert into dataloss (value) values (1), (2), (3);
Query OK, 3 rows affected (0.006 sec)
Records: 3  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+
3 rows in set (0.001 sec)

Status on node1:

MariaDB [test]> show global status like 'wsrep%cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.002 sec)
MariaDB [test]> show global status like 'wsrep%size%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| wsrep_cert_index_size | 3     |
| wsrep_cluster_size    | 3     |
+-----------------------+-------+
2 rows in set (0.002 sec)

Status on node2 before upgrade:

MariaDB [(none)]> select * from test.dataloss;
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+
4 rows in set (0.001 sec)

Perform node2 upgrade:

# Copy configs to safe place:
mkdir /root/configs/
/bin/cp -p /etc/my.cnf.d/*cnf /root/configs/.
# Stop and remove old rpm's:
systemctl stop mariadb && rpm -qai|grep -e Maria -e galera |grep Name | awk '{print "yum remove " $3 " -y"}'|bash
# Then install new rpm's and Selinux-policyfiles:
yum localinstall rpmsfor10.4.13/*rpm -y && semodule -v -i selinux/*.pp
# Copy configs back:
/bin/cp -p /root/configs/*cnf /etc/my.cnf.d/.
# Add needed link, start MariaDB and run mysql_upgrade:
ln -s /usr/lib64/galera-4 /usr/lib64/galera && systemctl start mariadb && mysql_upgrade -uroot -p --skip-write-binlog

Status after node2 upgrade:

[root@galera2 ~]# mysql -uroot
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 852
Server version: 10.4.13-MariaDB-log MariaDB Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> show global status like 'wsrep%cluster_size%';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.002 sec)
 
MariaDB [(none)]> show global status like 'wsrep%size%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| wsrep_cert_index_size | 3     |
| wsrep_cluster_size    | 3     |
+-----------------------+-------+
2 rows in set (0.002 sec)
 
MariaDB [(none)]>

Inserting on node1 data:

MariaDB [test]> insert into dataloss (value) values (4);
Query OK, 1 row affected (0.004 sec)
 
MariaDB [test]> select * from dataloss;
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
| 12 |     4 |
+----+-------+
4 rows in set (0.000 sec)

Status on node2 after data inserted on node1:

MariaDB [(none)]> select * from test.dataloss;
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
| 12 |     4 |
+----+-------+
4 rows in set (0.000 sec)
MariaDB [(none)]>

No data loss with this method

Comment by Rick Pizzi [ 2020-06-16 ]

If node2 came up with correct cluster index it could be it has performed an SST.
Please post logs...

Comment by MikaH [ 2020-06-16 ]

Here are the logs:
node2_upgraded_10.4.13.log node1_bootsrapped_10.3.23.log

Comment by Rick Pizzi [ 2020-06-16 ]

Your log is mangled. I would suggest you follow exactly my steps and you should get the same results. We did this in multiple labs with same result.

Comment by Stepan Patryshev (Inactive) [ 2020-06-17 ]

rpizzi Thank you for the detailed steps. I have retested it with wsrep version 25.3.28(r3875) you mentioned and these steps, but unfortunately still have not got any data loss or a server crash.

Comment by Stepan Patryshev (Inactive) [ 2020-07-09 ]

rpizziI have passed your steps with standard installed packages on separate VMs but still have not managed to reproduce it. Do not know what is the key difference. Can you please share the steps how exactly do you update the server just in case?

Comment by Rick Pizzi [ 2020-07-09 ]

The steps are outlined above https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156703 and are more than detailed.

Can you please post the output of your session when running the above commands here?

Comment by Stepan Patryshev (Inactive) [ 2020-07-10 ]

rpizzi Here you are my sessions output. There are different sessions for MariaDB client and for the console itself.

Comment by Rick Pizzi [ 2020-07-13 ]

stepan.patryshev from these files we can't infer whether the correct sequence of steps has been executed.

Can you provide evidence that reproducing the steps I have outlined above you get different results?
As I already mentioned, two in my team on two separate environments can reproduce it just fine and 100% of the time.

Please, try once again, and provide a single output with all the steps done in sequence, like I did above.

Thanks

Rick

Comment by Massimo [ 2020-07-13 ]

please do not use the test schema as well and add the steps, conf and error log of all the nodes. Looking at the log isnt clear what you have done

Comment by Stepan Patryshev (Inactive) [ 2020-07-13 ]

rpizzi I have passed the steps again without any failures. PFA all logs and cnf files.

Steps:

1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902).

2. On Node1 create a table and insert data in it.

[root@patgal1 ~]# mysql -pr -e'CREATE DATABASE d;create table d.dataloss (id int not null auto_increment primary key, value int);insert into d.dataloss (value) values (1), (2), (3);'
[root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
+----+-------+
 
 
[root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
+----+-------+
 
 
[root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
+----+-------+

Situation after above:

Node1:

[root@patgal1 ~]# mysql -pr -e'show global status like "wsrep%";'
+-------------------------------+-------------------------------------------------------+
| Variable_name                 | Value                                                 |
+-------------------------------+-------------------------------------------------------+
| wsrep_applier_thread_count    | 1                                                     |
| wsrep_apply_oooe              | 0.000000                                              |
| wsrep_apply_oool              | 0.000000                                              |
| wsrep_apply_window            | 1.000000                                              |
| wsrep_causal_reads            | 0                                                     |
| wsrep_cert_deps_distance      | 1.000000                                              |
| wsrep_cert_index_size         | 5                                                     |
| wsrep_cert_interval           | 0.000000                                              |
| wsrep_cluster_conf_id         | 3                                                     |
| wsrep_cluster_size            | 3                                                     |
| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_cluster_status          | Primary                                               |
| wsrep_cluster_weight          | 3                                                     |
| wsrep_commit_oooe             | 0.000000                                              |
| wsrep_commit_oool             | 0.000000                                              |
| wsrep_commit_window           | 1.000000                                              |
| wsrep_connected               | ON                                                    |
| wsrep_desync_count            | 0                                                     |
| wsrep_evs_delayed             |                                                       |
| wsrep_evs_evict_list          |                                                       |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
| wsrep_evs_state               | OPERATIONAL                                           |
| wsrep_flow_control_paused     | 0.000000                                              |
| wsrep_flow_control_paused_ns  | 0                                                     |
| wsrep_flow_control_recv       | 0                                                     |
| wsrep_flow_control_sent       | 0                                                     |
| wsrep_gcomm_uuid              | f1120258-c51e-11ea-8b48-cb8ed6394b53                  |
| wsrep_incoming_addresses      | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 |
| wsrep_last_committed          | 24                                                    |
| wsrep_local_bf_aborts         | 0                                                     |
| wsrep_local_cached_downto     | 22                                                    |
| wsrep_local_cert_failures     | 0                                                     |
| wsrep_local_commits           | 1                                                     |
| wsrep_local_index             | 0                                                     |
| wsrep_local_recv_queue        | 0                                                     |
| wsrep_local_recv_queue_avg    | 0.000000                                              |
| wsrep_local_recv_queue_max    | 1                                                     |
| wsrep_local_recv_queue_min    | 0                                                     |
| wsrep_local_replays           | 0                                                     |
| wsrep_local_send_queue        | 0                                                     |
| wsrep_local_send_queue_avg    | 0.000000                                              |
| wsrep_local_send_queue_max    | 1                                                     |
| wsrep_local_send_queue_min    | 0                                                     |
| wsrep_local_state             | 4                                                     |
| wsrep_local_state_comment     | Synced                                                |
| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_open_connections        | 0                                                     |
| wsrep_open_transactions       | 0                                                     |
| wsrep_protocol_version        | 9                                                     |
| wsrep_provider_name           | Galera                                                |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
| wsrep_provider_version        | 25.3.29(r3902)                                        |
| wsrep_ready                   | ON                                                    |
| wsrep_received                | 4                                                     |
| wsrep_received_bytes          | 626                                                   |
| wsrep_repl_data_bytes         | 969                                                   |
| wsrep_repl_keys               | 8                                                     |
| wsrep_repl_keys_bytes         | 136                                                   |
| wsrep_repl_other_bytes        | 0                                                     |
| wsrep_replicated              | 3                                                     |
| wsrep_replicated_bytes        | 1312                                                  |
| wsrep_rollbacker_thread_count | 1                                                     |
| wsrep_thread_count            | 2                                                     |
+-------------------------------+-------------------------------------------------------+

Node2:

[root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";'
+-------------------------------+-------------------------------------------------------+
| Variable_name                 | Value                                                 |
+-------------------------------+-------------------------------------------------------+
| wsrep_applier_thread_count    | 1                                                     |
| wsrep_apply_oooe              | 0.000000                                              |
| wsrep_apply_oool              | 0.000000                                              |
| wsrep_apply_window            | 1.000000                                              |
| wsrep_causal_reads            | 0                                                     |
| wsrep_cert_deps_distance      | 1.000000                                              |
| wsrep_cert_index_size         | 5                                                     |
| wsrep_cert_interval           | 0.000000                                              |
| wsrep_cluster_conf_id         | 3                                                     |
| wsrep_cluster_size            | 3                                                     |
| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_cluster_status          | Primary                                               |
| wsrep_cluster_weight          | 3                                                     |
| wsrep_commit_oooe             | 0.000000                                              |
| wsrep_commit_oool             | 0.000000                                              |
| wsrep_commit_window           | 1.000000                                              |
| wsrep_connected               | ON                                                    |
| wsrep_desync_count            | 0                                                     |
| wsrep_evs_delayed             |                                                       |
| wsrep_evs_evict_list          |                                                       |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
| wsrep_evs_state               | OPERATIONAL                                           |
| wsrep_flow_control_paused     | 0.000000                                              |
| wsrep_flow_control_paused_ns  | 0                                                     |
| wsrep_flow_control_recv       | 0                                                     |
| wsrep_flow_control_sent       | 0                                                     |
| wsrep_gcomm_uuid              | f8c46db5-c51e-11ea-8095-6ffbd7cfa539                  |
| wsrep_incoming_addresses      | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 |
| wsrep_last_committed          | 24                                                    |
| wsrep_local_bf_aborts         | 0                                                     |
| wsrep_local_cached_downto     | 22                                                    |
| wsrep_local_cert_failures     | 0                                                     |
| wsrep_local_commits           | 0                                                     |
| wsrep_local_index             | 1                                                     |
| wsrep_local_recv_queue        | 0                                                     |
| wsrep_local_recv_queue_avg    | 0.000000                                              |
| wsrep_local_recv_queue_max    | 1                                                     |
| wsrep_local_recv_queue_min    | 0                                                     |
| wsrep_local_replays           | 0                                                     |
| wsrep_local_send_queue        | 0                                                     |
| wsrep_local_send_queue_avg    | 0.000000                                              |
| wsrep_local_send_queue_max    | 1                                                     |
| wsrep_local_send_queue_min    | 0                                                     |
| wsrep_local_state             | 4                                                     |
| wsrep_local_state_comment     | Synced                                                |
| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_open_connections        | 0                                                     |
| wsrep_open_transactions       | 0                                                     |
| wsrep_protocol_version        | 9                                                     |
| wsrep_provider_name           | Galera                                                |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
| wsrep_provider_version        | 25.3.29(r3902)                                        |
| wsrep_ready                   | ON                                                    |
| wsrep_received                | 6                                                     |
| wsrep_received_bytes          | 1803                                                  |
| wsrep_repl_data_bytes         | 0                                                     |
| wsrep_repl_keys               | 0                                                     |
| wsrep_repl_keys_bytes         | 0                                                     |
| wsrep_repl_other_bytes        | 0                                                     |
| wsrep_replicated              | 0                                                     |
| wsrep_replicated_bytes        | 0                                                     |
| wsrep_rollbacker_thread_count | 1                                                     |
| wsrep_thread_count            | 2                                                     |
+-------------------------------+-------------------------------------------------------+

3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

4. Join upgraded Node2 to the cluster:

[root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";'
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
| wsrep_protocol_version        | 9                                                                                                                                              |
| wsrep_last_committed          | 24                                                                                                                                             |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 2                                                                                                                                              |
| wsrep_received_bytes          | 280                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | -1                                                                                                                                             |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 0                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 0                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | AUTO,172.20.3.101:3306,172.20.3.103:3306                                                                                                       |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | 332a2e12-c525-11ea-be26-4ed9b6694f67                                                                                                           |
| wsrep_applier_thread_count    | 1                                                                                                                                              |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 10                                                                                                                                             |
| wsrep_cluster_size            | 3                                                                                                                                              |
| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 0                                                                                                                                              |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 2                                                                                                                                              |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

wsrep_cluster_size and wsrep_local_index on Node2:

wsrep_cluster_size 3
wsrep_local_index 0

5. Recheck the content of table dataloss on 3 nodes:

[root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
+----+-------+
 
[root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
+----+-------+
 
[root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
+----+-------+

6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3:

[root@patgal1 ~]# mysql -pr -e'insert into d.dataloss (value) values (4);'
[root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
| 11 |     4 |
+----+-------+
 
[root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
| 11 |     4 |
+----+-------+
 
[root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  1 |     1 |
|  4 |     2 |
|  7 |     3 |
| 11 |     4 |
+----+-------+

As you may see there are no any related errors or data loss here.

Comment by Rick Pizzi [ 2020-07-14 ]

You aren't reproducing the issue.

Can you please explicit step 3 in details?
When you say:

 3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

We would like to see the exact steps used to do this as this is where you probably are doing things
differently. Please paste relevant part of history file.

Thanks
Rick

Comment by Rick Pizzi [ 2020-07-14 ]

OK, by looking at the output of the patgal2 session (both yesterday and the other day) we see this:

root@patgal2 ~]#   systemctl stop mariadb
[root@patgal2 ~]# systemctl start mariadb
[root@patgal2 ~]# 
[root@patgal2 ~]# systemctl stop mariadb

Basically after upgrading node 2 to 10.4 you start the server with wsrep ON, run mysql_upgrade then shut down and set wsrep to OFF and start again. This is not what we have specified in the ticket.

Please repeat EXACT steps we have posted. In other words: after upgrading packages you need to start with WSREP OFF not ON.

Thanks
Rick

Comment by Rick Pizzi [ 2020-07-14 ]

Re-reading the entire ticket I see that there was some confusion about this WSREP_ON = OFF thing, as Massimo (original bug submitter) said to start with off, run upgrade, stop and start with on, while in my test I don't play with that at all.

The bottom line of all this is: the FIRST time you start MariaDB on node2 with WSREP enabled, you get that weird cluster index and cluster_size=0 and it is in that moment that any data inserted in other nodes does not reach node2.

If you start node2 twice with WSREP enabled the problem does not appear because the 2nd restart (which you always seem to do, see above) "clears" the weird situation.

So, once again, to properly test this DO NOT touch the WSREP_ON variable, leave it on, but after upgrading packages start node2 only once, not twice. You will see the weird cluster index and size values - in that situation you will see that any row inserted on other nodes is lost (does not reach node2)

Comment by Stepan Patryshev (Inactive) [ 2020-07-14 ]

@rpizzi You are wrong here. As you may see in "20200713_patgal2_output.log" on the line 165 there is "wsrep_on=OFF" before running upgraded server. The only diference is that I did it even before upgrade.
And in "20200713_patgal2.err" the first run of 10.4.13 is on the line 494: "2020-07-13 19:13:38 0 [Note] InnoDB: 10.4.13 started", and the 1-st attemt to load WSREP provider on 10.4.13 logged later on the line 515 "2020-07-13 19:19:39 0 [Note] WSREP: Loading provider".
And here you are the history fragment:

  262  systemctl start mariadb
  263  mysql -pr -e'select * from d.dataloss;'
  264  mysql -pr -e'show global status like "wsrep%";'
  265  systemctl stop mariadb
  266  vi /etc/my.cnf.d/server2.cnf
  267  cat /etc/yum.repos.d/mariadb.repo
  268  curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4
  269  cat /etc/yum.repos.d/mariadb.repo
  270  yum list installed | grep galera
  271  yum list installed | grep MariaDB
  272  sudo yum remove MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common
  273  yum list installed | grep galera
  274  yum list installed | grep MariaDB
  275  yum install MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common
  276  yum list installed | grep MariaDB
  277  yum list installed | grep galera
  278  systemctl start mariadb
  279  mysql_upgrade -s
  280  mysql_upgrade -s -pr
  281  systemctl stop mariadb
  282  vi /etc/my.cnf.d/server.cnf
  283  vi /etc/my.cnf.d/server2.cnf
  284  systemctl start mariadb
  285  vi /etc/my.cnf.d/server2.cnf
  286  systemctl start mariadb
  287  mysql -pr -e'show global status like "wsrep%";'
  288  mysql -pr -e'select * from d.dataloss;'

Anyway I will try to do it more closer to your steps.

Comment by Rick Pizzi [ 2020-07-14 ]

To verify the bug DO NOT start node2 more than once after upgrading. That's it.

Comment by Stepan Patryshev (Inactive) [ 2020-07-14 ]

@rpizzi It has not helped. I have not changed WSREP_ON at all and run the upgraded server only once. And it has passed again without any failures or data loss. Please, share exact steps how do you install and update packages. PFA all logs and cnf files.

Steps:

1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902).

2. On Node1 create a table and insert data in it.

[root@patgal1 ~]# mysql -e'create database d;'
[root@patgal1 ~]# mysql -e'create table d.dataloss (id int not null auto_increment primary key, value int) 
;'
[root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (1), (2), (3);'
 
[root@patgal1 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+

2.1. Check that data are propagated successfully to other nodes:

[root@patgal2 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+
 
[root@patgal3 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+

2.2. Situation after above:

Node1:

[root@patgal1 ~]# mysql -e'show global status like "wsrep%";'
+-------------------------------+-------------------------------------------------------+
| Variable_name                 | Value                                                 |
+-------------------------------+-------------------------------------------------------+
| wsrep_applier_thread_count    | 1                                                     |
| wsrep_apply_oooe              | 0.000000                                              |
| wsrep_apply_oool              | 0.000000                                              |
| wsrep_apply_window            | 1.000000                                              |
| wsrep_causal_reads            | 0                                                     |
| wsrep_cert_deps_distance      | 1.000000                                              |
| wsrep_cert_index_size         | 5                                                     |
| wsrep_cert_interval           | 0.000000                                              |
| wsrep_cluster_conf_id         | 3                                                     |
| wsrep_cluster_size            | 3                                                     |
| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_cluster_status          | Primary                                               |
| wsrep_cluster_weight          | 3                                                     |
| wsrep_commit_oooe             | 0.000000                                              |
| wsrep_commit_oool             | 0.000000                                              |
| wsrep_commit_window           | 1.000000                                              |
| wsrep_connected               | ON                                                    |
| wsrep_desync_count            | 0                                                     |
| wsrep_evs_delayed             |                                                       |
| wsrep_evs_evict_list          |                                                       |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
| wsrep_evs_state               | OPERATIONAL                                           |
| wsrep_flow_control_paused     | 0.000000                                              |
| wsrep_flow_control_paused_ns  | 0                                                     |
| wsrep_flow_control_recv       | 0                                                     |
| wsrep_flow_control_sent       | 0                                                     |
| wsrep_gcomm_uuid              | fed13746-c5b4-11ea-a5fe-a6a8e8ca175a                  |
| wsrep_incoming_addresses      | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 |
| wsrep_last_committed          | 6                                                     |
| wsrep_local_bf_aborts         | 0                                                     |
| wsrep_local_cached_downto     | 4                                                     |
| wsrep_local_cert_failures     | 0                                                     |
| wsrep_local_commits           | 1                                                     |
| wsrep_local_index             | 2                                                     |
| wsrep_local_recv_queue        | 0                                                     |
| wsrep_local_recv_queue_avg    | 0.000000                                              |
| wsrep_local_recv_queue_max    | 1                                                     |
| wsrep_local_recv_queue_min    | 0                                                     |
| wsrep_local_replays           | 0                                                     |
| wsrep_local_send_queue        | 0                                                     |
| wsrep_local_send_queue_avg    | 0.000000                                              |
| wsrep_local_send_queue_max    | 1                                                     |
| wsrep_local_send_queue_min    | 0                                                     |
| wsrep_local_state             | 4                                                     |
| wsrep_local_state_comment     | Synced                                                |
| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_open_connections        | 0                                                     |
| wsrep_open_transactions       | 0                                                     |
| wsrep_protocol_version        | 9                                                     |
| wsrep_provider_name           | Galera                                                |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
| wsrep_provider_version        | 25.3.29(r3902)                                        |
| wsrep_ready                   | ON                                                    |
| wsrep_received                | 10                                                    |
| wsrep_received_bytes          | 782                                                   |
| wsrep_repl_data_bytes         | 969                                                   |
| wsrep_repl_keys               | 8                                                     |
| wsrep_repl_keys_bytes         | 136                                                   |
| wsrep_repl_other_bytes        | 0                                                     |
| wsrep_replicated              | 3                                                     |
| wsrep_replicated_bytes        | 1312                                                  |
| wsrep_rollbacker_thread_count | 1                                                     |
| wsrep_thread_count            | 2                                                     |
+-------------------------------+-------------------------------------------------------+

Node2:

[root@patgal2 ~]# mysql -e'show global status like "wsrep%";'
+-------------------------------+-------------------------------------------------------+
| Variable_name                 | Value                                                 |
+-------------------------------+-------------------------------------------------------+
| wsrep_applier_thread_count    | 1                                                     |
| wsrep_apply_oooe              | 0.000000                                              |
| wsrep_apply_oool              | 0.000000                                              |
| wsrep_apply_window            | 1.000000                                              |
| wsrep_causal_reads            | 0                                                     |
| wsrep_cert_deps_distance      | 1.000000                                              |
| wsrep_cert_index_size         | 5                                                     |
| wsrep_cert_interval           | 0.000000                                              |
| wsrep_cluster_conf_id         | 3                                                     |
| wsrep_cluster_size            | 3                                                     |
| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_cluster_status          | Primary                                               |
| wsrep_cluster_weight          | 3                                                     |
| wsrep_commit_oooe             | 0.000000                                              |
| wsrep_commit_oool             | 0.000000                                              |
| wsrep_commit_window           | 1.000000                                              |
| wsrep_connected               | ON                                                    |
| wsrep_desync_count            | 0                                                     |
| wsrep_evs_delayed             |                                                       |
| wsrep_evs_evict_list          |                                                       |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
| wsrep_evs_state               | OPERATIONAL                                           |
| wsrep_flow_control_paused     | 0.000000                                              |
| wsrep_flow_control_paused_ns  | 0                                                     |
| wsrep_flow_control_recv       | 0                                                     |
| wsrep_flow_control_sent       | 0                                                     |
| wsrep_gcomm_uuid              | 11a7b1fd-c5b5-11ea-9a59-5e4e35dabad1                  |
| wsrep_incoming_addresses      | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 |
| wsrep_last_committed          | 6                                                     |
| wsrep_local_bf_aborts         | 0                                                     |
| wsrep_local_cached_downto     | 4                                                     |
| wsrep_local_cert_failures     | 0                                                     |
| wsrep_local_commits           | 0                                                     |
| wsrep_local_index             | 0                                                     |
| wsrep_local_recv_queue        | 0                                                     |
| wsrep_local_recv_queue_avg    | 0.142857                                              |
| wsrep_local_recv_queue_max    | 2                                                     |
| wsrep_local_recv_queue_min    | 0                                                     |
| wsrep_local_replays           | 0                                                     |
| wsrep_local_send_queue        | 0                                                     |
| wsrep_local_send_queue_avg    | 0.000000                                              |
| wsrep_local_send_queue_max    | 1                                                     |
| wsrep_local_send_queue_min    | 0                                                     |
| wsrep_local_state             | 4                                                     |
| wsrep_local_state_comment     | Synced                                                |
| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
| wsrep_open_connections        | 0                                                     |
| wsrep_open_transactions       | 0                                                     |
| wsrep_protocol_version        | 9                                                     |
| wsrep_provider_name           | Galera                                                |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
| wsrep_provider_version        | 25.3.29(r3902)                                        |
| wsrep_ready                   | ON                                                    |
| wsrep_received                | 7                                                     |
| wsrep_received_bytes          | 1811                                                  |
| wsrep_repl_data_bytes         | 0                                                     |
| wsrep_repl_keys               | 0                                                     |
| wsrep_repl_keys_bytes         | 0                                                     |
| wsrep_repl_other_bytes        | 0                                                     |
| wsrep_replicated              | 0                                                     |
| wsrep_replicated_bytes        | 0                                                     |
| wsrep_rollbacker_thread_count | 1                                                     |
| wsrep_thread_count            | 2                                                     |
+-------------------------------+-------------------------------------------------------+

3. On Node2 shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

3.1. systemctl stop mariadb
3.2. https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s – --mariadb-server-version=mariadb-10.4
3.3. yum remove MariaDB galera
3.4. yum install MariaDB galera
3.5. rm /etc/my.cnf.d/server.cnf
3.6. Update "wsrep_provider" value to "/usr/lib64/galera-4/libgalera_smm.so" in "/etc/my.cnf.d/server2.cnf".
3.7. systemctl start mariadb

3.8. mysql_upgrade -s

The --upgrade-system-tables option was used, user tables won't be touched.
Phase 1/7: Checking and upgrading mysql database
Processing databases
mysql
mysql.column_stats                                 OK
mysql.columns_priv                                 OK
mysql.db                                           OK
mysql.event                                        OK
mysql.func                                         OK
mysql.gtid_slave_pos                               OK
mysql.help_category                                OK
mysql.help_keyword                                 OK
mysql.help_relation                                OK
mysql.help_topic                                   OK
mysql.host                                         OK
mysql.index_stats                                  OK
mysql.innodb_index_stats                           OK
mysql.innodb_table_stats                           OK
mysql.plugin                                       OK
mysql.proc                                         OK
mysql.procs_priv                                   OK
mysql.proxies_priv                                 OK
mysql.roles_mapping                                OK
mysql.servers                                      OK
mysql.table_stats                                  OK
mysql.tables_priv                                  OK
mysql.time_zone                                    OK
mysql.time_zone_leap_second                        OK
mysql.time_zone_name                               OK
mysql.time_zone_transition                         OK
mysql.time_zone_transition_type                    OK
mysql.transaction_registry                         OK
mysql.user                                         OK
mysql.wsrep_cluster                                OK
mysql.wsrep_cluster_members                        OK
mysql.wsrep_streaming_log                          OK
Phase 2/7: Installing used storage engines... Skipped
Phase 3/7: Fixing views... Skipped
Phase 4/7: Running 'mysql_fix_privilege_tables'
Phase 5/7: Fixing table and database names ... Skipped
Phase 6/7: Checking and upgrading tables... Skipped
Phase 7/7: Running 'FLUSH PRIVILEGES'
OK

4.

[root@patgal2 ~]# mysql -e'show global status like "wsrep%";'
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
| wsrep_protocol_version        | 9                                                                                                                                              |
| wsrep_last_committed          | 6                                                                                                                                              |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 2                                                                                                                                              |
| wsrep_received_bytes          | 280                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | -1                                                                                                                                             |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 0                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 0                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | 172.20.3.103:3306,AUTO,172.20.3.101:3306                                                                                                       |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | 4a75dc41-c5ba-11ea-a6f4-4b9ef7fb8a13                                                                                                           |
| wsrep_applier_thread_count    | 1                                                                                                                                              |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 6                                                                                                                                              |
| wsrep_cluster_size            | 3                                                                                                                                              |
| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 1                                                                                                                                              |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 2                                                                                                                                              |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

wsrep_cluster_size and wsrep_local_index on Node2:

wsrep_cluster_size 3
wsrep_local_index 1

5. Recheck the content of table dataloss on 3 nodes:

root@patgal1 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+
 
[root@patgal2 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+
 
[root@patgal3 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
+----+-------+

6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3:

[root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (4);'
 
[root@patgal1 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
| 12 |     4 |
+----+-------+
 
[root@patgal2 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
| 12 |     4 |
+----+-------+
 
[root@patgal3 ~]# mysql -e'select * from d.dataloss;'
+----+-------+
| id | value |
+----+-------+
|  3 |     1 |
|  6 |     2 |
|  9 |     3 |
| 12 |     4 |
+----+-------+

And here you are the history fragment for the Node2:

  211  date
  212  ps -ef | grep mysqld
  213  systemctl start mariadb
  214  mysql -e'select * from d.dataloss;'
  215  mysql -e'show global status like "wsrep%";'
  216  systemctl stop mariadb
  217  cat /etc/yum.repos.d/mariadb.repo
  218  curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4
  219  cat /etc/yum.repos.d/mariadb.repo
  220  yum list installed | grep galera
  221  yum list installed | grep MariaDB
  222  yum remove MariaDB galera
  223  yum list installed | grep galera
  224  yum list installed | grep MariaDB
  225  yum install MariaDB galera
  226  yum list installed | grep MariaDB
  227  yum list installed | grep galera
  228  rm /etc/my.cnf.d/server.cnf
  229  vi /etc/my.cnf.d/server2.cnf
  230  cat /etc/my.cnf.d/server2.cnf
  231  ls -al /usr/lib64/galera-4/libgalera_smm.so
  232  systemctl start mariadb
  233  mysql_upgrade -s
  234  mysql -e'show global status like "wsrep%";'
  235  mysql -e'select * from d.dataloss;'

Comment by Massimo [ 2020-07-14 ]

For what i could understood from your steps, you are performing the INSERT, when all the nodes are up, nomatter which version. There is not IST perform from the node that you have upgrade, cause you are not writing there while the node2 is down. You have to see that the node2 request and perform an IST cause it has not all the data yet.

Comment by Rick Pizzi [ 2020-07-14 ]

It doesn't happen because in this test you have done, you do not get the node with cluster_size=0 and weird index id.
But you originally got that: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

Comment by Rick Pizzi [ 2020-07-14 ]

stepan.patryshev is there a reason why you don't use the conf file we supplied when trying this test, and use a different one that you built yourself? This is not a good way of testing bugs if you ask me. Please, try with the files we have supplied.

Thank you!

Comment by Stepan Patryshev (Inactive) [ 2020-07-14 ]

massimo.disaro Why IST should take place if according to the steps from the description and especially from the more detailed ones by @rpizzi (see https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-156703 ) INSERT is performed when upgraded node 2 is run with WSREP_ON=ON?
Please, point me what exactly should I try differently if you have any certain idea.

Comment by Stepan Patryshev (Inactive) [ 2020-07-14 ]

rpizzi Ok, that is what I was going to try next - to use your config files. When I used "./mtr --suite=galera_3nodes --start-and-exit" simulation during my first tests I tried to get maximum related stuff from the attached configs. But when I moved to the cluster with three VMs and installed packages I decided to try first just only configs which I managed to adjust and run the cluster.

Comment by Stepan Patryshev (Inactive) [ 2020-07-14 ]

@rpizzi I have passed the steps again without any data loss or failures with the original configs: Node1 and Node2. Just changed only ip addresses. But I see there are some newer config files attached here.
Steps were exactly the same as described in my previous test.
PFA all logs and cnf files.

Comment by Rick Pizzi [ 2020-07-16 ]

I'm stumped, especially because you were able to get the cluster size 0 in your first attempt, and now you don't get that anymore.
How is that possible is beyond me.

Comment by Rick Pizzi [ 2020-07-16 ]

What OS are you running on the VMs?

Comment by Stepan Patryshev (Inactive) [ 2020-07-16 ]

rpizzi CentOS Linux release 7.8.2003 (Core).

Comment by Rick Pizzi [ 2020-07-16 ]

Maybe that's the difference. Both customer and my lab is on CentOS Linux release 7.5.1804 (Core) .
Can you please retry on that OS version?

Thanks
Rick

Comment by Rick Pizzi [ 2020-07-16 ]

I think Massimo used 7.6 but customer has 7.5 so please test on that. Thanks

Comment by Stepan Patryshev (Inactive) [ 2020-07-20 ]

@rpizzi I have passed the steps again without any data loss or failures on CentOS 7.5.1804.
Steps were exactly the same as described here. Just small steps modifications were here:
3.3. yum remove MariaDB-server MariaDB-client MariaDB-backup galera
3.4. yum install MariaDB-common MariaDB-compat MariaDB-server MariaDB-backup MariaDB-client galera
PFA all logs and cnf files.

Comment by Rick Pizzi [ 2020-07-22 ]

This is really odd.
Do you think you can retry with mtr?
And see if you still got the cluster_size=0 you got at the beginning?
Because that's the situation where data loss happens.

See your comment below:

https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

Comment by Stepan Patryshev (Inactive) [ 2020-07-23 ]

@rpizzi It's really strange, but I have managed to reproduce the data loss, but not a crash, just with my scenario using MTR described here. I used Galera 25.3.28(r3875).
PFA all logs and cnf files. Please, ignore errors in mysqld.2.err around 22:17, I just forgot to shutdown a node and tried to run it again.

Comment by Stepan Patryshev (Inactive) [ 2020-07-27 ]

There are the detailed steps how I reproduced the data loss.
Release builds 10.3.23 + Galera 25.3.28(r3875) and 10.4.13 + Galera 26.4.4(r4599). PFA all logs and cnf files.

Steps:

1. ./mtr --suite=galera_3nodes --start-and-exit
2. Restart all nodes one by one with separate config files from here.

The cluster status on Node1 is:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"show global status like 'wsrep%';"
 
+-------------------------------+-------------------------------------------------+
| Variable_name                 | Value                                           |
+-------------------------------+-------------------------------------------------+
| wsrep_applier_thread_count    | 32                                              |
| wsrep_apply_oooe              | 0.000000                                        |
| wsrep_apply_oool              | 0.000000                                        |
| wsrep_apply_window            | 0.000000                                        |
| wsrep_causal_reads            | 0                                               |
| wsrep_cert_deps_distance      | 0.000000                                        |
| wsrep_cert_index_size         | 0                                               |
| wsrep_cert_interval           | 0.000000                                        |
| wsrep_cluster_conf_id         | 8                                               |
| wsrep_cluster_size            | 3                                               |
| wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
| wsrep_cluster_status          | Primary                                         |
| wsrep_cluster_weight          | 3                                               |
| wsrep_commit_oooe             | 0.000000                                        |
| wsrep_commit_oool             | 0.000000                                        |
| wsrep_commit_window           | 0.000000                                        |
| wsrep_connected               | ON                                              |
| wsrep_desync_count            | 0                                               |
| wsrep_evs_delayed             |                                                 |
| wsrep_evs_evict_list          |                                                 |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                       |
| wsrep_evs_state               | OPERATIONAL                                     |
| wsrep_flow_control_paused     | 0.000000                                        |
| wsrep_flow_control_paused_ns  | 0                                               |
| wsrep_flow_control_recv       | 0                                               |
| wsrep_flow_control_sent       | 0                                               |
| wsrep_gcomm_uuid              | 0f038d23-cd0d-11ea-acd2-b7ff4121c102            |
| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 |
| wsrep_last_committed          | 0                                               |
| wsrep_local_bf_aborts         | 0                                               |
| wsrep_local_cached_downto     | 18446744073709551615                            |
| wsrep_local_cert_failures     | 0                                               |
| wsrep_local_commits           | 0                                               |
| wsrep_local_index             | 0                                               |
| wsrep_local_recv_queue        | 0                                               |
| wsrep_local_recv_queue_avg    | 0.000000                                        |
| wsrep_local_recv_queue_max    | 1                                               |
| wsrep_local_recv_queue_min    | 0                                               |
| wsrep_local_replays           | 0                                               |
| wsrep_local_send_queue        | 0                                               |
| wsrep_local_send_queue_avg    | 0.000000                                        |
| wsrep_local_send_queue_max    | 1                                               |
| wsrep_local_send_queue_min    | 0                                               |
| wsrep_local_state             | 4                                               |
| wsrep_local_state_comment     | Synced                                          |
| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
| wsrep_open_connections        | 0                                               |
| wsrep_open_transactions       | 0                                               |
| wsrep_protocol_version        | 9                                               |
| wsrep_provider_name           | Galera                                          |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>               |
| wsrep_provider_version        | 25.3.28(r3875)                                  |
| wsrep_ready                   | ON                                              |
| wsrep_received                | 2                                               |
| wsrep_received_bytes          | 270                                             |
| wsrep_repl_data_bytes         | 0                                               |
| wsrep_repl_keys               | 0                                               |
| wsrep_repl_keys_bytes         | 0                                               |
| wsrep_repl_other_bytes        | 0                                               |
| wsrep_replicated              | 0                                               |
| wsrep_replicated_bytes        | 0                                               |
| wsrep_rollbacker_thread_count | 1                                               |
| wsrep_thread_count            | 33                                              |
+-------------------------------+-------------------------------------------------+

3. On the Node1 create a database and a table:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"create database d; create table d.evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));"

4. On the Node1 insert 3 rows:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into d.evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into d.evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');"

Data have been propageted to all the cluster:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"
 
+----+---------------+---------+
| Id | IdDispositivo | kkkk    |
+----+---------------+---------+
|  1 |           123 | aaaa    |
|  4 |           222 | eeeeaa  |
|  7 |      34523452 | e4r4r4  |
+----+---------------+---------+
 
/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"
 
+----+---------------+---------+
| Id | IdDispositivo | kkkk    |
+----+---------------+---------+
|  1 |           123 | aaaa    |
|  4 |           222 | eeeeaa  |
|  7 |      34523452 | e4r4r4  |
+----+---------------+---------+
 
/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"
 
+----+---------------+---------+
| Id | IdDispositivo | kkkk    |
+----+---------------+---------+
|  1 |           123 | aaaa    |
|  4 |           222 | eeeeaa  |
|  7 |      34523452 | e4r4r4  |
+----+---------------+---------+

5. Stop Node 2.
6. To check that IST works while Node2 is off insert 1 row on the Node1:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(888, 'While Node 2 is OFF');"

The new row is added on the Node1 and Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"
+----+---------------+---------------------+
| Id | IdDispositivo | kkkk                |
+----+---------------+---------------------+
|  1 |           123 | aaaa                |
|  4 |           222 | eeeeaa              |
|  7 |      34523452 | e4r4r4              |
| 11 |           888 | While Node 2 is OFF |
+----+---------------+---------------------+
 
[stepan@cnt7glr11 mysql-test]$ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"
+----+---------------+---------------------+
| Id | IdDispositivo | kkkk                |
+----+---------------+---------------------+
|  1 |           123 | aaaa                |
|  4 |           222 | eeeeaa              |
|  7 |      34523452 | e4r4r4              |
| 11 |           888 | While Node 2 is OFF |
+----+---------------+---------------------+

7. Start the Node2.

The new row is added on the Node2 successfully :

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"
 
+----+---------------+---------------------+
| Id | IdDispositivo | kkkk                |
+----+---------------+---------------------+
|  1 |           123 | aaaa                |
|  4 |           222 | eeeeaa              |
|  7 |      34523452 | e4r4r4              |
| 11 |           888 | While Node 2 is OFF |
+----+---------------+---------------------+

8. Check the cluster status on the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"
 
+-------------------------------+-------------------------------------------------+
| Variable_name                 | Value                                           |
+-------------------------------+-------------------------------------------------+
| wsrep_applier_thread_count    | 32                                              |
| wsrep_apply_oooe              | 0.000000                                        |
| wsrep_apply_oool              | 0.000000                                        |
| wsrep_apply_window            | 1.000000                                        |
| wsrep_causal_reads            | 0                                               |
| wsrep_cert_deps_distance      | 0.000000                                        |
| wsrep_cert_index_size         | 0                                               |
| wsrep_cert_interval           | 0.000000                                        |
| wsrep_cluster_conf_id         | 10                                              |
| wsrep_cluster_size            | 3                                               |
| wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
| wsrep_cluster_status          | Primary                                         |
| wsrep_cluster_weight          | 3                                               |
| wsrep_commit_oooe             | 0.000000                                        |
| wsrep_commit_oool             | 0.000000                                        |
| wsrep_commit_window           | 1.000000                                        |
| wsrep_connected               | ON                                              |
| wsrep_desync_count            | 0                                               |
| wsrep_evs_delayed             |                                                 |
| wsrep_evs_evict_list          |                                                 |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                       |
| wsrep_evs_state               | OPERATIONAL                                     |
| wsrep_flow_control_paused     | 0.000000                                        |
| wsrep_flow_control_paused_ns  | 0                                               |
| wsrep_flow_control_recv       | 0                                               |
| wsrep_flow_control_sent       | 0                                               |
| wsrep_gcomm_uuid              | 96685da8-cd17-11ea-be6f-4399d680ab4c            |
| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 |
| wsrep_last_committed          | 6                                               |
| wsrep_local_bf_aborts         | 0                                               |
| wsrep_local_cached_downto     | 18446744073709551615                            |
| wsrep_local_cert_failures     | 0                                               |
| wsrep_local_commits           | 0                                               |
| wsrep_local_index             | 1                                               |
| wsrep_local_recv_queue        | 0                                               |
| wsrep_local_recv_queue_avg    | 0.000000                                        |
| wsrep_local_recv_queue_max    | 1                                               |
| wsrep_local_recv_queue_min    | 0                                               |
| wsrep_local_replays           | 0                                               |
| wsrep_local_send_queue        | 0                                               |
| wsrep_local_send_queue_avg    | 0.000000                                        |
| wsrep_local_send_queue_max    | 1                                               |
| wsrep_local_send_queue_min    | 0                                               |
| wsrep_local_state             | 4                                               |
| wsrep_local_state_comment     | Synced                                          |
| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
| wsrep_open_connections        | 0                                               |
| wsrep_open_transactions       | 0                                               |
| wsrep_protocol_version        | 9                                               |
| wsrep_provider_name           | Galera                                          |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>               |
| wsrep_provider_version        | 25.3.28(r3875)                                  |
| wsrep_ready                   | ON                                              |
| wsrep_received                | 3                                               |
| wsrep_received_bytes          | 278                                             |
| wsrep_repl_data_bytes         | 0                                               |
| wsrep_repl_keys               | 0                                               |
| wsrep_repl_keys_bytes         | 0                                               |
| wsrep_repl_other_bytes        | 0                                               |
| wsrep_replicated              | 0                                               |
| wsrep_replicated_bytes        | 0                                               |
| wsrep_rollbacker_thread_count | 1                                               |
| wsrep_thread_count            | 33                                              |
+-------------------------------+-------------------------------------------------+

Pay attention that wsrep_local_index = 1.

9. Stop Node 2.

10. Set wsrep-on=OFF and run Node2 on 10.4.13 binaries with new config containing paths to 10.4.13 resources (cnf files here).

/home/stepan/mariadb/10.4.13/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3.23/mysql-test/var/mysqld_new.2.cnf &

11. Perform mysql_upgrade -s.
12. Stop Node 2.
13. Insert 1 new row on the Node1:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');"
 
/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
+----+---------------+----------------------------+

14. Set wsrep-on=ON and run Node2.
15. Check that the new row is added to the Node2 also:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" 
+----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
+----+---------------+----------------------------+

16. Check the wsrep variables on the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"
 
 
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |
| wsrep_protocol_version        | -1                                                                                                                                             |
| wsrep_last_committed          | 7                                                                                                                                              |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 3                                                                                                                                              |
| wsrep_received_bytes          | 288                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 2                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0.333333                                                                                                                                       |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | 7                                                                                                                                              |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 1                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 1                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002                                                                                                |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | 11fd46cc-cd1b-11ea-8f5d-7efdb4c94287                                                                                                           |
| wsrep_applier_thread_count    | 32                                                                                                                                             |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
| wsrep_cluster_size            | 0                                                                                                                                              |
| wsrep_cluster_state_uuid      |                                                                                                                                                |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 18446744073709551615                                                                                                                           |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 33                                                                                                                                             |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

Pay attention:

wsrep_cluster_status Primary
wsrep_local_state_comment Synced
wsrep_local_index 18446744073709551615
wsrep_cluster_size 0

17. Insert 1 row on the Node1 again:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (3,'non tireplic');"

The new row has been replicated to the Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"
+----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
| 16 |             3 | non tireplic               |
+----+---------------+----------------------------+

But it has NOT been replicated to the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"
+----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
+----+---------------+----------------------------+

18. Just one more insert on the Node1 to repeat:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (666,'Lost data');"

And again the new row has been replicated to the Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
| 16 |             3 | non tireplic               |
| 19 |           666 | Lost data                  |
+----+---------------+----------------------------+

But it has NOT been replicated to the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
+----+---------------+----------------------------+

19. Restart the Node2.
Check the wsrep variables on the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%';" 
 
 
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name                 | Value                                                                                                                                          |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |
| wsrep_protocol_version        | 9                                                                                                                                              |
| wsrep_last_committed          | 9                                                                                                                                              |
| wsrep_replicated              | 0                                                                                                                                              |
| wsrep_replicated_bytes        | 0                                                                                                                                              |
| wsrep_repl_keys               | 0                                                                                                                                              |
| wsrep_repl_keys_bytes         | 0                                                                                                                                              |
| wsrep_repl_data_bytes         | 0                                                                                                                                              |
| wsrep_repl_other_bytes        | 0                                                                                                                                              |
| wsrep_received                | 2                                                                                                                                              |
| wsrep_received_bytes          | 280                                                                                                                                            |
| wsrep_local_commits           | 0                                                                                                                                              |
| wsrep_local_cert_failures     | 0                                                                                                                                              |
| wsrep_local_replays           | 0                                                                                                                                              |
| wsrep_local_send_queue        | 0                                                                                                                                              |
| wsrep_local_send_queue_max    | 1                                                                                                                                              |
| wsrep_local_send_queue_min    | 0                                                                                                                                              |
| wsrep_local_send_queue_avg    | 0                                                                                                                                              |
| wsrep_local_recv_queue        | 0                                                                                                                                              |
| wsrep_local_recv_queue_max    | 1                                                                                                                                              |
| wsrep_local_recv_queue_min    | 0                                                                                                                                              |
| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
| wsrep_local_cached_downto     | 7                                                                                                                                              |
| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
| wsrep_flow_control_paused     | 0                                                                                                                                              |
| wsrep_flow_control_sent       | 0                                                                                                                                              |
| wsrep_flow_control_recv       | 0                                                                                                                                              |
| wsrep_cert_deps_distance      | 0                                                                                                                                              |
| wsrep_apply_oooe              | 0                                                                                                                                              |
| wsrep_apply_oool              | 0                                                                                                                                              |
| wsrep_apply_window            | 0                                                                                                                                              |
| wsrep_commit_oooe             | 0                                                                                                                                              |
| wsrep_commit_oool             | 0                                                                                                                                              |
| wsrep_commit_window           | 0                                                                                                                                              |
| wsrep_local_state             | 4                                                                                                                                              |
| wsrep_local_state_comment     | Synced                                                                                                                                         |
| wsrep_cert_index_size         | 0                                                                                                                                              |
| wsrep_causal_reads            | 0                                                                                                                                              |
| wsrep_cert_interval           | 0                                                                                                                                              |
| wsrep_open_transactions       | 0                                                                                                                                              |
| wsrep_open_connections        | 0                                                                                                                                              |
| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002                                                                                                |
| wsrep_cluster_weight          | 3                                                                                                                                              |
| wsrep_desync_count            | 0                                                                                                                                              |
| wsrep_evs_delayed             |                                                                                                                                                |
| wsrep_evs_evict_list          |                                                                                                                                                |
| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
| wsrep_gcomm_uuid              | 39969b6c-cd1f-11ea-abde-7b7ed790f75c                                                                                                           |
| wsrep_applier_thread_count    | 32                                                                                                                                             |
| wsrep_cluster_capabilities    |                                                                                                                                                |
| wsrep_cluster_conf_id         | 14                                                                                                                                             |
| wsrep_cluster_size            | 3                                                                                                                                              |
| wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |
| wsrep_cluster_status          | Primary                                                                                                                                        |
| wsrep_connected               | ON                                                                                                                                             |
| wsrep_local_bf_aborts         | 0                                                                                                                                              |
| wsrep_local_index             | 1                                                                                                                                              |
| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name           | Galera                                                                                                                                         |
| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
| wsrep_ready                   | ON                                                                                                                                             |
| wsrep_rollbacker_thread_count | 1                                                                                                                                              |
| wsrep_thread_count            | 33                                                                                                                                             |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

All seems ok:

wsrep_cluster_status Primary
wsrep_local_state_comment Synced
wsrep_local_index 1
wsrep_cluster_size 3

20. Insert the new row on the Node1:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (555,'After Node restart');"

And the new row has been successfully replicated to the Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"                   
+----+---------------+----------------------------+
| Id | IdDispositivo | kkkk                       |
+----+---------------+----------------------------+
|  1 |           123 | aaaa                       |
|  4 |           222 | eeeeaa                     |
|  7 |      34523452 | e4r4r4                     |
| 11 |           888 | While Node 2 is OFF        |
| 13 |        777777 | While Node 2 was upgrading |
| 22 |           555 | After Node restart         |
+----+---------------+----------------------------+

Comment by Alexey [ 2020-07-29 ]

Ok, I think I know what is the problem, at least where it is solved.

Massimo's node 2 log has the following

wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully.
...
2020-05-25 22:25:17 19 [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1

As you may guess the last line spells bad news - the node cannot apply writesets. It is caused by a bug that was fixed in commit 02ad0e11 on April 1, way after release 4.4 was tagged and was merged into MariaDB Galera fork in commit ae24803 on April 9.

Stepan's log has

wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(rae24803) by Codership Oy <info@codership.com> loaded successfully.

That's why Stepan can't reproduce the bug, he's using a different Galera binary.

In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble.

Comment by Stepan Patryshev (Inactive) [ 2020-07-29 ]

Yurchenko I hope you are right, but I used Galera 26.4.4(r4599) on 20.07.2020 and there was no data loss.

Comment by Alexey [ 2020-07-29 ]

julien.fritsch
Yes, it is fixed in later Galera releases.

stepan.patryshev
On 20.07.2020 there was a mistake in case reproduction: in Massimo's case node 2 was missing 2 events and had to perform state transfer. In your case it seems there were no updates to the cluster during node 2 upgrade: it was shut down at seqno 7 and was brought back - cluster still had seqno 7. So there was no state transfer and it is a different code path.

And yes, I found out why in Massimo's case some transactions were lost:

[Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1

is a warning because we can expect during upgrade of the last node and protocol bump to get a writeset with an old protocol and in that case it simply is supposed to fail certification - on all nodes. The problem (that was fixed in the commit I mentioned above) was that protocol version was not updated in total order (it was not updated at all). As a result all transactions that failed certification on node 2 (and thus were skipped), perfectly passed certification on node 1 and thus were committed. In the end both nodes believed that they have successfully processed all events and are on the same page regarding last seqno. That's why those missing events went unnoticed.

However when node 2 was restarted, it rejoined the cluster without state transfer, the bug was not triggered, and it could continue to apply transactions.

Comment by Stepan Patryshev (Inactive) [ 2020-07-30 ]

Yurchenko Thank you for the clarifications. But I want to note that rpizzi reproduced it without updating data during Node2 upgrade: steps are here.

Comment by Stepan Patryshev (Inactive) [ 2020-08-19 ]

I have verified that using Galera 26.4.5(rb3764ab) and 25.3.30(r827e681) there were no any data loss or crash. The steps were the same which reproduced the bug on 23.07.2020 with 25.3.28(r3875) and 26.4.4(r4599).

But the strange wsrep values still presented just after the first time upgraded node joined the cluster:

wsrep_local_index 18446744073709551615
wsrep_cluster_size 0
Comment by Stepan Patryshev (Inactive) [ 2020-08-20 ]

Closing as fixed.

Generated at Thu Feb 08 09:16:58 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.