[MDEV-22723] Data loss when performing rolling upgrade from 10.3.23-MariaDB to 10.4.13-MariaDB - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.3.23, 10.4.13
Fix Version/s: 10.3.26, 10.4.16
Component/s: Galera
Labels:
None
Environment:
OS: CentOS Linux release 7.6.1810 (Core)

Description

Create a schema and a table on mdb1. all propagate

stop mdb2 . yum remove the rpm of Mariadb and galera.
install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
set wsrep_on=OFF on my.cnf
start mdb2
perform mysql_upgrade -s
stop mdb2
set wsrep_on=ON on my.cnf
start mbd2

At this point the status galera variables on mdb2:

MariaDB mdb2 [pippo]> show global status like 'wsrep%';

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |

| wsrep_protocol_version        | -1                                                                                                                                             |

| wsrep_last_committed          | 65                                                                                                                                             |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 3                                                                                                                                              |

| wsrep_received_bytes          | 208                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | 64                                                                                                                                             |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0.5                                                                                                                                            |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 1.5                                                                                                                                            |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 1                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | AUTO,10.0.1.13:3306                                                                                                                            |

| wsrep_cluster_weight          | 2                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0.000325151/0.00176008/0.00607075/0.00193032/7                                                                                                 |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4                                                                                                           |

| wsrep_applier_thread_count    | 32                                                                                                                                             |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |

| wsrep_cluster_size            | 0                                                                                                                                              |

| wsrep_cluster_state_uuid      |                                                                                                                                                |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 18446744073709551615                                                                                                                           |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 33                                                                                                                                             |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

65 rows in set (0.001 sec)

NOTE THAT :

wsrep_cluster_status          | Primary

wsrep_local_state_comment     | Synced

wsrep_local_index             | 18446744073709551615

wsrep_cluster_size            | 0

Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:

MariaDB mdb2 [pippo]> select * from evento4;

+----+---------------+--------+

| Id | IdDispositivo | kkkk   |

+----+---------------+--------+

|  1 |           123 | aaaa   |

|  3 |           222 | eeeeaa |

|  4 |      34523452 | e4r4r4 |

+----+---------------+--------+

WHILE ON THE MASTER:

MariaDB mdb1 [pippo]> select * from evento4;

+----+---------------+--------+

| Id | IdDispositivo | kkkk   |

+----+---------------+--------+

|  1 |           123 | aaaa   |

|  3 |           222 | eeeeaa |

|  4 |      34523452 | e4r4r4 |

+----+---------------+--------+

3 rows in set (0.001 sec)

MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');

Query OK, 1 row affected (0.015 sec)

MariaDB mdb1 [pippo]> select * from evento4;

+----+---------------+--------------+

| Id | IdDispositivo | kkkk         |

+----+---------------+--------------+

|  1 |           123 | aaaa         |

|  3 |           222 | eeeeaa       |

|  4 |      34523452 | e4r4r4       |

|  6 |             3 | non tireplic |

+----+---------------+--------------+

4 rows in set (0.001 sec)

The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:

[root@mdb2 my.cnf.d]# systemctl restart  mariadb

[root@mdb2 my.cnf.d]# mysql

MariaDB md2 [(none)]> show global status like 'wsrep%';

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |

| wsrep_protocol_version        | 9                                                                                                                                              |

| wsrep_last_committed          | 66                                                                                                                                             |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 2                                                                                                                                              |

| wsrep_received_bytes          | 200                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | 64                                                                                                                                             |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 0                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 0                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | 10.0.1.13:3306,AUTO                                                                                                                            |

| wsrep_cluster_weight          | 2                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0.000853237/0.001923/0.00333681/0.0010427/3                                                                                                    |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6                                                                                                           |

| wsrep_applier_thread_count    | 32                                                                                                                                             |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 6                                                                                                                                              |

| wsrep_cluster_size            | 2                                                                                                                                              |

| wsrep_cluster_state_uuid      | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 1                                                                                                                                              |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 33                                                                                                                                             |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

65 rows in set (0.002 sec)

NOTE now the status is ok:

wsrep_local_index             | 1

wsrep_cluster_status          | Primary

wsrep_local_state_comment     | Synced

wsrep_local_index             | 1

but when we check the data we expect the new row should be present:

MariaDB mdb2 [pippo]> select * from evento4;

+----+---------------+--------+

| Id | IdDispositivo | kkkk   |

+----+---------------+--------+

|  1 |           123 | aaaa   |

|  3 |           222 | eeeeaa |

|  4 |      34523452 | e4r4r4 |

+----+---------------+--------+

3 rows in set (0.001 sec)

The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

200612_mysqld.1.err
62 kB
2020-06-12 17:42
200612_mysqld.2.err
121 kB
2020-06-12 17:42
200612_mysqld.3.err
70 kB
2020-06-12 17:42
200709_patgal_output.zip
15 kB
2020-07-10 08:52
20200713_MDEV-22723_patgal_no_errors.zip
35 kB
2020-07-13 17:39
20200714_MDEV-22723_mdb_no_errors.zip
32 kB
2020-07-14 16:22
20200714_MDEV-22723_patgal_no_errors.zip
28 kB
2020-07-14 10:52
20200720_MDEV-22723_CentOS_7.5_no_errors.zip
24 kB
2020-07-20 18:27
20200723_MDEV-22723_data_loss.zip
43 kB
2020-07-23 20:48
error_log_mdb1
23 kB
2020-05-26 15:03
error_log_mdb2.after_upgrade
87 kB
2020-05-26 15:03
mysqld_new.2.cnf
2 kB
2020-06-12 17:42
mysqld_old.1.cnf
2 kB
2020-06-12 17:42
mysqld_old.2.cnf
2 kB
2020-06-12 17:42
mysqld_old.3.cnf
2 kB
2020-06-12 17:42
node1_bootsrapped_10.3.23.log
91 kB
2020-06-16 10:18
node1_bootsrapped_10.3.23.log.rtf
93 kB
2020-06-16 10:15
node2_upgraded_10.4.13.log
14 kB
2020-06-16 10:18
node2_upgraded.log.rtf
14 kB
2020-06-16 10:15
server.cnf_mdb1
2 kB
2020-05-26 15:03
server.cnf_mdb2
2 kB
2020-05-26 15:03

Issue Links

relates to

MDEV-29246 WSREP_CLUSTER_SIZE at 0 after rolling update a node from 10.3 to 10.4

Closed

MDEV-20439 WSREP_CLUSTER_SIZE at 0 after rolling update a node

Closed

MDEV-22745 node crash on upgrade from 10.3 to 10.4 writing on the 10.4 node

Closed

Activity

Ascending order - Click to sort in descending order

Massimo created issue - 2020-05-26 15:03

Rick Pizzi (Inactive) made changes - 2020-05-26 15:10

Field	Original Value	New Value
Description	Creating a full galera cluster of 10.3.23 with 3 nodes mdb1,mdb2,mdb3 10.3.23 version. We gently showdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result. Create a schema and a table on mdb1. all propagate - stop mdb2 . yum remove the rpm of Mariadb and galera. - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider - set wsrep_on=OFF on my.cnf - start mdb2 - perform mysql_upgrade -s - stop mdb2 - set wsrep_on=ON on my.cnf - start mbd2 At this point the status galera variables on mdb2: MariaDB mdb2 [pippo]> show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| Variable_name \| Value \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| wsrep_local_state_uuid \| 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 \| \| wsrep_protocol_version \| -1 \| \| wsrep_last_committed \| 65 \| \| wsrep_replicated \| 0 \| \| wsrep_replicated_bytes \| 0 \| \| wsrep_repl_keys \| 0 \| \| wsrep_repl_keys_bytes \| 0 \| \| wsrep_repl_data_bytes \| 0 \| \| wsrep_repl_other_bytes \| 0 \| \| wsrep_received \| 3 \| \| wsrep_received_bytes \| 208 \| \| wsrep_local_commits \| 0 \| \| wsrep_local_cert_failures \| 0 \| \| wsrep_local_replays \| 0 \| \| wsrep_local_send_queue \| 0 \| \| wsrep_local_send_queue_max \| 1 \| \| wsrep_local_send_queue_min \| 0 \| \| wsrep_local_send_queue_avg \| 0 \| \| wsrep_local_recv_queue \| 0 \| \| wsrep_local_recv_queue_max \| 1 \| \| wsrep_local_recv_queue_min \| 0 \| \| wsrep_local_recv_queue_avg \| 0 \| \| wsrep_local_cached_downto \| 64 \| \| wsrep_flow_control_paused_ns \| 0 \| \| wsrep_flow_control_paused \| 0 \| \| wsrep_flow_control_sent \| 0 \| \| wsrep_flow_control_recv \| 0 \| \| wsrep_cert_deps_distance \| 0 \| \| wsrep_apply_oooe \| 0.5 \| \| wsrep_apply_oool \| 0 \| \| wsrep_apply_window \| 1.5 \| \| wsrep_commit_oooe \| 0 \| \| wsrep_commit_oool \| 0 \| \| wsrep_commit_window \| 1 \| \| wsrep_local_state \| 4 \| \| wsrep_local_state_comment \| Synced \| \| wsrep_cert_index_size \| 0 \| \| wsrep_causal_reads \| 0 \| \| wsrep_cert_interval \| 0 \| \| wsrep_open_transactions \| 0 \| \| wsrep_open_connections \| 0 \| \| wsrep_incoming_addresses \| AUTO,10.0.1.13:3306 \| \| wsrep_cluster_weight \| 2 \| \| wsrep_desync_count \| 0 \| \| wsrep_evs_delayed \| \| \| wsrep_evs_evict_list \| \| \| wsrep_evs_repl_latency \| 0.000325151/0.00176008/0.00607075/0.00193032/7 \| \| wsrep_evs_state \| OPERATIONAL \| \| wsrep_gcomm_uuid \| 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 \| \| wsrep_applier_thread_count \| 32 \| \| wsrep_cluster_capabilities \| \| \| wsrep_cluster_conf_id \| 18446744073709551615 \| \| wsrep_cluster_size \| 0 \| \| wsrep_cluster_state_uuid \| \| \| wsrep_cluster_status \| Primary \| \| wsrep_connected \| ON \| \| wsrep_local_bf_aborts \| 0 \| \| wsrep_local_index \| 18446744073709551615 \| \| wsrep_provider_capabilities \| :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: \| \| wsrep_provider_name \| Galera \| \| wsrep_provider_vendor \| Codership Oy <info@codership.com> \| \| wsrep_provider_version \| 26.4.4(r4599) \| \| wsrep_ready \| ON \| \| wsrep_rollbacker_thread_count \| 1 \| \| wsrep_thread_count \| 33 \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) NOTE THAT : wsrep_cluster_status \| Primary wsrep_local_state_comment \| Synced wsrep_local_index \| 18446744073709551615 wsrep_cluster_size \| 0 Looking at the error log, the server is ready for connections after a IST At this point the 'master' mdb1 have a write that are not getting replicate: MariaDB mdb2 [pippo]> select * from evento4; +----+---------------+--------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| +----+---------------+--------+ WHILE ON THE MASTER: MariaDB mdb1 [pippo]> select * from evento4; +----+---------------+--------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| +----+---------------+--------+ 3 rows in set (0.001 sec) MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic'); Query OK, 1 row affected (0.015 sec) MariaDB mdb1 [pippo]> select * from evento4; +----+---------------+--------------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| \| 6 \| 3 \| non tireplic \| +----+---------------+--------------+ 4 rows in set (0.001 sec) The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so AT THIS point we restart mdb2 to fix the status: [root@mdb2 my.cnf.d]# systemctl restart mariadb [root@mdb2 my.cnf.d]# mysql MariaDB md2 [(none)]> show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| Variable_name \| Value \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| wsrep_local_state_uuid \| 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 \| \| wsrep_protocol_version \| 9 \| \| wsrep_last_committed \| 66 \| \| wsrep_replicated \| 0 \| \| wsrep_replicated_bytes \| 0 \| \| wsrep_repl_keys \| 0 \| \| wsrep_repl_keys_bytes \| 0 \| \| wsrep_repl_data_bytes \| 0 \| \| wsrep_repl_other_bytes \| 0 \| \| wsrep_received \| 2 \| \| wsrep_received_bytes \| 200 \| \| wsrep_local_commits \| 0 \| \| wsrep_local_cert_failures \| 0 \| \| wsrep_local_replays \| 0 \| \| wsrep_local_send_queue \| 0 \| \| wsrep_local_send_queue_max \| 1 \| \| wsrep_local_send_queue_min \| 0 \| \| wsrep_local_send_queue_avg \| 0 \| \| wsrep_local_recv_queue \| 0 \| \| wsrep_local_recv_queue_max \| 1 \| \| wsrep_local_recv_queue_min \| 0 \| \| wsrep_local_recv_queue_avg \| 0 \| \| wsrep_local_cached_downto \| 64 \| \| wsrep_flow_control_paused_ns \| 0 \| \| wsrep_flow_control_paused \| 0 \| \| wsrep_flow_control_sent \| 0 \| \| wsrep_flow_control_recv \| 0 \| \| wsrep_cert_deps_distance \| 0 \| \| wsrep_apply_oooe \| 0 \| \| wsrep_apply_oool \| 0 \| \| wsrep_apply_window \| 0 \| \| wsrep_commit_oooe \| 0 \| \| wsrep_commit_oool \| 0 \| \| wsrep_commit_window \| 0 \| \| wsrep_local_state \| 4 \| \| wsrep_local_state_comment \| Synced \| \| wsrep_cert_index_size \| 0 \| \| wsrep_causal_reads \| 0 \| \| wsrep_cert_interval \| 0 \| \| wsrep_open_transactions \| 0 \| \| wsrep_open_connections \| 0 \| \| wsrep_incoming_addresses \| 10.0.1.13:3306,AUTO \| \| wsrep_cluster_weight \| 2 \| \| wsrep_desync_count \| 0 \| \| wsrep_evs_delayed \| \| \| wsrep_evs_evict_list \| \| \| wsrep_evs_repl_latency \| 0.000853237/0.001923/0.00333681/0.0010427/3 \| \| wsrep_evs_state \| OPERATIONAL \| \| wsrep_gcomm_uuid \| ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 \| \| wsrep_applier_thread_count \| 32 \| \| wsrep_cluster_capabilities \| \| \| wsrep_cluster_conf_id \| 6 \| \| wsrep_cluster_size \| 2 \| \| wsrep_cluster_state_uuid \| 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 \| \| wsrep_cluster_status \| Primary \| \| wsrep_connected \| ON \| \| wsrep_local_bf_aborts \| 0 \| \| wsrep_local_index \| 1 \| \| wsrep_provider_capabilities \| :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: \| \| wsrep_provider_name \| Galera \| \| wsrep_provider_vendor \| Codership Oy <info@codership.com> \| \| wsrep_provider_version \| 26.4.4(r4599) \| \| wsrep_ready \| ON \| \| wsrep_rollbacker_thread_count \| 1 \| \| wsrep_thread_count \| 33 \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.002 sec) NOTE now the status is ok: wsrep_local_index \| 1 wsrep_cluster_status \| Primary wsrep_local_state_comment \| Synced wsrep_local_index \| 1 but when we check the data we expect the new row should be present: MariaDB mdb2 [pippo]> select * from evento4; +----+---------------+--------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| +----+---------------+--------+ 3 rows in set (0.001 sec) The row is not there. If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.	Creating a full galera cluster of 10.3.23 with 3 nodes mdb1,mdb2,mdb3 10.3.23 version. We gently shut mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result. Create a schema and a table on mdb1. all propagate - stop mdb2 . yum remove the rpm of Mariadb and galera. - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider - set wsrep_on=OFF on my.cnf - start mdb2 - perform mysql_upgrade -s - stop mdb2 - set wsrep_on=ON on my.cnf - start mbd2 At this point the status galera variables on mdb2: MariaDB mdb2 [pippo]> show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| Variable_name \| Value \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| wsrep_local_state_uuid \| 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 \| \| wsrep_protocol_version \| -1 \| \| wsrep_last_committed \| 65 \| \| wsrep_replicated \| 0 \| \| wsrep_replicated_bytes \| 0 \| \| wsrep_repl_keys \| 0 \| \| wsrep_repl_keys_bytes \| 0 \| \| wsrep_repl_data_bytes \| 0 \| \| wsrep_repl_other_bytes \| 0 \| \| wsrep_received \| 3 \| \| wsrep_received_bytes \| 208 \| \| wsrep_local_commits \| 0 \| \| wsrep_local_cert_failures \| 0 \| \| wsrep_local_replays \| 0 \| \| wsrep_local_send_queue \| 0 \| \| wsrep_local_send_queue_max \| 1 \| \| wsrep_local_send_queue_min \| 0 \| \| wsrep_local_send_queue_avg \| 0 \| \| wsrep_local_recv_queue \| 0 \| \| wsrep_local_recv_queue_max \| 1 \| \| wsrep_local_recv_queue_min \| 0 \| \| wsrep_local_recv_queue_avg \| 0 \| \| wsrep_local_cached_downto \| 64 \| \| wsrep_flow_control_paused_ns \| 0 \| \| wsrep_flow_control_paused \| 0 \| \| wsrep_flow_control_sent \| 0 \| \| wsrep_flow_control_recv \| 0 \| \| wsrep_cert_deps_distance \| 0 \| \| wsrep_apply_oooe \| 0.5 \| \| wsrep_apply_oool \| 0 \| \| wsrep_apply_window \| 1.5 \| \| wsrep_commit_oooe \| 0 \| \| wsrep_commit_oool \| 0 \| \| wsrep_commit_window \| 1 \| \| wsrep_local_state \| 4 \| \| wsrep_local_state_comment \| Synced \| \| wsrep_cert_index_size \| 0 \| \| wsrep_causal_reads \| 0 \| \| wsrep_cert_interval \| 0 \| \| wsrep_open_transactions \| 0 \| \| wsrep_open_connections \| 0 \| \| wsrep_incoming_addresses \| AUTO,10.0.1.13:3306 \| \| wsrep_cluster_weight \| 2 \| \| wsrep_desync_count \| 0 \| \| wsrep_evs_delayed \| \| \| wsrep_evs_evict_list \| \| \| wsrep_evs_repl_latency \| 0.000325151/0.00176008/0.00607075/0.00193032/7 \| \| wsrep_evs_state \| OPERATIONAL \| \| wsrep_gcomm_uuid \| 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 \| \| wsrep_applier_thread_count \| 32 \| \| wsrep_cluster_capabilities \| \| \| wsrep_cluster_conf_id \| 18446744073709551615 \| \| wsrep_cluster_size \| 0 \| \| wsrep_cluster_state_uuid \| \| \| wsrep_cluster_status \| Primary \| \| wsrep_connected \| ON \| \| wsrep_local_bf_aborts \| 0 \| \| wsrep_local_index \| 18446744073709551615 \| \| wsrep_provider_capabilities \| :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: \| \| wsrep_provider_name \| Galera \| \| wsrep_provider_vendor \| Codership Oy <info@codership.com> \| \| wsrep_provider_version \| 26.4.4(r4599) \| \| wsrep_ready \| ON \| \| wsrep_rollbacker_thread_count \| 1 \| \| wsrep_thread_count \| 33 \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) NOTE THAT : wsrep_cluster_status \| Primary wsrep_local_state_comment \| Synced wsrep_local_index \| 18446744073709551615 wsrep_cluster_size \| 0 Looking at the error log, the server is ready for connections after a IST At this point the 'master' mdb1 have a write that are not getting replicate: MariaDB mdb2 [pippo]> select * from evento4; +----+---------------+--------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| +----+---------------+--------+ WHILE ON THE MASTER: MariaDB mdb1 [pippo]> select * from evento4; +----+---------------+--------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| +----+---------------+--------+ 3 rows in set (0.001 sec) MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic'); Query OK, 1 row affected (0.015 sec) MariaDB mdb1 [pippo]> select * from evento4; +----+---------------+--------------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| \| 6 \| 3 \| non tireplic \| +----+---------------+--------------+ 4 rows in set (0.001 sec) The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so AT THIS point we restart mdb2 to fix the status: [root@mdb2 my.cnf.d]# systemctl restart mariadb [root@mdb2 my.cnf.d]# mysql MariaDB md2 [(none)]> show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| Variable_name \| Value \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ \| wsrep_local_state_uuid \| 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 \| \| wsrep_protocol_version \| 9 \| \| wsrep_last_committed \| 66 \| \| wsrep_replicated \| 0 \| \| wsrep_replicated_bytes \| 0 \| \| wsrep_repl_keys \| 0 \| \| wsrep_repl_keys_bytes \| 0 \| \| wsrep_repl_data_bytes \| 0 \| \| wsrep_repl_other_bytes \| 0 \| \| wsrep_received \| 2 \| \| wsrep_received_bytes \| 200 \| \| wsrep_local_commits \| 0 \| \| wsrep_local_cert_failures \| 0 \| \| wsrep_local_replays \| 0 \| \| wsrep_local_send_queue \| 0 \| \| wsrep_local_send_queue_max \| 1 \| \| wsrep_local_send_queue_min \| 0 \| \| wsrep_local_send_queue_avg \| 0 \| \| wsrep_local_recv_queue \| 0 \| \| wsrep_local_recv_queue_max \| 1 \| \| wsrep_local_recv_queue_min \| 0 \| \| wsrep_local_recv_queue_avg \| 0 \| \| wsrep_local_cached_downto \| 64 \| \| wsrep_flow_control_paused_ns \| 0 \| \| wsrep_flow_control_paused \| 0 \| \| wsrep_flow_control_sent \| 0 \| \| wsrep_flow_control_recv \| 0 \| \| wsrep_cert_deps_distance \| 0 \| \| wsrep_apply_oooe \| 0 \| \| wsrep_apply_oool \| 0 \| \| wsrep_apply_window \| 0 \| \| wsrep_commit_oooe \| 0 \| \| wsrep_commit_oool \| 0 \| \| wsrep_commit_window \| 0 \| \| wsrep_local_state \| 4 \| \| wsrep_local_state_comment \| Synced \| \| wsrep_cert_index_size \| 0 \| \| wsrep_causal_reads \| 0 \| \| wsrep_cert_interval \| 0 \| \| wsrep_open_transactions \| 0 \| \| wsrep_open_connections \| 0 \| \| wsrep_incoming_addresses \| 10.0.1.13:3306,AUTO \| \| wsrep_cluster_weight \| 2 \| \| wsrep_desync_count \| 0 \| \| wsrep_evs_delayed \| \| \| wsrep_evs_evict_list \| \| \| wsrep_evs_repl_latency \| 0.000853237/0.001923/0.00333681/0.0010427/3 \| \| wsrep_evs_state \| OPERATIONAL \| \| wsrep_gcomm_uuid \| ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 \| \| wsrep_applier_thread_count \| 32 \| \| wsrep_cluster_capabilities \| \| \| wsrep_cluster_conf_id \| 6 \| \| wsrep_cluster_size \| 2 \| \| wsrep_cluster_state_uuid \| 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 \| \| wsrep_cluster_status \| Primary \| \| wsrep_connected \| ON \| \| wsrep_local_bf_aborts \| 0 \| \| wsrep_local_index \| 1 \| \| wsrep_provider_capabilities \| :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: \| \| wsrep_provider_name \| Galera \| \| wsrep_provider_vendor \| Codership Oy <info@codership.com> \| \| wsrep_provider_version \| 26.4.4(r4599) \| \| wsrep_ready \| ON \| \| wsrep_rollbacker_thread_count \| 1 \| \| wsrep_thread_count \| 33 \| +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.002 sec) NOTE now the status is ok: wsrep_local_index \| 1 wsrep_cluster_status \| Primary wsrep_local_state_comment \| Synced wsrep_local_index \| 1 but when we check the data we expect the new row should be present: MariaDB mdb2 [pippo]> select * from evento4; +----+---------------+--------+ \| Id \| IdDispositivo \| kkkk \| +----+---------------+--------+ \| 1 \| 123 \| aaaa \| \| 3 \| 222 \| eeeeaa \| \| 4 \| 34523452 \| e4r4r4 \| +----+---------------+--------+ 3 rows in set (0.001 sec) The row is not there. If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Massimo made changes - 2020-05-26 15:13

Description

Creating a full galera cluster of 10.3.23 with 3 nodes
mdb1,mdb2,mdb3 10.3.23 version.
We gently shut mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

Create a schema and a table on mdb1. all propagate

- stop mdb2 . yum remove the rpm of Mariadb and galera.
- install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
- set wsrep_on=OFF on my.cnf
- start mdb2
- perform mysql_upgrade -s
- stop mdb2
- set wsrep_on=ON on my.cnf
- start mbd2

At this point the status galera variables on mdb2:

MariaDB mdb2 [pippo]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | -1 |
| wsrep_last_committed | 65 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 3 |
| wsrep_received_bytes | 208 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0.5 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 1.5 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 1 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)

NOTE THAT :
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 18446744073709551615
wsrep_cluster_size | 0

Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:

MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+

WHILE ON THE MASTER:

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
Query OK, 1 row affected (0.015 sec)

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
| 6 | 3 | non tireplic |
+----+---------------+--------------+
4 rows in set (0.001 sec)

The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:

[root@mdb2 my.cnf.d]# systemctl restart mariadb
[root@mdb2 my.cnf.d]# mysql

MariaDB md2 [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | 9 |
| wsrep_last_committed | 66 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 2 |
| wsrep_received_bytes | 200 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 0 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.002 sec)

NOTE now the status is ok:

wsrep_local_index | 1
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 1

but when we check the data we expect the new row should be present:

MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Creating a full galera cluster of 10.3.23 with 3 nodes
mdb1,mdb2,mdb3 10.3.23 version.
We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

Create a schema and a table on mdb1. all propagate

- stop mdb2 . yum remove the rpm of Mariadb and galera.
- install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
- set wsrep_on=OFF on my.cnf
- start mdb2
- perform mysql_upgrade -s
- stop mdb2
- set wsrep_on=ON on my.cnf
- start mbd2

At this point the status galera variables on mdb2:

MariaDB mdb2 [pippo]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | -1 |
| wsrep_last_committed | 65 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 3 |
| wsrep_received_bytes | 208 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0.5 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 1.5 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 1 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)

NOTE THAT :
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 18446744073709551615
wsrep_cluster_size | 0

Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:

MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+

WHILE ON THE MASTER:

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
Query OK, 1 row affected (0.015 sec)

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
| 6 | 3 | non tireplic |
+----+---------------+--------------+
4 rows in set (0.001 sec)

The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:

[root@mdb2 my.cnf.d]# systemctl restart mariadb
[root@mdb2 my.cnf.d]# mysql

MariaDB md2 [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | 9 |
| wsrep_last_committed | 66 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 2 |
| wsrep_received_bytes | 200 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 0 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.002 sec)

NOTE now the status is ok:

wsrep_local_index | 1
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 1

but when we check the data we expect the new row should be present:

MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Massimo made changes - 2020-05-26 15:16

Priority

Major [ 3 ]

Critical [ 2 ]

Rick Pizzi (Inactive) added a comment - 2020-05-27 07:46

Looks related to https://jira.mariadb.org/browse/MDEV-19983

Rick Pizzi (Inactive) added a comment - 2020-05-27 07:46 Looks related to https://jira.mariadb.org/browse/MDEV-19983

Sergei Golubchik made changes - 2020-05-27 19:06

Description

Creating a full galera cluster of 10.3.23 with 3 nodes
mdb1,mdb2,mdb3 10.3.23 version.
We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

Create a schema and a table on mdb1. all propagate

- stop mdb2 . yum remove the rpm of Mariadb and galera.
- install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
- set wsrep_on=OFF on my.cnf
- start mdb2
- perform mysql_upgrade -s
- stop mdb2
- set wsrep_on=ON on my.cnf
- start mbd2

At this point the status galera variables on mdb2:

{noformat}
MariaDB mdb2 [pippo]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | -1 |
| wsrep_last_committed | 65 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 3 |
| wsrep_received_bytes | 208 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0.5 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 1.5 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 1 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)
{noformat}
NOTE THAT :
{noformat}
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 18446744073709551615
wsrep_cluster_size | 0
{noformat}
Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:
{noformat}
MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
{noformat}
WHILE ON THE MASTER:
{noformat}
MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
Query OK, 1 row affected (0.015 sec)

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
| 6 | 3 | non tireplic |
+----+---------------+--------------+
4 rows in set (0.001 sec)
{noformat}
The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:
{noformat}
[root@mdb2 my.cnf.d]# systemctl restart mariadb
[root@mdb2 my.cnf.d]# mysql

MariaDB md2 [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | 9 |
| wsrep_last_committed | 66 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 2 |
| wsrep_received_bytes | 200 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 0 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.002 sec)
{noformat}
NOTE now the status is ok:

{noformat}
wsrep_local_index | 1
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 1
{noformat}
but when we check the data we expect the new row should be present:
{noformat}
MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)
{noformat}
The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Elena Stepanova made changes - 2020-05-29 18:38

Fix Version/s		10.3 [ 22126 ]
Assignee		Jan Lindström [ jplindst ]

Jan Lindström (Inactive) made changes - 2020-05-29 18:57

Assignee

Jan Lindström [ jplindst ]

Stepan Patryshev [ stepan.patryshev ]

Stepan Patryshev (Inactive) made changes - 2020-06-10 15:24

Status

Open [ 1 ]

In Progress [ 3 ]

Stepan Patryshev (Inactive) made changes - 2020-06-10 16:31

Description

Creating a full galera cluster of 10.3.23 with 3 nodes
mdb1,mdb2,mdb3 10.3.23 version.
We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

Create a schema and a table on mdb1. all propagate

- stop mdb2 . yum remove the rpm of Mariadb and galera.
- install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
- set wsrep_on=OFF on my.cnf
- start mdb2
- perform mysql_upgrade -s
- stop mdb2
- set wsrep_on=ON on my.cnf
- start mbd2

At this point the status galera variables on mdb2:

{noformat}
MariaDB mdb2 [pippo]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | -1 |
| wsrep_last_committed | 65 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 3 |
| wsrep_received_bytes | 208 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0.5 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 1.5 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 1 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)
{noformat}
NOTE THAT :
{noformat}
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 18446744073709551615
wsrep_cluster_size | 0
{noformat}
Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:
{noformat}
MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
{noformat}
WHILE ON THE MASTER:
{noformat}
MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
Query OK, 1 row affected (0.015 sec)

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
| 6 | 3 | non tireplic |
+----+---------------+--------------+
4 rows in set (0.001 sec)
{noformat}
The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:
{noformat}
[root@mdb2 my.cnf.d]# systemctl restart mariadb
[root@mdb2 my.cnf.d]# mysql

MariaDB md2 [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | 9 |
| wsrep_last_committed | 66 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 2 |
| wsrep_received_bytes | 200 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 0 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.002 sec)
{noformat}
NOTE now the status is ok:

{noformat}
wsrep_local_index | 1
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 1
{noformat}
but when we check the data we expect the new row should be present:
{noformat}
MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)
{noformat}
The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Creating a full galera cluster of 10.3.23 with 3 nodes
mdb1,mdb2,mdb3 10.3.23 version.
We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4.13, to enforce IST . We also re-tested with all 3 servers up , same result.

Create a schema and a table on mdb1. all propagate

- stop mdb2 . yum remove the rpm of Mariadb and galera.
- install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
- set wsrep_on=OFF on my.cnf
- start mdb2
- perform mysql_upgrade -s
- stop mdb2
- set wsrep_on=ON on my.cnf
- start mbd2

At this point the status galera variables on mdb2:

{noformat}
MariaDB mdb2 [pippo]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | -1 |
| wsrep_last_committed | 65 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 3 |
| wsrep_received_bytes | 208 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0.5 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 1.5 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 1 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 18446744073709551615 |
| wsrep_cluster_size | 0 |
| wsrep_cluster_state_uuid | |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 18446744073709551615 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.001 sec)
{noformat}
NOTE THAT :
{noformat}
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 18446744073709551615
wsrep_cluster_size | 0
{noformat}
Looking at the error log, the server is ready for connections after a IST

At this point the 'master' mdb1 have a write that are not getting replicate:
{noformat}
MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
{noformat}
WHILE ON THE MASTER:
{noformat}
MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)

MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
Query OK, 1 row affected (0.015 sec)

MariaDB mdb1 [pippo]> select * from evento4;
+----+---------------+--------------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
| 6 | 3 | non tireplic |
+----+---------------+--------------+
4 rows in set (0.001 sec)
{noformat}
The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

AT THIS point we restart mdb2 to fix the status:
{noformat}
[root@mdb2 my.cnf.d]# systemctl restart mariadb
[root@mdb2 my.cnf.d]# mysql

MariaDB md2 [(none)]> show global status like 'wsrep%';
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| Variable_name | Value |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
| wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_protocol_version | 9 |
| wsrep_last_committed | 66 |
| wsrep_replicated | 0 |
| wsrep_replicated_bytes | 0 |
| wsrep_repl_keys | 0 |
| wsrep_repl_keys_bytes | 0 |
| wsrep_repl_data_bytes | 0 |
| wsrep_repl_other_bytes | 0 |
| wsrep_received | 2 |
| wsrep_received_bytes | 200 |
| wsrep_local_commits | 0 |
| wsrep_local_cert_failures | 0 |
| wsrep_local_replays | 0 |
| wsrep_local_send_queue | 0 |
| wsrep_local_send_queue_max | 1 |
| wsrep_local_send_queue_min | 0 |
| wsrep_local_send_queue_avg | 0 |
| wsrep_local_recv_queue | 0 |
| wsrep_local_recv_queue_max | 1 |
| wsrep_local_recv_queue_min | 0 |
| wsrep_local_recv_queue_avg | 0 |
| wsrep_local_cached_downto | 64 |
| wsrep_flow_control_paused_ns | 0 |
| wsrep_flow_control_paused | 0 |
| wsrep_flow_control_sent | 0 |
| wsrep_flow_control_recv | 0 |
| wsrep_cert_deps_distance | 0 |
| wsrep_apply_oooe | 0 |
| wsrep_apply_oool | 0 |
| wsrep_apply_window | 0 |
| wsrep_commit_oooe | 0 |
| wsrep_commit_oool | 0 |
| wsrep_commit_window | 0 |
| wsrep_local_state | 4 |
| wsrep_local_state_comment | Synced |
| wsrep_cert_index_size | 0 |
| wsrep_causal_reads | 0 |
| wsrep_cert_interval | 0 |
| wsrep_open_transactions | 0 |
| wsrep_open_connections | 0 |
| wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
| wsrep_cluster_weight | 2 |
| wsrep_desync_count | 0 |
| wsrep_evs_delayed | |
| wsrep_evs_evict_list | |
| wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
| wsrep_evs_state | OPERATIONAL |
| wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
| wsrep_applier_thread_count | 32 |
| wsrep_cluster_capabilities | |
| wsrep_cluster_conf_id | 6 |
| wsrep_cluster_size | 2 |
| wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
| wsrep_cluster_status | Primary |
| wsrep_connected | ON |
| wsrep_local_bf_aborts | 0 |
| wsrep_local_index | 1 |
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
| wsrep_provider_name | Galera |
| wsrep_provider_vendor | Codership Oy <info@codership.com> |
| wsrep_provider_version | 26.4.4(r4599) |
| wsrep_ready | ON |
| wsrep_rollbacker_thread_count | 1 |
| wsrep_thread_count | 33 |
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
65 rows in set (0.002 sec)
{noformat}
NOTE now the status is ok:

{noformat}
wsrep_local_index | 1
wsrep_cluster_status | Primary
wsrep_local_state_comment | Synced
wsrep_local_index | 1
{noformat}
but when we check the data we expect the new row should be present:
{noformat}
MariaDB mdb2 [pippo]> select * from evento4;
+----+---------------+--------+
| Id | IdDispositivo | kkkk |
+----+---------------+--------+
| 1 | 123 | aaaa |
| 3 | 222 | eeeeaa |
| 4 | 34523452 | e4r4r4 |
+----+---------------+--------+
3 rows in set (0.001 sec)
{noformat}
The row is not there.

If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

Stepan Patryshev (Inactive) made changes - 2020-06-12 17:42

Attachment		200612_mysqld.1.err [ 52173 ]
Attachment		200612_mysqld.2.err [ 52174 ]
Attachment		200612_mysqld.3.err [ 52175 ]
Attachment		mysqld_new.2.cnf [ 52176 ]
Attachment		mysqld_old.3.cnf [ 52177 ]
Attachment		mysqld_old.2.cnf [ 52178 ]
Attachment		mysqld_old.1.cnf [ 52179 ]

Stepan Patryshev (Inactive) added a comment - 2020-06-12 17:42 - edited

I have managed to reproduce it only partially. I have not observed any data loss during a node upgrade. But I got these strange values: wsrep_local_index = 18446744073709551615 and wsrep_cluster_size = 0.

Release builds 10.3.23 + Galera 25.3.29(rb0f34b0) and 10.4.13 + Galera 26.4.4(rae24803).

Steps:

1. ./mtr --suite=galera_3nodes --start-and-exit
2. Restart all nodes one by one with separate config files: Node1, Node2, Node3.
3. create table evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));
4. insert into evento4(IdDispositivo, kkkk) values(123, 'aaaa');
insert into evento4(IdDispositivo, kkkk) values(222, 'eeeeaa');
insert into evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');
5. Stop Node 2.
6. Set wsrep-on=OFF and run Node 2 on 10.4.13 binaries with Node2 new config.
7. Perform mysql_upgrade -s.
8. Stop Node 2.
9. Node 3: insert into evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');
select * from evento4;

Id	IdDispositivo	kkkk
2	123	aaaa
5	222	eeeeaa
8	34523452	e4r4r4
10	777777	While Node 2 was upgrading

10. Start Node 2 with wsrep-on=ON.

11. New data appeared on Node 2:
select * from evento4;

Id	IdDispositivo	kkkk
2	123	aaaa
5	222	eeeeaa
8	34523452	e4r4r4
10	777777	While Node 2 was upgrading

But:

show global status like 'wsrep%';

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |

| wsrep_protocol_version        | 9                                                                                                                                              |

| wsrep_last_committed          | 6                                                                                                                                              |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 3                                                                                                                                              |

| wsrep_received_bytes          | 288                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | 6                                                                                                                                              |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 1                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 1                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001                                                                                                |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0.000293552/0.000366098/0.000521759/7.98882e-05/5                                                                                              |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | e05a4078-acc3-11ea-9394-8ba782d6f291                                                                                                           |

| wsrep_applier_thread_count    | 32                                                                                                                                             |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |

| wsrep_cluster_size            | 0                                                                                                                                              |

| wsrep_cluster_state_uuid      |                                                                                                                                                |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 18446744073709551615                                                                                                                           |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(rae24803)                                                                                                                               |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 33                                                                                                                                             |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

65 rows in set (0.001 sec)

wsrep_cluster_status	Primary
wsrep_local_state_comment	Synced
wsrep_local_index	18446744073709551615
wsrep_cluster_size	0

12. On node 3: insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
13. New data are replicated to Node 2:
select * from evento4;

Id	IdDispositivo	kkkk
2	123	aaaa
5	222	eeeeaa
8	34523452	e4r4r4
10	777777	While Node 2 was upgrading
13	3	non tireplic

14. Restart Node 2.
15. On Node 2:

show global status like 'wsrep%';

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |

| wsrep_protocol_version        | 9                                                                                                                                              |

| wsrep_last_committed          | 7                                                                                                                                              |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 2                                                                                                                                              |

| wsrep_received_bytes          | 280                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | 6                                                                                                                                              |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 0                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 0                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001                                                                                                |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | a2c23b72-acc8-11ea-afe5-cbd8cb9a86ed                                                                                                           |

| wsrep_applier_thread_count    | 32                                                                                                                                             |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 17                                                                                                                                             |

| wsrep_cluster_size            | 3                                                                                                                                              |

| wsrep_cluster_state_uuid      | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 2                                                                                                                                              |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(rae24803)                                                                                                                               |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 33                                                                                                                                             |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

65 rows in set (0.001 sec)

wsrep_cluster_status	Primary
wsrep_local_state_comment	Synced
wsrep_local_index	2
wsrep_cluster_size	3

Server logs: Node 1, Node 2, Node 3.

I also have tried with one node stopped and without data population on Node 1 joined to the cluster during upgrading Node 2, but there were no any data loss anyway.

Stepan Patryshev (Inactive) added a comment - 2020-06-12 17:42 - edited I have managed to reproduce it only partially. I have not observed any data loss during a node upgrade. But I got these strange values: wsrep_local_index = 18446744073709551615 and wsrep_cluster_size = 0. Release builds 10.3.23 + Galera 25.3.29(rb0f34b0) and 10.4.13 + Galera 26.4.4(rae24803). Steps: 1. ./mtr --suite=galera_3nodes --start-and-exit 2. Restart all nodes one by one with separate config files: Node1 , Node2 , Node3 . 3. create table evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255)); 4. insert into evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 '); 5. Stop Node 2. 6. Set wsrep-on=OFF and run Node 2 on 10.4.13 binaries with Node2 new config . 7. Perform mysql_upgrade -s. 8. Stop Node 2. 9. Node 3: insert into evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading'); select * from evento4; Id IdDispositivo kkkk 2 123 aaaa 5 222 eeeeaa 8 34523452 e4r4r4 10 777777 While Node 2 was upgrading 10. Start Node 2 with wsrep-on=ON. 11. New data appeared on Node 2: select * from evento4; Id IdDispositivo kkkk 2 123 aaaa 5 222 eeeeaa 8 34523452 e4r4r4 10 777777 While Node 2 was upgrading But: show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | be36cf8b-acb6-11ea-aa2c-e3149c2ff908 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 6 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 3 | | wsrep_received_bytes | 288 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 1 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 1 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0.000293552/0.000366098/0.000521759/7.98882e-05/5 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | e05a4078-acc3-11ea-9394-8ba782d6f291 | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 18446744073709551615 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(rae24803) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 18446744073709551615 wsrep_cluster_size 0 12. On node 3: insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic'); 13. New data are replicated to Node 2: select * from evento4; Id IdDispositivo kkkk 2 123 aaaa 5 222 eeeeaa 8 34523452 e4r4r4 10 777777 While Node 2 was upgrading 13 3 non tireplic 14. Restart Node 2. 15. On Node 2: show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | be36cf8b-acb6-11ea-aa2c-e3149c2ff908 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 7 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | a2c23b72-acc8-11ea-afe5-cbd8cb9a86ed | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 17 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | be36cf8b-acb6-11ea-aa2c-e3149c2ff908 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 2 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(rae24803) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 2 wsrep_cluster_size 3 Server logs : Node 1 , Node 2 , Node 3 . I also have tried with one node stopped and without data population on Node 1 joined to the cluster during upgrading Node 2, but there were no any data loss anyway.

Stepan Patryshev (Inactive) made changes - 2020-06-12 17:57

Assignee

Stepan Patryshev [ stepan.patryshev ]

Seppo Jaakola [ seppo ]

Stepan Patryshev (Inactive) made changes - 2020-06-12 17:58

Assignee

Seppo Jaakola [ seppo ]

Stepan Patryshev [ stepan.patryshev ]

Stepan Patryshev (Inactive) made changes - 2020-06-12 17:58

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Stepan Patryshev (Inactive) made changes - 2020-06-12 17:58

Assignee

Stepan Patryshev [ stepan.patryshev ]

Seppo Jaakola [ seppo ]

Stepan Patryshev (Inactive) made changes - 2020-06-12 17:59

Affects Version/s		10.4.13 [ 24223 ]
Affects Version/s	10.3.14 [ 23216 ]

Stepan Patryshev (Inactive) made changes - 2020-06-12 18:00

Fix Version/s

10.4 [ 22408 ]

Rick Pizzi (Inactive) added a comment - 2020-06-15 07:49

Data loss is there, as documented in original description. We reproduced it many times.

Rick Pizzi (Inactive) added a comment - 2020-06-15 07:49 Data loss is there, as documented in original description. We reproduced it many times.

Rick Pizzi (Inactive) added a comment - 2020-06-15 12:51

I have re-tested this in my own lab (the original bug report was from Massimo, I'm in same team).

I confirm the bug exist and we don't understand why it is not happening to you.

Exact steps to reproduce:

1. install 3 nodes with latest 10.3, i used 10.3.23, wsrep version 25.3.28(r3875)
2. create a table and insert data in it.

Situation after 2 steps above:

node1>create table dataloss (id int not null auto_increment primary key, value int);

Query OK, 0 rows affected (0.025 sec)

node1>insert into dataloss (value) values (1), (2), (3);

Query OK, 3 rows affected (0.003 sec)

Records: 3  Duplicates: 0  Warnings: 0

node1>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

+----+-------+

3 rows in set (0.000 sec)

node1>show global status like 'wsrep%';

+-------------------------------+------------------------------------------+

| Variable_name                 | Value                                    |

+-------------------------------+------------------------------------------+

| wsrep_applier_thread_count    | 8                                        |

| wsrep_apply_oooe              | 0.000000                                 |

| wsrep_apply_oool              | 0.000000                                 |

| wsrep_apply_window            | 1.000000                                 |

| wsrep_causal_reads            | 0                                        |

| wsrep_cert_deps_distance      | 1.000000                                 |

| wsrep_cert_index_size         | 5                                        |

| wsrep_cert_interval           | 0.000000                                 |

| wsrep_cluster_conf_id         | 19                                       |

| wsrep_cluster_size            | 3                                        |

| wsrep_cluster_state_uuid      | cf61cf68-aef7-11ea-88db-1bc466429584     |

| wsrep_cluster_status          | Primary                                  |

| wsrep_cluster_weight          | 3                                        |

| wsrep_commit_oooe             | 0.000000                                 |

| wsrep_commit_oool             | 0.000000                                 |

| wsrep_commit_window           | 1.000000                                 |

| wsrep_connected               | ON                                       |

| wsrep_desync_count            | 0                                        |

| wsrep_evs_delayed             |                                          |

| wsrep_evs_evict_list          |                                          |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                |

| wsrep_evs_state               | OPERATIONAL                              |

| wsrep_flow_control_paused     | 0.000000                                 |

| wsrep_flow_control_paused_ns  | 0                                        |

| wsrep_flow_control_recv       | 0                                        |

| wsrep_flow_control_sent       | 0                                        |

| wsrep_gcomm_uuid              | 66883d21-af01-11ea-a6eb-260a9c0d8490     |

| wsrep_incoming_addresses      | AUTO,192.168.2.90:3306,192.168.2.92:3306 |

| wsrep_last_committed          | 8                                        |

| wsrep_local_bf_aborts         | 0                                        |

| wsrep_local_cached_downto     | 6                                        |

| wsrep_local_cert_failures     | 0                                        |

| wsrep_local_commits           | 1                                        |

| wsrep_local_index             | 1                                        |

| wsrep_local_recv_queue        | 0                                        |

| wsrep_local_recv_queue_avg    | 0.000000                                 |

| wsrep_local_recv_queue_max    | 1                                        |

| wsrep_local_recv_queue_min    | 0                                        |

| wsrep_local_replays           | 0                                        |

| wsrep_local_send_queue        | 0                                        |

| wsrep_local_send_queue_avg    | 0.000000                                 |

| wsrep_local_send_queue_max    | 1                                        |

| wsrep_local_send_queue_min    | 0                                        |

| wsrep_local_state             | 4                                        |

| wsrep_local_state_comment     | Synced                                   |

| wsrep_local_state_uuid        | cf61cf68-aef7-11ea-88db-1bc466429584     |

| wsrep_open_connections        | 0                                        |

| wsrep_open_transactions       | 0                                        |

| wsrep_protocol_version        | 9                                        |

| wsrep_provider_name           | Galera                                   |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>        |

| wsrep_provider_version        | 25.3.28(r3875)                           |

| wsrep_ready                   | ON                                       |

| wsrep_received                | 4                                        |

| wsrep_received_bytes          | 755                                      |

| wsrep_repl_data_bytes         | 978                                      |

| wsrep_repl_keys               | 9                                        |

| wsrep_repl_keys_bytes         | 144                                      |

| wsrep_repl_other_bytes        | 0                                        |

| wsrep_replicated              | 3                                        |

| wsrep_replicated_bytes        | 1328                                     |

| wsrep_rollbacker_thread_count | 1                                        |

| wsrep_thread_count            | 9                                        |

+-------------------------------+------------------------------------------+

63 rows in set (0.001 sec)

3. on node 2, shut down and upgrade to latest 10.4, I used 10.4.13, wsrep 26.4.4(r4599)

When you restart that node, you see weird values for cluster_size and cluster_local_index:

MariaDB [(none)]> show global status like 'wsrep%';

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | cf61cf68-aef7-11ea-88db-1bc466429584                                                                                                           |

| wsrep_protocol_version        | -1                                                                                                                                             |

| wsrep_last_committed          | 8                                                                                                                                              |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 3                                                                                                                                              |

| wsrep_received_bytes          | 288                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | -1                                                                                                                                             |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 0                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 0                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | AUTO,192.168.2.90:3306,192.168.2.92:3306                                                                                                       |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0.000567644/0.00112438/0.00173288/0.000348106/7                                                                                                |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | 043aaa1a-af04-11ea-9292-9a42c9f9c38d                                                                                                           |

| wsrep_applier_thread_count    | 8                                                                                                                                              |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |

| wsrep_cluster_size            | 0                                                                                                                                              |

| wsrep_cluster_state_uuid      |                                                                                                                                                |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 18446744073709551615                                                                                                                           |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 9                                                                                                                                              |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

65 rows in set (0.001 sec)

Recheck the content of table dataloss on 3 nodes:

node1>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

+----+-------+

3 rows in set (0.001 sec)

node2> select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

+----+-------+

3 rows in set (0.001 sec)

node3>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

+----+-------+

3 rows in set (0.000 sec)

Now insert a row on node1, verify it has been added:

node1>insert into dataloss (value) values (4);

Query OK, 1 row affected (0.002 sec)

node1>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

| 11 |     4 |

+----+-------+

4 rows in set (0.000 sec)

If you check on node2, that row is not there and it's lost:

noed2> select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

+----+-------+

3 rows in set (0.000 sec)

On node 3, the row is there:

node3>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

| 11 |     4 |

+----+-------+

4 rows in set (0.000 sec)

Any other row inserted in this situation never reaches node 2 - it's data loss.

Then if you reboot the node2 once more, the wsrep config clears and looks good:

Redirecting to /bin/systemctl stop mariadb.service

[root@docker2 ~]# service mariadb start

Redirecting to /bin/systemctl start mariadb.service

[root@docker2 ~]# mysql -A

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 20

Server version: 10.4.13-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

node2> show global status like 'wsrep_local_index';

+-------------------+-------+

| Variable_name     | Value |

+-------------------+-------+

| wsrep_local_index | 2     |

+-------------------+-------+

1 row in set (0.001 sec)

Now, if I insert a new row on node1, it is correctly propagated to all nodes, but the row previously inserted is lost:

node1>insert into dataloss (value) values (5);

Query OK, 1 row affected (0.003 sec)

node1>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

| 11 |     4 |

| 16 |     5 |

+----+-------+

5 rows in set (0.000 sec)

node2> select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

| 16 |     5 |

+----+-------+

4 rows in set (0.000 sec)

node3>select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  2 |     1 |

|  5 |     2 |

|  8 |     3 |

| 11 |     4 |

| 16 |     5 |

+----+-------+

5 rows in set (0.000 sec)

So, please re-test the above scenario to verify that there is actual data loss and it's not only a problem of bad variable display

Thanks
RIck

Rick Pizzi (Inactive) added a comment - 2020-06-15 12:51 I have re-tested this in my own lab (the original bug report was from Massimo, I'm in same team). I confirm the bug exist and we don't understand why it is not happening to you. Exact steps to reproduce: 1. install 3 nodes with latest 10.3, i used 10.3.23, wsrep version 25.3.28(r3875) 2. create a table and insert data in it. Situation after 2 steps above: node1>create table dataloss (id int not null auto_increment primary key, value int); Query OK, 0 rows affected (0.025 sec) node1>insert into dataloss (value) values (1), (2), (3); Query OK, 3 rows affected (0.003 sec) Records: 3 Duplicates: 0 Warnings: 0 node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.000 sec) node1>show global status like 'wsrep%'; +-------------------------------+------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------+ | wsrep_applier_thread_count | 8 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 19 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | cf61cf68-aef7-11ea-88db-1bc466429584 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 66883d21-af01-11ea-a6eb-260a9c0d8490 | | wsrep_incoming_addresses | AUTO,192.168.2.90:3306,192.168.2.92:3306 | | wsrep_last_committed | 8 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 6 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 1 | | wsrep_local_index | 1 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | cf61cf68-aef7-11ea-88db-1bc466429584 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.28(r3875) | | wsrep_ready | ON | | wsrep_received | 4 | | wsrep_received_bytes | 755 | | wsrep_repl_data_bytes | 978 | | wsrep_repl_keys | 9 | | wsrep_repl_keys_bytes | 144 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 3 | | wsrep_replicated_bytes | 1328 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 9 | +-------------------------------+------------------------------------------+ 63 rows in set (0.001 sec) 3. on node 2, shut down and upgrade to latest 10.4, I used 10.4.13, wsrep 26.4.4(r4599) When you restart that node, you see weird values for cluster_size and cluster_local_index: MariaDB [(none)]> show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | cf61cf68-aef7-11ea-88db-1bc466429584 | | wsrep_protocol_version | -1 | | wsrep_last_committed | 8 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 3 | | wsrep_received_bytes | 288 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | -1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | AUTO,192.168.2.90:3306,192.168.2.92:3306 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0.000567644/0.00112438/0.00173288/0.000348106/7 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 043aaa1a-af04-11ea-9292-9a42c9f9c38d | | wsrep_applier_thread_count | 8 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 18446744073709551615 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 9 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) Recheck the content of table dataloss on 3 nodes: node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.001 sec) node2> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.001 sec) node3>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.000 sec) Now insert a row on node1, verify it has been added: node1>insert into dataloss (value) values (4); Query OK, 1 row affected (0.002 sec) node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | +----+-------+ 4 rows in set (0.000 sec) If you check on node2, that row is not there and it's lost: noed2> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.000 sec) On node 3, the row is there: node3>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | +----+-------+ 4 rows in set (0.000 sec) Any other row inserted in this situation never reaches node 2 - it's data loss. Then if you reboot the node2 once more, the wsrep config clears and looks good: Redirecting to /bin/systemctl stop mariadb.service [root@docker2 ~]# service mariadb start Redirecting to /bin/systemctl start mariadb.service [root@docker2 ~]# mysql -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 20 Server version: 10.4.13-MariaDB-log MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. node2> show global status like 'wsrep_local_index'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | wsrep_local_index | 2 | +-------------------+-------+ 1 row in set (0.001 sec) Now, if I insert a new row on node1, it is correctly propagated to all nodes, but the row previously inserted is lost: node1>insert into dataloss (value) values (5); Query OK, 1 row affected (0.003 sec) node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | | 16 | 5 | +----+-------+ 5 rows in set (0.000 sec) node2> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 16 | 5 | +----+-------+ 4 rows in set (0.000 sec) node3>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | | 16 | 5 | +----+-------+ 5 rows in set (0.000 sec) So, please re-test the above scenario to verify that there is actual data loss and it's not only a problem of bad variable display Thanks RIck

Rick Pizzi (Inactive) added a comment - 2020-06-15 12:54

stepan.patryshev Please check the above.

Rick Pizzi (Inactive) added a comment - 2020-06-15 12:54 stepan.patryshev Please check the above.

MikaH added a comment - 2020-06-16 09:53

tested with rolling-update method. Three node cluster where nodes were 10.3.23 (on Centos 7.6). Node2 upgraded:

node1> MariaDB [test]> create table dataloss (id int not null auto_increment primary key, value int);

MariaDB [test]> insert into dataloss (value) values (1), (2), (3);

Query OK, 3 rows affected (0.006 sec)

Records: 3  Duplicates: 0  Warnings: 0

MariaDB [test]> select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

3 rows in set (0.001 sec)

Status on node1:

MariaDB [test]> show global status like 'wsrep%cluster_size%';

+--------------------+-------+

| Variable_name      | Value |

+--------------------+-------+

| wsrep_cluster_size | 3     |

+--------------------+-------+

1 row in set (0.002 sec)

MariaDB [test]> show global status like 'wsrep%size%';

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| wsrep_cert_index_size | 3     |

| wsrep_cluster_size    | 3     |

+-----------------------+-------+

2 rows in set (0.002 sec)

Status on node2 before upgrade:

MariaDB [(none)]> select * from test.dataloss;

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

4 rows in set (0.001 sec)

Perform node2 upgrade:

# Copy configs to safe place:

mkdir /root/configs/

/bin/cp -p /etc/my.cnf.d/*cnf /root/configs/.

# Stop and remove old rpm's:

systemctl stop mariadb && rpm -qai|grep -e Maria -e galera |grep Name | awk '{print "yum remove " $3 " -y"}'|bash

# Then install new rpm's and Selinux-policyfiles:

yum localinstall rpmsfor10.4.13/*rpm -y && semodule -v -i selinux/*.pp

# Copy configs back:

/bin/cp -p /root/configs/*cnf /etc/my.cnf.d/.

# Add needed link, start MariaDB and run mysql_upgrade:

ln -s /usr/lib64/galera-4 /usr/lib64/galera && systemctl start mariadb && mysql_upgrade -uroot -p --skip-write-binlog

Status after node2 upgrade:

[root@galera2 ~]# mysql -uroot

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 852

Server version: 10.4.13-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show global status like 'wsrep%cluster_size%';

+--------------------+-------+

| Variable_name      | Value |

+--------------------+-------+

| wsrep_cluster_size | 3     |

+--------------------+-------+

1 row in set (0.002 sec)

MariaDB [(none)]> show global status like 'wsrep%size%';

+-----------------------+-------+

| Variable_name         | Value |

+-----------------------+-------+

| wsrep_cert_index_size | 3     |

| wsrep_cluster_size    | 3     |

+-----------------------+-------+

2 rows in set (0.002 sec)

MariaDB [(none)]>

Inserting on node1 data:

MariaDB [test]> insert into dataloss (value) values (4);

Query OK, 1 row affected (0.004 sec)

MariaDB [test]> select * from dataloss;

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

| 12 |     4 |

+----+-------+

4 rows in set (0.000 sec)

Status on node2 after data inserted on node1:

MariaDB [(none)]> select * from test.dataloss;

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

| 12 |     4 |

+----+-------+

4 rows in set (0.000 sec)

MariaDB [(none)]>

No data loss with this method

MikaH added a comment - 2020-06-16 09:53 tested with rolling-update method. Three node cluster where nodes were 10.3.23 (on Centos 7.6). Node2 upgraded: node1> MariaDB [test]> create table dataloss (id int not null auto_increment primary key, value int); MariaDB [test]> insert into dataloss (value) values (1), (2), (3); Query OK, 3 rows affected (0.006 sec) Records: 3 Duplicates: 0 Warnings: 0 MariaDB [test]> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 3 rows in set (0.001 sec) Status on node1: MariaDB [test]> show global status like 'wsrep%cluster_size%'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.002 sec) MariaDB [test]> show global status like 'wsrep%size%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | wsrep_cert_index_size | 3 | | wsrep_cluster_size | 3 | +-----------------------+-------+ 2 rows in set (0.002 sec) Status on node2 before upgrade: MariaDB [(none)]> select * from test.dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 4 rows in set (0.001 sec) Perform node2 upgrade: # Copy configs to safe place: mkdir /root/configs/ /bin/cp -p /etc/my.cnf.d/*cnf /root/configs/. # Stop and remove old rpm's: systemctl stop mariadb && rpm -qai|grep -e Maria -e galera |grep Name | awk '{print "yum remove " $3 " -y"}'|bash # Then install new rpm's and Selinux-policyfiles: yum localinstall rpmsfor10.4.13/*rpm -y && semodule -v -i selinux/*.pp # Copy configs back: /bin/cp -p /root/configs/*cnf /etc/my.cnf.d/. # Add needed link, start MariaDB and run mysql_upgrade: ln -s /usr/lib64/galera-4 /usr/lib64/galera && systemctl start mariadb && mysql_upgrade -uroot -p --skip-write-binlog Status after node2 upgrade: [root@galera2 ~]# mysql -uroot Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 852 Server version: 10.4.13-MariaDB-log MariaDB Server Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> show global status like 'wsrep%cluster_size%'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.002 sec) MariaDB [(none)]> show global status like 'wsrep%size%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | wsrep_cert_index_size | 3 | | wsrep_cluster_size | 3 | +-----------------------+-------+ 2 rows in set (0.002 sec) MariaDB [(none)]> Inserting on node1 data: MariaDB [test]> insert into dataloss (value) values (4); Query OK, 1 row affected (0.004 sec) MariaDB [test]> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ 4 rows in set (0.000 sec) Status on node2 after data inserted on node1: MariaDB [(none)]> select * from test.dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ 4 rows in set (0.000 sec) MariaDB [(none)]> No data loss with this method

Rick Pizzi (Inactive) added a comment - 2020-06-16 09:59 - edited

If node2 came up with correct cluster index it could be it has performed an SST.
Please post logs...

Rick Pizzi (Inactive) added a comment - 2020-06-16 09:59 - edited If node2 came up with correct cluster index it could be it has performed an SST. Please post logs...

Massimo made changes - 2020-06-16 10:02

Comment

[ [~mihaQ] you are doing the wrong test. the insert and the data loss are happening when the node2 is down. on your test you write when the node join already the cluster ]

MikaH made changes - 2020-06-16 10:15

Attachment		node1_bootsrapped_10.3.23.log.rtf [ 52193 ]
Attachment		node2_upgraded.log.rtf [ 52194 ]

MikaH made changes - 2020-06-16 10:18

Attachment		node2_upgraded_10.4.13.log [ 52195 ]
Attachment		node1_bootsrapped_10.3.23.log [ 52196 ]

MikaH added a comment - 2020-06-16 10:18

Here are the logs:
node2_upgraded_10.4.13.log node1_bootsrapped_10.3.23.log

MikaH added a comment - 2020-06-16 10:18 Here are the logs: node2_upgraded_10.4.13.log node1_bootsrapped_10.3.23.log

Rick Pizzi (Inactive) added a comment - 2020-06-16 10:28

Your log is mangled. I would suggest you follow exactly my steps and you should get the same results. We did this in multiple labs with same result.

Rick Pizzi (Inactive) added a comment - 2020-06-16 10:28 Your log is mangled. I would suggest you follow exactly my steps and you should get the same results. We did this in multiple labs with same result.

Stepan Patryshev (Inactive) made changes - 2020-06-17 17:16

Link

This issue relates to ~~MDEV-22745~~ [ ~~MDEV-22745~~ ]

Stepan Patryshev (Inactive) added a comment - 2020-06-17 17:21 - edited

rpizzi Thank you for the detailed steps. I have retested it with wsrep version 25.3.28(r3875) you mentioned and these steps, but unfortunately still have not got any data loss or a server crash.

Stepan Patryshev (Inactive) added a comment - 2020-06-17 17:21 - edited rpizzi Thank you for the detailed steps. I have retested it with wsrep version 25.3.28(r3875) you mentioned and these steps, but unfortunately still have not got any data loss or a server crash.

Stepan Patryshev (Inactive) added a comment - 2020-07-09 13:12

rpizziI have passed your steps with standard installed packages on separate VMs but still have not managed to reproduce it. Do not know what is the key difference. Can you please share the steps how exactly do you update the server just in case?

Stepan Patryshev (Inactive) added a comment - 2020-07-09 13:12 rpizzi I have passed your steps with standard installed packages on separate VMs but still have not managed to reproduce it. Do not know what is the key difference. Can you please share the steps how exactly do you update the server just in case?

Rick Pizzi (Inactive) added a comment - 2020-07-09 19:54

The steps are outlined above https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156703 and are more than detailed.

Can you please post the output of your session when running the above commands here?

Rick Pizzi (Inactive) added a comment - 2020-07-09 19:54 The steps are outlined above https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156703 and are more than detailed. Can you please post the output of your session when running the above commands here?

Stepan Patryshev (Inactive) made changes - 2020-07-10 08:52

Attachment

200709_patgal_output.zip [ 52743 ]

Stepan Patryshev (Inactive) added a comment - 2020-07-10 08:52

rpizzi Here you are my sessions output. There are different sessions for MariaDB client and for the console itself.

Stepan Patryshev (Inactive) added a comment - 2020-07-10 08:52 rpizzi Here you are my sessions output . There are different sessions for MariaDB client and for the console itself.

Rick Pizzi (Inactive) added a comment - 2020-07-13 12:19

stepan.patryshev from these files we can't infer whether the correct sequence of steps has been executed.

Can you provide evidence that reproducing the steps I have outlined above you get different results?
As I already mentioned, two in my team on two separate environments can reproduce it just fine and 100% of the time.

Please, try once again, and provide a single output with all the steps done in sequence, like I did above.

Thanks

Rick

Rick Pizzi (Inactive) added a comment - 2020-07-13 12:19 stepan.patryshev from these files we can't infer whether the correct sequence of steps has been executed. Can you provide evidence that reproducing the steps I have outlined above you get different results? As I already mentioned, two in my team on two separate environments can reproduce it just fine and 100% of the time. Please, try once again, and provide a single output with all the steps done in sequence, like I did above. Thanks Rick

Massimo added a comment - 2020-07-13 12:27

please do not use the test schema as well and add the steps, conf and error log of all the nodes. Looking at the log isnt clear what you have done

Massimo added a comment - 2020-07-13 12:27 please do not use the test schema as well and add the steps, conf and error log of all the nodes. Looking at the log isnt clear what you have done

Stepan Patryshev (Inactive) made changes - 2020-07-13 17:39

Attachment

20200713_MDEV-22723_patgal_no_errors.zip [ 52773 ]

Stepan Patryshev (Inactive) added a comment - 2020-07-13 17:40 - edited

rpizzi I have passed the steps again without any failures. PFA all logs and cnf files.

Steps:

1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902).

2. On Node1 create a table and insert data in it.

[root@patgal1 ~]# mysql -pr -e'CREATE DATABASE d;create table d.dataloss (id int not null auto_increment primary key, value int);insert into d.dataloss (value) values (1), (2), (3);'

[root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

+----+-------+

[root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

+----+-------+

[root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

+----+-------+

Situation after above:

Node1:

[root@patgal1 ~]# mysql -pr -e'show global status like "wsrep%";'

+-------------------------------+-------------------------------------------------------+

| Variable_name                 | Value                                                 |

+-------------------------------+-------------------------------------------------------+

| wsrep_applier_thread_count    | 1                                                     |

| wsrep_apply_oooe              | 0.000000                                              |

| wsrep_apply_oool              | 0.000000                                              |

| wsrep_apply_window            | 1.000000                                              |

| wsrep_causal_reads            | 0                                                     |

| wsrep_cert_deps_distance      | 1.000000                                              |

| wsrep_cert_index_size         | 5                                                     |

| wsrep_cert_interval           | 0.000000                                              |

| wsrep_cluster_conf_id         | 3                                                     |

| wsrep_cluster_size            | 3                                                     |

| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_cluster_status          | Primary                                               |

| wsrep_cluster_weight          | 3                                                     |

| wsrep_commit_oooe             | 0.000000                                              |

| wsrep_commit_oool             | 0.000000                                              |

| wsrep_commit_window           | 1.000000                                              |

| wsrep_connected               | ON                                                    |

| wsrep_desync_count            | 0                                                     |

| wsrep_evs_delayed             |                                                       |

| wsrep_evs_evict_list          |                                                       |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |

| wsrep_evs_state               | OPERATIONAL                                           |

| wsrep_flow_control_paused     | 0.000000                                              |

| wsrep_flow_control_paused_ns  | 0                                                     |

| wsrep_flow_control_recv       | 0                                                     |

| wsrep_flow_control_sent       | 0                                                     |

| wsrep_gcomm_uuid              | f1120258-c51e-11ea-8b48-cb8ed6394b53                  |

| wsrep_incoming_addresses      | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 |

| wsrep_last_committed          | 24                                                    |

| wsrep_local_bf_aborts         | 0                                                     |

| wsrep_local_cached_downto     | 22                                                    |

| wsrep_local_cert_failures     | 0                                                     |

| wsrep_local_commits           | 1                                                     |

| wsrep_local_index             | 0                                                     |

| wsrep_local_recv_queue        | 0                                                     |

| wsrep_local_recv_queue_avg    | 0.000000                                              |

| wsrep_local_recv_queue_max    | 1                                                     |

| wsrep_local_recv_queue_min    | 0                                                     |

| wsrep_local_replays           | 0                                                     |

| wsrep_local_send_queue        | 0                                                     |

| wsrep_local_send_queue_avg    | 0.000000                                              |

| wsrep_local_send_queue_max    | 1                                                     |

| wsrep_local_send_queue_min    | 0                                                     |

| wsrep_local_state             | 4                                                     |

| wsrep_local_state_comment     | Synced                                                |

| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_open_connections        | 0                                                     |

| wsrep_open_transactions       | 0                                                     |

| wsrep_protocol_version        | 9                                                     |

| wsrep_provider_name           | Galera                                                |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |

| wsrep_provider_version        | 25.3.29(r3902)                                        |

| wsrep_ready                   | ON                                                    |

| wsrep_received                | 4                                                     |

| wsrep_received_bytes          | 626                                                   |

| wsrep_repl_data_bytes         | 969                                                   |

| wsrep_repl_keys               | 8                                                     |

| wsrep_repl_keys_bytes         | 136                                                   |

| wsrep_repl_other_bytes        | 0                                                     |

| wsrep_replicated              | 3                                                     |

| wsrep_replicated_bytes        | 1312                                                  |

| wsrep_rollbacker_thread_count | 1                                                     |

| wsrep_thread_count            | 2                                                     |

+-------------------------------+-------------------------------------------------------+

Node2:

[root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";'

+-------------------------------+-------------------------------------------------------+

| Variable_name                 | Value                                                 |

+-------------------------------+-------------------------------------------------------+

| wsrep_applier_thread_count    | 1                                                     |

| wsrep_apply_oooe              | 0.000000                                              |

| wsrep_apply_oool              | 0.000000                                              |

| wsrep_apply_window            | 1.000000                                              |

| wsrep_causal_reads            | 0                                                     |

| wsrep_cert_deps_distance      | 1.000000                                              |

| wsrep_cert_index_size         | 5                                                     |

| wsrep_cert_interval           | 0.000000                                              |

| wsrep_cluster_conf_id         | 3                                                     |

| wsrep_cluster_size            | 3                                                     |

| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_cluster_status          | Primary                                               |

| wsrep_cluster_weight          | 3                                                     |

| wsrep_commit_oooe             | 0.000000                                              |

| wsrep_commit_oool             | 0.000000                                              |

| wsrep_commit_window           | 1.000000                                              |

| wsrep_connected               | ON                                                    |

| wsrep_desync_count            | 0                                                     |

| wsrep_evs_delayed             |                                                       |

| wsrep_evs_evict_list          |                                                       |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |

| wsrep_evs_state               | OPERATIONAL                                           |

| wsrep_flow_control_paused     | 0.000000                                              |

| wsrep_flow_control_paused_ns  | 0                                                     |

| wsrep_flow_control_recv       | 0                                                     |

| wsrep_flow_control_sent       | 0                                                     |

| wsrep_gcomm_uuid              | f8c46db5-c51e-11ea-8095-6ffbd7cfa539                  |

| wsrep_incoming_addresses      | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 |

| wsrep_last_committed          | 24                                                    |

| wsrep_local_bf_aborts         | 0                                                     |

| wsrep_local_cached_downto     | 22                                                    |

| wsrep_local_cert_failures     | 0                                                     |

| wsrep_local_commits           | 0                                                     |

| wsrep_local_index             | 1                                                     |

| wsrep_local_recv_queue        | 0                                                     |

| wsrep_local_recv_queue_avg    | 0.000000                                              |

| wsrep_local_recv_queue_max    | 1                                                     |

| wsrep_local_recv_queue_min    | 0                                                     |

| wsrep_local_replays           | 0                                                     |

| wsrep_local_send_queue        | 0                                                     |

| wsrep_local_send_queue_avg    | 0.000000                                              |

| wsrep_local_send_queue_max    | 1                                                     |

| wsrep_local_send_queue_min    | 0                                                     |

| wsrep_local_state             | 4                                                     |

| wsrep_local_state_comment     | Synced                                                |

| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_open_connections        | 0                                                     |

| wsrep_open_transactions       | 0                                                     |

| wsrep_protocol_version        | 9                                                     |

| wsrep_provider_name           | Galera                                                |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |

| wsrep_provider_version        | 25.3.29(r3902)                                        |

| wsrep_ready                   | ON                                                    |

| wsrep_received                | 6                                                     |

| wsrep_received_bytes          | 1803                                                  |

| wsrep_repl_data_bytes         | 0                                                     |

| wsrep_repl_keys               | 0                                                     |

| wsrep_repl_keys_bytes         | 0                                                     |

| wsrep_repl_other_bytes        | 0                                                     |

| wsrep_replicated              | 0                                                     |

| wsrep_replicated_bytes        | 0                                                     |

| wsrep_rollbacker_thread_count | 1                                                     |

| wsrep_thread_count            | 2                                                     |

+-------------------------------+-------------------------------------------------------+

3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

4. Join upgraded Node2 to the cluster:

[root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";'

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |

| wsrep_protocol_version        | 9                                                                                                                                              |

| wsrep_last_committed          | 24                                                                                                                                             |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 2                                                                                                                                              |

| wsrep_received_bytes          | 280                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | -1                                                                                                                                             |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 0                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 0                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | AUTO,172.20.3.101:3306,172.20.3.103:3306                                                                                                       |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | 332a2e12-c525-11ea-be26-4ed9b6694f67                                                                                                           |

| wsrep_applier_thread_count    | 1                                                                                                                                              |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 10                                                                                                                                             |

| wsrep_cluster_size            | 3                                                                                                                                              |

| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 0                                                                                                                                              |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 2                                                                                                                                              |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

wsrep_cluster_size and wsrep_local_index on Node2:

wsrep_cluster_size	3
wsrep_local_index	0

5. Recheck the content of table dataloss on 3 nodes:

[root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

+----+-------+

[root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

+----+-------+

[root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

+----+-------+

6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3:

[root@patgal1 ~]# mysql -pr -e'insert into d.dataloss (value) values (4);'

[root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

| 11 |     4 |

+----+-------+

[root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

| 11 |     4 |

+----+-------+

[root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  1 |     1 |

|  4 |     2 |

|  7 |     3 |

| 11 |     4 |

+----+-------+

As you may see there are no any related errors or data loss here.

Stepan Patryshev (Inactive) added a comment - 2020-07-13 17:40 - edited rpizzi I have passed the steps again without any failures. PFA all logs and cnf files . Steps: 1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902). 2. On Node1 create a table and insert data in it. [root@patgal1 ~]# mysql -pr -e'CREATE DATABASE d;create table d.dataloss (id int not null auto_increment primary key, value int);insert into d.dataloss (value) values (1), (2), (3);' [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ Situation after above: Node1: [root@patgal1 ~]# mysql -pr -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | f1120258-c51e-11ea-8b48-cb8ed6394b53 | | wsrep_incoming_addresses | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 | | wsrep_last_committed | 24 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 22 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 1 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 4 | | wsrep_received_bytes | 626 | | wsrep_repl_data_bytes | 969 | | wsrep_repl_keys | 8 | | wsrep_repl_keys_bytes | 136 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 3 | | wsrep_replicated_bytes | 1312 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ Node2: [root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | f8c46db5-c51e-11ea-8095-6ffbd7cfa539 | | wsrep_incoming_addresses | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 | | wsrep_last_committed | 24 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 22 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 1 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 6 | | wsrep_received_bytes | 1803 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ 3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599). 4. Join upgraded Node2 to the cluster: [root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";' +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 24 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | -1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | AUTO,172.20.3.101:3306,172.20.3.103:3306 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 332a2e12-c525-11ea-be26-4ed9b6694f67 | | wsrep_applier_thread_count | 1 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 10 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 0 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ wsrep_cluster_size and wsrep_local_index on Node2: wsrep_cluster_size 3 wsrep_local_index 0 5. Recheck the content of table dataloss on 3 nodes: [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ 6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3: [root@patgal1 ~]# mysql -pr -e'insert into d.dataloss (value) values (4);' [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | | 11 | 4 | +----+-------+ [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | | 11 | 4 | +----+-------+ [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | | 11 | 4 | +----+-------+ As you may see there are no any related errors or data loss here.

Rick Pizzi (Inactive) added a comment - 2020-07-14 06:12

You aren't reproducing the issue.

Can you please explicit step 3 in details?
When you say:

 3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

We would like to see the exact steps used to do this as this is where you probably are doing things
differently. Please paste relevant part of history file.

Thanks
Rick

Rick Pizzi (Inactive) added a comment - 2020-07-14 06:12 You aren't reproducing the issue. Can you please explicit step 3 in details? When you say: 3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599). We would like to see the exact steps used to do this as this is where you probably are doing things differently. Please paste relevant part of history file. Thanks Rick

Rick Pizzi (Inactive) added a comment - 2020-07-14 06:28

OK, by looking at the output of the patgal2 session (both yesterday and the other day) we see this:

root@patgal2 ~]#   systemctl stop mariadb

[root@patgal2 ~]# systemctl start mariadb

[root@patgal2 ~]#

[root@patgal2 ~]# systemctl stop mariadb

Basically after upgrading node 2 to 10.4 you start the server with wsrep ON, run mysql_upgrade then shut down and set wsrep to OFF and start again. This is not what we have specified in the ticket.

Please repeat EXACT steps we have posted. In other words: after upgrading packages you need to start with WSREP OFF not ON.

Thanks
Rick

Rick Pizzi (Inactive) added a comment - 2020-07-14 06:28 OK, by looking at the output of the patgal2 session (both yesterday and the other day) we see this: root@patgal2 ~]# systemctl stop mariadb [root@patgal2 ~]# systemctl start mariadb [root@patgal2 ~]# [root@patgal2 ~]# systemctl stop mariadb Basically after upgrading node 2 to 10.4 you start the server with wsrep ON, run mysql_upgrade then shut down and set wsrep to OFF and start again. This is not what we have specified in the ticket. Please repeat EXACT steps we have posted. In other words: after upgrading packages you need to start with WSREP OFF not ON. Thanks Rick

Rick Pizzi (Inactive) added a comment - 2020-07-14 07:40

Re-reading the entire ticket I see that there was some confusion about this WSREP_ON = OFF thing, as Massimo (original bug submitter) said to start with off, run upgrade, stop and start with on, while in my test I don't play with that at all.

The bottom line of all this is: the FIRST time you start MariaDB on node2 with WSREP enabled, you get that weird cluster index and cluster_size=0 and it is in that moment that any data inserted in other nodes does not reach node2.

If you start node2 twice with WSREP enabled the problem does not appear because the 2nd restart (which you always seem to do, see above) "clears" the weird situation.

So, once again, to properly test this DO NOT touch the WSREP_ON variable, leave it on, but after upgrading packages start node2 only once, not twice. You will see the weird cluster index and size values - in that situation you will see that any row inserted on other nodes is lost (does not reach node2)

Rick Pizzi (Inactive) added a comment - 2020-07-14 07:40 Re-reading the entire ticket I see that there was some confusion about this WSREP_ON = OFF thing, as Massimo (original bug submitter) said to start with off, run upgrade, stop and start with on, while in my test I don't play with that at all. The bottom line of all this is: the FIRST time you start MariaDB on node2 with WSREP enabled, you get that weird cluster index and cluster_size=0 and it is in that moment that any data inserted in other nodes does not reach node2 . If you start node2 twice with WSREP enabled the problem does not appear because the 2nd restart (which you always seem to do, see above) "clears" the weird situation. So, once again, to properly test this DO NOT touch the WSREP_ON variable, leave it on, but after upgrading packages start node2 only once, not twice . You will see the weird cluster index and size values - in that situation you will see that any row inserted on other nodes is lost (does not reach node2)

Stepan Patryshev (Inactive) added a comment - 2020-07-14 07:51 - edited

@rpizzi You are wrong here. As you may see in "20200713_patgal2_output.log" on the line 165 there is "wsrep_on=OFF" before running upgraded server. The only diference is that I did it even before upgrade.
And in "20200713_patgal2.err" the first run of 10.4.13 is on the line 494: "2020-07-13 19:13:38 0 [Note] InnoDB: 10.4.13 started", and the 1-st attemt to load WSREP provider on 10.4.13 logged later on the line 515 "2020-07-13 19:19:39 0 [Note] WSREP: Loading provider".
And here you are the history fragment:

  262  systemctl start mariadb

  263  mysql -pr -e'select * from d.dataloss;'

  264  mysql -pr -e'show global status like "wsrep%";'

  265  systemctl stop mariadb

  266  vi /etc/my.cnf.d/server2.cnf

  267  cat /etc/yum.repos.d/mariadb.repo

  268  curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4

  269  cat /etc/yum.repos.d/mariadb.repo

  270  yum list installed | grep galera

  271  yum list installed | grep MariaDB

  272  sudo yum remove MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common

  273  yum list installed | grep galera

  274  yum list installed | grep MariaDB

  275  yum install MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common

  276  yum list installed | grep MariaDB

  277  yum list installed | grep galera

  278  systemctl start mariadb

  279  mysql_upgrade -s

  280  mysql_upgrade -s -pr

  281  systemctl stop mariadb

  282  vi /etc/my.cnf.d/server.cnf

  283  vi /etc/my.cnf.d/server2.cnf

  284  systemctl start mariadb

  285  vi /etc/my.cnf.d/server2.cnf

  286  systemctl start mariadb

  287  mysql -pr -e'show global status like "wsrep%";'

  288  mysql -pr -e'select * from d.dataloss;'

Anyway I will try to do it more closer to your steps.

Stepan Patryshev (Inactive) added a comment - 2020-07-14 07:51 - edited @ rpizzi You are wrong here. As you may see in "20200713_patgal2_output.log" on the line 165 there is "wsrep_on=OFF" before running upgraded server. The only diference is that I did it even before upgrade. And in "20200713_patgal2.err" the first run of 10.4.13 is on the line 494: "2020-07-13 19:13:38 0 [Note] InnoDB: 10.4.13 started", and the 1-st attemt to load WSREP provider on 10.4.13 logged later on the line 515 "2020-07-13 19:19:39 0 [Note] WSREP: Loading provider". And here you are the history fragment: 262 systemctl start mariadb 263 mysql -pr -e'select * from d.dataloss;' 264 mysql -pr -e'show global status like "wsrep%";' 265 systemctl stop mariadb 266 vi /etc/my.cnf.d/server2.cnf 267 cat /etc/yum.repos.d/mariadb.repo 268 curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4 269 cat /etc/yum.repos.d/mariadb.repo 270 yum list installed | grep galera 271 yum list installed | grep MariaDB 272 sudo yum remove MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common 273 yum list installed | grep galera 274 yum list installed | grep MariaDB 275 yum install MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common 276 yum list installed | grep MariaDB 277 yum list installed | grep galera 278 systemctl start mariadb 279 mysql_upgrade -s 280 mysql_upgrade -s -pr 281 systemctl stop mariadb 282 vi /etc/my.cnf.d/server.cnf 283 vi /etc/my.cnf.d/server2.cnf 284 systemctl start mariadb 285 vi /etc/my.cnf.d/server2.cnf 286 systemctl start mariadb 287 mysql -pr -e'show global status like "wsrep%";' 288 mysql -pr -e'select * from d.dataloss;' Anyway I will try to do it more closer to your steps.

Rick Pizzi (Inactive) added a comment - 2020-07-14 08:17

To verify the bug DO NOT start node2 more than once after upgrading. That's it.

Rick Pizzi (Inactive) added a comment - 2020-07-14 08:17 To verify the bug DO NOT start node2 more than once after upgrading. That's it.

Stepan Patryshev (Inactive) made changes - 2020-07-14 10:27

Attachment

20200714_MDEV-22723_patgal_no_errors.zip [ 52793 ]

Stepan Patryshev (Inactive) made changes - 2020-07-14 10:36

Attachment

20200714_MDEV-22723_patgal_no_errors.zip [ 52794 ]

Stepan Patryshev (Inactive) made changes - 2020-07-14 10:37

Attachment

20200714_~~MDEV-22723~~_patgal_no_errors.zip [ 52793 ]

Stepan Patryshev (Inactive) made changes - 2020-07-14 10:52

Attachment

20200714_~~MDEV-22723~~_patgal_no_errors.zip [ 52794 ]

Stepan Patryshev (Inactive) made changes - 2020-07-14 10:52

Attachment

20200714_MDEV-22723_patgal_no_errors.zip [ 52796 ]

Stepan Patryshev (Inactive) added a comment - 2020-07-14 10:58 - edited

@rpizzi It has not helped. I have not changed WSREP_ON at all and run the upgraded server only once. And it has passed again without any failures or data loss. Please, share exact steps how do you install and update packages. PFA all logs and cnf files.

Steps:

1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902).

2. On Node1 create a table and insert data in it.

[root@patgal1 ~]# mysql -e'create database d;'

[root@patgal1 ~]# mysql -e'create table d.dataloss (id int not null auto_increment primary key, value int)

;'

[root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (1), (2), (3);'

[root@patgal1 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

2.1. Check that data are propagated successfully to other nodes:

[root@patgal2 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

[root@patgal3 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

2.2. Situation after above:

Node1:

[root@patgal1 ~]# mysql -e'show global status like "wsrep%";'

+-------------------------------+-------------------------------------------------------+

| Variable_name                 | Value                                                 |

+-------------------------------+-------------------------------------------------------+

| wsrep_applier_thread_count    | 1                                                     |

| wsrep_apply_oooe              | 0.000000                                              |

| wsrep_apply_oool              | 0.000000                                              |

| wsrep_apply_window            | 1.000000                                              |

| wsrep_causal_reads            | 0                                                     |

| wsrep_cert_deps_distance      | 1.000000                                              |

| wsrep_cert_index_size         | 5                                                     |

| wsrep_cert_interval           | 0.000000                                              |

| wsrep_cluster_conf_id         | 3                                                     |

| wsrep_cluster_size            | 3                                                     |

| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_cluster_status          | Primary                                               |

| wsrep_cluster_weight          | 3                                                     |

| wsrep_commit_oooe             | 0.000000                                              |

| wsrep_commit_oool             | 0.000000                                              |

| wsrep_commit_window           | 1.000000                                              |

| wsrep_connected               | ON                                                    |

| wsrep_desync_count            | 0                                                     |

| wsrep_evs_delayed             |                                                       |

| wsrep_evs_evict_list          |                                                       |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |

| wsrep_evs_state               | OPERATIONAL                                           |

| wsrep_flow_control_paused     | 0.000000                                              |

| wsrep_flow_control_paused_ns  | 0                                                     |

| wsrep_flow_control_recv       | 0                                                     |

| wsrep_flow_control_sent       | 0                                                     |

| wsrep_gcomm_uuid              | fed13746-c5b4-11ea-a5fe-a6a8e8ca175a                  |

| wsrep_incoming_addresses      | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 |

| wsrep_last_committed          | 6                                                     |

| wsrep_local_bf_aborts         | 0                                                     |

| wsrep_local_cached_downto     | 4                                                     |

| wsrep_local_cert_failures     | 0                                                     |

| wsrep_local_commits           | 1                                                     |

| wsrep_local_index             | 2                                                     |

| wsrep_local_recv_queue        | 0                                                     |

| wsrep_local_recv_queue_avg    | 0.000000                                              |

| wsrep_local_recv_queue_max    | 1                                                     |

| wsrep_local_recv_queue_min    | 0                                                     |

| wsrep_local_replays           | 0                                                     |

| wsrep_local_send_queue        | 0                                                     |

| wsrep_local_send_queue_avg    | 0.000000                                              |

| wsrep_local_send_queue_max    | 1                                                     |

| wsrep_local_send_queue_min    | 0                                                     |

| wsrep_local_state             | 4                                                     |

| wsrep_local_state_comment     | Synced                                                |

| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_open_connections        | 0                                                     |

| wsrep_open_transactions       | 0                                                     |

| wsrep_protocol_version        | 9                                                     |

| wsrep_provider_name           | Galera                                                |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |

| wsrep_provider_version        | 25.3.29(r3902)                                        |

| wsrep_ready                   | ON                                                    |

| wsrep_received                | 10                                                    |

| wsrep_received_bytes          | 782                                                   |

| wsrep_repl_data_bytes         | 969                                                   |

| wsrep_repl_keys               | 8                                                     |

| wsrep_repl_keys_bytes         | 136                                                   |

| wsrep_repl_other_bytes        | 0                                                     |

| wsrep_replicated              | 3                                                     |

| wsrep_replicated_bytes        | 1312                                                  |

| wsrep_rollbacker_thread_count | 1                                                     |

| wsrep_thread_count            | 2                                                     |

+-------------------------------+-------------------------------------------------------+

Node2:

[root@patgal2 ~]# mysql -e'show global status like "wsrep%";'

+-------------------------------+-------------------------------------------------------+

| Variable_name                 | Value                                                 |

+-------------------------------+-------------------------------------------------------+

| wsrep_applier_thread_count    | 1                                                     |

| wsrep_apply_oooe              | 0.000000                                              |

| wsrep_apply_oool              | 0.000000                                              |

| wsrep_apply_window            | 1.000000                                              |

| wsrep_causal_reads            | 0                                                     |

| wsrep_cert_deps_distance      | 1.000000                                              |

| wsrep_cert_index_size         | 5                                                     |

| wsrep_cert_interval           | 0.000000                                              |

| wsrep_cluster_conf_id         | 3                                                     |

| wsrep_cluster_size            | 3                                                     |

| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_cluster_status          | Primary                                               |

| wsrep_cluster_weight          | 3                                                     |

| wsrep_commit_oooe             | 0.000000                                              |

| wsrep_commit_oool             | 0.000000                                              |

| wsrep_commit_window           | 1.000000                                              |

| wsrep_connected               | ON                                                    |

| wsrep_desync_count            | 0                                                     |

| wsrep_evs_delayed             |                                                       |

| wsrep_evs_evict_list          |                                                       |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                             |

| wsrep_evs_state               | OPERATIONAL                                           |

| wsrep_flow_control_paused     | 0.000000                                              |

| wsrep_flow_control_paused_ns  | 0                                                     |

| wsrep_flow_control_recv       | 0                                                     |

| wsrep_flow_control_sent       | 0                                                     |

| wsrep_gcomm_uuid              | 11a7b1fd-c5b5-11ea-9a59-5e4e35dabad1                  |

| wsrep_incoming_addresses      | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 |

| wsrep_last_committed          | 6                                                     |

| wsrep_local_bf_aborts         | 0                                                     |

| wsrep_local_cached_downto     | 4                                                     |

| wsrep_local_cert_failures     | 0                                                     |

| wsrep_local_commits           | 0                                                     |

| wsrep_local_index             | 0                                                     |

| wsrep_local_recv_queue        | 0                                                     |

| wsrep_local_recv_queue_avg    | 0.142857                                              |

| wsrep_local_recv_queue_max    | 2                                                     |

| wsrep_local_recv_queue_min    | 0                                                     |

| wsrep_local_replays           | 0                                                     |

| wsrep_local_send_queue        | 0                                                     |

| wsrep_local_send_queue_avg    | 0.000000                                              |

| wsrep_local_send_queue_max    | 1                                                     |

| wsrep_local_send_queue_min    | 0                                                     |

| wsrep_local_state             | 4                                                     |

| wsrep_local_state_comment     | Synced                                                |

| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |

| wsrep_open_connections        | 0                                                     |

| wsrep_open_transactions       | 0                                                     |

| wsrep_protocol_version        | 9                                                     |

| wsrep_provider_name           | Galera                                                |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |

| wsrep_provider_version        | 25.3.29(r3902)                                        |

| wsrep_ready                   | ON                                                    |

| wsrep_received                | 7                                                     |

| wsrep_received_bytes          | 1811                                                  |

| wsrep_repl_data_bytes         | 0                                                     |

| wsrep_repl_keys               | 0                                                     |

| wsrep_repl_keys_bytes         | 0                                                     |

| wsrep_repl_other_bytes        | 0                                                     |

| wsrep_replicated              | 0                                                     |

| wsrep_replicated_bytes        | 0                                                     |

| wsrep_rollbacker_thread_count | 1                                                     |

| wsrep_thread_count            | 2                                                     |

+-------------------------------+-------------------------------------------------------+

3. On Node2 shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

3.1. systemctl stop mariadb
3.2. https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s – --mariadb-server-version=mariadb-10.4
3.3. yum remove MariaDB galera
3.4. yum install MariaDB galera
3.5. rm /etc/my.cnf.d/server.cnf
3.6. Update "wsrep_provider" value to "/usr/lib64/galera-4/libgalera_smm.so" in "/etc/my.cnf.d/server2.cnf".
3.7. systemctl start mariadb

3.8. mysql_upgrade -s

The --upgrade-system-tables option was used, user tables won't be touched.

Phase 1/7: Checking and upgrading mysql database

Processing databases

mysql

mysql.column_stats                                 OK

mysql.columns_priv                                 OK

mysql.db                                           OK

mysql.event                                        OK

mysql.func                                         OK

mysql.gtid_slave_pos                               OK

mysql.help_category                                OK

mysql.help_keyword                                 OK

mysql.help_relation                                OK

mysql.help_topic                                   OK

mysql.host                                         OK

mysql.index_stats                                  OK

mysql.innodb_index_stats                           OK

mysql.innodb_table_stats                           OK

mysql.plugin                                       OK

mysql.proc                                         OK

mysql.procs_priv                                   OK

mysql.proxies_priv                                 OK

mysql.roles_mapping                                OK

mysql.servers                                      OK

mysql.table_stats                                  OK

mysql.tables_priv                                  OK

mysql.time_zone                                    OK

mysql.time_zone_leap_second                        OK

mysql.time_zone_name                               OK

mysql.time_zone_transition                         OK

mysql.time_zone_transition_type                    OK

mysql.transaction_registry                         OK

mysql.user                                         OK

mysql.wsrep_cluster                                OK

mysql.wsrep_cluster_members                        OK

mysql.wsrep_streaming_log                          OK

Phase 2/7: Installing used storage engines... Skipped

Phase 3/7: Fixing views... Skipped

Phase 4/7: Running 'mysql_fix_privilege_tables'

Phase 5/7: Fixing table and database names ... Skipped

Phase 6/7: Checking and upgrading tables... Skipped

Phase 7/7: Running 'FLUSH PRIVILEGES'

OK

[root@patgal2 ~]# mysql -e'show global status like "wsrep%";'

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |

| wsrep_protocol_version        | 9                                                                                                                                              |

| wsrep_last_committed          | 6                                                                                                                                              |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 2                                                                                                                                              |

| wsrep_received_bytes          | 280                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | -1                                                                                                                                             |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 0                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 0                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | 172.20.3.103:3306,AUTO,172.20.3.101:3306                                                                                                       |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | 4a75dc41-c5ba-11ea-a6f4-4b9ef7fb8a13                                                                                                           |

| wsrep_applier_thread_count    | 1                                                                                                                                              |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 6                                                                                                                                              |

| wsrep_cluster_size            | 3                                                                                                                                              |

| wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 1                                                                                                                                              |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 2                                                                                                                                              |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

wsrep_cluster_size and wsrep_local_index on Node2:

wsrep_cluster_size	3
wsrep_local_index	1

5. Recheck the content of table dataloss on 3 nodes:

root@patgal1 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

[root@patgal2 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

[root@patgal3 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

+----+-------+

6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3:

[root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (4);'

[root@patgal1 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

| 12 |     4 |

+----+-------+

[root@patgal2 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

| 12 |     4 |

+----+-------+

[root@patgal3 ~]# mysql -e'select * from d.dataloss;'

+----+-------+

| id | value |

+----+-------+

|  3 |     1 |

|  6 |     2 |

|  9 |     3 |

| 12 |     4 |

+----+-------+

And here you are the history fragment for the Node2:

  211  date

  212  ps -ef | grep mysqld

  213  systemctl start mariadb

  214  mysql -e'select * from d.dataloss;'

  215  mysql -e'show global status like "wsrep%";'

  216  systemctl stop mariadb

  217  cat /etc/yum.repos.d/mariadb.repo

  218  curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4

  219  cat /etc/yum.repos.d/mariadb.repo

  220  yum list installed | grep galera

  221  yum list installed | grep MariaDB

  222  yum remove MariaDB galera

  223  yum list installed | grep galera

  224  yum list installed | grep MariaDB

  225  yum install MariaDB galera

  226  yum list installed | grep MariaDB

  227  yum list installed | grep galera

  228  rm /etc/my.cnf.d/server.cnf

  229  vi /etc/my.cnf.d/server2.cnf

  230  cat /etc/my.cnf.d/server2.cnf

  231  ls -al /usr/lib64/galera-4/libgalera_smm.so

  232  systemctl start mariadb

  233  mysql_upgrade -s

  234  mysql -e'show global status like "wsrep%";'

  235  mysql -e'select * from d.dataloss;'

Stepan Patryshev (Inactive) added a comment - 2020-07-14 10:58 - edited @ rpizzi It has not helped. I have not changed WSREP_ON at all and run the upgraded server only once. And it has passed again without any failures or data loss. Please, share exact steps how do you install and update packages. PFA all logs and cnf files . Steps: 1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902). 2. On Node1 create a table and insert data in it. [root@patgal1 ~]# mysql -e'create database d;' [root@patgal1 ~]# mysql -e'create table d.dataloss (id int not null auto_increment primary key, value int) ;' [root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (1), (2), (3);' [root@patgal1 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 2.1. Check that data are propagated successfully to other nodes: [root@patgal2 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ [root@patgal3 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 2.2. Situation after above: Node1: [root@patgal1 ~]# mysql -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | fed13746-c5b4-11ea-a5fe-a6a8e8ca175a | | wsrep_incoming_addresses | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 | | wsrep_last_committed | 6 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 4 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 1 | | wsrep_local_index | 2 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 10 | | wsrep_received_bytes | 782 | | wsrep_repl_data_bytes | 969 | | wsrep_repl_keys | 8 | | wsrep_repl_keys_bytes | 136 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 3 | | wsrep_replicated_bytes | 1312 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ Node2: [root@patgal2 ~]# mysql -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 11a7b1fd-c5b5-11ea-9a59-5e4e35dabad1 | | wsrep_incoming_addresses | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 | | wsrep_last_committed | 6 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 4 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.142857 | | wsrep_local_recv_queue_max | 2 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 7 | | wsrep_received_bytes | 1811 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ 3. On Node2 shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599). 3.1. systemctl stop mariadb 3.2. https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s – --mariadb-server-version=mariadb-10.4 3.3. yum remove MariaDB galera 3.4. yum install MariaDB galera 3.5. rm /etc/my.cnf.d/server.cnf 3.6. Update "wsrep_provider" value to "/usr/lib64/galera-4/libgalera_smm.so" in "/etc/my.cnf.d/server2.cnf". 3.7. systemctl start mariadb 3.8. mysql_upgrade -s The --upgrade-system-tables option was used, user tables won't be touched. Phase 1/7: Checking and upgrading mysql database Processing databases mysql mysql.column_stats OK mysql.columns_priv OK mysql.db OK mysql.event OK mysql.func OK mysql.gtid_slave_pos OK mysql.help_category OK mysql.help_keyword OK mysql.help_relation OK mysql.help_topic OK mysql.host OK mysql.index_stats OK mysql.innodb_index_stats OK mysql.innodb_table_stats OK mysql.plugin OK mysql.proc OK mysql.procs_priv OK mysql.proxies_priv OK mysql.roles_mapping OK mysql.servers OK mysql.table_stats OK mysql.tables_priv OK mysql.time_zone OK mysql.time_zone_leap_second OK mysql.time_zone_name OK mysql.time_zone_transition OK mysql.time_zone_transition_type OK mysql.transaction_registry OK mysql.user OK mysql.wsrep_cluster OK mysql.wsrep_cluster_members OK mysql.wsrep_streaming_log OK Phase 2/7: Installing used storage engines... Skipped Phase 3/7: Fixing views... Skipped Phase 4/7: Running 'mysql_fix_privilege_tables' Phase 5/7: Fixing table and database names ... Skipped Phase 6/7: Checking and upgrading tables... Skipped Phase 7/7: Running 'FLUSH PRIVILEGES' OK 4. [root@patgal2 ~]# mysql -e'show global status like "wsrep%";' +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 6 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | -1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 172.20.3.103:3306,AUTO,172.20.3.101:3306 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 4a75dc41-c5ba-11ea-a6f4-4b9ef7fb8a13 | | wsrep_applier_thread_count | 1 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 6 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 1 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ wsrep_cluster_size and wsrep_local_index on Node2: wsrep_cluster_size 3 wsrep_local_index 1 5. Recheck the content of table dataloss on 3 nodes: root@patgal1 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ [root@patgal2 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ [root@patgal3 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3: [root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (4);' [root@patgal1 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ [root@patgal2 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ [root@patgal3 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ And here you are the history fragment for the Node2: 211 date 212 ps -ef | grep mysqld 213 systemctl start mariadb 214 mysql -e'select * from d.dataloss;' 215 mysql -e'show global status like "wsrep%";' 216 systemctl stop mariadb 217 cat /etc/yum.repos.d/mariadb.repo 218 curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4 219 cat /etc/yum.repos.d/mariadb.repo 220 yum list installed | grep galera 221 yum list installed | grep MariaDB 222 yum remove MariaDB galera 223 yum list installed | grep galera 224 yum list installed | grep MariaDB 225 yum install MariaDB galera 226 yum list installed | grep MariaDB 227 yum list installed | grep galera 228 rm /etc/my.cnf.d/server.cnf 229 vi /etc/my.cnf.d/server2.cnf 230 cat /etc/my.cnf.d/server2.cnf 231 ls -al /usr/lib64/galera-4/libgalera_smm.so 232 systemctl start mariadb 233 mysql_upgrade -s 234 mysql -e'show global status like "wsrep%";' 235 mysql -e'select * from d.dataloss;'

Massimo added a comment - 2020-07-14 12:54

For what i could understood from your steps, you are performing the INSERT, when all the nodes are up, nomatter which version. There is not IST perform from the node that you have upgrade, cause you are not writing there while the node2 is down. You have to see that the node2 request and perform an IST cause it has not all the data yet.

Massimo added a comment - 2020-07-14 12:54 For what i could understood from your steps, you are performing the INSERT, when all the nodes are up, nomatter which version. There is not IST perform from the node that you have upgrade, cause you are not writing there while the node2 is down. You have to see that the node2 request and perform an IST cause it has not all the data yet.

Rick Pizzi (Inactive) added a comment - 2020-07-14 13:51

It doesn't happen because in this test you have done, you do not get the node with cluster_size=0 and weird index id.
But you originally got that: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

Rick Pizzi (Inactive) added a comment - 2020-07-14 13:51 It doesn't happen because in this test you have done, you do not get the node with cluster_size=0 and weird index id. But you originally got that: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

Rick Pizzi (Inactive) added a comment - 2020-07-14 13:56

stepan.patryshev is there a reason why you don't use the conf file we supplied when trying this test, and use a different one that you built yourself? This is not a good way of testing bugs if you ask me. Please, try with the files we have supplied.

Thank you!

Rick Pizzi (Inactive) added a comment - 2020-07-14 13:56 stepan.patryshev is there a reason why you don't use the conf file we supplied when trying this test, and use a different one that you built yourself? This is not a good way of testing bugs if you ask me. Please, try with the files we have supplied. Thank you!

Stepan Patryshev (Inactive) added a comment - 2020-07-14 14:05

massimo.disaro Why IST should take place if according to the steps from the description and especially from the more detailed ones by @rpizzi (see https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-156703 ) INSERT is performed when upgraded node 2 is run with WSREP_ON=ON?
Please, point me what exactly should I try differently if you have any certain idea.

Stepan Patryshev (Inactive) added a comment - 2020-07-14 14:05 massimo.disaro Why IST should take place if according to the steps from the description and especially from the more detailed ones by @ rpizzi (see https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-156703 ) INSERT is performed when upgraded node 2 is run with WSREP_ON=ON? Please, point me what exactly should I try differently if you have any certain idea.

Stepan Patryshev (Inactive) added a comment - 2020-07-14 14:15

rpizzi Ok, that is what I was going to try next - to use your config files. When I used "./mtr --suite=galera_3nodes --start-and-exit" simulation during my first tests I tried to get maximum related stuff from the attached configs. But when I moved to the cluster with three VMs and installed packages I decided to try first just only configs which I managed to adjust and run the cluster.

Stepan Patryshev (Inactive) added a comment - 2020-07-14 14:15 rpizzi Ok, that is what I was going to try next - to use your config files. When I used "./mtr --suite=galera_3nodes --start-and-exit" simulation during my first tests I tried to get maximum related stuff from the attached configs. But when I moved to the cluster with three VMs and installed packages I decided to try first just only configs which I managed to adjust and run the cluster.

Stepan Patryshev (Inactive) made changes - 2020-07-14 16:22

Attachment

20200714_MDEV-22723_mdb_no_errors.zip [ 52806 ]

Stepan Patryshev (Inactive) added a comment - 2020-07-14 16:33 - edited

@rpizzi I have passed the steps again without any data loss or failures with the original configs: Node1 and Node2. Just changed only ip addresses. But I see there are some newer config files attached here.
Steps were exactly the same as described in my previous test.
PFA all logs and cnf files.

Stepan Patryshev (Inactive) added a comment - 2020-07-14 16:33 - edited @ rpizzi I have passed the steps again without any data loss or failures with the original configs: Node1 and Node2 . Just changed only ip addresses. But I see there are some newer config files attached here. Steps were exactly the same as described in my previous test . PFA all logs and cnf files .

Rick Pizzi (Inactive) added a comment - 2020-07-16 13:12

I'm stumped, especially because you were able to get the cluster size 0 in your first attempt, and now you don't get that anymore.
How is that possible is beyond me.

Rick Pizzi (Inactive) added a comment - 2020-07-16 13:12 I'm stumped, especially because you were able to get the cluster size 0 in your first attempt, and now you don't get that anymore. How is that possible is beyond me.

Rick Pizzi (Inactive) added a comment - 2020-07-16 13:16

What OS are you running on the VMs?

Rick Pizzi (Inactive) added a comment - 2020-07-16 13:16 What OS are you running on the VMs?

Stepan Patryshev (Inactive) added a comment - 2020-07-16 14:53

rpizzi CentOS Linux release 7.8.2003 (Core).

Stepan Patryshev (Inactive) added a comment - 2020-07-16 14:53 rpizzi CentOS Linux release 7.8.2003 (Core).

Rick Pizzi (Inactive) added a comment - 2020-07-16 14:59

Maybe that's the difference. Both customer and my lab is on CentOS Linux release 7.5.1804 (Core) .
Can you please retry on that OS version?

Thanks
Rick

Rick Pizzi (Inactive) added a comment - 2020-07-16 14:59 Maybe that's the difference. Both customer and my lab is on CentOS Linux release 7.5.1804 (Core) . Can you please retry on that OS version? Thanks Rick

Rick Pizzi (Inactive) added a comment - 2020-07-16 15:00

I think Massimo used 7.6 but customer has 7.5 so please test on that. Thanks

Rick Pizzi (Inactive) added a comment - 2020-07-16 15:00 I think Massimo used 7.6 but customer has 7.5 so please test on that. Thanks

Alexey made changes - 2020-07-16 18:20

Assignee

Seppo Jaakola [ seppo ]

Alexey [ yurchenko ]

Stepan Patryshev (Inactive) made changes - 2020-07-20 18:27

Attachment

20200720_MDEV-22723_CentOS_7.5_no_errors.zip [ 52884 ]

Stepan Patryshev (Inactive) added a comment - 2020-07-20 18:27 - edited

@rpizzi I have passed the steps again without any data loss or failures on CentOS 7.5.1804.
Steps were exactly the same as described here. Just small steps modifications were here:
3.3. yum remove MariaDB-server MariaDB-client MariaDB-backup galera
3.4. yum install MariaDB-common MariaDB-compat MariaDB-server MariaDB-backup MariaDB-client galera
PFA all logs and cnf files.

Stepan Patryshev (Inactive) added a comment - 2020-07-20 18:27 - edited @ rpizzi I have passed the steps again without any data loss or failures on CentOS 7.5.1804. Steps were exactly the same as described here . Just small steps modifications were here: 3.3. yum remove MariaDB-server MariaDB-client MariaDB-backup galera 3.4. yum install MariaDB-common MariaDB-compat MariaDB-server MariaDB-backup MariaDB-client galera PFA all logs and cnf files .

Alexey made changes - 2020-07-22 16:25

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Rick Pizzi (Inactive) added a comment - 2020-07-22 16:51

This is really odd.
Do you think you can retry with mtr?
And see if you still got the cluster_size=0 you got at the beginning?
Because that's the situation where data loss happens.

See your comment below:

https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

Rick Pizzi (Inactive) added a comment - 2020-07-22 16:51 This is really odd. Do you think you can retry with mtr? And see if you still got the cluster_size=0 you got at the beginning? Because that's the situation where data loss happens. See your comment below: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

Stepan Patryshev (Inactive) made changes - 2020-07-23 20:48

Attachment

20200723_MDEV-22723_data_loss.zip [ 52936 ]

Stepan Patryshev (Inactive) added a comment - 2020-07-23 20:48 - edited

@rpizzi It's really strange, but I have managed to reproduce the data loss, but not a crash, just with my scenario using MTR described here. I used Galera 25.3.28(r3875).
PFA all logs and cnf files. Please, ignore errors in mysqld.2.err around 22:17, I just forgot to shutdown a node and tried to run it again.

Stepan Patryshev (Inactive) added a comment - 2020-07-23 20:48 - edited @ rpizzi It's really strange, but I have managed to reproduce the data loss, but not a crash, just with my scenario using MTR described here . I used Galera 25.3.28(r3875). PFA all logs and cnf files . Please, ignore errors in mysqld.2.err around 22:17, I just forgot to shutdown a node and tried to run it again.

Stepan Patryshev (Inactive) added a comment - 2020-07-27 11:43 - edited

There are the detailed steps how I reproduced the data loss.
Release builds 10.3.23 + Galera 25.3.28(r3875) and 10.4.13 + Galera 26.4.4(r4599). PFA all logs and cnf files.

Steps:

1. ./mtr --suite=galera_3nodes --start-and-exit
2. Restart all nodes one by one with separate config files from here.

The cluster status on Node1 is:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"show global status like 'wsrep%';"

+-------------------------------+-------------------------------------------------+

| Variable_name                 | Value                                           |

+-------------------------------+-------------------------------------------------+

| wsrep_applier_thread_count    | 32                                              |

| wsrep_apply_oooe              | 0.000000                                        |

| wsrep_apply_oool              | 0.000000                                        |

| wsrep_apply_window            | 0.000000                                        |

| wsrep_causal_reads            | 0                                               |

| wsrep_cert_deps_distance      | 0.000000                                        |

| wsrep_cert_index_size         | 0                                               |

| wsrep_cert_interval           | 0.000000                                        |

| wsrep_cluster_conf_id         | 8                                               |

| wsrep_cluster_size            | 3                                               |

| wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |

| wsrep_cluster_status          | Primary                                         |

| wsrep_cluster_weight          | 3                                               |

| wsrep_commit_oooe             | 0.000000                                        |

| wsrep_commit_oool             | 0.000000                                        |

| wsrep_commit_window           | 0.000000                                        |

| wsrep_connected               | ON                                              |

| wsrep_desync_count            | 0                                               |

| wsrep_evs_delayed             |                                                 |

| wsrep_evs_evict_list          |                                                 |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                       |

| wsrep_evs_state               | OPERATIONAL                                     |

| wsrep_flow_control_paused     | 0.000000                                        |

| wsrep_flow_control_paused_ns  | 0                                               |

| wsrep_flow_control_recv       | 0                                               |

| wsrep_flow_control_sent       | 0                                               |

| wsrep_gcomm_uuid              | 0f038d23-cd0d-11ea-acd2-b7ff4121c102            |

| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 |

| wsrep_last_committed          | 0                                               |

| wsrep_local_bf_aborts         | 0                                               |

| wsrep_local_cached_downto     | 18446744073709551615                            |

| wsrep_local_cert_failures     | 0                                               |

| wsrep_local_commits           | 0                                               |

| wsrep_local_index             | 0                                               |

| wsrep_local_recv_queue        | 0                                               |

| wsrep_local_recv_queue_avg    | 0.000000                                        |

| wsrep_local_recv_queue_max    | 1                                               |

| wsrep_local_recv_queue_min    | 0                                               |

| wsrep_local_replays           | 0                                               |

| wsrep_local_send_queue        | 0                                               |

| wsrep_local_send_queue_avg    | 0.000000                                        |

| wsrep_local_send_queue_max    | 1                                               |

| wsrep_local_send_queue_min    | 0                                               |

| wsrep_local_state             | 4                                               |

| wsrep_local_state_comment     | Synced                                          |

| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |

| wsrep_open_connections        | 0                                               |

| wsrep_open_transactions       | 0                                               |

| wsrep_protocol_version        | 9                                               |

| wsrep_provider_name           | Galera                                          |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>               |

| wsrep_provider_version        | 25.3.28(r3875)                                  |

| wsrep_ready                   | ON                                              |

| wsrep_received                | 2                                               |

| wsrep_received_bytes          | 270                                             |

| wsrep_repl_data_bytes         | 0                                               |

| wsrep_repl_keys               | 0                                               |

| wsrep_repl_keys_bytes         | 0                                               |

| wsrep_repl_other_bytes        | 0                                               |

| wsrep_replicated              | 0                                               |

| wsrep_replicated_bytes        | 0                                               |

| wsrep_rollbacker_thread_count | 1                                               |

| wsrep_thread_count            | 33                                              |

+-------------------------------+-------------------------------------------------+

3. On the Node1 create a database and a table:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"create database d; create table d.evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));"

4. On the Node1 insert 3 rows:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into d.evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into d.evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');"

Data have been propageted to all the cluster:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"

+----+---------------+---------+

| Id | IdDispositivo | kkkk    |

+----+---------------+---------+

|  1 |           123 | aaaa    |

|  4 |           222 | eeeeaa  |

|  7 |      34523452 | e4r4r4  |

+----+---------------+---------+

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"

+----+---------------+---------+

| Id | IdDispositivo | kkkk    |

+----+---------------+---------+

|  1 |           123 | aaaa    |

|  4 |           222 | eeeeaa  |

|  7 |      34523452 | e4r4r4  |

+----+---------------+---------+

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"

+----+---------------+---------+

| Id | IdDispositivo | kkkk    |

+----+---------------+---------+

|  1 |           123 | aaaa    |

|  4 |           222 | eeeeaa  |

|  7 |      34523452 | e4r4r4  |

+----+---------------+---------+

5. Stop Node 2.
6. To check that IST works while Node2 is off insert 1 row on the Node1:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(888, 'While Node 2 is OFF');"

The new row is added on the Node1 and Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"

+----+---------------+---------------------+

| Id | IdDispositivo | kkkk                |

+----+---------------+---------------------+

|  1 |           123 | aaaa                |

|  4 |           222 | eeeeaa              |

|  7 |      34523452 | e4r4r4              |

| 11 |           888 | While Node 2 is OFF |

+----+---------------+---------------------+

[stepan@cnt7glr11 mysql-test]$ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"

+----+---------------+---------------------+

| Id | IdDispositivo | kkkk                |

+----+---------------+---------------------+

|  1 |           123 | aaaa                |

|  4 |           222 | eeeeaa              |

|  7 |      34523452 | e4r4r4              |

| 11 |           888 | While Node 2 is OFF |

+----+---------------+---------------------+

7. Start the Node2.

The new row is added on the Node2 successfully :

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"

+----+---------------+---------------------+

| Id | IdDispositivo | kkkk                |

+----+---------------+---------------------+

|  1 |           123 | aaaa                |

|  4 |           222 | eeeeaa              |

|  7 |      34523452 | e4r4r4              |

| 11 |           888 | While Node 2 is OFF |

+----+---------------+---------------------+

8. Check the cluster status on the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"

+-------------------------------+-------------------------------------------------+

| Variable_name                 | Value                                           |

+-------------------------------+-------------------------------------------------+

| wsrep_applier_thread_count    | 32                                              |

| wsrep_apply_oooe              | 0.000000                                        |

| wsrep_apply_oool              | 0.000000                                        |

| wsrep_apply_window            | 1.000000                                        |

| wsrep_causal_reads            | 0                                               |

| wsrep_cert_deps_distance      | 0.000000                                        |

| wsrep_cert_index_size         | 0                                               |

| wsrep_cert_interval           | 0.000000                                        |

| wsrep_cluster_conf_id         | 10                                              |

| wsrep_cluster_size            | 3                                               |

| wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |

| wsrep_cluster_status          | Primary                                         |

| wsrep_cluster_weight          | 3                                               |

| wsrep_commit_oooe             | 0.000000                                        |

| wsrep_commit_oool             | 0.000000                                        |

| wsrep_commit_window           | 1.000000                                        |

| wsrep_connected               | ON                                              |

| wsrep_desync_count            | 0                                               |

| wsrep_evs_delayed             |                                                 |

| wsrep_evs_evict_list          |                                                 |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                       |

| wsrep_evs_state               | OPERATIONAL                                     |

| wsrep_flow_control_paused     | 0.000000                                        |

| wsrep_flow_control_paused_ns  | 0                                               |

| wsrep_flow_control_recv       | 0                                               |

| wsrep_flow_control_sent       | 0                                               |

| wsrep_gcomm_uuid              | 96685da8-cd17-11ea-be6f-4399d680ab4c            |

| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 |

| wsrep_last_committed          | 6                                               |

| wsrep_local_bf_aborts         | 0                                               |

| wsrep_local_cached_downto     | 18446744073709551615                            |

| wsrep_local_cert_failures     | 0                                               |

| wsrep_local_commits           | 0                                               |

| wsrep_local_index             | 1                                               |

| wsrep_local_recv_queue        | 0                                               |

| wsrep_local_recv_queue_avg    | 0.000000                                        |

| wsrep_local_recv_queue_max    | 1                                               |

| wsrep_local_recv_queue_min    | 0                                               |

| wsrep_local_replays           | 0                                               |

| wsrep_local_send_queue        | 0                                               |

| wsrep_local_send_queue_avg    | 0.000000                                        |

| wsrep_local_send_queue_max    | 1                                               |

| wsrep_local_send_queue_min    | 0                                               |

| wsrep_local_state             | 4                                               |

| wsrep_local_state_comment     | Synced                                          |

| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |

| wsrep_open_connections        | 0                                               |

| wsrep_open_transactions       | 0                                               |

| wsrep_protocol_version        | 9                                               |

| wsrep_provider_name           | Galera                                          |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>               |

| wsrep_provider_version        | 25.3.28(r3875)                                  |

| wsrep_ready                   | ON                                              |

| wsrep_received                | 3                                               |

| wsrep_received_bytes          | 278                                             |

| wsrep_repl_data_bytes         | 0                                               |

| wsrep_repl_keys               | 0                                               |

| wsrep_repl_keys_bytes         | 0                                               |

| wsrep_repl_other_bytes        | 0                                               |

| wsrep_replicated              | 0                                               |

| wsrep_replicated_bytes        | 0                                               |

| wsrep_rollbacker_thread_count | 1                                               |

| wsrep_thread_count            | 33                                              |

+-------------------------------+-------------------------------------------------+

Pay attention that wsrep_local_index = 1.

9. Stop Node 2.

10. Set wsrep-on=OFF and run Node2 on 10.4.13 binaries with new config containing paths to 10.4.13 resources (cnf files here).

/home/stepan/mariadb/10.4.13/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3.23/mysql-test/var/mysqld_new.2.cnf &

11. Perform mysql_upgrade -s.
12. Stop Node 2.
13. Insert 1 new row on the Node1:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');"

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

+----+---------------+----------------------------+

14. Set wsrep-on=ON and run Node2.
15. Check that the new row is added to the Node2 also:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"

+----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

+----+---------------+----------------------------+

16. Check the wsrep variables on the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |

| wsrep_protocol_version        | -1                                                                                                                                             |

| wsrep_last_committed          | 7                                                                                                                                              |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 3                                                                                                                                              |

| wsrep_received_bytes          | 288                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 2                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0.333333                                                                                                                                       |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | 7                                                                                                                                              |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 1                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 1                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002                                                                                                |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | 11fd46cc-cd1b-11ea-8f5d-7efdb4c94287                                                                                                           |

| wsrep_applier_thread_count    | 32                                                                                                                                             |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |

| wsrep_cluster_size            | 0                                                                                                                                              |

| wsrep_cluster_state_uuid      |                                                                                                                                                |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 18446744073709551615                                                                                                                           |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 33                                                                                                                                             |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

Pay attention:

wsrep_cluster_status	Primary
wsrep_local_state_comment	Synced
wsrep_local_index	18446744073709551615
wsrep_cluster_size	0

17. Insert 1 row on the Node1 again:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (3,'non tireplic');"

The new row has been replicated to the Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"

+----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

| 16 |             3 | non tireplic               |

+----+---------------+----------------------------+

But it has NOT been replicated to the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"

+----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

+----+---------------+----------------------------+

18. Just one more insert on the Node1 to repeat:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (666,'Lost data');"

And again the new row has been replicated to the Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

| 16 |             3 | non tireplic               |

| 19 |           666 | Lost data                  |

+----+---------------+----------------------------+

But it has NOT been replicated to the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

+----+---------------+----------------------------+

19. Restart the Node2.
Check the wsrep variables on the Node2:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%';"

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| Variable_name                 | Value                                                                                                                                          |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

| wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |

| wsrep_protocol_version        | 9                                                                                                                                              |

| wsrep_last_committed          | 9                                                                                                                                              |

| wsrep_replicated              | 0                                                                                                                                              |

| wsrep_replicated_bytes        | 0                                                                                                                                              |

| wsrep_repl_keys               | 0                                                                                                                                              |

| wsrep_repl_keys_bytes         | 0                                                                                                                                              |

| wsrep_repl_data_bytes         | 0                                                                                                                                              |

| wsrep_repl_other_bytes        | 0                                                                                                                                              |

| wsrep_received                | 2                                                                                                                                              |

| wsrep_received_bytes          | 280                                                                                                                                            |

| wsrep_local_commits           | 0                                                                                                                                              |

| wsrep_local_cert_failures     | 0                                                                                                                                              |

| wsrep_local_replays           | 0                                                                                                                                              |

| wsrep_local_send_queue        | 0                                                                                                                                              |

| wsrep_local_send_queue_max    | 1                                                                                                                                              |

| wsrep_local_send_queue_min    | 0                                                                                                                                              |

| wsrep_local_send_queue_avg    | 0                                                                                                                                              |

| wsrep_local_recv_queue        | 0                                                                                                                                              |

| wsrep_local_recv_queue_max    | 1                                                                                                                                              |

| wsrep_local_recv_queue_min    | 0                                                                                                                                              |

| wsrep_local_recv_queue_avg    | 0                                                                                                                                              |

| wsrep_local_cached_downto     | 7                                                                                                                                              |

| wsrep_flow_control_paused_ns  | 0                                                                                                                                              |

| wsrep_flow_control_paused     | 0                                                                                                                                              |

| wsrep_flow_control_sent       | 0                                                                                                                                              |

| wsrep_flow_control_recv       | 0                                                                                                                                              |

| wsrep_cert_deps_distance      | 0                                                                                                                                              |

| wsrep_apply_oooe              | 0                                                                                                                                              |

| wsrep_apply_oool              | 0                                                                                                                                              |

| wsrep_apply_window            | 0                                                                                                                                              |

| wsrep_commit_oooe             | 0                                                                                                                                              |

| wsrep_commit_oool             | 0                                                                                                                                              |

| wsrep_commit_window           | 0                                                                                                                                              |

| wsrep_local_state             | 4                                                                                                                                              |

| wsrep_local_state_comment     | Synced                                                                                                                                         |

| wsrep_cert_index_size         | 0                                                                                                                                              |

| wsrep_causal_reads            | 0                                                                                                                                              |

| wsrep_cert_interval           | 0                                                                                                                                              |

| wsrep_open_transactions       | 0                                                                                                                                              |

| wsrep_open_connections        | 0                                                                                                                                              |

| wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002                                                                                                |

| wsrep_cluster_weight          | 3                                                                                                                                              |

| wsrep_desync_count            | 0                                                                                                                                              |

| wsrep_evs_delayed             |                                                                                                                                                |

| wsrep_evs_evict_list          |                                                                                                                                                |

| wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |

| wsrep_evs_state               | OPERATIONAL                                                                                                                                    |

| wsrep_gcomm_uuid              | 39969b6c-cd1f-11ea-abde-7b7ed790f75c                                                                                                           |

| wsrep_applier_thread_count    | 32                                                                                                                                             |

| wsrep_cluster_capabilities    |                                                                                                                                                |

| wsrep_cluster_conf_id         | 14                                                                                                                                             |

| wsrep_cluster_size            | 3                                                                                                                                              |

| wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |

| wsrep_cluster_status          | Primary                                                                                                                                        |

| wsrep_connected               | ON                                                                                                                                             |

| wsrep_local_bf_aborts         | 0                                                                                                                                              |

| wsrep_local_index             | 1                                                                                                                                              |

| wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |

| wsrep_provider_name           | Galera                                                                                                                                         |

| wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |

| wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |

| wsrep_ready                   | ON                                                                                                                                             |

| wsrep_rollbacker_thread_count | 1                                                                                                                                              |

| wsrep_thread_count            | 33                                                                                                                                             |

+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+

All seems ok:

wsrep_cluster_status	Primary
wsrep_local_state_comment	Synced
wsrep_local_index	1
wsrep_cluster_size	3

20. Insert the new row on the Node1:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (555,'After Node restart');"

And the new row has been successfully replicated to the Node3:

/home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"

+----+---------------+----------------------------+

| Id | IdDispositivo | kkkk                       |

+----+---------------+----------------------------+

|  1 |           123 | aaaa                       |

|  4 |           222 | eeeeaa                     |

|  7 |      34523452 | e4r4r4                     |

| 11 |           888 | While Node 2 is OFF        |

| 13 |        777777 | While Node 2 was upgrading |

| 22 |           555 | After Node restart         |

+----+---------------+----------------------------+

Stepan Patryshev (Inactive) added a comment - 2020-07-27 11:43 - edited There are the detailed steps how I reproduced the data loss. Release builds 10.3.23 + Galera 25.3.28(r3875) and 10.4.13 + Galera 26.4.4(r4599). PFA all logs and cnf files . Steps: 1. ./mtr --suite=galera_3nodes --start-and-exit 2. Restart all nodes one by one with separate config files from here . The cluster status on Node1 is: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"show global status like 'wsrep%';" +-------------------------------+-------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------+ | wsrep_applier_thread_count | 32 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 0.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 8 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 0.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 0f038d23-cd0d-11ea-acd2-b7ff4121c102 | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_last_committed | 0 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.28(r3875) | | wsrep_ready | ON | | wsrep_received | 2 | | wsrep_received_bytes | 270 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+-------------------------------------------------+ 3. On the Node1 create a database and a table: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"create database d; create table d.evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));" 4. On the Node1 insert 3 rows: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into d.evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into d.evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');" Data have been propageted to all the cluster: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;" +----+---------------+---------+ | Id | IdDispositivo | kkkk | +----+---------------+---------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | +----+---------------+---------+ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+---------+ | Id | IdDispositivo | kkkk | +----+---------------+---------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | +----+---------------+---------+ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+---------+ | Id | IdDispositivo | kkkk | +----+---------------+---------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | +----+---------------+---------+ 5. Stop Node 2. 6. To check that IST works while Node2 is off insert 1 row on the Node1: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(888, 'While Node 2 is OFF');" The new row is added on the Node1 and Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;" +----+---------------+---------------------+ | Id | IdDispositivo | kkkk | +----+---------------+---------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | +----+---------------+---------------------+ [stepan@cnt7glr11 mysql-test]$ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+---------------------+ | Id | IdDispositivo | kkkk | +----+---------------+---------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | +----+---------------+---------------------+ 7. Start the Node2. The new row is added on the Node2 successfully : /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+---------------------+ | Id | IdDispositivo | kkkk | +----+---------------+---------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | +----+---------------+---------------------+ 8. Check the cluster status on the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'" +-------------------------------+-------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------+ | wsrep_applier_thread_count | 32 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 10 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 96685da8-cd17-11ea-be6f-4399d680ab4c | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_last_committed | 6 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 1 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.28(r3875) | | wsrep_ready | ON | | wsrep_received | 3 | | wsrep_received_bytes | 278 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+-------------------------------------------------+ Pay attention that wsrep_local_index = 1. 9. Stop Node 2. 10. Set wsrep-on=OFF and run Node2 on 10.4.13 binaries with new config containing paths to 10.4.13 resources ( cnf files here ). /home/stepan/mariadb/10.4.13/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3.23/mysql-test/var/mysqld_new.2.cnf & 11. Perform mysql_upgrade -s. 12. Stop Node 2. 13. Insert 1 new row on the Node1: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');" /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 14. Set wsrep-on=ON and run Node2. 15. Check that the new row is added to the Node2 also: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 16. Check the wsrep variables on the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'" +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_protocol_version | -1 | | wsrep_last_committed | 7 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 3 | | wsrep_received_bytes | 288 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 2 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.333333 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 7 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 1 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 1 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 11fd46cc-cd1b-11ea-8f5d-7efdb4c94287 | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 18446744073709551615 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ Pay attention: wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 18446744073709551615 wsrep_cluster_size 0 17. Insert 1 row on the Node1 again: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (3,'non tireplic');" The new row has been replicated to the Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | | 16 | 3 | non tireplic | +----+---------------+----------------------------+ But it has NOT been replicated to the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 18. Just one more insert on the Node1 to repeat: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (666,'Lost data');" And again the new row has been replicated to the Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | | 16 | 3 | non tireplic | | 19 | 666 | Lost data | +----+---------------+----------------------------+ But it has NOT been replicated to the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 19. Restart the Node2. Check the wsrep variables on the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%';" +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 9 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 7 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 39969b6c-cd1f-11ea-abde-7b7ed790f75c | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 14 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 1 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ All seems ok: wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 1 wsrep_cluster_size 3 20. Insert the new row on the Node1: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (555,'After Node restart');" And the new row has been successfully replicated to the Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | | 22 | 555 | After Node restart | +----+---------------+----------------------------+

Alexey added a comment - 2020-07-29 13:28

Ok, I think I know what is the problem, at least where it is solved.

Massimo's node 2 log has the following

wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully.

...

2020-05-25 22:25:17 19 [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1

As you may guess the last line spells bad news - the node cannot apply writesets. It is caused by a bug that was fixed in commit 02ad0e11 on April 1, way after release 4.4 was tagged and was merged into MariaDB Galera fork in commit ae24803 on April 9.

Stepan's log has

wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(rae24803) by Codership Oy <info@codership.com> loaded successfully.

That's why Stepan can't reproduce the bug, he's using a different Galera binary.

In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble.

Alexey added a comment - 2020-07-29 13:28 Ok, I think I know what is the problem, at least where it is solved. Massimo's node 2 log has the following wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully. ... 2020-05-25 22:25:17 19 [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1 As you may guess the last line spells bad news - the node cannot apply writesets. It is caused by a bug that was fixed in commit 02ad0e11 on April 1, way after release 4.4 was tagged and was merged into MariaDB Galera fork in commit ae24803 on April 9. Stepan's log has wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(rae24803) by Codership Oy <info@codership.com> loaded successfully. That's why Stepan can't reproduce the bug, he's using a different Galera binary. In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble.

Stepan Patryshev (Inactive) added a comment - 2020-07-29 15:22 - edited

Yurchenko I hope you are right, but I used Galera 26.4.4(r4599) on 20.07.2020 and there was no data loss.

Stepan Patryshev (Inactive) added a comment - 2020-07-29 15:22 - edited Yurchenko I hope you are right, but I used Galera 26.4.4(r4599) on 20.07.2020 and there was no data loss.

Alexey added a comment - 2020-07-29 18:08

julien.fritsch
Yes, it is fixed in later Galera releases.

stepan.patryshev
On 20.07.2020 there was a mistake in case reproduction: in Massimo's case node 2 was missing 2 events and had to perform state transfer. In your case it seems there were no updates to the cluster during node 2 upgrade: it was shut down at seqno 7 and was brought back - cluster still had seqno 7. So there was no state transfer and it is a different code path.

And yes, I found out why in Massimo's case some transactions were lost:

[Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1

is a warning because we can expect during upgrade of the last node and protocol bump to get a writeset with an old protocol and in that case it simply is supposed to fail certification - on all nodes. The problem (that was fixed in the commit I mentioned above) was that protocol version was not updated in total order (it was not updated at all). As a result all transactions that failed certification on node 2 (and thus were skipped), perfectly passed certification on node 1 and thus were committed. In the end both nodes believed that they have successfully processed all events and are on the same page regarding last seqno. That's why those missing events went unnoticed.

However when node 2 was restarted, it rejoined the cluster without state transfer, the bug was not triggered, and it could continue to apply transactions.

Alexey added a comment - 2020-07-29 18:08 julien.fritsch Yes, it is fixed in later Galera releases. stepan.patryshev On 20.07.2020 there was a mistake in case reproduction: in Massimo's case node 2 was missing 2 events and had to perform state transfer. In your case it seems there were no updates to the cluster during node 2 upgrade: it was shut down at seqno 7 and was brought back - cluster still had seqno 7. So there was no state transfer and it is a different code path. And yes, I found out why in Massimo's case some transactions were lost: [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1 is a warning because we can expect during upgrade of the last node and protocol bump to get a writeset with an old protocol and in that case it simply is supposed to fail certification - on all nodes. The problem (that was fixed in the commit I mentioned above) was that protocol version was not updated in total order (it was not updated at all). As a result all transactions that failed certification on node 2 (and thus were skipped), perfectly passed certification on node 1 and thus were committed. In the end both nodes believed that they have successfully processed all events and are on the same page regarding last seqno. That's why those missing events went unnoticed. However when node 2 was restarted, it rejoined the cluster without state transfer, the bug was not triggered, and it could continue to apply transactions.

Stepan Patryshev (Inactive) added a comment - 2020-07-30 09:29

Yurchenko Thank you for the clarifications. But I want to note that rpizzi reproduced it without updating data during Node2 upgrade: steps are here.

Stepan Patryshev (Inactive) added a comment - 2020-07-30 09:29 Yurchenko Thank you for the clarifications. But I want to note that rpizzi reproduced it without updating data during Node2 upgrade: steps are here .

Stepan Patryshev (Inactive) made changes - 2020-08-04 06:53

Link

This issue relates to ~~MDEV-20439~~ [ ~~MDEV-20439~~ ]

Stepan Patryshev (Inactive) added a comment - 2020-08-19 18:32

I have verified that using Galera 26.4.5(rb3764ab) and 25.3.30(r827e681) there were no any data loss or crash. The steps were the same which reproduced the bug on 23.07.2020 with 25.3.28(r3875) and 26.4.4(r4599).

But the strange wsrep values still presented just after the first time upgraded node joined the cluster:

wsrep_local_index	18446744073709551615
wsrep_cluster_size	0

Stepan Patryshev (Inactive) added a comment - 2020-08-19 18:32 I have verified that using Galera 26.4.5(rb3764ab) and 25.3.30(r827e681) there were no any data loss or crash. The steps were the same which reproduced the bug on 23.07.2020 with 25.3.28(r3875) and 26.4.4(r4599). But the strange wsrep values still presented just after the first time upgraded node joined the cluster: wsrep_local_index 18446744073709551615 wsrep_cluster_size 0

Stepan Patryshev (Inactive) added a comment - 2020-08-20 06:05

Closing as fixed.

Stepan Patryshev (Inactive) added a comment - 2020-08-20 06:05 Closing as fixed.

Stepan Patryshev (Inactive) made changes - 2020-08-20 06:05

Fix Version/s		10.3.25 [ 24506 ]
Fix Version/s		10.4.15 [ 24507 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Ralf Gebhardt made changes - 2020-10-06 18:40

Fix Version/s

10.4.16 [ 25020 ]

Ralf Gebhardt made changes - 2020-10-06 18:42

Fix Version/s

10.4.15 [ 24507 ]

Ralf Gebhardt made changes - 2020-10-06 18:43

Fix Version/s

10.3.26 [ 25021 ]

Ralf Gebhardt made changes - 2020-10-06 18:45

Fix Version/s

10.3.25 [ 24506 ]

Sergei Golubchik made changes - 2021-12-06 21:51

Workflow

MariaDB v3 [ 109176 ]

MariaDB v4 [ 157862 ]

Claudio Nanni made changes - 2022-08-04 14:59

Link

This issue relates to ~~MDEV-29246~~ [ ~~MDEV-29246~~ ]

Jira Automation (IT) made changes - 2024-07-04 03:05

Zendesk Related Tickets

183937

People

Assignee:: Alexey

Reporter:: Massimo

Votes:: 3 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2020-05-26 15:03

Updated:: 2024-07-07 22:00

Resolved:: 2020-08-20 06:05

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.