Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22723

Data loss when performing rolling upgrade from 10.3.23-MariaDB to 10.4.13-MariaDB

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.3.23, 10.4.13
    • 10.3.26, 10.4.16
    • Galera
    • None
    • OS: CentOS Linux release 7.6.1810 (Core)

    Description

      Creating a full galera cluster of 10.3.23 with 3 nodes
      mdb1,mdb2,mdb3 10.3.23 version.
      We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4.13, to enforce IST . We also re-tested with all 3 servers up , same result.

      Create a schema and a table on mdb1. all propagate

      • stop mdb2 . yum remove the rpm of Mariadb and galera.
      • install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
      • set wsrep_on=OFF on my.cnf
      • start mdb2
      • perform mysql_upgrade -s
      • stop mdb2
      • set wsrep_on=ON on my.cnf
      • start mbd2

      At this point the status galera variables on mdb2:

      MariaDB mdb2 [pippo]> show global status like 'wsrep%';
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name                 | Value                                                                                                                                          |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
      | wsrep_protocol_version        | -1                                                                                                                                             |
      | wsrep_last_committed          | 65                                                                                                                                             |
      | wsrep_replicated              | 0                                                                                                                                              |
      | wsrep_replicated_bytes        | 0                                                                                                                                              |
      | wsrep_repl_keys               | 0                                                                                                                                              |
      | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
      | wsrep_repl_data_bytes         | 0                                                                                                                                              |
      | wsrep_repl_other_bytes        | 0                                                                                                                                              |
      | wsrep_received                | 3                                                                                                                                              |
      | wsrep_received_bytes          | 208                                                                                                                                            |
      | wsrep_local_commits           | 0                                                                                                                                              |
      | wsrep_local_cert_failures     | 0                                                                                                                                              |
      | wsrep_local_replays           | 0                                                                                                                                              |
      | wsrep_local_send_queue        | 0                                                                                                                                              |
      | wsrep_local_send_queue_max    | 1                                                                                                                                              |
      | wsrep_local_send_queue_min    | 0                                                                                                                                              |
      | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_recv_queue        | 0                                                                                                                                              |
      | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
      | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
      | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_cached_downto     | 64                                                                                                                                             |
      | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
      | wsrep_flow_control_paused     | 0                                                                                                                                              |
      | wsrep_flow_control_sent       | 0                                                                                                                                              |
      | wsrep_flow_control_recv       | 0                                                                                                                                              |
      | wsrep_cert_deps_distance      | 0                                                                                                                                              |
      | wsrep_apply_oooe              | 0.5                                                                                                                                            |
      | wsrep_apply_oool              | 0                                                                                                                                              |
      | wsrep_apply_window            | 1.5                                                                                                                                            |
      | wsrep_commit_oooe             | 0                                                                                                                                              |
      | wsrep_commit_oool             | 0                                                                                                                                              |
      | wsrep_commit_window           | 1                                                                                                                                              |
      | wsrep_local_state             | 4                                                                                                                                              |
      | wsrep_local_state_comment     | Synced                                                                                                                                         |
      | wsrep_cert_index_size         | 0                                                                                                                                              |
      | wsrep_causal_reads            | 0                                                                                                                                              |
      | wsrep_cert_interval           | 0                                                                                                                                              |
      | wsrep_open_transactions       | 0                                                                                                                                              |
      | wsrep_open_connections        | 0                                                                                                                                              |
      | wsrep_incoming_addresses      | AUTO,10.0.1.13:3306                                                                                                                            |
      | wsrep_cluster_weight          | 2                                                                                                                                              |
      | wsrep_desync_count            | 0                                                                                                                                              |
      | wsrep_evs_delayed             |                                                                                                                                                |
      | wsrep_evs_evict_list          |                                                                                                                                                |
      | wsrep_evs_repl_latency        | 0.000325151/0.00176008/0.00607075/0.00193032/7                                                                                                 |
      | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
      | wsrep_gcomm_uuid              | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4                                                                                                           |
      | wsrep_applier_thread_count    | 32                                                                                                                                             |
      | wsrep_cluster_capabilities    |                                                                                                                                                |
      | wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
      | wsrep_cluster_size            | 0                                                                                                                                              |
      | wsrep_cluster_state_uuid      |                                                                                                                                                |
      | wsrep_cluster_status          | Primary                                                                                                                                        |
      | wsrep_connected               | ON                                                                                                                                             |
      | wsrep_local_bf_aborts         | 0                                                                                                                                              |
      | wsrep_local_index             | 18446744073709551615                                                                                                                           |
      | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
      | wsrep_provider_name           | Galera                                                                                                                                         |
      | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
      | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
      | wsrep_ready                   | ON                                                                                                                                             |
      | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
      | wsrep_thread_count            | 33                                                                                                                                             |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      65 rows in set (0.001 sec)
      

      NOTE THAT :

      wsrep_cluster_status          | Primary
      wsrep_local_state_comment     | Synced
      wsrep_local_index             | 18446744073709551615
      wsrep_cluster_size            | 0
      

      Looking at the error log, the server is ready for connections after a IST

      At this point the 'master' mdb1 have a write that are not getting replicate:

      MariaDB mdb2 [pippo]> select * from evento4;
      +----+---------------+--------+
      | Id | IdDispositivo | kkkk   |
      +----+---------------+--------+
      |  1 |           123 | aaaa   |
      |  3 |           222 | eeeeaa |
      |  4 |      34523452 | e4r4r4 |
      +----+---------------+--------+
      

      WHILE ON THE MASTER:

      MariaDB mdb1 [pippo]> select * from evento4;
      +----+---------------+--------+
      | Id | IdDispositivo | kkkk   |
      +----+---------------+--------+
      |  1 |           123 | aaaa   |
      |  3 |           222 | eeeeaa |
      |  4 |      34523452 | e4r4r4 |
      +----+---------------+--------+
      3 rows in set (0.001 sec)
       
      MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
      Query OK, 1 row affected (0.015 sec)
       
      MariaDB mdb1 [pippo]> select * from evento4;
      +----+---------------+--------------+
      | Id | IdDispositivo | kkkk         |
      +----+---------------+--------------+
      |  1 |           123 | aaaa         |
      |  3 |           222 | eeeeaa       |
      |  4 |      34523452 | e4r4r4       |
      |  6 |             3 | non tireplic |
      +----+---------------+--------------+
      4 rows in set (0.001 sec)
      

      The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

      AT THIS point we restart mdb2 to fix the status:

      [root@mdb2 my.cnf.d]# systemctl restart  mariadb
      [root@mdb2 my.cnf.d]# mysql
       
      MariaDB md2 [(none)]> show global status like 'wsrep%';
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name                 | Value                                                                                                                                          |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
      | wsrep_protocol_version        | 9                                                                                                                                              |
      | wsrep_last_committed          | 66                                                                                                                                             |
      | wsrep_replicated              | 0                                                                                                                                              |
      | wsrep_replicated_bytes        | 0                                                                                                                                              |
      | wsrep_repl_keys               | 0                                                                                                                                              |
      | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
      | wsrep_repl_data_bytes         | 0                                                                                                                                              |
      | wsrep_repl_other_bytes        | 0                                                                                                                                              |
      | wsrep_received                | 2                                                                                                                                              |
      | wsrep_received_bytes          | 200                                                                                                                                            |
      | wsrep_local_commits           | 0                                                                                                                                              |
      | wsrep_local_cert_failures     | 0                                                                                                                                              |
      | wsrep_local_replays           | 0                                                                                                                                              |
      | wsrep_local_send_queue        | 0                                                                                                                                              |
      | wsrep_local_send_queue_max    | 1                                                                                                                                              |
      | wsrep_local_send_queue_min    | 0                                                                                                                                              |
      | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_recv_queue        | 0                                                                                                                                              |
      | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
      | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
      | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_cached_downto     | 64                                                                                                                                             |
      | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
      | wsrep_flow_control_paused     | 0                                                                                                                                              |
      | wsrep_flow_control_sent       | 0                                                                                                                                              |
      | wsrep_flow_control_recv       | 0                                                                                                                                              |
      | wsrep_cert_deps_distance      | 0                                                                                                                                              |
      | wsrep_apply_oooe              | 0                                                                                                                                              |
      | wsrep_apply_oool              | 0                                                                                                                                              |
      | wsrep_apply_window            | 0                                                                                                                                              |
      | wsrep_commit_oooe             | 0                                                                                                                                              |
      | wsrep_commit_oool             | 0                                                                                                                                              |
      | wsrep_commit_window           | 0                                                                                                                                              |
      | wsrep_local_state             | 4                                                                                                                                              |
      | wsrep_local_state_comment     | Synced                                                                                                                                         |
      | wsrep_cert_index_size         | 0                                                                                                                                              |
      | wsrep_causal_reads            | 0                                                                                                                                              |
      | wsrep_cert_interval           | 0                                                                                                                                              |
      | wsrep_open_transactions       | 0                                                                                                                                              |
      | wsrep_open_connections        | 0                                                                                                                                              |
      | wsrep_incoming_addresses      | 10.0.1.13:3306,AUTO                                                                                                                            |
      | wsrep_cluster_weight          | 2                                                                                                                                              |
      | wsrep_desync_count            | 0                                                                                                                                              |
      | wsrep_evs_delayed             |                                                                                                                                                |
      | wsrep_evs_evict_list          |                                                                                                                                                |
      | wsrep_evs_repl_latency        | 0.000853237/0.001923/0.00333681/0.0010427/3                                                                                                    |
      | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
      | wsrep_gcomm_uuid              | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6                                                                                                           |
      | wsrep_applier_thread_count    | 32                                                                                                                                             |
      | wsrep_cluster_capabilities    |                                                                                                                                                |
      | wsrep_cluster_conf_id         | 6                                                                                                                                              |
      | wsrep_cluster_size            | 2                                                                                                                                              |
      | wsrep_cluster_state_uuid      | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
      | wsrep_cluster_status          | Primary                                                                                                                                        |
      | wsrep_connected               | ON                                                                                                                                             |
      | wsrep_local_bf_aborts         | 0                                                                                                                                              |
      | wsrep_local_index             | 1                                                                                                                                              |
      | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
      | wsrep_provider_name           | Galera                                                                                                                                         |
      | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
      | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
      | wsrep_ready                   | ON                                                                                                                                             |
      | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
      | wsrep_thread_count            | 33                                                                                                                                             |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      65 rows in set (0.002 sec)
      

      NOTE now the status is ok:

      wsrep_local_index             | 1
      wsrep_cluster_status          | Primary
      wsrep_local_state_comment     | Synced
      wsrep_local_index             | 1
      

      but when we check the data we expect the new row should be present:

      MariaDB mdb2 [pippo]> select * from evento4;
      +----+---------------+--------+
      | Id | IdDispositivo | kkkk   |
      +----+---------------+--------+
      |  1 |           123 | aaaa   |
      |  3 |           222 | eeeeaa |
      |  4 |      34523452 | e4r4r4 |
      +----+---------------+--------+
      3 rows in set (0.001 sec)
      

      The row is not there.

      If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

      Attachments

        1. 200612_mysqld.1.err
          62 kB
        2. 200612_mysqld.2.err
          121 kB
        3. 200612_mysqld.3.err
          70 kB
        4. 200709_patgal_output.zip
          15 kB
        5. 20200713_MDEV-22723_patgal_no_errors.zip
          35 kB
        6. 20200714_MDEV-22723_mdb_no_errors.zip
          32 kB
        7. 20200714_MDEV-22723_patgal_no_errors.zip
          28 kB
        8. 20200720_MDEV-22723_CentOS_7.5_no_errors.zip
          24 kB
        9. 20200723_MDEV-22723_data_loss.zip
          43 kB
        10. error_log_mdb1
          23 kB
        11. error_log_mdb2.after_upgrade
          87 kB
        12. mysqld_new.2.cnf
          2 kB
        13. mysqld_old.1.cnf
          2 kB
        14. mysqld_old.2.cnf
          2 kB
        15. mysqld_old.3.cnf
          2 kB
        16. node1_bootsrapped_10.3.23.log
          91 kB
        17. node1_bootsrapped_10.3.23.log.rtf
          93 kB
        18. node2_upgraded_10.4.13.log
          14 kB
        19. node2_upgraded.log.rtf
          14 kB
        20. server.cnf_mdb1
          2 kB
        21. server.cnf_mdb2
          2 kB

        Issue Links

          Activity

            massimo.disaro Massimo created issue -
            rpizzi Rick Pizzi (Inactive) made changes -
            Field Original Value New Value
            Description Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently showdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:


            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)

            NOTE THAT :
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0

            Looking at the error log, the server is ready for connections after a IST


            At this point the 'master' mdb1 have a write that are not getting replicate:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+

            WHILE ON THE MASTER:

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)

            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:

            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)

            NOTE now the status is ok:


            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1

            but when we check the data we expect the new row should be present:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.








            Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shut mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:


            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)

            NOTE THAT :
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0

            Looking at the error log, the server is ready for connections after a IST


            At this point the 'master' mdb1 have a write that are not getting replicate:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+

            WHILE ON THE MASTER:

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)

            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:

            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)

            NOTE now the status is ok:


            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1

            but when we check the data we expect the new row should be present:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.








            massimo.disaro Massimo made changes -
            Description Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shut mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:


            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)

            NOTE THAT :
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0

            Looking at the error log, the server is ready for connections after a IST


            At this point the 'master' mdb1 have a write that are not getting replicate:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+

            WHILE ON THE MASTER:

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)

            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:

            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)

            NOTE now the status is ok:


            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1

            but when we check the data we expect the new row should be present:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.








            Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:


            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)

            NOTE THAT :
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0

            Looking at the error log, the server is ready for connections after a IST


            At this point the 'master' mdb1 have a write that are not getting replicate:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+

            WHILE ON THE MASTER:

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)

            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:

            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)

            NOTE now the status is ok:


            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1

            but when we check the data we expect the new row should be present:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.








            massimo.disaro Massimo made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            rpizzi Rick Pizzi (Inactive) added a comment - Looks related to https://jira.mariadb.org/browse/MDEV-19983
            serg Sergei Golubchik made changes -
            Description Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:


            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)

            NOTE THAT :
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0

            Looking at the error log, the server is ready for connections after a IST


            At this point the 'master' mdb1 have a write that are not getting replicate:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+

            WHILE ON THE MASTER:

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)

            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:

            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)

            NOTE now the status is ok:


            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1

            but when we check the data we expect the new row should be present:

            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.








            Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:

            {noformat}
            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)
            {noformat}
            NOTE THAT :
            {noformat}
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0
            {noformat}
            Looking at the error log, the server is ready for connections after a IST

            At this point the 'master' mdb1 have a write that are not getting replicate:
            {noformat}
            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            {noformat}
            WHILE ON THE MASTER:
            {noformat}
            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)
            {noformat}
            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:
            {noformat}
            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)
            {noformat}
            NOTE now the status is ok:

            {noformat}
            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1
            {noformat}
            but when we check the data we expect the new row should be present:
            {noformat}
            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)
            {noformat}
            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.
            elenst Elena Stepanova made changes -
            Fix Version/s 10.3 [ 22126 ]
            Assignee Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Stepan Patryshev [ stepan.patryshev ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Description Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4. , to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:

            {noformat}
            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)
            {noformat}
            NOTE THAT :
            {noformat}
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0
            {noformat}
            Looking at the error log, the server is ready for connections after a IST

            At this point the 'master' mdb1 have a write that are not getting replicate:
            {noformat}
            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            {noformat}
            WHILE ON THE MASTER:
            {noformat}
            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)
            {noformat}
            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:
            {noformat}
            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)
            {noformat}
            NOTE now the status is ok:

            {noformat}
            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1
            {noformat}
            but when we check the data we expect the new row should be present:
            {noformat}
            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)
            {noformat}
            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.
            Creating a full galera cluster of 10.3.23 with 3 nodes
            mdb1,mdb2,mdb3 10.3.23 version.
            We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4.13, to enforce IST . We also re-tested with all 3 servers up , same result.

            Create a schema and a table on mdb1. all propagate

            - stop mdb2 . yum remove the rpm of Mariadb and galera.
            - install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
            - set wsrep_on=OFF on my.cnf
            - start mdb2
            - perform mysql_upgrade -s
            - stop mdb2
            - set wsrep_on=ON on my.cnf
            - start mbd2

            At this point the status galera variables on mdb2:

            {noformat}
            MariaDB mdb2 [pippo]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | -1 |
            | wsrep_last_committed | 65 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 3 |
            | wsrep_received_bytes | 208 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0.5 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 1.5 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 1 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | AUTO,10.0.1.13:3306 |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000325151/0.00176008/0.00607075/0.00193032/7 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 18446744073709551615 |
            | wsrep_cluster_size | 0 |
            | wsrep_cluster_state_uuid | |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 18446744073709551615 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)
            {noformat}
            NOTE THAT :
            {noformat}
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 18446744073709551615
            wsrep_cluster_size | 0
            {noformat}
            Looking at the error log, the server is ready for connections after a IST

            At this point the 'master' mdb1 have a write that are not getting replicate:
            {noformat}
            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            {noformat}
            WHILE ON THE MASTER:
            {noformat}
            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)

            MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            Query OK, 1 row affected (0.015 sec)

            MariaDB mdb1 [pippo]> select * from evento4;
            +----+---------------+--------------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            | 6 | 3 | non tireplic |
            +----+---------------+--------------+
            4 rows in set (0.001 sec)
            {noformat}
            The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

            AT THIS point we restart mdb2 to fix the status:
            {noformat}
            [root@mdb2 my.cnf.d]# systemctl restart mariadb
            [root@mdb2 my.cnf.d]# mysql

            MariaDB md2 [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name | Value |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_protocol_version | 9 |
            | wsrep_last_committed | 66 |
            | wsrep_replicated | 0 |
            | wsrep_replicated_bytes | 0 |
            | wsrep_repl_keys | 0 |
            | wsrep_repl_keys_bytes | 0 |
            | wsrep_repl_data_bytes | 0 |
            | wsrep_repl_other_bytes | 0 |
            | wsrep_received | 2 |
            | wsrep_received_bytes | 200 |
            | wsrep_local_commits | 0 |
            | wsrep_local_cert_failures | 0 |
            | wsrep_local_replays | 0 |
            | wsrep_local_send_queue | 0 |
            | wsrep_local_send_queue_max | 1 |
            | wsrep_local_send_queue_min | 0 |
            | wsrep_local_send_queue_avg | 0 |
            | wsrep_local_recv_queue | 0 |
            | wsrep_local_recv_queue_max | 1 |
            | wsrep_local_recv_queue_min | 0 |
            | wsrep_local_recv_queue_avg | 0 |
            | wsrep_local_cached_downto | 64 |
            | wsrep_flow_control_paused_ns | 0 |
            | wsrep_flow_control_paused | 0 |
            | wsrep_flow_control_sent | 0 |
            | wsrep_flow_control_recv | 0 |
            | wsrep_cert_deps_distance | 0 |
            | wsrep_apply_oooe | 0 |
            | wsrep_apply_oool | 0 |
            | wsrep_apply_window | 0 |
            | wsrep_commit_oooe | 0 |
            | wsrep_commit_oool | 0 |
            | wsrep_commit_window | 0 |
            | wsrep_local_state | 4 |
            | wsrep_local_state_comment | Synced |
            | wsrep_cert_index_size | 0 |
            | wsrep_causal_reads | 0 |
            | wsrep_cert_interval | 0 |
            | wsrep_open_transactions | 0 |
            | wsrep_open_connections | 0 |
            | wsrep_incoming_addresses | 10.0.1.13:3306,AUTO |
            | wsrep_cluster_weight | 2 |
            | wsrep_desync_count | 0 |
            | wsrep_evs_delayed | |
            | wsrep_evs_evict_list | |
            | wsrep_evs_repl_latency | 0.000853237/0.001923/0.00333681/0.0010427/3 |
            | wsrep_evs_state | OPERATIONAL |
            | wsrep_gcomm_uuid | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6 |
            | wsrep_applier_thread_count | 32 |
            | wsrep_cluster_capabilities | |
            | wsrep_cluster_conf_id | 6 |
            | wsrep_cluster_size | 2 |
            | wsrep_cluster_state_uuid | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0 |
            | wsrep_cluster_status | Primary |
            | wsrep_connected | ON |
            | wsrep_local_bf_aborts | 0 |
            | wsrep_local_index | 1 |
            | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name | Galera |
            | wsrep_provider_vendor | Codership Oy <info@codership.com> |
            | wsrep_provider_version | 26.4.4(r4599) |
            | wsrep_ready | ON |
            | wsrep_rollbacker_thread_count | 1 |
            | wsrep_thread_count | 33 |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.002 sec)
            {noformat}
            NOTE now the status is ok:

            {noformat}
            wsrep_local_index | 1
            wsrep_cluster_status | Primary
            wsrep_local_state_comment | Synced
            wsrep_local_index | 1
            {noformat}
            but when we check the data we expect the new row should be present:
            {noformat}
            MariaDB mdb2 [pippo]> select * from evento4;
            +----+---------------+--------+
            | Id | IdDispositivo | kkkk |
            +----+---------------+--------+
            | 1 | 123 | aaaa |
            | 3 | 222 | eeeeaa |
            | 4 | 34523452 | e4r4r4 |
            +----+---------------+--------+
            3 rows in set (0.001 sec)
            {noformat}
            The row is not there.

            If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 200612_mysqld.1.err [ 52173 ]
            Attachment 200612_mysqld.2.err [ 52174 ]
            Attachment 200612_mysqld.3.err [ 52175 ]
            Attachment mysqld_new.2.cnf [ 52176 ]
            Attachment mysqld_old.3.cnf [ 52177 ]
            Attachment mysqld_old.2.cnf [ 52178 ]
            Attachment mysqld_old.1.cnf [ 52179 ]

            I have managed to reproduce it only partially. I have not observed any data loss during a node upgrade. But I got these strange values: wsrep_local_index = 18446744073709551615 and wsrep_cluster_size = 0.

            Release builds 10.3.23 + Galera 25.3.29(rb0f34b0) and 10.4.13 + Galera 26.4.4(rae24803).

            Steps:

            1. ./mtr --suite=galera_3nodes --start-and-exit
            2. Restart all nodes one by one with separate config files: Node1, Node2, Node3.
            3. create table evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));
            4. insert into evento4(IdDispositivo, kkkk) values(123, 'aaaa');
            insert into evento4(IdDispositivo, kkkk) values(222, 'eeeeaa');
            insert into evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');
            5. Stop Node 2.
            6. Set wsrep-on=OFF and run Node 2 on 10.4.13 binaries with Node2 new config.
            7. Perform mysql_upgrade -s.
            8. Stop Node 2.
            9. Node 3: insert into evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');
            select * from evento4;

            Id IdDispositivo kkkk
            2 123 aaaa
            5 222 eeeeaa
            8 34523452 e4r4r4
            10 777777 While Node 2 was upgrading

            10. Start Node 2 with wsrep-on=ON.

            11. New data appeared on Node 2:
            select * from evento4;

            Id IdDispositivo kkkk
            2 123 aaaa
            5 222 eeeeaa
            8 34523452 e4r4r4
            10 777777 While Node 2 was upgrading

            But:

            show global status like 'wsrep%';
             
             
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |
            | wsrep_protocol_version        | 9                                                                                                                                              |
            | wsrep_last_committed          | 6                                                                                                                                              |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 3                                                                                                                                              |
            | wsrep_received_bytes          | 288                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 1                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | 6                                                                                                                                              |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 1                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 1                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001                                                                                                |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0.000293552/0.000366098/0.000521759/7.98882e-05/5                                                                                              |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | e05a4078-acc3-11ea-9394-8ba782d6f291                                                                                                           |
            | wsrep_applier_thread_count    | 32                                                                                                                                             |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
            | wsrep_cluster_size            | 0                                                                                                                                              |
            | wsrep_cluster_state_uuid      |                                                                                                                                                |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 18446744073709551615                                                                                                                           |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(rae24803)                                                                                                                               |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 33                                                                                                                                             |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)
            

            wsrep_cluster_status Primary
            wsrep_local_state_comment Synced
            wsrep_local_index 18446744073709551615
            wsrep_cluster_size 0

            12. On node 3: insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
            13. New data are replicated to Node 2:
            select * from evento4;

            Id IdDispositivo kkkk
            2 123 aaaa
            5 222 eeeeaa
            8 34523452 e4r4r4
            10 777777 While Node 2 was upgrading
            13 3 non tireplic

            14. Restart Node 2.
            15. On Node 2:

            show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |
            | wsrep_protocol_version        | 9                                                                                                                                              |
            | wsrep_last_committed          | 7                                                                                                                                              |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 2                                                                                                                                              |
            | wsrep_received_bytes          | 280                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 1                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | 6                                                                                                                                              |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 0                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 0                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001                                                                                                |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | a2c23b72-acc8-11ea-afe5-cbd8cb9a86ed                                                                                                           |
            | wsrep_applier_thread_count    | 32                                                                                                                                             |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 17                                                                                                                                             |
            | wsrep_cluster_size            | 3                                                                                                                                              |
            | wsrep_cluster_state_uuid      | be36cf8b-acb6-11ea-aa2c-e3149c2ff908                                                                                                           |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 2                                                                                                                                              |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(rae24803)                                                                                                                               |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 33                                                                                                                                             |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)
            

            wsrep_cluster_status Primary
            wsrep_local_state_comment Synced
            wsrep_local_index 2
            wsrep_cluster_size 3

            Server logs: Node 1, Node 2, Node 3.

            I also have tried with one node stopped and without data population on Node 1 joined to the cluster during upgrading Node 2, but there were no any data loss anyway.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited I have managed to reproduce it only partially. I have not observed any data loss during a node upgrade. But I got these strange values: wsrep_local_index = 18446744073709551615 and wsrep_cluster_size = 0. Release builds 10.3.23 + Galera 25.3.29(rb0f34b0) and 10.4.13 + Galera 26.4.4(rae24803). Steps: 1. ./mtr --suite=galera_3nodes --start-and-exit 2. Restart all nodes one by one with separate config files: Node1 , Node2 , Node3 . 3. create table evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255)); 4. insert into evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 '); 5. Stop Node 2. 6. Set wsrep-on=OFF and run Node 2 on 10.4.13 binaries with Node2 new config . 7. Perform mysql_upgrade -s. 8. Stop Node 2. 9. Node 3: insert into evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading'); select * from evento4; Id IdDispositivo kkkk 2 123 aaaa 5 222 eeeeaa 8 34523452 e4r4r4 10 777777 While Node 2 was upgrading 10. Start Node 2 with wsrep-on=ON. 11. New data appeared on Node 2: select * from evento4; Id IdDispositivo kkkk 2 123 aaaa 5 222 eeeeaa 8 34523452 e4r4r4 10 777777 While Node 2 was upgrading But: show global status like 'wsrep%';     +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | be36cf8b-acb6-11ea-aa2c-e3149c2ff908 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 6 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 3 | | wsrep_received_bytes | 288 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 1 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 1 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0.000293552/0.000366098/0.000521759/7.98882e-05/5 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | e05a4078-acc3-11ea-9394-8ba782d6f291 | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 18446744073709551615 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(rae24803) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 18446744073709551615 wsrep_cluster_size 0 12. On node 3: insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic'); 13. New data are replicated to Node 2: select * from evento4; Id IdDispositivo kkkk 2 123 aaaa 5 222 eeeeaa 8 34523452 e4r4r4 10 777777 While Node 2 was upgrading 13 3 non tireplic 14. Restart Node 2. 15. On Node 2: show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | be36cf8b-acb6-11ea-aa2c-e3149c2ff908 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 7 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 6 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16002,127.0.0.1:16000,127.0.0.1:16001 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | a2c23b72-acc8-11ea-afe5-cbd8cb9a86ed | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 17 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | be36cf8b-acb6-11ea-aa2c-e3149c2ff908 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 2 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(rae24803) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 2 wsrep_cluster_size 3 Server logs : Node 1 , Node 2 , Node 3 . I also have tried with one node stopped and without data population on Node 1 joined to the cluster during upgrading Node 2, but there were no any data loss anyway.
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Assignee Stepan Patryshev [ stepan.patryshev ] Seppo Jaakola [ seppo ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Assignee Seppo Jaakola [ seppo ] Stepan Patryshev [ stepan.patryshev ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Assignee Stepan Patryshev [ stepan.patryshev ] Seppo Jaakola [ seppo ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Affects Version/s 10.4.13 [ 24223 ]
            Affects Version/s 10.3.14 [ 23216 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Fix Version/s 10.4 [ 22408 ]

            Data loss is there, as documented in original description. We reproduced it many times.

            rpizzi Rick Pizzi (Inactive) added a comment - Data loss is there, as documented in original description. We reproduced it many times.

            I have re-tested this in my own lab (the original bug report was from Massimo, I'm in same team).

            I confirm the bug exist and we don't understand why it is not happening to you.

            Exact steps to reproduce:

            1. install 3 nodes with latest 10.3, i used 10.3.23, wsrep version 25.3.28(r3875)
            2. create a table and insert data in it.

            Situation after 2 steps above:

            node1>create table dataloss (id int not null auto_increment primary key, value int);
            Query OK, 0 rows affected (0.025 sec)
             
            node1>insert into dataloss (value) values (1), (2), (3);
            Query OK, 3 rows affected (0.003 sec)
            Records: 3  Duplicates: 0  Warnings: 0
             
            node1>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            +----+-------+
            3 rows in set (0.000 sec)
             
            node1>show global status like 'wsrep%';
            +-------------------------------+------------------------------------------+
            | Variable_name                 | Value                                    |
            +-------------------------------+------------------------------------------+
            | wsrep_applier_thread_count    | 8                                        |
            | wsrep_apply_oooe              | 0.000000                                 |
            | wsrep_apply_oool              | 0.000000                                 |
            | wsrep_apply_window            | 1.000000                                 |
            | wsrep_causal_reads            | 0                                        |
            | wsrep_cert_deps_distance      | 1.000000                                 |
            | wsrep_cert_index_size         | 5                                        |
            | wsrep_cert_interval           | 0.000000                                 |
            | wsrep_cluster_conf_id         | 19                                       |
            | wsrep_cluster_size            | 3                                        |
            | wsrep_cluster_state_uuid      | cf61cf68-aef7-11ea-88db-1bc466429584     |
            | wsrep_cluster_status          | Primary                                  |
            | wsrep_cluster_weight          | 3                                        |
            | wsrep_commit_oooe             | 0.000000                                 |
            | wsrep_commit_oool             | 0.000000                                 |
            | wsrep_commit_window           | 1.000000                                 |
            | wsrep_connected               | ON                                       |
            | wsrep_desync_count            | 0                                        |
            | wsrep_evs_delayed             |                                          |
            | wsrep_evs_evict_list          |                                          |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                |
            | wsrep_evs_state               | OPERATIONAL                              |
            | wsrep_flow_control_paused     | 0.000000                                 |
            | wsrep_flow_control_paused_ns  | 0                                        |
            | wsrep_flow_control_recv       | 0                                        |
            | wsrep_flow_control_sent       | 0                                        |
            | wsrep_gcomm_uuid              | 66883d21-af01-11ea-a6eb-260a9c0d8490     |
            | wsrep_incoming_addresses      | AUTO,192.168.2.90:3306,192.168.2.92:3306 |
            | wsrep_last_committed          | 8                                        |
            | wsrep_local_bf_aborts         | 0                                        |
            | wsrep_local_cached_downto     | 6                                        |
            | wsrep_local_cert_failures     | 0                                        |
            | wsrep_local_commits           | 1                                        |
            | wsrep_local_index             | 1                                        |
            | wsrep_local_recv_queue        | 0                                        |
            | wsrep_local_recv_queue_avg    | 0.000000                                 |
            | wsrep_local_recv_queue_max    | 1                                        |
            | wsrep_local_recv_queue_min    | 0                                        |
            | wsrep_local_replays           | 0                                        |
            | wsrep_local_send_queue        | 0                                        |
            | wsrep_local_send_queue_avg    | 0.000000                                 |
            | wsrep_local_send_queue_max    | 1                                        |
            | wsrep_local_send_queue_min    | 0                                        |
            | wsrep_local_state             | 4                                        |
            | wsrep_local_state_comment     | Synced                                   |
            | wsrep_local_state_uuid        | cf61cf68-aef7-11ea-88db-1bc466429584     |
            | wsrep_open_connections        | 0                                        |
            | wsrep_open_transactions       | 0                                        |
            | wsrep_protocol_version        | 9                                        |
            | wsrep_provider_name           | Galera                                   |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>        |
            | wsrep_provider_version        | 25.3.28(r3875)                           |
            | wsrep_ready                   | ON                                       |
            | wsrep_received                | 4                                        |
            | wsrep_received_bytes          | 755                                      |
            | wsrep_repl_data_bytes         | 978                                      |
            | wsrep_repl_keys               | 9                                        |
            | wsrep_repl_keys_bytes         | 144                                      |
            | wsrep_repl_other_bytes        | 0                                        |
            | wsrep_replicated              | 3                                        |
            | wsrep_replicated_bytes        | 1328                                     |
            | wsrep_rollbacker_thread_count | 1                                        |
            | wsrep_thread_count            | 9                                        |
            +-------------------------------+------------------------------------------+
            63 rows in set (0.001 sec)
            

            3. on node 2, shut down and upgrade to latest 10.4, I used 10.4.13, wsrep 26.4.4(r4599)

            When you restart that node, you see weird values for cluster_size and cluster_local_index:

            MariaDB [(none)]> show global status like 'wsrep%';
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | cf61cf68-aef7-11ea-88db-1bc466429584                                                                                                           |
            | wsrep_protocol_version        | -1                                                                                                                                             |
            | wsrep_last_committed          | 8                                                                                                                                              |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 3                                                                                                                                              |
            | wsrep_received_bytes          | 288                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 1                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | -1                                                                                                                                             |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 0                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 0                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | AUTO,192.168.2.90:3306,192.168.2.92:3306                                                                                                       |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0.000567644/0.00112438/0.00173288/0.000348106/7                                                                                                |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | 043aaa1a-af04-11ea-9292-9a42c9f9c38d                                                                                                           |
            | wsrep_applier_thread_count    | 8                                                                                                                                              |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
            | wsrep_cluster_size            | 0                                                                                                                                              |
            | wsrep_cluster_state_uuid      |                                                                                                                                                |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 18446744073709551615                                                                                                                           |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 9                                                                                                                                              |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            65 rows in set (0.001 sec)
            

            Recheck the content of table dataloss on 3 nodes:

            node1>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            +----+-------+
            3 rows in set (0.001 sec)
             
            node2> select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            +----+-------+
            3 rows in set (0.001 sec)
             
            node3>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            +----+-------+
            3 rows in set (0.000 sec)
             
            
            

            Now insert a row on node1, verify it has been added:

            node1>insert into dataloss (value) values (4);
            Query OK, 1 row affected (0.002 sec)
             
            node1>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            | 11 |     4 |
            +----+-------+
            4 rows in set (0.000 sec)
            

            If you check on node2, that row is not there and it's lost:

            noed2> select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            +----+-------+
            3 rows in set (0.000 sec)
            

            On node 3, the row is there:

            node3>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            | 11 |     4 |
            +----+-------+
            4 rows in set (0.000 sec)
            
            

            Any other row inserted in this situation never reaches node 2 - it's data loss.

            Then if you reboot the node2 once more, the wsrep config clears and looks good:

            Redirecting to /bin/systemctl stop mariadb.service
            [root@docker2 ~]# service mariadb start
            Redirecting to /bin/systemctl start mariadb.service
            [root@docker2 ~]# mysql -A
            Welcome to the MariaDB monitor.  Commands end with ; or \g.
            Your MariaDB connection id is 20
            Server version: 10.4.13-MariaDB-log MariaDB Server
             
            Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
             
            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
             
            node2> show global status like 'wsrep_local_index';
            +-------------------+-------+
            | Variable_name     | Value |
            +-------------------+-------+
            | wsrep_local_index | 2     |
            +-------------------+-------+
            1 row in set (0.001 sec)
            
            

            Now, if I insert a new row on node1, it is correctly propagated to all nodes, but the row previously inserted is lost:

            node1>insert into dataloss (value) values (5);
            Query OK, 1 row affected (0.003 sec)
            node1>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            | 11 |     4 |
            | 16 |     5 |
            +----+-------+
            5 rows in set (0.000 sec)
             
            node2> select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            | 16 |     5 |
            +----+-------+
            4 rows in set (0.000 sec)
             
            node3>select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  2 |     1 |
            |  5 |     2 |
            |  8 |     3 |
            | 11 |     4 |
            | 16 |     5 |
            +----+-------+
            5 rows in set (0.000 sec)
            

            So, please re-test the above scenario to verify that there is actual data loss and it's not only a problem of bad variable display

            Thanks
            RIck

            rpizzi Rick Pizzi (Inactive) added a comment - I have re-tested this in my own lab (the original bug report was from Massimo, I'm in same team). I confirm the bug exist and we don't understand why it is not happening to you. Exact steps to reproduce: 1. install 3 nodes with latest 10.3, i used 10.3.23, wsrep version 25.3.28(r3875) 2. create a table and insert data in it. Situation after 2 steps above: node1>create table dataloss (id int not null auto_increment primary key, value int); Query OK, 0 rows affected (0.025 sec)   node1>insert into dataloss (value) values (1), (2), (3); Query OK, 3 rows affected (0.003 sec) Records: 3 Duplicates: 0 Warnings: 0   node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.000 sec)   node1>show global status like 'wsrep%'; +-------------------------------+------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------+ | wsrep_applier_thread_count | 8 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 19 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | cf61cf68-aef7-11ea-88db-1bc466429584 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 66883d21-af01-11ea-a6eb-260a9c0d8490 | | wsrep_incoming_addresses | AUTO,192.168.2.90:3306,192.168.2.92:3306 | | wsrep_last_committed | 8 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 6 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 1 | | wsrep_local_index | 1 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | cf61cf68-aef7-11ea-88db-1bc466429584 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.28(r3875) | | wsrep_ready | ON | | wsrep_received | 4 | | wsrep_received_bytes | 755 | | wsrep_repl_data_bytes | 978 | | wsrep_repl_keys | 9 | | wsrep_repl_keys_bytes | 144 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 3 | | wsrep_replicated_bytes | 1328 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 9 | +-------------------------------+------------------------------------------+ 63 rows in set (0.001 sec) 3. on node 2, shut down and upgrade to latest 10.4, I used 10.4.13, wsrep 26.4.4(r4599) When you restart that node, you see weird values for cluster_size and cluster_local_index: MariaDB [(none)]> show global status like 'wsrep%'; +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | cf61cf68-aef7-11ea-88db-1bc466429584 | | wsrep_protocol_version | -1 | | wsrep_last_committed | 8 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 3 | | wsrep_received_bytes | 288 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | -1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | AUTO,192.168.2.90:3306,192.168.2.92:3306 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0.000567644/0.00112438/0.00173288/0.000348106/7 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 043aaa1a-af04-11ea-9292-9a42c9f9c38d | | wsrep_applier_thread_count | 8 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 18446744073709551615 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 9 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ 65 rows in set (0.001 sec) Recheck the content of table dataloss on 3 nodes: node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.001 sec)   node2> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.001 sec)   node3>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.000 sec)   Now insert a row on node1, verify it has been added: node1>insert into dataloss (value) values (4); Query OK, 1 row affected (0.002 sec)   node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | +----+-------+ 4 rows in set (0.000 sec) If you check on node2, that row is not there and it's lost: noed2> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | +----+-------+ 3 rows in set (0.000 sec) On node 3, the row is there: node3>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | +----+-------+ 4 rows in set (0.000 sec) Any other row inserted in this situation never reaches node 2 - it's data loss. Then if you reboot the node2 once more, the wsrep config clears and looks good: Redirecting to /bin/systemctl stop mariadb.service [root@docker2 ~]# service mariadb start Redirecting to /bin/systemctl start mariadb.service [root@docker2 ~]# mysql -A Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 20 Server version: 10.4.13-MariaDB-log MariaDB Server   Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   node2> show global status like 'wsrep_local_index'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | wsrep_local_index | 2 | +-------------------+-------+ 1 row in set (0.001 sec) Now, if I insert a new row on node1, it is correctly propagated to all nodes, but the row previously inserted is lost: node1>insert into dataloss (value) values (5); Query OK, 1 row affected (0.003 sec) node1>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | | 16 | 5 | +----+-------+ 5 rows in set (0.000 sec)   node2> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 16 | 5 | +----+-------+ 4 rows in set (0.000 sec)   node3>select * from dataloss; +----+-------+ | id | value | +----+-------+ | 2 | 1 | | 5 | 2 | | 8 | 3 | | 11 | 4 | | 16 | 5 | +----+-------+ 5 rows in set (0.000 sec) So, please re-test the above scenario to verify that there is actual data loss and it's not only a problem of bad variable display Thanks RIck

            stepan.patryshev Please check the above.

            rpizzi Rick Pizzi (Inactive) added a comment - stepan.patryshev Please check the above.
            mihaQ MikaH added a comment -

            tested with rolling-update method. Three node cluster where nodes were 10.3.23 (on Centos 7.6). Node2 upgraded:

            node1> MariaDB [test]> create table dataloss (id int not null auto_increment primary key, value int); 
            MariaDB [test]> insert into dataloss (value) values (1), (2), (3);
            Query OK, 3 rows affected (0.006 sec)
            Records: 3  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
            3 rows in set (0.001 sec)
            

            Status on node1:

            MariaDB [test]> show global status like 'wsrep%cluster_size%';
            +--------------------+-------+
            | Variable_name      | Value |
            +--------------------+-------+
            | wsrep_cluster_size | 3     |
            +--------------------+-------+
            1 row in set (0.002 sec)
            MariaDB [test]> show global status like 'wsrep%size%';
            +-----------------------+-------+
            | Variable_name         | Value |
            +-----------------------+-------+
            | wsrep_cert_index_size | 3     |
            | wsrep_cluster_size    | 3     |
            +-----------------------+-------+
            2 rows in set (0.002 sec)
            

            Status on node2 before upgrade:

            MariaDB [(none)]> select * from test.dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
            4 rows in set (0.001 sec)
            

            Perform node2 upgrade:

            # Copy configs to safe place:
            mkdir /root/configs/
            /bin/cp -p /etc/my.cnf.d/*cnf /root/configs/.
            # Stop and remove old rpm's:
            systemctl stop mariadb && rpm -qai|grep -e Maria -e galera |grep Name | awk '{print "yum remove " $3 " -y"}'|bash
            # Then install new rpm's and Selinux-policyfiles:
            yum localinstall rpmsfor10.4.13/*rpm -y && semodule -v -i selinux/*.pp
            # Copy configs back:
            /bin/cp -p /root/configs/*cnf /etc/my.cnf.d/.
            # Add needed link, start MariaDB and run mysql_upgrade:
            ln -s /usr/lib64/galera-4 /usr/lib64/galera && systemctl start mariadb && mysql_upgrade -uroot -p --skip-write-binlog
            

            Status after node2 upgrade:

            [root@galera2 ~]# mysql -uroot
            Welcome to the MariaDB monitor.  Commands end with ; or \g.
            Your MariaDB connection id is 852
            Server version: 10.4.13-MariaDB-log MariaDB Server
             
            Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
             
            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
             
            MariaDB [(none)]> show global status like 'wsrep%cluster_size%';
            +--------------------+-------+
            | Variable_name      | Value |
            +--------------------+-------+
            | wsrep_cluster_size | 3     |
            +--------------------+-------+
            1 row in set (0.002 sec)
             
            MariaDB [(none)]> show global status like 'wsrep%size%';
            +-----------------------+-------+
            | Variable_name         | Value |
            +-----------------------+-------+
            | wsrep_cert_index_size | 3     |
            | wsrep_cluster_size    | 3     |
            +-----------------------+-------+
            2 rows in set (0.002 sec)
             
            MariaDB [(none)]>
            

            Inserting on node1 data:

            MariaDB [test]> insert into dataloss (value) values (4);
            Query OK, 1 row affected (0.004 sec)
             
            MariaDB [test]> select * from dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            | 12 |     4 |
            +----+-------+
            4 rows in set (0.000 sec)
            

            Status on node2 after data inserted on node1:

            MariaDB [(none)]> select * from test.dataloss;
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            | 12 |     4 |
            +----+-------+
            4 rows in set (0.000 sec)
            MariaDB [(none)]>
            

            No data loss with this method

            mihaQ MikaH added a comment - tested with rolling-update method. Three node cluster where nodes were 10.3.23 (on Centos 7.6). Node2 upgraded: node1> MariaDB [test]> create table dataloss (id int not null auto_increment primary key, value int); MariaDB [test]> insert into dataloss (value) values (1), (2), (3); Query OK, 3 rows affected (0.006 sec) Records: 3 Duplicates: 0 Warnings: 0   MariaDB [test]> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 3 rows in set (0.001 sec) Status on node1: MariaDB [test]> show global status like 'wsrep%cluster_size%'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.002 sec) MariaDB [test]> show global status like 'wsrep%size%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | wsrep_cert_index_size | 3 | | wsrep_cluster_size | 3 | +-----------------------+-------+ 2 rows in set (0.002 sec) Status on node2 before upgrade: MariaDB [(none)]> select * from test.dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 4 rows in set (0.001 sec) Perform node2 upgrade: # Copy configs to safe place: mkdir /root/configs/ /bin/cp -p /etc/my.cnf.d/*cnf /root/configs/. # Stop and remove old rpm's: systemctl stop mariadb && rpm -qai|grep -e Maria -e galera |grep Name | awk '{print "yum remove " $3 " -y"}'|bash # Then install new rpm's and Selinux-policyfiles: yum localinstall rpmsfor10.4.13/*rpm -y && semodule -v -i selinux/*.pp # Copy configs back: /bin/cp -p /root/configs/*cnf /etc/my.cnf.d/. # Add needed link, start MariaDB and run mysql_upgrade: ln -s /usr/lib64/galera-4 /usr/lib64/galera && systemctl start mariadb && mysql_upgrade -uroot -p --skip-write-binlog Status after node2 upgrade: [root@galera2 ~]# mysql -uroot Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 852 Server version: 10.4.13-MariaDB-log MariaDB Server   Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MariaDB [(none)]> show global status like 'wsrep%cluster_size%'; +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+ 1 row in set (0.002 sec)   MariaDB [(none)]> show global status like 'wsrep%size%'; +-----------------------+-------+ | Variable_name | Value | +-----------------------+-------+ | wsrep_cert_index_size | 3 | | wsrep_cluster_size | 3 | +-----------------------+-------+ 2 rows in set (0.002 sec)   MariaDB [(none)]> Inserting on node1 data: MariaDB [test]> insert into dataloss (value) values (4); Query OK, 1 row affected (0.004 sec)   MariaDB [test]> select * from dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ 4 rows in set (0.000 sec) Status on node2 after data inserted on node1: MariaDB [(none)]> select * from test.dataloss; +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ 4 rows in set (0.000 sec) MariaDB [(none)]> No data loss with this method
            rpizzi Rick Pizzi (Inactive) added a comment - - edited

            If node2 came up with correct cluster index it could be it has performed an SST.
            Please post logs...

            rpizzi Rick Pizzi (Inactive) added a comment - - edited If node2 came up with correct cluster index it could be it has performed an SST. Please post logs...
            massimo.disaro Massimo made changes -
            Comment [ [~mihaQ] you are doing the wrong test. the insert and the data loss are happening when the node2 is down. on your test you write when the node join already the cluster ]
            mihaQ MikaH made changes -
            Attachment node1_bootsrapped_10.3.23.log.rtf [ 52193 ]
            Attachment node2_upgraded.log.rtf [ 52194 ]
            mihaQ MikaH made changes -
            Attachment node2_upgraded_10.4.13.log [ 52195 ]
            Attachment node1_bootsrapped_10.3.23.log [ 52196 ]
            mihaQ MikaH added a comment - Here are the logs: node2_upgraded_10.4.13.log node1_bootsrapped_10.3.23.log

            Your log is mangled. I would suggest you follow exactly my steps and you should get the same results. We did this in multiple labs with same result.

            rpizzi Rick Pizzi (Inactive) added a comment - Your log is mangled. I would suggest you follow exactly my steps and you should get the same results. We did this in multiple labs with same result.
            stepan.patryshev Stepan Patryshev (Inactive) made changes -

            rpizzi Thank you for the detailed steps. I have retested it with wsrep version 25.3.28(r3875) you mentioned and these steps, but unfortunately still have not got any data loss or a server crash.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited rpizzi Thank you for the detailed steps. I have retested it with wsrep version 25.3.28(r3875) you mentioned and these steps, but unfortunately still have not got any data loss or a server crash.

            rpizziI have passed your steps with standard installed packages on separate VMs but still have not managed to reproduce it. Do not know what is the key difference. Can you please share the steps how exactly do you update the server just in case?

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - rpizzi I have passed your steps with standard installed packages on separate VMs but still have not managed to reproduce it. Do not know what is the key difference. Can you please share the steps how exactly do you update the server just in case?

            The steps are outlined above https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156703 and are more than detailed.

            Can you please post the output of your session when running the above commands here?

            rpizzi Rick Pizzi (Inactive) added a comment - The steps are outlined above https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156703 and are more than detailed. Can you please post the output of your session when running the above commands here?
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 200709_patgal_output.zip [ 52743 ]

            rpizzi Here you are my sessions output. There are different sessions for MariaDB client and for the console itself.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - rpizzi Here you are my sessions output . There are different sessions for MariaDB client and for the console itself.

            stepan.patryshev from these files we can't infer whether the correct sequence of steps has been executed.

            Can you provide evidence that reproducing the steps I have outlined above you get different results?
            As I already mentioned, two in my team on two separate environments can reproduce it just fine and 100% of the time.

            Please, try once again, and provide a single output with all the steps done in sequence, like I did above.

            Thanks

            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - stepan.patryshev from these files we can't infer whether the correct sequence of steps has been executed. Can you provide evidence that reproducing the steps I have outlined above you get different results? As I already mentioned, two in my team on two separate environments can reproduce it just fine and 100% of the time. Please, try once again, and provide a single output with all the steps done in sequence, like I did above. Thanks Rick
            massimo.disaro Massimo added a comment -

            please do not use the test schema as well and add the steps, conf and error log of all the nodes. Looking at the log isnt clear what you have done

            massimo.disaro Massimo added a comment - please do not use the test schema as well and add the steps, conf and error log of all the nodes. Looking at the log isnt clear what you have done
            stepan.patryshev Stepan Patryshev (Inactive) made changes -

            rpizzi I have passed the steps again without any failures. PFA all logs and cnf files.

            Steps:

            1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902).

            2. On Node1 create a table and insert data in it.

            [root@patgal1 ~]# mysql -pr -e'CREATE DATABASE d;create table d.dataloss (id int not null auto_increment primary key, value int);insert into d.dataloss (value) values (1), (2), (3);'
            [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            +----+-------+
             
             
            [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            +----+-------+
             
             
            [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            +----+-------+
            

            Situation after above:

            Node1:

            [root@patgal1 ~]# mysql -pr -e'show global status like "wsrep%";'
            +-------------------------------+-------------------------------------------------------+
            | Variable_name                 | Value                                                 |
            +-------------------------------+-------------------------------------------------------+
            | wsrep_applier_thread_count    | 1                                                     |
            | wsrep_apply_oooe              | 0.000000                                              |
            | wsrep_apply_oool              | 0.000000                                              |
            | wsrep_apply_window            | 1.000000                                              |
            | wsrep_causal_reads            | 0                                                     |
            | wsrep_cert_deps_distance      | 1.000000                                              |
            | wsrep_cert_index_size         | 5                                                     |
            | wsrep_cert_interval           | 0.000000                                              |
            | wsrep_cluster_conf_id         | 3                                                     |
            | wsrep_cluster_size            | 3                                                     |
            | wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_cluster_status          | Primary                                               |
            | wsrep_cluster_weight          | 3                                                     |
            | wsrep_commit_oooe             | 0.000000                                              |
            | wsrep_commit_oool             | 0.000000                                              |
            | wsrep_commit_window           | 1.000000                                              |
            | wsrep_connected               | ON                                                    |
            | wsrep_desync_count            | 0                                                     |
            | wsrep_evs_delayed             |                                                       |
            | wsrep_evs_evict_list          |                                                       |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
            | wsrep_evs_state               | OPERATIONAL                                           |
            | wsrep_flow_control_paused     | 0.000000                                              |
            | wsrep_flow_control_paused_ns  | 0                                                     |
            | wsrep_flow_control_recv       | 0                                                     |
            | wsrep_flow_control_sent       | 0                                                     |
            | wsrep_gcomm_uuid              | f1120258-c51e-11ea-8b48-cb8ed6394b53                  |
            | wsrep_incoming_addresses      | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 |
            | wsrep_last_committed          | 24                                                    |
            | wsrep_local_bf_aborts         | 0                                                     |
            | wsrep_local_cached_downto     | 22                                                    |
            | wsrep_local_cert_failures     | 0                                                     |
            | wsrep_local_commits           | 1                                                     |
            | wsrep_local_index             | 0                                                     |
            | wsrep_local_recv_queue        | 0                                                     |
            | wsrep_local_recv_queue_avg    | 0.000000                                              |
            | wsrep_local_recv_queue_max    | 1                                                     |
            | wsrep_local_recv_queue_min    | 0                                                     |
            | wsrep_local_replays           | 0                                                     |
            | wsrep_local_send_queue        | 0                                                     |
            | wsrep_local_send_queue_avg    | 0.000000                                              |
            | wsrep_local_send_queue_max    | 1                                                     |
            | wsrep_local_send_queue_min    | 0                                                     |
            | wsrep_local_state             | 4                                                     |
            | wsrep_local_state_comment     | Synced                                                |
            | wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_open_connections        | 0                                                     |
            | wsrep_open_transactions       | 0                                                     |
            | wsrep_protocol_version        | 9                                                     |
            | wsrep_provider_name           | Galera                                                |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
            | wsrep_provider_version        | 25.3.29(r3902)                                        |
            | wsrep_ready                   | ON                                                    |
            | wsrep_received                | 4                                                     |
            | wsrep_received_bytes          | 626                                                   |
            | wsrep_repl_data_bytes         | 969                                                   |
            | wsrep_repl_keys               | 8                                                     |
            | wsrep_repl_keys_bytes         | 136                                                   |
            | wsrep_repl_other_bytes        | 0                                                     |
            | wsrep_replicated              | 3                                                     |
            | wsrep_replicated_bytes        | 1312                                                  |
            | wsrep_rollbacker_thread_count | 1                                                     |
            | wsrep_thread_count            | 2                                                     |
            +-------------------------------+-------------------------------------------------------+
            

            Node2:

            [root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";'
            +-------------------------------+-------------------------------------------------------+
            | Variable_name                 | Value                                                 |
            +-------------------------------+-------------------------------------------------------+
            | wsrep_applier_thread_count    | 1                                                     |
            | wsrep_apply_oooe              | 0.000000                                              |
            | wsrep_apply_oool              | 0.000000                                              |
            | wsrep_apply_window            | 1.000000                                              |
            | wsrep_causal_reads            | 0                                                     |
            | wsrep_cert_deps_distance      | 1.000000                                              |
            | wsrep_cert_index_size         | 5                                                     |
            | wsrep_cert_interval           | 0.000000                                              |
            | wsrep_cluster_conf_id         | 3                                                     |
            | wsrep_cluster_size            | 3                                                     |
            | wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_cluster_status          | Primary                                               |
            | wsrep_cluster_weight          | 3                                                     |
            | wsrep_commit_oooe             | 0.000000                                              |
            | wsrep_commit_oool             | 0.000000                                              |
            | wsrep_commit_window           | 1.000000                                              |
            | wsrep_connected               | ON                                                    |
            | wsrep_desync_count            | 0                                                     |
            | wsrep_evs_delayed             |                                                       |
            | wsrep_evs_evict_list          |                                                       |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
            | wsrep_evs_state               | OPERATIONAL                                           |
            | wsrep_flow_control_paused     | 0.000000                                              |
            | wsrep_flow_control_paused_ns  | 0                                                     |
            | wsrep_flow_control_recv       | 0                                                     |
            | wsrep_flow_control_sent       | 0                                                     |
            | wsrep_gcomm_uuid              | f8c46db5-c51e-11ea-8095-6ffbd7cfa539                  |
            | wsrep_incoming_addresses      | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 |
            | wsrep_last_committed          | 24                                                    |
            | wsrep_local_bf_aborts         | 0                                                     |
            | wsrep_local_cached_downto     | 22                                                    |
            | wsrep_local_cert_failures     | 0                                                     |
            | wsrep_local_commits           | 0                                                     |
            | wsrep_local_index             | 1                                                     |
            | wsrep_local_recv_queue        | 0                                                     |
            | wsrep_local_recv_queue_avg    | 0.000000                                              |
            | wsrep_local_recv_queue_max    | 1                                                     |
            | wsrep_local_recv_queue_min    | 0                                                     |
            | wsrep_local_replays           | 0                                                     |
            | wsrep_local_send_queue        | 0                                                     |
            | wsrep_local_send_queue_avg    | 0.000000                                              |
            | wsrep_local_send_queue_max    | 1                                                     |
            | wsrep_local_send_queue_min    | 0                                                     |
            | wsrep_local_state             | 4                                                     |
            | wsrep_local_state_comment     | Synced                                                |
            | wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_open_connections        | 0                                                     |
            | wsrep_open_transactions       | 0                                                     |
            | wsrep_protocol_version        | 9                                                     |
            | wsrep_provider_name           | Galera                                                |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
            | wsrep_provider_version        | 25.3.29(r3902)                                        |
            | wsrep_ready                   | ON                                                    |
            | wsrep_received                | 6                                                     |
            | wsrep_received_bytes          | 1803                                                  |
            | wsrep_repl_data_bytes         | 0                                                     |
            | wsrep_repl_keys               | 0                                                     |
            | wsrep_repl_keys_bytes         | 0                                                     |
            | wsrep_repl_other_bytes        | 0                                                     |
            | wsrep_replicated              | 0                                                     |
            | wsrep_replicated_bytes        | 0                                                     |
            | wsrep_rollbacker_thread_count | 1                                                     |
            | wsrep_thread_count            | 2                                                     |
            +-------------------------------+-------------------------------------------------------+
            

            3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

            4. Join upgraded Node2 to the cluster:

            [root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";'
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
            | wsrep_protocol_version        | 9                                                                                                                                              |
            | wsrep_last_committed          | 24                                                                                                                                             |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 2                                                                                                                                              |
            | wsrep_received_bytes          | 280                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 1                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | -1                                                                                                                                             |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 0                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 0                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | AUTO,172.20.3.101:3306,172.20.3.103:3306                                                                                                       |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | 332a2e12-c525-11ea-be26-4ed9b6694f67                                                                                                           |
            | wsrep_applier_thread_count    | 1                                                                                                                                              |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 10                                                                                                                                             |
            | wsrep_cluster_size            | 3                                                                                                                                              |
            | wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 0                                                                                                                                              |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 2                                                                                                                                              |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            

            wsrep_cluster_size and wsrep_local_index on Node2:

            wsrep_cluster_size 3
            wsrep_local_index 0

            5. Recheck the content of table dataloss on 3 nodes:

            [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            +----+-------+
             
            [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            +----+-------+
             
            [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            +----+-------+
            

            6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3:

            [root@patgal1 ~]# mysql -pr -e'insert into d.dataloss (value) values (4);'
            [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            | 11 |     4 |
            +----+-------+
             
            [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            | 11 |     4 |
            +----+-------+
             
            [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  1 |     1 |
            |  4 |     2 |
            |  7 |     3 |
            | 11 |     4 |
            +----+-------+
            

            As you may see there are no any related errors or data loss here.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited rpizzi I have passed the steps again without any failures. PFA all logs and cnf files . Steps: 1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902). 2. On Node1 create a table and insert data in it. [root@patgal1 ~]# mysql -pr -e'CREATE DATABASE d;create table d.dataloss (id int not null auto_increment primary key, value int);insert into d.dataloss (value) values (1), (2), (3);' [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+     [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+     [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ Situation after above: Node1: [root@patgal1 ~]# mysql -pr -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | f1120258-c51e-11ea-8b48-cb8ed6394b53 | | wsrep_incoming_addresses | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 | | wsrep_last_committed | 24 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 22 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 1 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 4 | | wsrep_received_bytes | 626 | | wsrep_repl_data_bytes | 969 | | wsrep_repl_keys | 8 | | wsrep_repl_keys_bytes | 136 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 3 | | wsrep_replicated_bytes | 1312 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ Node2: [root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | f8c46db5-c51e-11ea-8095-6ffbd7cfa539 | | wsrep_incoming_addresses | 172.20.3.101:3306,172.20.3.102:3306,172.20.3.103:3306 | | wsrep_last_committed | 24 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 22 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 1 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 6 | | wsrep_received_bytes | 1803 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ 3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599). 4. Join upgraded Node2 to the cluster: [root@patgal2 ~]# mysql -pr -e'show global status like "wsrep%";' +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 24 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | -1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | AUTO,172.20.3.101:3306,172.20.3.103:3306 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 332a2e12-c525-11ea-be26-4ed9b6694f67 | | wsrep_applier_thread_count | 1 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 10 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 0 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ wsrep_cluster_size and wsrep_local_index on Node2: wsrep_cluster_size 3 wsrep_local_index 0 5. Recheck the content of table dataloss on 3 nodes: [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+   [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+   [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | +----+-------+ 6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3: [root@patgal1 ~]# mysql -pr -e'insert into d.dataloss (value) values (4);' [root@patgal1 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | | 11 | 4 | +----+-------+   [root@patgal2 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | | 11 | 4 | +----+-------+   [root@patgal3 ~]# mysql -pr -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 1 | 1 | | 4 | 2 | | 7 | 3 | | 11 | 4 | +----+-------+ As you may see there are no any related errors or data loss here.

            You aren't reproducing the issue.

            Can you please explicit step 3 in details?
            When you say:

             3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).
            

            We would like to see the exact steps used to do this as this is where you probably are doing things
            differently. Please paste relevant part of history file.

            Thanks
            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - You aren't reproducing the issue. Can you please explicit step 3 in details? When you say: 3. On Node2 set wsrep_on=OFF, shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599). We would like to see the exact steps used to do this as this is where you probably are doing things differently. Please paste relevant part of history file. Thanks Rick

            OK, by looking at the output of the patgal2 session (both yesterday and the other day) we see this:

            root@patgal2 ~]#   systemctl stop mariadb
            [root@patgal2 ~]# systemctl start mariadb
            [root@patgal2 ~]# 
            [root@patgal2 ~]# systemctl stop mariadb
            

            Basically after upgrading node 2 to 10.4 you start the server with wsrep ON, run mysql_upgrade then shut down and set wsrep to OFF and start again. This is not what we have specified in the ticket.

            Please repeat EXACT steps we have posted. In other words: after upgrading packages you need to start with WSREP OFF not ON.

            Thanks
            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - OK, by looking at the output of the patgal2 session (both yesterday and the other day) we see this: root@patgal2 ~]# systemctl stop mariadb [root@patgal2 ~]# systemctl start mariadb [root@patgal2 ~]# [root@patgal2 ~]# systemctl stop mariadb Basically after upgrading node 2 to 10.4 you start the server with wsrep ON, run mysql_upgrade then shut down and set wsrep to OFF and start again. This is not what we have specified in the ticket. Please repeat EXACT steps we have posted. In other words: after upgrading packages you need to start with WSREP OFF not ON. Thanks Rick

            Re-reading the entire ticket I see that there was some confusion about this WSREP_ON = OFF thing, as Massimo (original bug submitter) said to start with off, run upgrade, stop and start with on, while in my test I don't play with that at all.

            The bottom line of all this is: the FIRST time you start MariaDB on node2 with WSREP enabled, you get that weird cluster index and cluster_size=0 and it is in that moment that any data inserted in other nodes does not reach node2.

            If you start node2 twice with WSREP enabled the problem does not appear because the 2nd restart (which you always seem to do, see above) "clears" the weird situation.

            So, once again, to properly test this DO NOT touch the WSREP_ON variable, leave it on, but after upgrading packages start node2 only once, not twice. You will see the weird cluster index and size values - in that situation you will see that any row inserted on other nodes is lost (does not reach node2)

            rpizzi Rick Pizzi (Inactive) added a comment - Re-reading the entire ticket I see that there was some confusion about this WSREP_ON = OFF thing, as Massimo (original bug submitter) said to start with off, run upgrade, stop and start with on, while in my test I don't play with that at all. The bottom line of all this is: the FIRST time you start MariaDB on node2 with WSREP enabled, you get that weird cluster index and cluster_size=0 and it is in that moment that any data inserted in other nodes does not reach node2 . If you start node2 twice with WSREP enabled the problem does not appear because the 2nd restart (which you always seem to do, see above) "clears" the weird situation. So, once again, to properly test this DO NOT touch the WSREP_ON variable, leave it on, but after upgrading packages start node2 only once, not twice . You will see the weird cluster index and size values - in that situation you will see that any row inserted on other nodes is lost (does not reach node2)

            @rpizzi You are wrong here. As you may see in "20200713_patgal2_output.log" on the line 165 there is "wsrep_on=OFF" before running upgraded server. The only diference is that I did it even before upgrade.
            And in "20200713_patgal2.err" the first run of 10.4.13 is on the line 494: "2020-07-13 19:13:38 0 [Note] InnoDB: 10.4.13 started", and the 1-st attemt to load WSREP provider on 10.4.13 logged later on the line 515 "2020-07-13 19:19:39 0 [Note] WSREP: Loading provider".
            And here you are the history fragment:

              262  systemctl start mariadb
              263  mysql -pr -e'select * from d.dataloss;'
              264  mysql -pr -e'show global status like "wsrep%";'
              265  systemctl stop mariadb
              266  vi /etc/my.cnf.d/server2.cnf
              267  cat /etc/yum.repos.d/mariadb.repo
              268  curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4
              269  cat /etc/yum.repos.d/mariadb.repo
              270  yum list installed | grep galera
              271  yum list installed | grep MariaDB
              272  sudo yum remove MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common
              273  yum list installed | grep galera
              274  yum list installed | grep MariaDB
              275  yum install MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common
              276  yum list installed | grep MariaDB
              277  yum list installed | grep galera
              278  systemctl start mariadb
              279  mysql_upgrade -s
              280  mysql_upgrade -s -pr
              281  systemctl stop mariadb
              282  vi /etc/my.cnf.d/server.cnf
              283  vi /etc/my.cnf.d/server2.cnf
              284  systemctl start mariadb
              285  vi /etc/my.cnf.d/server2.cnf
              286  systemctl start mariadb
              287  mysql -pr -e'show global status like "wsrep%";'
              288  mysql -pr -e'select * from d.dataloss;'
            

            Anyway I will try to do it more closer to your steps.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited @ rpizzi You are wrong here. As you may see in "20200713_patgal2_output.log" on the line 165 there is "wsrep_on=OFF" before running upgraded server. The only diference is that I did it even before upgrade. And in "20200713_patgal2.err" the first run of 10.4.13 is on the line 494: "2020-07-13 19:13:38 0 [Note] InnoDB: 10.4.13 started", and the 1-st attemt to load WSREP provider on 10.4.13 logged later on the line 515 "2020-07-13 19:19:39 0 [Note] WSREP: Loading provider". And here you are the history fragment: 262 systemctl start mariadb 263 mysql -pr -e'select * from d.dataloss;' 264 mysql -pr -e'show global status like "wsrep%";' 265 systemctl stop mariadb 266 vi /etc/my.cnf.d/server2.cnf 267 cat /etc/yum.repos.d/mariadb.repo 268 curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4 269 cat /etc/yum.repos.d/mariadb.repo 270 yum list installed | grep galera 271 yum list installed | grep MariaDB 272 sudo yum remove MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common 273 yum list installed | grep galera 274 yum list installed | grep MariaDB 275 yum install MariaDB-server galera MariaDB-backup MariaDB-client MariaDB-common 276 yum list installed | grep MariaDB 277 yum list installed | grep galera 278 systemctl start mariadb 279 mysql_upgrade -s 280 mysql_upgrade -s -pr 281 systemctl stop mariadb 282 vi /etc/my.cnf.d/server.cnf 283 vi /etc/my.cnf.d/server2.cnf 284 systemctl start mariadb 285 vi /etc/my.cnf.d/server2.cnf 286 systemctl start mariadb 287 mysql -pr -e'show global status like "wsrep%";' 288 mysql -pr -e'select * from d.dataloss;' Anyway I will try to do it more closer to your steps.

            To verify the bug DO NOT start node2 more than once after upgrading. That's it.

            rpizzi Rick Pizzi (Inactive) added a comment - To verify the bug DO NOT start node2 more than once after upgrading. That's it.
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 20200714_MDEV-22723_patgal_no_errors.zip [ 52793 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 20200714_MDEV-22723_patgal_no_errors.zip [ 52794 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 20200714_MDEV-22723_patgal_no_errors.zip [ 52793 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 20200714_MDEV-22723_patgal_no_errors.zip [ 52794 ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -

            @rpizzi It has not helped. I have not changed WSREP_ON at all and run the upgraded server only once. And it has passed again without any failures or data loss. Please, share exact steps how do you install and update packages. PFA all logs and cnf files.

            Steps:

            1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902).

            2. On Node1 create a table and insert data in it.

            [root@patgal1 ~]# mysql -e'create database d;'
            [root@patgal1 ~]# mysql -e'create table d.dataloss (id int not null auto_increment primary key, value int) 
            ;'
            [root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (1), (2), (3);'
             
            [root@patgal1 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
            

            2.1. Check that data are propagated successfully to other nodes:

            [root@patgal2 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
             
            [root@patgal3 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
            

            2.2. Situation after above:

            Node1:

            [root@patgal1 ~]# mysql -e'show global status like "wsrep%";'
            +-------------------------------+-------------------------------------------------------+
            | Variable_name                 | Value                                                 |
            +-------------------------------+-------------------------------------------------------+
            | wsrep_applier_thread_count    | 1                                                     |
            | wsrep_apply_oooe              | 0.000000                                              |
            | wsrep_apply_oool              | 0.000000                                              |
            | wsrep_apply_window            | 1.000000                                              |
            | wsrep_causal_reads            | 0                                                     |
            | wsrep_cert_deps_distance      | 1.000000                                              |
            | wsrep_cert_index_size         | 5                                                     |
            | wsrep_cert_interval           | 0.000000                                              |
            | wsrep_cluster_conf_id         | 3                                                     |
            | wsrep_cluster_size            | 3                                                     |
            | wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_cluster_status          | Primary                                               |
            | wsrep_cluster_weight          | 3                                                     |
            | wsrep_commit_oooe             | 0.000000                                              |
            | wsrep_commit_oool             | 0.000000                                              |
            | wsrep_commit_window           | 1.000000                                              |
            | wsrep_connected               | ON                                                    |
            | wsrep_desync_count            | 0                                                     |
            | wsrep_evs_delayed             |                                                       |
            | wsrep_evs_evict_list          |                                                       |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
            | wsrep_evs_state               | OPERATIONAL                                           |
            | wsrep_flow_control_paused     | 0.000000                                              |
            | wsrep_flow_control_paused_ns  | 0                                                     |
            | wsrep_flow_control_recv       | 0                                                     |
            | wsrep_flow_control_sent       | 0                                                     |
            | wsrep_gcomm_uuid              | fed13746-c5b4-11ea-a5fe-a6a8e8ca175a                  |
            | wsrep_incoming_addresses      | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 |
            | wsrep_last_committed          | 6                                                     |
            | wsrep_local_bf_aborts         | 0                                                     |
            | wsrep_local_cached_downto     | 4                                                     |
            | wsrep_local_cert_failures     | 0                                                     |
            | wsrep_local_commits           | 1                                                     |
            | wsrep_local_index             | 2                                                     |
            | wsrep_local_recv_queue        | 0                                                     |
            | wsrep_local_recv_queue_avg    | 0.000000                                              |
            | wsrep_local_recv_queue_max    | 1                                                     |
            | wsrep_local_recv_queue_min    | 0                                                     |
            | wsrep_local_replays           | 0                                                     |
            | wsrep_local_send_queue        | 0                                                     |
            | wsrep_local_send_queue_avg    | 0.000000                                              |
            | wsrep_local_send_queue_max    | 1                                                     |
            | wsrep_local_send_queue_min    | 0                                                     |
            | wsrep_local_state             | 4                                                     |
            | wsrep_local_state_comment     | Synced                                                |
            | wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_open_connections        | 0                                                     |
            | wsrep_open_transactions       | 0                                                     |
            | wsrep_protocol_version        | 9                                                     |
            | wsrep_provider_name           | Galera                                                |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
            | wsrep_provider_version        | 25.3.29(r3902)                                        |
            | wsrep_ready                   | ON                                                    |
            | wsrep_received                | 10                                                    |
            | wsrep_received_bytes          | 782                                                   |
            | wsrep_repl_data_bytes         | 969                                                   |
            | wsrep_repl_keys               | 8                                                     |
            | wsrep_repl_keys_bytes         | 136                                                   |
            | wsrep_repl_other_bytes        | 0                                                     |
            | wsrep_replicated              | 3                                                     |
            | wsrep_replicated_bytes        | 1312                                                  |
            | wsrep_rollbacker_thread_count | 1                                                     |
            | wsrep_thread_count            | 2                                                     |
            +-------------------------------+-------------------------------------------------------+
            

            Node2:

            [root@patgal2 ~]# mysql -e'show global status like "wsrep%";'
            +-------------------------------+-------------------------------------------------------+
            | Variable_name                 | Value                                                 |
            +-------------------------------+-------------------------------------------------------+
            | wsrep_applier_thread_count    | 1                                                     |
            | wsrep_apply_oooe              | 0.000000                                              |
            | wsrep_apply_oool              | 0.000000                                              |
            | wsrep_apply_window            | 1.000000                                              |
            | wsrep_causal_reads            | 0                                                     |
            | wsrep_cert_deps_distance      | 1.000000                                              |
            | wsrep_cert_index_size         | 5                                                     |
            | wsrep_cert_interval           | 0.000000                                              |
            | wsrep_cluster_conf_id         | 3                                                     |
            | wsrep_cluster_size            | 3                                                     |
            | wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_cluster_status          | Primary                                               |
            | wsrep_cluster_weight          | 3                                                     |
            | wsrep_commit_oooe             | 0.000000                                              |
            | wsrep_commit_oool             | 0.000000                                              |
            | wsrep_commit_window           | 1.000000                                              |
            | wsrep_connected               | ON                                                    |
            | wsrep_desync_count            | 0                                                     |
            | wsrep_evs_delayed             |                                                       |
            | wsrep_evs_evict_list          |                                                       |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                             |
            | wsrep_evs_state               | OPERATIONAL                                           |
            | wsrep_flow_control_paused     | 0.000000                                              |
            | wsrep_flow_control_paused_ns  | 0                                                     |
            | wsrep_flow_control_recv       | 0                                                     |
            | wsrep_flow_control_sent       | 0                                                     |
            | wsrep_gcomm_uuid              | 11a7b1fd-c5b5-11ea-9a59-5e4e35dabad1                  |
            | wsrep_incoming_addresses      | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 |
            | wsrep_last_committed          | 6                                                     |
            | wsrep_local_bf_aborts         | 0                                                     |
            | wsrep_local_cached_downto     | 4                                                     |
            | wsrep_local_cert_failures     | 0                                                     |
            | wsrep_local_commits           | 0                                                     |
            | wsrep_local_index             | 0                                                     |
            | wsrep_local_recv_queue        | 0                                                     |
            | wsrep_local_recv_queue_avg    | 0.142857                                              |
            | wsrep_local_recv_queue_max    | 2                                                     |
            | wsrep_local_recv_queue_min    | 0                                                     |
            | wsrep_local_replays           | 0                                                     |
            | wsrep_local_send_queue        | 0                                                     |
            | wsrep_local_send_queue_avg    | 0.000000                                              |
            | wsrep_local_send_queue_max    | 1                                                     |
            | wsrep_local_send_queue_min    | 0                                                     |
            | wsrep_local_state             | 4                                                     |
            | wsrep_local_state_comment     | Synced                                                |
            | wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                  |
            | wsrep_open_connections        | 0                                                     |
            | wsrep_open_transactions       | 0                                                     |
            | wsrep_protocol_version        | 9                                                     |
            | wsrep_provider_name           | Galera                                                |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                     |
            | wsrep_provider_version        | 25.3.29(r3902)                                        |
            | wsrep_ready                   | ON                                                    |
            | wsrep_received                | 7                                                     |
            | wsrep_received_bytes          | 1811                                                  |
            | wsrep_repl_data_bytes         | 0                                                     |
            | wsrep_repl_keys               | 0                                                     |
            | wsrep_repl_keys_bytes         | 0                                                     |
            | wsrep_repl_other_bytes        | 0                                                     |
            | wsrep_replicated              | 0                                                     |
            | wsrep_replicated_bytes        | 0                                                     |
            | wsrep_rollbacker_thread_count | 1                                                     |
            | wsrep_thread_count            | 2                                                     |
            +-------------------------------+-------------------------------------------------------+
            

            3. On Node2 shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599).

            3.1. systemctl stop mariadb
            3.2. https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s – --mariadb-server-version=mariadb-10.4
            3.3. yum remove MariaDB galera
            3.4. yum install MariaDB galera
            3.5. rm /etc/my.cnf.d/server.cnf
            3.6. Update "wsrep_provider" value to "/usr/lib64/galera-4/libgalera_smm.so" in "/etc/my.cnf.d/server2.cnf".
            3.7. systemctl start mariadb

            3.8. mysql_upgrade -s

            The --upgrade-system-tables option was used, user tables won't be touched.
            Phase 1/7: Checking and upgrading mysql database
            Processing databases
            mysql
            mysql.column_stats                                 OK
            mysql.columns_priv                                 OK
            mysql.db                                           OK
            mysql.event                                        OK
            mysql.func                                         OK
            mysql.gtid_slave_pos                               OK
            mysql.help_category                                OK
            mysql.help_keyword                                 OK
            mysql.help_relation                                OK
            mysql.help_topic                                   OK
            mysql.host                                         OK
            mysql.index_stats                                  OK
            mysql.innodb_index_stats                           OK
            mysql.innodb_table_stats                           OK
            mysql.plugin                                       OK
            mysql.proc                                         OK
            mysql.procs_priv                                   OK
            mysql.proxies_priv                                 OK
            mysql.roles_mapping                                OK
            mysql.servers                                      OK
            mysql.table_stats                                  OK
            mysql.tables_priv                                  OK
            mysql.time_zone                                    OK
            mysql.time_zone_leap_second                        OK
            mysql.time_zone_name                               OK
            mysql.time_zone_transition                         OK
            mysql.time_zone_transition_type                    OK
            mysql.transaction_registry                         OK
            mysql.user                                         OK
            mysql.wsrep_cluster                                OK
            mysql.wsrep_cluster_members                        OK
            mysql.wsrep_streaming_log                          OK
            Phase 2/7: Installing used storage engines... Skipped
            Phase 3/7: Fixing views... Skipped
            Phase 4/7: Running 'mysql_fix_privilege_tables'
            Phase 5/7: Fixing table and database names ... Skipped
            Phase 6/7: Checking and upgrading tables... Skipped
            Phase 7/7: Running 'FLUSH PRIVILEGES'
            OK
            

            4.

            [root@patgal2 ~]# mysql -e'show global status like "wsrep%";'
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
            | wsrep_protocol_version        | 9                                                                                                                                              |
            | wsrep_last_committed          | 6                                                                                                                                              |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 2                                                                                                                                              |
            | wsrep_received_bytes          | 280                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 1                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | -1                                                                                                                                             |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 0                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 0                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | 172.20.3.103:3306,AUTO,172.20.3.101:3306                                                                                                       |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | 4a75dc41-c5ba-11ea-a6f4-4b9ef7fb8a13                                                                                                           |
            | wsrep_applier_thread_count    | 1                                                                                                                                              |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 6                                                                                                                                              |
            | wsrep_cluster_size            | 3                                                                                                                                              |
            | wsrep_cluster_state_uuid      | 499f4d1e-b249-11ea-abeb-764a6a38b248                                                                                                           |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 1                                                                                                                                              |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 2                                                                                                                                              |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            

            wsrep_cluster_size and wsrep_local_index on Node2:

            wsrep_cluster_size 3
            wsrep_local_index 1

            5. Recheck the content of table dataloss on 3 nodes:

            root@patgal1 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
             
            [root@patgal2 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
             
            [root@patgal3 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            +----+-------+
            

            6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3:

            [root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (4);'
             
            [root@patgal1 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            | 12 |     4 |
            +----+-------+
             
            [root@patgal2 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            | 12 |     4 |
            +----+-------+
             
            [root@patgal3 ~]# mysql -e'select * from d.dataloss;'
            +----+-------+
            | id | value |
            +----+-------+
            |  3 |     1 |
            |  6 |     2 |
            |  9 |     3 |
            | 12 |     4 |
            +----+-------+
            

            And here you are the history fragment for the Node2:

              211  date
              212  ps -ef | grep mysqld
              213  systemctl start mariadb
              214  mysql -e'select * from d.dataloss;'
              215  mysql -e'show global status like "wsrep%";'
              216  systemctl stop mariadb
              217  cat /etc/yum.repos.d/mariadb.repo
              218  curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4
              219  cat /etc/yum.repos.d/mariadb.repo
              220  yum list installed | grep galera
              221  yum list installed | grep MariaDB
              222  yum remove MariaDB galera
              223  yum list installed | grep galera
              224  yum list installed | grep MariaDB
              225  yum install MariaDB galera
              226  yum list installed | grep MariaDB
              227  yum list installed | grep galera
              228  rm /etc/my.cnf.d/server.cnf
              229  vi /etc/my.cnf.d/server2.cnf
              230  cat /etc/my.cnf.d/server2.cnf
              231  ls -al /usr/lib64/galera-4/libgalera_smm.so
              232  systemctl start mariadb
              233  mysql_upgrade -s
              234  mysql -e'show global status like "wsrep%";'
              235  mysql -e'select * from d.dataloss;'
            

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited @ rpizzi It has not helped. I have not changed WSREP_ON at all and run the upgraded server only once. And it has passed again without any failures or data loss. Please, share exact steps how do you install and update packages. PFA all logs and cnf files . Steps: 1. Install 3 nodes with MariaDB 10.3.23 on CentOS Linux release 7.8.2003 (Core), wsrep version 25.3.29(r3902). 2. On Node1 create a table and insert data in it. [root@patgal1 ~]# mysql -e'create database d;' [root@patgal1 ~]# mysql -e'create table d.dataloss (id int not null auto_increment primary key, value int) ;' [root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (1), (2), (3);'   [root@patgal1 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 2.1. Check that data are propagated successfully to other nodes: [root@patgal2 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+   [root@patgal3 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 2.2. Situation after above: Node1: [root@patgal1 ~]# mysql -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | fed13746-c5b4-11ea-a5fe-a6a8e8ca175a | | wsrep_incoming_addresses | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 | | wsrep_last_committed | 6 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 4 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 1 | | wsrep_local_index | 2 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 10 | | wsrep_received_bytes | 782 | | wsrep_repl_data_bytes | 969 | | wsrep_repl_keys | 8 | | wsrep_repl_keys_bytes | 136 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 3 | | wsrep_replicated_bytes | 1312 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ Node2: [root@patgal2 ~]# mysql -e'show global status like "wsrep%";' +-------------------------------+-------------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------------+ | wsrep_applier_thread_count | 1 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 1.000000 | | wsrep_cert_index_size | 5 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 3 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 11a7b1fd-c5b5-11ea-9a59-5e4e35dabad1 | | wsrep_incoming_addresses | 172.20.3.102:3306,172.20.3.103:3306,172.20.3.101:3306 | | wsrep_last_committed | 6 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 4 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.142857 | | wsrep_local_recv_queue_max | 2 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.29(r3902) | | wsrep_ready | ON | | wsrep_received | 7 | | wsrep_received_bytes | 1811 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+-------------------------------------------------------+ 3. On Node2 shut down and upgrade to 10.4.13, wsrep 26.4.4(r4599). 3.1. systemctl stop mariadb 3.2. https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s – --mariadb-server-version=mariadb-10.4 3.3. yum remove MariaDB galera 3.4. yum install MariaDB galera 3.5. rm /etc/my.cnf.d/server.cnf 3.6. Update "wsrep_provider" value to "/usr/lib64/galera-4/libgalera_smm.so" in "/etc/my.cnf.d/server2.cnf". 3.7. systemctl start mariadb 3.8. mysql_upgrade -s The --upgrade-system-tables option was used, user tables won't be touched. Phase 1/7: Checking and upgrading mysql database Processing databases mysql mysql.column_stats OK mysql.columns_priv OK mysql.db OK mysql.event OK mysql.func OK mysql.gtid_slave_pos OK mysql.help_category OK mysql.help_keyword OK mysql.help_relation OK mysql.help_topic OK mysql.host OK mysql.index_stats OK mysql.innodb_index_stats OK mysql.innodb_table_stats OK mysql.plugin OK mysql.proc OK mysql.procs_priv OK mysql.proxies_priv OK mysql.roles_mapping OK mysql.servers OK mysql.table_stats OK mysql.tables_priv OK mysql.time_zone OK mysql.time_zone_leap_second OK mysql.time_zone_name OK mysql.time_zone_transition OK mysql.time_zone_transition_type OK mysql.transaction_registry OK mysql.user OK mysql.wsrep_cluster OK mysql.wsrep_cluster_members OK mysql.wsrep_streaming_log OK Phase 2/7: Installing used storage engines... Skipped Phase 3/7: Fixing views... Skipped Phase 4/7: Running 'mysql_fix_privilege_tables' Phase 5/7: Fixing table and database names ... Skipped Phase 6/7: Checking and upgrading tables... Skipped Phase 7/7: Running 'FLUSH PRIVILEGES' OK 4. [root@patgal2 ~]# mysql -e'show global status like "wsrep%";' +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 6 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | -1 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 172.20.3.103:3306,AUTO,172.20.3.101:3306 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 4a75dc41-c5ba-11ea-a6f4-4b9ef7fb8a13 | | wsrep_applier_thread_count | 1 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 6 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 499f4d1e-b249-11ea-abeb-764a6a38b248 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 1 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 2 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ wsrep_cluster_size and wsrep_local_index on Node2: wsrep_cluster_size 3 wsrep_local_index 1 5. Recheck the content of table dataloss on 3 nodes: root@patgal1 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+   [root@patgal2 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+   [root@patgal3 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | +----+-------+ 6. Insert a row on Node1, verify it has been added and replicated to Node2 and Node3: [root@patgal1 ~]# mysql -e'insert into d.dataloss (value) values (4);'   [root@patgal1 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+   [root@patgal2 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+   [root@patgal3 ~]# mysql -e'select * from d.dataloss;' +----+-------+ | id | value | +----+-------+ | 3 | 1 | | 6 | 2 | | 9 | 3 | | 12 | 4 | +----+-------+ And here you are the history fragment for the Node2: 211 date 212 ps -ef | grep mysqld 213 systemctl start mariadb 214 mysql -e'select * from d.dataloss;' 215 mysql -e'show global status like "wsrep%";' 216 systemctl stop mariadb 217 cat /etc/yum.repos.d/mariadb.repo 218 curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash -s -- --mariadb-server-version=mariadb-10.4 219 cat /etc/yum.repos.d/mariadb.repo 220 yum list installed | grep galera 221 yum list installed | grep MariaDB 222 yum remove MariaDB galera 223 yum list installed | grep galera 224 yum list installed | grep MariaDB 225 yum install MariaDB galera 226 yum list installed | grep MariaDB 227 yum list installed | grep galera 228 rm /etc/my.cnf.d/server.cnf 229 vi /etc/my.cnf.d/server2.cnf 230 cat /etc/my.cnf.d/server2.cnf 231 ls -al /usr/lib64/galera-4/libgalera_smm.so 232 systemctl start mariadb 233 mysql_upgrade -s 234 mysql -e'show global status like "wsrep%";' 235 mysql -e'select * from d.dataloss;'
            massimo.disaro Massimo added a comment -

            For what i could understood from your steps, you are performing the INSERT, when all the nodes are up, nomatter which version. There is not IST perform from the node that you have upgrade, cause you are not writing there while the node2 is down. You have to see that the node2 request and perform an IST cause it has not all the data yet.

            massimo.disaro Massimo added a comment - For what i could understood from your steps, you are performing the INSERT, when all the nodes are up, nomatter which version. There is not IST perform from the node that you have upgrade, cause you are not writing there while the node2 is down. You have to see that the node2 request and perform an IST cause it has not all the data yet.

            It doesn't happen because in this test you have done, you do not get the node with cluster_size=0 and weird index id.
            But you originally got that: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

            rpizzi Rick Pizzi (Inactive) added a comment - It doesn't happen because in this test you have done, you do not get the node with cluster_size=0 and weird index id. But you originally got that: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

            stepan.patryshev is there a reason why you don't use the conf file we supplied when trying this test, and use a different one that you built yourself? This is not a good way of testing bugs if you ask me. Please, try with the files we have supplied.

            Thank you!

            rpizzi Rick Pizzi (Inactive) added a comment - stepan.patryshev is there a reason why you don't use the conf file we supplied when trying this test, and use a different one that you built yourself? This is not a good way of testing bugs if you ask me. Please, try with the files we have supplied. Thank you!

            massimo.disaro Why IST should take place if according to the steps from the description and especially from the more detailed ones by @rpizzi (see https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-156703 ) INSERT is performed when upgraded node 2 is run with WSREP_ON=ON?
            Please, point me what exactly should I try differently if you have any certain idea.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - massimo.disaro Why IST should take place if according to the steps from the description and especially from the more detailed ones by @ rpizzi (see https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156703&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-156703 ) INSERT is performed when upgraded node 2 is run with WSREP_ON=ON? Please, point me what exactly should I try differently if you have any certain idea.

            rpizzi Ok, that is what I was going to try next - to use your config files. When I used "./mtr --suite=galera_3nodes --start-and-exit" simulation during my first tests I tried to get maximum related stuff from the attached configs. But when I moved to the cluster with three VMs and installed packages I decided to try first just only configs which I managed to adjust and run the cluster.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - rpizzi Ok, that is what I was going to try next - to use your config files. When I used "./mtr --suite=galera_3nodes --start-and-exit" simulation during my first tests I tried to get maximum related stuff from the attached configs. But when I moved to the cluster with three VMs and installed packages I decided to try first just only configs which I managed to adjust and run the cluster.
            stepan.patryshev Stepan Patryshev (Inactive) made changes -

            @rpizzi I have passed the steps again without any data loss or failures with the original configs: Node1 and Node2. Just changed only ip addresses. But I see there are some newer config files attached here.
            Steps were exactly the same as described in my previous test.
            PFA all logs and cnf files.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited @ rpizzi I have passed the steps again without any data loss or failures with the original configs: Node1 and Node2 . Just changed only ip addresses. But I see there are some newer config files attached here. Steps were exactly the same as described in my previous test . PFA all logs and cnf files .

            I'm stumped, especially because you were able to get the cluster size 0 in your first attempt, and now you don't get that anymore.
            How is that possible is beyond me.

            rpizzi Rick Pizzi (Inactive) added a comment - I'm stumped, especially because you were able to get the cluster size 0 in your first attempt, and now you don't get that anymore. How is that possible is beyond me.

            What OS are you running on the VMs?

            rpizzi Rick Pizzi (Inactive) added a comment - What OS are you running on the VMs?

            rpizzi CentOS Linux release 7.8.2003 (Core).

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - rpizzi CentOS Linux release 7.8.2003 (Core).

            Maybe that's the difference. Both customer and my lab is on CentOS Linux release 7.5.1804 (Core) .
            Can you please retry on that OS version?

            Thanks
            Rick

            rpizzi Rick Pizzi (Inactive) added a comment - Maybe that's the difference. Both customer and my lab is on CentOS Linux release 7.5.1804 (Core) . Can you please retry on that OS version? Thanks Rick

            I think Massimo used 7.6 but customer has 7.5 so please test on that. Thanks

            rpizzi Rick Pizzi (Inactive) added a comment - I think Massimo used 7.6 but customer has 7.5 so please test on that. Thanks
            Yurchenko Alexey made changes -
            Assignee Seppo Jaakola [ seppo ] Alexey [ yurchenko ]
            stepan.patryshev Stepan Patryshev (Inactive) made changes -

            @rpizzi I have passed the steps again without any data loss or failures on CentOS 7.5.1804.
            Steps were exactly the same as described here. Just small steps modifications were here:
            3.3. yum remove MariaDB-server MariaDB-client MariaDB-backup galera
            3.4. yum install MariaDB-common MariaDB-compat MariaDB-server MariaDB-backup MariaDB-client galera
            PFA all logs and cnf files.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited @ rpizzi I have passed the steps again without any data loss or failures on CentOS 7.5.1804. Steps were exactly the same as described here . Just small steps modifications were here: 3.3. yum remove MariaDB-server MariaDB-client MariaDB-backup galera 3.4. yum install MariaDB-common MariaDB-compat MariaDB-server MariaDB-backup MariaDB-client galera PFA all logs and cnf files .
            Yurchenko Alexey made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            This is really odd.
            Do you think you can retry with mtr?
            And see if you still got the cluster_size=0 you got at the beginning?
            Because that's the situation where data loss happens.

            See your comment below:

            https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489

            rpizzi Rick Pizzi (Inactive) added a comment - This is really odd. Do you think you can retry with mtr? And see if you still got the cluster_size=0 you got at the beginning? Because that's the situation where data loss happens. See your comment below: https://jira.mariadb.org/browse/MDEV-22723?focusedCommentId=156489&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-156489
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Attachment 20200723_MDEV-22723_data_loss.zip [ 52936 ]

            @rpizzi It's really strange, but I have managed to reproduce the data loss, but not a crash, just with my scenario using MTR described here. I used Galera 25.3.28(r3875).
            PFA all logs and cnf files. Please, ignore errors in mysqld.2.err around 22:17, I just forgot to shutdown a node and tried to run it again.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited @ rpizzi It's really strange, but I have managed to reproduce the data loss, but not a crash, just with my scenario using MTR described here . I used Galera 25.3.28(r3875). PFA all logs and cnf files . Please, ignore errors in mysqld.2.err around 22:17, I just forgot to shutdown a node and tried to run it again.

            There are the detailed steps how I reproduced the data loss.
            Release builds 10.3.23 + Galera 25.3.28(r3875) and 10.4.13 + Galera 26.4.4(r4599). PFA all logs and cnf files.

            Steps:

            1. ./mtr --suite=galera_3nodes --start-and-exit
            2. Restart all nodes one by one with separate config files from here.

            The cluster status on Node1 is:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"show global status like 'wsrep%';"
             
            +-------------------------------+-------------------------------------------------+
            | Variable_name                 | Value                                           |
            +-------------------------------+-------------------------------------------------+
            | wsrep_applier_thread_count    | 32                                              |
            | wsrep_apply_oooe              | 0.000000                                        |
            | wsrep_apply_oool              | 0.000000                                        |
            | wsrep_apply_window            | 0.000000                                        |
            | wsrep_causal_reads            | 0                                               |
            | wsrep_cert_deps_distance      | 0.000000                                        |
            | wsrep_cert_index_size         | 0                                               |
            | wsrep_cert_interval           | 0.000000                                        |
            | wsrep_cluster_conf_id         | 8                                               |
            | wsrep_cluster_size            | 3                                               |
            | wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
            | wsrep_cluster_status          | Primary                                         |
            | wsrep_cluster_weight          | 3                                               |
            | wsrep_commit_oooe             | 0.000000                                        |
            | wsrep_commit_oool             | 0.000000                                        |
            | wsrep_commit_window           | 0.000000                                        |
            | wsrep_connected               | ON                                              |
            | wsrep_desync_count            | 0                                               |
            | wsrep_evs_delayed             |                                                 |
            | wsrep_evs_evict_list          |                                                 |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                       |
            | wsrep_evs_state               | OPERATIONAL                                     |
            | wsrep_flow_control_paused     | 0.000000                                        |
            | wsrep_flow_control_paused_ns  | 0                                               |
            | wsrep_flow_control_recv       | 0                                               |
            | wsrep_flow_control_sent       | 0                                               |
            | wsrep_gcomm_uuid              | 0f038d23-cd0d-11ea-acd2-b7ff4121c102            |
            | wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 |
            | wsrep_last_committed          | 0                                               |
            | wsrep_local_bf_aborts         | 0                                               |
            | wsrep_local_cached_downto     | 18446744073709551615                            |
            | wsrep_local_cert_failures     | 0                                               |
            | wsrep_local_commits           | 0                                               |
            | wsrep_local_index             | 0                                               |
            | wsrep_local_recv_queue        | 0                                               |
            | wsrep_local_recv_queue_avg    | 0.000000                                        |
            | wsrep_local_recv_queue_max    | 1                                               |
            | wsrep_local_recv_queue_min    | 0                                               |
            | wsrep_local_replays           | 0                                               |
            | wsrep_local_send_queue        | 0                                               |
            | wsrep_local_send_queue_avg    | 0.000000                                        |
            | wsrep_local_send_queue_max    | 1                                               |
            | wsrep_local_send_queue_min    | 0                                               |
            | wsrep_local_state             | 4                                               |
            | wsrep_local_state_comment     | Synced                                          |
            | wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
            | wsrep_open_connections        | 0                                               |
            | wsrep_open_transactions       | 0                                               |
            | wsrep_protocol_version        | 9                                               |
            | wsrep_provider_name           | Galera                                          |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>               |
            | wsrep_provider_version        | 25.3.28(r3875)                                  |
            | wsrep_ready                   | ON                                              |
            | wsrep_received                | 2                                               |
            | wsrep_received_bytes          | 270                                             |
            | wsrep_repl_data_bytes         | 0                                               |
            | wsrep_repl_keys               | 0                                               |
            | wsrep_repl_keys_bytes         | 0                                               |
            | wsrep_repl_other_bytes        | 0                                               |
            | wsrep_replicated              | 0                                               |
            | wsrep_replicated_bytes        | 0                                               |
            | wsrep_rollbacker_thread_count | 1                                               |
            | wsrep_thread_count            | 33                                              |
            +-------------------------------+-------------------------------------------------+
            

            3. On the Node1 create a database and a table:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"create database d; create table d.evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));"
            

            4. On the Node1 insert 3 rows:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into d.evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into d.evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');"
            

            Data have been propageted to all the cluster:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"
             
            +----+---------------+---------+
            | Id | IdDispositivo | kkkk    |
            +----+---------------+---------+
            |  1 |           123 | aaaa    |
            |  4 |           222 | eeeeaa  |
            |  7 |      34523452 | e4r4r4  |
            +----+---------------+---------+
             
            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"
             
            +----+---------------+---------+
            | Id | IdDispositivo | kkkk    |
            +----+---------------+---------+
            |  1 |           123 | aaaa    |
            |  4 |           222 | eeeeaa  |
            |  7 |      34523452 | e4r4r4  |
            +----+---------------+---------+
             
            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"
             
            +----+---------------+---------+
            | Id | IdDispositivo | kkkk    |
            +----+---------------+---------+
            |  1 |           123 | aaaa    |
            |  4 |           222 | eeeeaa  |
            |  7 |      34523452 | e4r4r4  |
            +----+---------------+---------+
            

            5. Stop Node 2.
            6. To check that IST works while Node2 is off insert 1 row on the Node1:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(888, 'While Node 2 is OFF');"
            

            The new row is added on the Node1 and Node3:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"
            +----+---------------+---------------------+
            | Id | IdDispositivo | kkkk                |
            +----+---------------+---------------------+
            |  1 |           123 | aaaa                |
            |  4 |           222 | eeeeaa              |
            |  7 |      34523452 | e4r4r4              |
            | 11 |           888 | While Node 2 is OFF |
            +----+---------------+---------------------+
             
            [stepan@cnt7glr11 mysql-test]$ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"
            +----+---------------+---------------------+
            | Id | IdDispositivo | kkkk                |
            +----+---------------+---------------------+
            |  1 |           123 | aaaa                |
            |  4 |           222 | eeeeaa              |
            |  7 |      34523452 | e4r4r4              |
            | 11 |           888 | While Node 2 is OFF |
            +----+---------------+---------------------+
            

            7. Start the Node2.

            The new row is added on the Node2 successfully :

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"
             
            +----+---------------+---------------------+
            | Id | IdDispositivo | kkkk                |
            +----+---------------+---------------------+
            |  1 |           123 | aaaa                |
            |  4 |           222 | eeeeaa              |
            |  7 |      34523452 | e4r4r4              |
            | 11 |           888 | While Node 2 is OFF |
            +----+---------------+---------------------+
            

            8. Check the cluster status on the Node2:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"
             
            +-------------------------------+-------------------------------------------------+
            | Variable_name                 | Value                                           |
            +-------------------------------+-------------------------------------------------+
            | wsrep_applier_thread_count    | 32                                              |
            | wsrep_apply_oooe              | 0.000000                                        |
            | wsrep_apply_oool              | 0.000000                                        |
            | wsrep_apply_window            | 1.000000                                        |
            | wsrep_causal_reads            | 0                                               |
            | wsrep_cert_deps_distance      | 0.000000                                        |
            | wsrep_cert_index_size         | 0                                               |
            | wsrep_cert_interval           | 0.000000                                        |
            | wsrep_cluster_conf_id         | 10                                              |
            | wsrep_cluster_size            | 3                                               |
            | wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
            | wsrep_cluster_status          | Primary                                         |
            | wsrep_cluster_weight          | 3                                               |
            | wsrep_commit_oooe             | 0.000000                                        |
            | wsrep_commit_oool             | 0.000000                                        |
            | wsrep_commit_window           | 1.000000                                        |
            | wsrep_connected               | ON                                              |
            | wsrep_desync_count            | 0                                               |
            | wsrep_evs_delayed             |                                                 |
            | wsrep_evs_evict_list          |                                                 |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                       |
            | wsrep_evs_state               | OPERATIONAL                                     |
            | wsrep_flow_control_paused     | 0.000000                                        |
            | wsrep_flow_control_paused_ns  | 0                                               |
            | wsrep_flow_control_recv       | 0                                               |
            | wsrep_flow_control_sent       | 0                                               |
            | wsrep_gcomm_uuid              | 96685da8-cd17-11ea-be6f-4399d680ab4c            |
            | wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 |
            | wsrep_last_committed          | 6                                               |
            | wsrep_local_bf_aborts         | 0                                               |
            | wsrep_local_cached_downto     | 18446744073709551615                            |
            | wsrep_local_cert_failures     | 0                                               |
            | wsrep_local_commits           | 0                                               |
            | wsrep_local_index             | 1                                               |
            | wsrep_local_recv_queue        | 0                                               |
            | wsrep_local_recv_queue_avg    | 0.000000                                        |
            | wsrep_local_recv_queue_max    | 1                                               |
            | wsrep_local_recv_queue_min    | 0                                               |
            | wsrep_local_replays           | 0                                               |
            | wsrep_local_send_queue        | 0                                               |
            | wsrep_local_send_queue_avg    | 0.000000                                        |
            | wsrep_local_send_queue_max    | 1                                               |
            | wsrep_local_send_queue_min    | 0                                               |
            | wsrep_local_state             | 4                                               |
            | wsrep_local_state_comment     | Synced                                          |
            | wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7            |
            | wsrep_open_connections        | 0                                               |
            | wsrep_open_transactions       | 0                                               |
            | wsrep_protocol_version        | 9                                               |
            | wsrep_provider_name           | Galera                                          |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>               |
            | wsrep_provider_version        | 25.3.28(r3875)                                  |
            | wsrep_ready                   | ON                                              |
            | wsrep_received                | 3                                               |
            | wsrep_received_bytes          | 278                                             |
            | wsrep_repl_data_bytes         | 0                                               |
            | wsrep_repl_keys               | 0                                               |
            | wsrep_repl_keys_bytes         | 0                                               |
            | wsrep_repl_other_bytes        | 0                                               |
            | wsrep_replicated              | 0                                               |
            | wsrep_replicated_bytes        | 0                                               |
            | wsrep_rollbacker_thread_count | 1                                               |
            | wsrep_thread_count            | 33                                              |
            +-------------------------------+-------------------------------------------------+
            

            Pay attention that wsrep_local_index = 1.

            9. Stop Node 2.

            10. Set wsrep-on=OFF and run Node2 on 10.4.13 binaries with new config containing paths to 10.4.13 resources (cnf files here).

            /home/stepan/mariadb/10.4.13/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3.23/mysql-test/var/mysqld_new.2.cnf &
            

            11. Perform mysql_upgrade -s.
            12. Stop Node 2.
            13. Insert 1 new row on the Node1:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');"
             
            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            +----+---------------+----------------------------+
            

            14. Set wsrep-on=ON and run Node2.
            15. Check that the new row is added to the Node2 also:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" 
            +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            +----+---------------+----------------------------+
            

            16. Check the wsrep variables on the Node2:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"
             
             
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |
            | wsrep_protocol_version        | -1                                                                                                                                             |
            | wsrep_last_committed          | 7                                                                                                                                              |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 3                                                                                                                                              |
            | wsrep_received_bytes          | 288                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 2                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0.333333                                                                                                                                       |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | 7                                                                                                                                              |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 1                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 1                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002                                                                                                |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | 11fd46cc-cd1b-11ea-8f5d-7efdb4c94287                                                                                                           |
            | wsrep_applier_thread_count    | 32                                                                                                                                             |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
            | wsrep_cluster_size            | 0                                                                                                                                              |
            | wsrep_cluster_state_uuid      |                                                                                                                                                |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 18446744073709551615                                                                                                                           |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 33                                                                                                                                             |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            

            Pay attention:

            wsrep_cluster_status Primary
            wsrep_local_state_comment Synced
            wsrep_local_index 18446744073709551615
            wsrep_cluster_size 0

            17. Insert 1 row on the Node1 again:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (3,'non tireplic');"
            

            The new row has been replicated to the Node3:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"
            +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            | 16 |             3 | non tireplic               |
            +----+---------------+----------------------------+
            

            But it has NOT been replicated to the Node2:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"
            +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            +----+---------------+----------------------------+
            

            18. Just one more insert on the Node1 to repeat:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (666,'Lost data');"
            

            And again the new row has been replicated to the Node3:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            | 16 |             3 | non tireplic               |
            | 19 |           666 | Lost data                  |
            +----+---------------+----------------------------+
            

            But it has NOT been replicated to the Node2:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"                                       +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            +----+---------------+----------------------------+
            

            19. Restart the Node2.
            Check the wsrep variables on the Node2:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%';" 
             
             
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | Variable_name                 | Value                                                                                                                                          |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            | wsrep_local_state_uuid        | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |
            | wsrep_protocol_version        | 9                                                                                                                                              |
            | wsrep_last_committed          | 9                                                                                                                                              |
            | wsrep_replicated              | 0                                                                                                                                              |
            | wsrep_replicated_bytes        | 0                                                                                                                                              |
            | wsrep_repl_keys               | 0                                                                                                                                              |
            | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
            | wsrep_repl_data_bytes         | 0                                                                                                                                              |
            | wsrep_repl_other_bytes        | 0                                                                                                                                              |
            | wsrep_received                | 2                                                                                                                                              |
            | wsrep_received_bytes          | 280                                                                                                                                            |
            | wsrep_local_commits           | 0                                                                                                                                              |
            | wsrep_local_cert_failures     | 0                                                                                                                                              |
            | wsrep_local_replays           | 0                                                                                                                                              |
            | wsrep_local_send_queue        | 0                                                                                                                                              |
            | wsrep_local_send_queue_max    | 1                                                                                                                                              |
            | wsrep_local_send_queue_min    | 0                                                                                                                                              |
            | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_recv_queue        | 0                                                                                                                                              |
            | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
            | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
            | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
            | wsrep_local_cached_downto     | 7                                                                                                                                              |
            | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
            | wsrep_flow_control_paused     | 0                                                                                                                                              |
            | wsrep_flow_control_sent       | 0                                                                                                                                              |
            | wsrep_flow_control_recv       | 0                                                                                                                                              |
            | wsrep_cert_deps_distance      | 0                                                                                                                                              |
            | wsrep_apply_oooe              | 0                                                                                                                                              |
            | wsrep_apply_oool              | 0                                                                                                                                              |
            | wsrep_apply_window            | 0                                                                                                                                              |
            | wsrep_commit_oooe             | 0                                                                                                                                              |
            | wsrep_commit_oool             | 0                                                                                                                                              |
            | wsrep_commit_window           | 0                                                                                                                                              |
            | wsrep_local_state             | 4                                                                                                                                              |
            | wsrep_local_state_comment     | Synced                                                                                                                                         |
            | wsrep_cert_index_size         | 0                                                                                                                                              |
            | wsrep_causal_reads            | 0                                                                                                                                              |
            | wsrep_cert_interval           | 0                                                                                                                                              |
            | wsrep_open_transactions       | 0                                                                                                                                              |
            | wsrep_open_connections        | 0                                                                                                                                              |
            | wsrep_incoming_addresses      | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002                                                                                                |
            | wsrep_cluster_weight          | 3                                                                                                                                              |
            | wsrep_desync_count            | 0                                                                                                                                              |
            | wsrep_evs_delayed             |                                                                                                                                                |
            | wsrep_evs_evict_list          |                                                                                                                                                |
            | wsrep_evs_repl_latency        | 0/0/0/0/0                                                                                                                                      |
            | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
            | wsrep_gcomm_uuid              | 39969b6c-cd1f-11ea-abde-7b7ed790f75c                                                                                                           |
            | wsrep_applier_thread_count    | 32                                                                                                                                             |
            | wsrep_cluster_capabilities    |                                                                                                                                                |
            | wsrep_cluster_conf_id         | 14                                                                                                                                             |
            | wsrep_cluster_size            | 3                                                                                                                                              |
            | wsrep_cluster_state_uuid      | 335ea557-cd0b-11ea-bce5-1b40dbec53a7                                                                                                           |
            | wsrep_cluster_status          | Primary                                                                                                                                        |
            | wsrep_connected               | ON                                                                                                                                             |
            | wsrep_local_bf_aborts         | 0                                                                                                                                              |
            | wsrep_local_index             | 1                                                                                                                                              |
            | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
            | wsrep_provider_name           | Galera                                                                                                                                         |
            | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
            | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
            | wsrep_ready                   | ON                                                                                                                                             |
            | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
            | wsrep_thread_count            | 33                                                                                                                                             |
            +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
            

            All seems ok:

            wsrep_cluster_status Primary
            wsrep_local_state_comment Synced
            wsrep_local_index 1
            wsrep_cluster_size 3

            20. Insert the new row on the Node1:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (555,'After Node restart');"
            

            And the new row has been successfully replicated to the Node3:

            /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"                   
            +----+---------------+----------------------------+
            | Id | IdDispositivo | kkkk                       |
            +----+---------------+----------------------------+
            |  1 |           123 | aaaa                       |
            |  4 |           222 | eeeeaa                     |
            |  7 |      34523452 | e4r4r4                     |
            | 11 |           888 | While Node 2 is OFF        |
            | 13 |        777777 | While Node 2 was upgrading |
            | 22 |           555 | After Node restart         |
            +----+---------------+----------------------------+
            

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited There are the detailed steps how I reproduced the data loss. Release builds 10.3.23 + Galera 25.3.28(r3875) and 10.4.13 + Galera 26.4.4(r4599). PFA all logs and cnf files . Steps: 1. ./mtr --suite=galera_3nodes --start-and-exit 2. Restart all nodes one by one with separate config files from here . The cluster status on Node1 is: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"show global status like 'wsrep%';"   +-------------------------------+-------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------+ | wsrep_applier_thread_count | 32 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 0.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 8 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 0.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 0f038d23-cd0d-11ea-acd2-b7ff4121c102 | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_last_committed | 0 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.28(r3875) | | wsrep_ready | ON | | wsrep_received | 2 | | wsrep_received_bytes | 270 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+-------------------------------------------------+ 3. On the Node1 create a database and a table: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"create database d; create table d.evento4 (Id int primary key auto_increment, IdDispositivo int, kkkk varchar(255));" 4. On the Node1 insert 3 rows: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(123, 'aaaa'); insert into d.evento4(IdDispositivo, kkkk) values(222, 'eeeeaa'); insert into d.evento4(IdDispositivo, kkkk) values(34523452, 'e4r4r4 ');" Data have been propageted to all the cluster: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;"   +----+---------------+---------+ | Id | IdDispositivo | kkkk | +----+---------------+---------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | +----+---------------+---------+   /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"   +----+---------------+---------+ | Id | IdDispositivo | kkkk | +----+---------------+---------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | +----+---------------+---------+   /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;"   +----+---------------+---------+ | Id | IdDispositivo | kkkk | +----+---------------+---------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | +----+---------------+---------+ 5. Stop Node 2. 6. To check that IST works while Node2 is off insert 1 row on the Node1: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(888, 'While Node 2 is OFF');" The new row is added on the Node1 and Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;" +----+---------------+---------------------+ | Id | IdDispositivo | kkkk | +----+---------------+---------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | +----+---------------+---------------------+   [stepan@cnt7glr11 mysql-test]$ /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+---------------------+ | Id | IdDispositivo | kkkk | +----+---------------+---------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | +----+---------------+---------------------+ 7. Start the Node2. The new row is added on the Node2 successfully : /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;"   +----+---------------+---------------------+ | Id | IdDispositivo | kkkk | +----+---------------+---------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | +----+---------------+---------------------+ 8. Check the cluster status on the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"   +-------------------------------+-------------------------------------------------+ | Variable_name | Value | +-------------------------------+-------------------------------------------------+ | wsrep_applier_thread_count | 32 | | wsrep_apply_oooe | 0.000000 | | wsrep_apply_oool | 0.000000 | | wsrep_apply_window | 1.000000 | | wsrep_causal_reads | 0 | | wsrep_cert_deps_distance | 0.000000 | | wsrep_cert_index_size | 0 | | wsrep_cert_interval | 0.000000 | | wsrep_cluster_conf_id | 10 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_cluster_status | Primary | | wsrep_cluster_weight | 3 | | wsrep_commit_oooe | 0.000000 | | wsrep_commit_oool | 0.000000 | | wsrep_commit_window | 1.000000 | | wsrep_connected | ON | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_flow_control_paused | 0.000000 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_recv | 0 | | wsrep_flow_control_sent | 0 | | wsrep_gcomm_uuid | 96685da8-cd17-11ea-be6f-4399d680ab4c | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_last_committed | 6 | | wsrep_local_bf_aborts | 0 | | wsrep_local_cached_downto | 18446744073709551615 | | wsrep_local_cert_failures | 0 | | wsrep_local_commits | 0 | | wsrep_local_index | 1 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_avg | 0.000000 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_avg | 0.000000 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_open_connections | 0 | | wsrep_open_transactions | 0 | | wsrep_protocol_version | 9 | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 25.3.28(r3875) | | wsrep_ready | ON | | wsrep_received | 3 | | wsrep_received_bytes | 278 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+-------------------------------------------------+ Pay attention that wsrep_local_index = 1. 9. Stop Node 2. 10. Set wsrep-on=OFF and run Node2 on 10.4.13 binaries with new config containing paths to 10.4.13 resources ( cnf files here ). /home/stepan/mariadb/10.4.13/sql/mysqld --defaults-file=/home/stepan/mariadb/10.3.23/mysql-test/var/mysqld_new.2.cnf & 11. Perform mysql_upgrade -s. 12. Stop Node 2. 13. Insert 1 new row on the Node1: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4(IdDispositivo, kkkk) values(777777, 'While Node 2 was upgrading');"   /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 14. Set wsrep-on=ON and run Node2. 15. Check that the new row is added to the Node2 also: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 16. Check the wsrep variables on the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%'"     +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_protocol_version | -1 | | wsrep_last_committed | 7 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 3 | | wsrep_received_bytes | 288 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 2 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0.333333 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 7 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 1 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 1 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 11fd46cc-cd1b-11ea-8f5d-7efdb4c94287 | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 18446744073709551615 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ Pay attention: wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 18446744073709551615 wsrep_cluster_size 0 17. Insert 1 row on the Node1 again: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (3,'non tireplic');" The new row has been replicated to the Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | | 16 | 3 | non tireplic | +----+---------------+----------------------------+ But it has NOT been replicated to the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 18. Just one more insert on the Node1 to repeat: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (666,'Lost data');" And again the new row has been replicated to the Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.3.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | | 16 | 3 | non tireplic | | 19 | 666 | Lost data | +----+---------------+----------------------------+ But it has NOT been replicated to the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | +----+---------------+----------------------------+ 19. Restart the Node2. Check the wsrep variables on the Node2: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"show global status like 'wsrep%';"     +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | Variable_name | Value | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ | wsrep_local_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_protocol_version | 9 | | wsrep_last_committed | 9 | | wsrep_replicated | 0 | | wsrep_replicated_bytes | 0 | | wsrep_repl_keys | 0 | | wsrep_repl_keys_bytes | 0 | | wsrep_repl_data_bytes | 0 | | wsrep_repl_other_bytes | 0 | | wsrep_received | 2 | | wsrep_received_bytes | 280 | | wsrep_local_commits | 0 | | wsrep_local_cert_failures | 0 | | wsrep_local_replays | 0 | | wsrep_local_send_queue | 0 | | wsrep_local_send_queue_max | 1 | | wsrep_local_send_queue_min | 0 | | wsrep_local_send_queue_avg | 0 | | wsrep_local_recv_queue | 0 | | wsrep_local_recv_queue_max | 1 | | wsrep_local_recv_queue_min | 0 | | wsrep_local_recv_queue_avg | 0 | | wsrep_local_cached_downto | 7 | | wsrep_flow_control_paused_ns | 0 | | wsrep_flow_control_paused | 0 | | wsrep_flow_control_sent | 0 | | wsrep_flow_control_recv | 0 | | wsrep_cert_deps_distance | 0 | | wsrep_apply_oooe | 0 | | wsrep_apply_oool | 0 | | wsrep_apply_window | 0 | | wsrep_commit_oooe | 0 | | wsrep_commit_oool | 0 | | wsrep_commit_window | 0 | | wsrep_local_state | 4 | | wsrep_local_state_comment | Synced | | wsrep_cert_index_size | 0 | | wsrep_causal_reads | 0 | | wsrep_cert_interval | 0 | | wsrep_open_transactions | 0 | | wsrep_open_connections | 0 | | wsrep_incoming_addresses | 127.0.0.1:16000,127.0.0.1:16001,127.0.0.1:16002 | | wsrep_cluster_weight | 3 | | wsrep_desync_count | 0 | | wsrep_evs_delayed | | | wsrep_evs_evict_list | | | wsrep_evs_repl_latency | 0/0/0/0/0 | | wsrep_evs_state | OPERATIONAL | | wsrep_gcomm_uuid | 39969b6c-cd1f-11ea-abde-7b7ed790f75c | | wsrep_applier_thread_count | 32 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 14 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 335ea557-cd0b-11ea-bce5-1b40dbec53a7 | | wsrep_cluster_status | Primary | | wsrep_connected | ON | | wsrep_local_bf_aborts | 0 | | wsrep_local_index | 1 | | wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: | | wsrep_provider_name | Galera | | wsrep_provider_vendor | Codership Oy <info@codership.com> | | wsrep_provider_version | 26.4.4(r4599) | | wsrep_ready | ON | | wsrep_rollbacker_thread_count | 1 | | wsrep_thread_count | 33 | +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+ All seems ok: wsrep_cluster_status Primary wsrep_local_state_comment Synced wsrep_local_index 1 wsrep_cluster_size 3 20. Insert the new row on the Node1: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.1.sock -e"insert into d.evento4 (IdDispositivo,kkkk) values (555,'After Node restart');" And the new row has been successfully replicated to the Node3: /home/stepan/mariadb/10.3.23/client/mysql -u root -S/home/stepan/mariadb/10.3.23/mysql-test/var/tmp/mysqld.2.sock -e"select * from d.evento4;" +----+---------------+----------------------------+ | Id | IdDispositivo | kkkk | +----+---------------+----------------------------+ | 1 | 123 | aaaa | | 4 | 222 | eeeeaa | | 7 | 34523452 | e4r4r4 | | 11 | 888 | While Node 2 is OFF | | 13 | 777777 | While Node 2 was upgrading | | 22 | 555 | After Node restart | +----+---------------+----------------------------+
            Yurchenko Alexey added a comment -

            Ok, I think I know what is the problem, at least where it is solved.

            Massimo's node 2 log has the following

            wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully.
            ...
            2020-05-25 22:25:17 19 [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1
            

            As you may guess the last line spells bad news - the node cannot apply writesets. It is caused by a bug that was fixed in commit 02ad0e11 on April 1, way after release 4.4 was tagged and was merged into MariaDB Galera fork in commit ae24803 on April 9.

            Stepan's log has

            wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(rae24803) by Codership Oy <info@codership.com> loaded successfully.
            

            That's why Stepan can't reproduce the bug, he's using a different Galera binary.

            In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble.

            Yurchenko Alexey added a comment - Ok, I think I know what is the problem, at least where it is solved. Massimo's node 2 log has the following wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(r4599) by Codership Oy <info@codership.com> loaded successfully. ... 2020-05-25 22:25:17 19 [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1 As you may guess the last line spells bad news - the node cannot apply writesets. It is caused by a bug that was fixed in commit 02ad0e11 on April 1, way after release 4.4 was tagged and was merged into MariaDB Galera fork in commit ae24803 on April 9. Stepan's log has wsrep loader: [INFO] wsrep_load(): Galera 26.4.4(rae24803) by Codership Oy <info@codership.com> loaded successfully. That's why Stepan can't reproduce the bug, he's using a different Galera binary. In any case this bug (and many other) is fixed in 4.5 release tag. All MariaDB 10.4 users should switch to it. It will solve a lot of trouble.

            Yurchenko I hope you are right, but I used Galera 26.4.4(r4599) on 20.07.2020 and there was no data loss.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - - edited Yurchenko I hope you are right, but I used Galera 26.4.4(r4599) on 20.07.2020 and there was no data loss.
            Yurchenko Alexey added a comment -

            julien.fritsch
            Yes, it is fixed in later Galera releases.

            stepan.patryshev
            On 20.07.2020 there was a mistake in case reproduction: in Massimo's case node 2 was missing 2 events and had to perform state transfer. In your case it seems there were no updates to the cluster during node 2 upgrade: it was shut down at seqno 7 and was brought back - cluster still had seqno 7. So there was no state transfer and it is a different code path.

            And yes, I found out why in Massimo's case some transactions were lost:

            [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1
            

            is a warning because we can expect during upgrade of the last node and protocol bump to get a writeset with an old protocol and in that case it simply is supposed to fail certification - on all nodes. The problem (that was fixed in the commit I mentioned above) was that protocol version was not updated in total order (it was not updated at all). As a result all transactions that failed certification on node 2 (and thus were skipped), perfectly passed certification on node 1 and thus were committed. In the end both nodes believed that they have successfully processed all events and are on the same page regarding last seqno. That's why those missing events went unnoticed.

            However when node 2 was restarted, it rejoined the cluster without state transfer, the bug was not triggered, and it could continue to apply transactions.

            Yurchenko Alexey added a comment - julien.fritsch Yes, it is fixed in later Galera releases. stepan.patryshev On 20.07.2020 there was a mistake in case reproduction: in Massimo's case node 2 was missing 2 events and had to perform state transfer. In your case it seems there were no updates to the cluster during node 2 upgrade: it was shut down at seqno 7 and was brought back - cluster still had seqno 7. So there was no state transfer and it is a different code path. And yes, I found out why in Massimo's case some transactions were lost: [Warning] WSREP: trx protocol version: 4 does not match certification protocol version: -1 is a warning because we can expect during upgrade of the last node and protocol bump to get a writeset with an old protocol and in that case it simply is supposed to fail certification - on all nodes. The problem (that was fixed in the commit I mentioned above) was that protocol version was not updated in total order (it was not updated at all). As a result all transactions that failed certification on node 2 (and thus were skipped), perfectly passed certification on node 1 and thus were committed. In the end both nodes believed that they have successfully processed all events and are on the same page regarding last seqno. That's why those missing events went unnoticed. However when node 2 was restarted, it rejoined the cluster without state transfer, the bug was not triggered, and it could continue to apply transactions.

            Yurchenko Thank you for the clarifications. But I want to note that rpizzi reproduced it without updating data during Node2 upgrade: steps are here.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - Yurchenko Thank you for the clarifications. But I want to note that rpizzi reproduced it without updating data during Node2 upgrade: steps are here .
            stepan.patryshev Stepan Patryshev (Inactive) made changes -

            I have verified that using Galera 26.4.5(rb3764ab) and 25.3.30(r827e681) there were no any data loss or crash. The steps were the same which reproduced the bug on 23.07.2020 with 25.3.28(r3875) and 26.4.4(r4599).

            But the strange wsrep values still presented just after the first time upgraded node joined the cluster:

            wsrep_local_index 18446744073709551615
            wsrep_cluster_size 0
            stepan.patryshev Stepan Patryshev (Inactive) added a comment - I have verified that using Galera 26.4.5(rb3764ab) and 25.3.30(r827e681) there were no any data loss or crash. The steps were the same which reproduced the bug on 23.07.2020 with 25.3.28(r3875) and 26.4.4(r4599). But the strange wsrep values still presented just after the first time upgraded node joined the cluster: wsrep_local_index 18446744073709551615 wsrep_cluster_size 0

            Closing as fixed.

            stepan.patryshev Stepan Patryshev (Inactive) added a comment - Closing as fixed.
            stepan.patryshev Stepan Patryshev (Inactive) made changes -
            Fix Version/s 10.3.25 [ 24506 ]
            Fix Version/s 10.4.15 [ 24507 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.16 [ 25020 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.4.15 [ 24507 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3.26 [ 25021 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.3.25 [ 24506 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 109176 ] MariaDB v4 [ 157862 ]
            claudio.nanni Claudio Nanni made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 183937

            People

              Yurchenko Alexey
              massimo.disaro Massimo
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.