Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22723

Data loss when performing rolling upgrade from 10.3.23-MariaDB to 10.4.13-MariaDB

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: In Progress (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 10.3.23, 10.4.13
    • Fix Version/s: 10.3, 10.4
    • Component/s: Galera
    • Labels:
      None
    • Environment:
      OS: CentOS Linux release 7.6.1810 (Core)

      Description

      Creating a full galera cluster of 10.3.23 with 3 nodes
      mdb1,mdb2,mdb3 10.3.23 version.
      We gently shutdown mdb3 to check the interaction between writing on 10.3.23 and effect on 10.4.13, to enforce IST . We also re-tested with all 3 servers up , same result.

      Create a schema and a table on mdb1. all propagate

      • stop mdb2 . yum remove the rpm of Mariadb and galera.
      • install from new repo of Mariadb 10.4 and update my.cnf to the right wsrep_provider
      • set wsrep_on=OFF on my.cnf
      • start mdb2
      • perform mysql_upgrade -s
      • stop mdb2
      • set wsrep_on=ON on my.cnf
      • start mbd2

      At this point the status galera variables on mdb2:

      MariaDB mdb2 [pippo]> show global status like 'wsrep%';
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name                 | Value                                                                                                                                          |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
      | wsrep_protocol_version        | -1                                                                                                                                             |
      | wsrep_last_committed          | 65                                                                                                                                             |
      | wsrep_replicated              | 0                                                                                                                                              |
      | wsrep_replicated_bytes        | 0                                                                                                                                              |
      | wsrep_repl_keys               | 0                                                                                                                                              |
      | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
      | wsrep_repl_data_bytes         | 0                                                                                                                                              |
      | wsrep_repl_other_bytes        | 0                                                                                                                                              |
      | wsrep_received                | 3                                                                                                                                              |
      | wsrep_received_bytes          | 208                                                                                                                                            |
      | wsrep_local_commits           | 0                                                                                                                                              |
      | wsrep_local_cert_failures     | 0                                                                                                                                              |
      | wsrep_local_replays           | 0                                                                                                                                              |
      | wsrep_local_send_queue        | 0                                                                                                                                              |
      | wsrep_local_send_queue_max    | 1                                                                                                                                              |
      | wsrep_local_send_queue_min    | 0                                                                                                                                              |
      | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_recv_queue        | 0                                                                                                                                              |
      | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
      | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
      | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_cached_downto     | 64                                                                                                                                             |
      | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
      | wsrep_flow_control_paused     | 0                                                                                                                                              |
      | wsrep_flow_control_sent       | 0                                                                                                                                              |
      | wsrep_flow_control_recv       | 0                                                                                                                                              |
      | wsrep_cert_deps_distance      | 0                                                                                                                                              |
      | wsrep_apply_oooe              | 0.5                                                                                                                                            |
      | wsrep_apply_oool              | 0                                                                                                                                              |
      | wsrep_apply_window            | 1.5                                                                                                                                            |
      | wsrep_commit_oooe             | 0                                                                                                                                              |
      | wsrep_commit_oool             | 0                                                                                                                                              |
      | wsrep_commit_window           | 1                                                                                                                                              |
      | wsrep_local_state             | 4                                                                                                                                              |
      | wsrep_local_state_comment     | Synced                                                                                                                                         |
      | wsrep_cert_index_size         | 0                                                                                                                                              |
      | wsrep_causal_reads            | 0                                                                                                                                              |
      | wsrep_cert_interval           | 0                                                                                                                                              |
      | wsrep_open_transactions       | 0                                                                                                                                              |
      | wsrep_open_connections        | 0                                                                                                                                              |
      | wsrep_incoming_addresses      | AUTO,10.0.1.13:3306                                                                                                                            |
      | wsrep_cluster_weight          | 2                                                                                                                                              |
      | wsrep_desync_count            | 0                                                                                                                                              |
      | wsrep_evs_delayed             |                                                                                                                                                |
      | wsrep_evs_evict_list          |                                                                                                                                                |
      | wsrep_evs_repl_latency        | 0.000325151/0.00176008/0.00607075/0.00193032/7                                                                                                 |
      | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
      | wsrep_gcomm_uuid              | 7ff14eaf-9ed6-11ea-b98f-8fc2b85537f4                                                                                                           |
      | wsrep_applier_thread_count    | 32                                                                                                                                             |
      | wsrep_cluster_capabilities    |                                                                                                                                                |
      | wsrep_cluster_conf_id         | 18446744073709551615                                                                                                                           |
      | wsrep_cluster_size            | 0                                                                                                                                              |
      | wsrep_cluster_state_uuid      |                                                                                                                                                |
      | wsrep_cluster_status          | Primary                                                                                                                                        |
      | wsrep_connected               | ON                                                                                                                                             |
      | wsrep_local_bf_aborts         | 0                                                                                                                                              |
      | wsrep_local_index             | 18446744073709551615                                                                                                                           |
      | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
      | wsrep_provider_name           | Galera                                                                                                                                         |
      | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
      | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
      | wsrep_ready                   | ON                                                                                                                                             |
      | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
      | wsrep_thread_count            | 33                                                                                                                                             |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      65 rows in set (0.001 sec)
      

      NOTE THAT :

      wsrep_cluster_status          | Primary
      wsrep_local_state_comment     | Synced
      wsrep_local_index             | 18446744073709551615
      wsrep_cluster_size            | 0
      

      Looking at the error log, the server is ready for connections after a IST

      At this point the 'master' mdb1 have a write that are not getting replicate:

      MariaDB mdb2 [pippo]> select * from evento4;
      +----+---------------+--------+
      | Id | IdDispositivo | kkkk   |
      +----+---------------+--------+
      |  1 |           123 | aaaa   |
      |  3 |           222 | eeeeaa |
      |  4 |      34523452 | e4r4r4 |
      +----+---------------+--------+
      

      WHILE ON THE MASTER:

      MariaDB mdb1 [pippo]> select * from evento4;
      +----+---------------+--------+
      | Id | IdDispositivo | kkkk   |
      +----+---------------+--------+
      |  1 |           123 | aaaa   |
      |  3 |           222 | eeeeaa |
      |  4 |      34523452 | e4r4r4 |
      +----+---------------+--------+
      3 rows in set (0.001 sec)
       
      MariaDB mdb1 [pippo]> insert into evento4 (IdDispositivo,kkkk) values (3,'non tireplic');
      Query OK, 1 row affected (0.015 sec)
       
      MariaDB mdb1 [pippo]> select * from evento4;
      +----+---------------+--------------+
      | Id | IdDispositivo | kkkk         |
      +----+---------------+--------------+
      |  1 |           123 | aaaa         |
      |  3 |           222 | eeeeaa       |
      |  4 |      34523452 | e4r4r4       |
      |  6 |             3 | non tireplic |
      +----+---------------+--------------+
      4 rows in set (0.001 sec)
      

      The fact that INSERT not getting replicate could be indeed cause the cluster_size=0 and wsrep_local_index= 18446744073709551615, obviously so

      AT THIS point we restart mdb2 to fix the status:

      [root@mdb2 my.cnf.d]# systemctl restart  mariadb
      [root@mdb2 my.cnf.d]# mysql
       
      MariaDB md2 [(none)]> show global status like 'wsrep%';
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | Variable_name                 | Value                                                                                                                                          |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      | wsrep_local_state_uuid        | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
      | wsrep_protocol_version        | 9                                                                                                                                              |
      | wsrep_last_committed          | 66                                                                                                                                             |
      | wsrep_replicated              | 0                                                                                                                                              |
      | wsrep_replicated_bytes        | 0                                                                                                                                              |
      | wsrep_repl_keys               | 0                                                                                                                                              |
      | wsrep_repl_keys_bytes         | 0                                                                                                                                              |
      | wsrep_repl_data_bytes         | 0                                                                                                                                              |
      | wsrep_repl_other_bytes        | 0                                                                                                                                              |
      | wsrep_received                | 2                                                                                                                                              |
      | wsrep_received_bytes          | 200                                                                                                                                            |
      | wsrep_local_commits           | 0                                                                                                                                              |
      | wsrep_local_cert_failures     | 0                                                                                                                                              |
      | wsrep_local_replays           | 0                                                                                                                                              |
      | wsrep_local_send_queue        | 0                                                                                                                                              |
      | wsrep_local_send_queue_max    | 1                                                                                                                                              |
      | wsrep_local_send_queue_min    | 0                                                                                                                                              |
      | wsrep_local_send_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_recv_queue        | 0                                                                                                                                              |
      | wsrep_local_recv_queue_max    | 1                                                                                                                                              |
      | wsrep_local_recv_queue_min    | 0                                                                                                                                              |
      | wsrep_local_recv_queue_avg    | 0                                                                                                                                              |
      | wsrep_local_cached_downto     | 64                                                                                                                                             |
      | wsrep_flow_control_paused_ns  | 0                                                                                                                                              |
      | wsrep_flow_control_paused     | 0                                                                                                                                              |
      | wsrep_flow_control_sent       | 0                                                                                                                                              |
      | wsrep_flow_control_recv       | 0                                                                                                                                              |
      | wsrep_cert_deps_distance      | 0                                                                                                                                              |
      | wsrep_apply_oooe              | 0                                                                                                                                              |
      | wsrep_apply_oool              | 0                                                                                                                                              |
      | wsrep_apply_window            | 0                                                                                                                                              |
      | wsrep_commit_oooe             | 0                                                                                                                                              |
      | wsrep_commit_oool             | 0                                                                                                                                              |
      | wsrep_commit_window           | 0                                                                                                                                              |
      | wsrep_local_state             | 4                                                                                                                                              |
      | wsrep_local_state_comment     | Synced                                                                                                                                         |
      | wsrep_cert_index_size         | 0                                                                                                                                              |
      | wsrep_causal_reads            | 0                                                                                                                                              |
      | wsrep_cert_interval           | 0                                                                                                                                              |
      | wsrep_open_transactions       | 0                                                                                                                                              |
      | wsrep_open_connections        | 0                                                                                                                                              |
      | wsrep_incoming_addresses      | 10.0.1.13:3306,AUTO                                                                                                                            |
      | wsrep_cluster_weight          | 2                                                                                                                                              |
      | wsrep_desync_count            | 0                                                                                                                                              |
      | wsrep_evs_delayed             |                                                                                                                                                |
      | wsrep_evs_evict_list          |                                                                                                                                                |
      | wsrep_evs_repl_latency        | 0.000853237/0.001923/0.00333681/0.0010427/3                                                                                                    |
      | wsrep_evs_state               | OPERATIONAL                                                                                                                                    |
      | wsrep_gcomm_uuid              | ab80ace4-9ed6-11ea-8cdf-eab063bfbbb6                                                                                                           |
      | wsrep_applier_thread_count    | 32                                                                                                                                             |
      | wsrep_cluster_capabilities    |                                                                                                                                                |
      | wsrep_cluster_conf_id         | 6                                                                                                                                              |
      | wsrep_cluster_size            | 2                                                                                                                                              |
      | wsrep_cluster_state_uuid      | 86a3014e-9e9d-11ea-8f7d-829b023fcaf0                                                                                                           |
      | wsrep_cluster_status          | Primary                                                                                                                                        |
      | wsrep_connected               | ON                                                                                                                                             |
      | wsrep_local_bf_aborts         | 0                                                                                                                                              |
      | wsrep_local_index             | 1                                                                                                                                              |
      | wsrep_provider_capabilities   | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
      | wsrep_provider_name           | Galera                                                                                                                                         |
      | wsrep_provider_vendor         | Codership Oy <info@codership.com>                                                                                                              |
      | wsrep_provider_version        | 26.4.4(r4599)                                                                                                                                  |
      | wsrep_ready                   | ON                                                                                                                                             |
      | wsrep_rollbacker_thread_count | 1                                                                                                                                              |
      | wsrep_thread_count            | 33                                                                                                                                             |
      +-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
      65 rows in set (0.002 sec)
      

      NOTE now the status is ok:

      wsrep_local_index             | 1
      wsrep_cluster_status          | Primary
      wsrep_local_state_comment     | Synced
      wsrep_local_index             | 1
      

      but when we check the data we expect the new row should be present:

      MariaDB mdb2 [pippo]> select * from evento4;
      +----+---------------+--------+
      | Id | IdDispositivo | kkkk   |
      +----+---------------+--------+
      |  1 |           123 | aaaa   |
      |  3 |           222 | eeeeaa |
      |  4 |      34523452 | e4r4r4 |
      +----+---------------+--------+
      3 rows in set (0.001 sec)
      

      The row is not there.

      If we write after this moment all is getting replicate. So the data loss is after the first IST complete until a new restart is done and got the status of the cluster back.

        Attachments

        1. 200612_mysqld.1.err
          62 kB
        2. 200612_mysqld.2.err
          121 kB
        3. 200612_mysqld.3.err
          70 kB
        4. 200709_patgal_output.zip
          15 kB
        5. 20200713_MDEV-22723_patgal_no_errors.zip
          35 kB
        6. 20200714_MDEV-22723_mdb_no_errors.zip
          32 kB
        7. 20200714_MDEV-22723_patgal_no_errors.zip
          28 kB
        8. 20200720_MDEV-22723_CentOS_7.5_no_errors.zip
          24 kB
        9. 20200723_MDEV-22723_data_loss.zip
          43 kB
        10. error_log_mdb1
          23 kB
        11. error_log_mdb2.after_upgrade
          87 kB
        12. mysqld_new.2.cnf
          2 kB
        13. mysqld_old.1.cnf
          2 kB
        14. mysqld_old.2.cnf
          2 kB
        15. mysqld_old.3.cnf
          2 kB
        16. node1_bootsrapped_10.3.23.log
          91 kB
        17. node1_bootsrapped_10.3.23.log.rtf
          93 kB
        18. node2_upgraded_10.4.13.log
          14 kB
        19. node2_upgraded.log.rtf
          14 kB
        20. server.cnf_mdb1
          2 kB
        21. server.cnf_mdb2
          2 kB

          Issue Links

            Activity

              People

              Assignee:
              Yurchenko Alexey
              Reporter:
              massimo.disaro Massimo
              Votes:
              3 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated: