Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32122

Node killing could cause applier to fail on write rows event leading inconsistency

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Cannot Reproduce
    • 10.6
    • N/A
    • Galera
    • None

    Description

      3-node cluster with RQG load where node_2 is killed. Node_1 applier fails on deadlock error and drops from cluster, node_3 self leaves as it is alone.

      2023-09-07 11:50:07 2 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table test.table20_int_autoinc; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log FIRST, end_log_pos 398, Internal MariaDB error code: 1213
      

      Attachments

        1. n1.cnf
          1 kB
        2. n2.cnf
          1 kB
        3. n3.cnf
          1 kB
        4. node1.err
          311 kB
        5. node2.err
          477 kB
        6. node3.err
          830 kB
        7. transactions.zz
          1.0 kB
        8. transactions-bkp_1.yy
          4 kB
        9. transactions-bkp.yy
          5 kB

        Issue Links

          Activity

            node1 (OLTP load was active) :

            node1:root@localhost> show status like '%wsrep%st%';
            +------------------------------+--------------------------------------+
            | Variable_name                | Value                                |
            +------------------------------+--------------------------------------+
            | wsrep_local_state_uuid       | 00000000-0000-0000-0000-000000000000 |
            | wsrep_last_committed         | -1                                   |
            | wsrep_flow_control_requested | false                                |
            | wsrep_cert_deps_distance     | 70.7591                              |
            | wsrep_local_state            | 5                                    |
            | wsrep_local_state_comment    | Inconsistent                         |
            | wsrep_cluster_capabilities   |                                      |
            | wsrep_cluster_conf_id        | 18446744073709551615                 |
            | wsrep_cluster_size           | 0                                    |
            | wsrep_cluster_state_uuid     | 65ef358e-4d5b-11ee-a7ce-76be38a797b3 |
            | wsrep_cluster_status         | Disconnected                         |
            +------------------------------+--------------------------------------+
            11 rows in set (0.001 sec)
             
            node1:root@localhost>
            

            Node3 :

            node3:root@localhost> show status like '%wsrep%st%';
            +------------------------------+--------------------------------------+
            | Variable_name                | Value                                |
            +------------------------------+--------------------------------------+
            | wsrep_local_state_uuid       | 65ef358e-4d5b-11ee-a7ce-76be38a797b3 |
            | wsrep_last_committed         | 23055                                |
            | wsrep_flow_control_requested | false                                |
            | wsrep_cert_deps_distance     | 46.5084                              |
            | wsrep_local_state            | 0                                    |
            | wsrep_local_state_comment    | Initialized                          |
            | wsrep_cluster_weight         | 0                                    |
            | wsrep_evs_evict_list         |                                      |
            | wsrep_evs_state              | OPERATIONAL                          |
            | wsrep_gmcast_segment         | 0                                    |
            | wsrep_cluster_capabilities   |                                      |
            | wsrep_cluster_conf_id        | 18446744073709551615                 |
            | wsrep_cluster_size           | 1                                    |
            | wsrep_cluster_state_uuid     | 65ef358e-4d5b-11ee-a7ce-76be38a797b3 |
            | wsrep_cluster_status         | non-Primary                          |
            +------------------------------+--------------------------------------+
            15 rows in set (0.001 sec)
             
            node3:root@localhost>
            

            janlindstrom Jan Lindström added a comment - node1 (OLTP load was active) : node1:root@localhost> show status like '%wsrep%st%'; +------------------------------+--------------------------------------+ | Variable_name | Value | +------------------------------+--------------------------------------+ | wsrep_local_state_uuid | 00000000-0000-0000-0000-000000000000 | | wsrep_last_committed | -1 | | wsrep_flow_control_requested | false | | wsrep_cert_deps_distance | 70.7591 | | wsrep_local_state | 5 | | wsrep_local_state_comment | Inconsistent | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 0 | | wsrep_cluster_state_uuid | 65ef358e-4d5b-11ee-a7ce-76be38a797b3 | | wsrep_cluster_status | Disconnected | +------------------------------+--------------------------------------+ 11 rows in set (0.001 sec)   node1:root@localhost> Node3 : node3:root@localhost> show status like '%wsrep%st%'; +------------------------------+--------------------------------------+ | Variable_name | Value | +------------------------------+--------------------------------------+ | wsrep_local_state_uuid | 65ef358e-4d5b-11ee-a7ce-76be38a797b3 | | wsrep_last_committed | 23055 | | wsrep_flow_control_requested | false | | wsrep_cert_deps_distance | 46.5084 | | wsrep_local_state | 0 | | wsrep_local_state_comment | Initialized | | wsrep_cluster_weight | 0 | | wsrep_evs_evict_list | | | wsrep_evs_state | OPERATIONAL | | wsrep_gmcast_segment | 0 | | wsrep_cluster_capabilities | | | wsrep_cluster_conf_id | 18446744073709551615 | | wsrep_cluster_size | 1 | | wsrep_cluster_state_uuid | 65ef358e-4d5b-11ee-a7ce-76be38a797b3 | | wsrep_cluster_status | non-Primary | +------------------------------+--------------------------------------+ 15 rows in set (0.001 sec)   node3:root@localhost>

            Test case to reproduce the issue
            1) Start 3 node cluster
            2) Run RQG load on Node1 and Node2 (RQG grammar and cnf files attached)

            perl gendata.pl --dsn=dbi:mysql:host=127.0.0.1:port=12107:user=root:database=test --spec=conf/transactions/transactions.zz
            perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=12107:user=root:database=test --grammar=conf/transactions/transactions-bkp.yy --threads=32 --duration=30000 --queries=1000000000 &
            perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=12943:user=root:database=test --grammar=conf/transactions/transactions-bkp.yy --threads=32 --duration=30000 --queries=1000000000 &
            

            4) Forcefully kill node2 and analyse the remaining cluster nodes wsrep state.

            ramesh Ramesh Sivaraman added a comment - Test case to reproduce the issue 1) Start 3 node cluster 2) Run RQG load on Node1 and Node2 (RQG grammar and cnf files attached) perl gendata.pl --dsn=dbi:mysql:host=127.0.0.1:port=12107:user=root:database=test --spec=conf/transactions/transactions.zz perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=12107:user=root:database=test --grammar=conf/transactions/transactions-bkp.yy --threads=32 --duration=30000 --queries=1000000000 & perl gentest.pl --dsn=dbi:mysql:host=127.0.0.1:port=12943:user=root:database=test --grammar=conf/transactions/transactions-bkp.yy --threads=32 --duration=30000 --queries=1000000000 & 4) Forcefully kill node2 and analyse the remaining cluster nodes wsrep state.

            Relevant info from node_1 where we start inconsistency voting:

            2023-09-07 11:50:07 0 [Note] WSREP: Member 0(galapq) initiates vote on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523,ded413bc1b661ec5:  Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213;
            2023-09-07 11:50:07 0 [Note] WSREP: Votes over 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523:
               ded413bc1b661ec5:   1/3
            Waiting for more votes.
            2023-09-07 11:50:07 0 [Note] WSREP: Member 1(galapq) responds to vote on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523,0000000000000000: Success
            2023-09-07 11:50:07 0 [Note] WSREP: Votes over 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523:
               0000000000000000:   1/3
               ded413bc1b661ec5:   1/3
            Waiting for more votes.
            2023-09-07 11:50:07 0 [Note] WSREP: Member 2(galapq) responds to vote on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523,0000000000000000: Success
            2023-09-07 11:50:07 0 [Note] WSREP: Votes over 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523:
               0000000000000000:   2/3
               ded413bc1b661ec5:   1/3
            Winner: 0000000000000000
            2023-09-07 11:50:07 2 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523
            	 at /test/galera_4x_opt/galera/src/replicator_smm.cpp:process_apply_error():1357
            

            janlindstrom Jan Lindström added a comment - Relevant info from node_1 where we start inconsistency voting: 2023-09-07 11:50:07 0 [Note] WSREP: Member 0(galapq) initiates vote on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523,ded413bc1b661ec5: Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; 2023-09-07 11:50:07 0 [Note] WSREP: Votes over 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523: ded413bc1b661ec5: 1/3 Waiting for more votes. 2023-09-07 11:50:07 0 [Note] WSREP: Member 1(galapq) responds to vote on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523,0000000000000000: Success 2023-09-07 11:50:07 0 [Note] WSREP: Votes over 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523: 0000000000000000: 1/3 ded413bc1b661ec5: 1/3 Waiting for more votes. 2023-09-07 11:50:07 0 [Note] WSREP: Member 2(galapq) responds to vote on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523,0000000000000000: Success 2023-09-07 11:50:07 0 [Note] WSREP: Votes over 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523: 0000000000000000: 2/3 ded413bc1b661ec5: 1/3 Winner: 0000000000000000 2023-09-07 11:50:07 2 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 65ef358e-4d5b-11ee-a7ce-76be38a797b3:9523 at /test/galera_4x_opt/galera/src/replicator_smm.cpp:process_apply_error():1357

            Based on testing this problem does not reproduce anymore with 10.6 commit 388296a1.

            janlindstrom Jan Lindström added a comment - Based on testing this problem does not reproduce anymore with 10.6 commit 388296a1.

            could not reproduce the issue on latest 10.6 build.

            ramesh Ramesh Sivaraman added a comment - could not reproduce the issue on latest 10.6 build.

            People

              janlindstrom Jan Lindström
              janlindstrom Jan Lindström
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.