Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30413

run sequence nextval got [Note] WSREP: MDL BF-BF conflict and [ERROR] Aborting

Details

    Description

      Hi,

      Our Galera cluster architecture is 2 DB nodes + 1 witness node. Most DB traffic is redirected to one DB node.

      One of our Galera cluster encounter one node down (wsrep_ready=OFF). We restarted the DB node but IST encountered the same error. We need to remove data directory and let Galera to trigger SST in order to resume the DB node.

      DB version is 10.6.10 and Galera version is 26.4.12

      We have another similar incident MDEV-30303

      2023-01-06 22:04:55 2 [Note] WSREP: MDL BF-BF conflict
      schema:  tswtrn1
      request: (2     seqno 31040753  wsrep (high priority, exec, executing) cmd 0 161        select nextval(`SEQUENCE_LPCO_ID`)<87>*<B8>c^S^A)
      granted: (6     seqno 31040752  wsrep (high priority, exec, preparing) cmd 0 161        (null))
      2023-01-06 22:04:55 2 [ERROR] Aborting
      

      Attachments

        1. node1.err
          1.47 MB
          Ramesh Sivaraman

        Issue Links

          Activity

            frelist I would need more information about customer workload because I could not reproduce issue easily. Firstly, can you provide full unedited error log from all nodes, node configuration and show create sequence `SEQUENCE_LPCO_ID`; Here we need to remember that SELECT NEXT VALUE is basically write to sequence table and that could cause MDL conflict. However, at the moment it is not clear what the conflicting SQL-clause was.

            janlindstrom Jan Lindström added a comment - frelist I would need more information about customer workload because I could not reproduce issue easily. Firstly, can you provide full unedited error log from all nodes, node configuration and show create sequence `SEQUENCE_LPCO_ID`; Here we need to remember that SELECT NEXT VALUE is basically write to sequence table and that could cause MDL conflict. However, at the moment it is not clear what the conflicting SQL-clause was.

            janlindstrom Reproduced BF conflict issue using RQG. PFA error logs node1.err

            2023-03-15  9:42:02 15 [Note] WSREP: wsrep_abort_thd, by: 23175262897920, victim: 23175850178304
            2023-03-15  9:42:02 15 [Note] WSREP: abort transaction: BF: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf victim: select /* QNO 13 CON_ID 38 */ nextval(s1) victim conf: certifying
            2023-03-15  9:42:02 15 [Note] WSREP: wsrep_thd_set_wsrep_aborter setting wsrep_aborter 15
            2023-03-15  9:42:02 15 [Note] WSREP: wsrep_bf_abort BF aborter before
                thd: 15 thd_ptr: 0x151398000f88 client_mode: high priority client_state: exec trx_state: executing
                next_trx_id: 2366 trx_id: 3200 seqno: 1497
                is_streaming: 0 fragments: 0
                sql_errno: 0 message: 
                command: 161 query: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf
            2023-03-15  9:42:02 15 [Note] WSREP: wsrep_bf_abort victim before
                thd: 38 thd_ptr: 0x151278000d48 client_mode: local client_state: exec trx_state: certifying
                next_trx_id: 2365 trx_id: 2365 seqno: -1
                is_streaming: 0 fragments: 0
                sql_errno: 0 message: 
                command: 0 query: select /* QNO 13 CON_ID 38 */ nextval(s1)
            2023-03-15  9:42:02 17 [Note] WSREP: wsrep_before_commit: 1, 1498
            2023-03-15  9:42:02 38 [Note] WSREP: MDL conflict 
            schema:  test
            request: (6     seqno 1484      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf)
            granted: (15    seqno 1497      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf)
            2023-03-15  9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock)
            2023-03-15  9:42:02 38 [Note] WSREP: MDL BF-BF conflict
            schema:  test
            request: (6     seqno 1484      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf)
            granted: (15    seqno 1497      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf)
            2023-03-15  9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock)
            2023-03-15  9:42:02 38 [ERROR] Aborting
            

            node1:root@localhost> show status like 'wsrep_ready';
            +---------------+-------+
            | Variable_name | Value |
            +---------------+-------+
            | wsrep_ready   | OFF   |
            +---------------+-------+
            1 row in set (0.001 sec)
             
            node1:root@localhost> 
            node1:root@localhost> select nextval(s1);
            ERROR 1047 (08S01): WSREP has not yet prepared node for application use
            node1:root@localhost>
            

            ramesh Ramesh Sivaraman added a comment - janlindstrom Reproduced BF conflict issue using RQG. PFA error logs node1.err 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_abort_thd, by: 23175262897920, victim: 23175850178304 2023-03-15 9:42:02 15 [Note] WSREP: abort transaction: BF: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf victim: select /* QNO 13 CON_ID 38 */ nextval(s1) victim conf: certifying 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_thd_set_wsrep_aborter setting wsrep_aborter 15 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_bf_abort BF aborter before thd: 15 thd_ptr: 0x151398000f88 client_mode: high priority client_state: exec trx_state: executing next_trx_id: 2366 trx_id: 3200 seqno: 1497 is_streaming: 0 fragments: 0 sql_errno: 0 message: command: 161 query: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_bf_abort victim before thd: 38 thd_ptr: 0x151278000d48 client_mode: local client_state: exec trx_state: certifying next_trx_id: 2365 trx_id: 2365 seqno: -1 is_streaming: 0 fragments: 0 sql_errno: 0 message: command: 0 query: select /* QNO 13 CON_ID 38 */ nextval(s1) 2023-03-15 9:42:02 17 [Note] WSREP: wsrep_before_commit: 1, 1498 2023-03-15 9:42:02 38 [Note] WSREP: MDL conflict schema: test request: (6 seqno 1484 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf) granted: (15 seqno 1497 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf) 2023-03-15 9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock) 2023-03-15 9:42:02 38 [Note] WSREP: MDL BF-BF conflict schema: test request: (6 seqno 1484 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf) granted: (15 seqno 1497 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf) 2023-03-15 9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock) 2023-03-15 9:42:02 38 [ERROR] Aborting node1:root@localhost> show status like 'wsrep_ready' ; + ---------------+-------+ | Variable_name | Value | + ---------------+-------+ | wsrep_ready | OFF | + ---------------+-------+ 1 row in set (0.001 sec)   node1:root@localhost> node1:root@localhost> select nextval(s1); ERROR 1047 (08S01): WSREP has not yet prepared node for application use node1:root@localhost>
            janlindstrom Jan Lindström added a comment - https://github.com/MariaDB/server/pull/2580

            According to the results of running the tests, fix works as it should, therefore the fix is merged with head revision: https://github.com/MariaDB/server/commit/169def14f64492466a305114b0ca13b2b5775164

            sysprg Julius Goryavsky added a comment - According to the results of running the tests, fix works as it should, therefore the fix is merged with head revision: https://github.com/MariaDB/server/commit/169def14f64492466a305114b0ca13b2b5775164

            People

              sysprg Julius Goryavsky
              frelist William Wong
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.