[MDEV-30413] run sequence nextval got [Note] WSREP: MDL BF-BF conflict and [ERROR] Aborting - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6.10
Fix Version/s: 11.1.0, 10.6.13, 10.8.8, 10.9.6, 10.10.4, 10.11.3
Component/s: Galera
Labels:
None

Description

Hi,

Our Galera cluster architecture is 2 DB nodes + 1 witness node. Most DB traffic is redirected to one DB node.

One of our Galera cluster encounter one node down (wsrep_ready=OFF). We restarted the DB node but IST encountered the same error. We need to remove data directory and let Galera to trigger SST in order to resume the DB node.

DB version is 10.6.10 and Galera version is 26.4.12

We have another similar incident ~~MDEV-30303~~

2023-01-06 22:04:55 2 [Note] WSREP: MDL BF-BF conflict

schema:  tswtrn1

request: (2     seqno 31040753  wsrep (high priority, exec, executing) cmd 0 161        select nextval(`SEQUENCE_LPCO_ID`)<87>*<B8>c^S^A)

granted: (6     seqno 31040752  wsrep (high priority, exec, preparing) cmd 0 161        (null))

2023-01-06 22:04:55 2 [ERROR] Aborting

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

node1.err
1.47 MB
2023-03-15 09:35

Issue Links

relates to

MDEV-36123 WSREP: MDL BF-BF conflict

Open

MDEV-30303 run optimize table got [Note] WSREP: MDL BF-BF conflict and [ERROR] Aborting

Closed

Activity

Ascending order - Click to sort in descending order

Jan Lindström added a comment - 2023-03-14 08:31

frelist I would need more information about customer workload because I could not reproduce issue easily. Firstly, can you provide full unedited error log from all nodes, node configuration and show create sequence `SEQUENCE_LPCO_ID`; Here we need to remember that SELECT NEXT VALUE is basically write to sequence table and that could cause MDL conflict. However, at the moment it is not clear what the conflicting SQL-clause was.

Jan Lindström added a comment - 2023-03-14 08:31 frelist I would need more information about customer workload because I could not reproduce issue easily. Firstly, can you provide full unedited error log from all nodes, node configuration and show create sequence `SEQUENCE_LPCO_ID`; Here we need to remember that SELECT NEXT VALUE is basically write to sequence table and that could cause MDL conflict. However, at the moment it is not clear what the conflicting SQL-clause was.

Ramesh Sivaraman added a comment - 2023-03-15 09:36

janlindstrom Reproduced BF conflict issue using RQG. PFA error logs node1.err

2023-03-15  9:42:02 15 [Note] WSREP: wsrep_abort_thd, by: 23175262897920, victim: 23175850178304

2023-03-15  9:42:02 15 [Note] WSREP: abort transaction: BF: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf victim: select /* QNO 13 CON_ID 38 */ nextval(s1) victim conf: certifying

2023-03-15  9:42:02 15 [Note] WSREP: wsrep_thd_set_wsrep_aborter setting wsrep_aborter 15

2023-03-15  9:42:02 15 [Note] WSREP: wsrep_bf_abort BF aborter before

    thd: 15 thd_ptr: 0x151398000f88 client_mode: high priority client_state: exec trx_state: executing

    next_trx_id: 2366 trx_id: 3200 seqno: 1497

    is_streaming: 0 fragments: 0

    sql_errno: 0 message:

    command: 161 query: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf

2023-03-15  9:42:02 15 [Note] WSREP: wsrep_bf_abort victim before

    thd: 38 thd_ptr: 0x151278000d48 client_mode: local client_state: exec trx_state: certifying

    next_trx_id: 2365 trx_id: 2365 seqno: -1

    is_streaming: 0 fragments: 0

    sql_errno: 0 message:

    command: 0 query: select /* QNO 13 CON_ID 38 */ nextval(s1)

2023-03-15  9:42:02 17 [Note] WSREP: wsrep_before_commit: 1, 1498

2023-03-15  9:42:02 38 [Note] WSREP: MDL conflict

schema:  test

request: (6     seqno 1484      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf)

granted: (15    seqno 1497      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf)

2023-03-15  9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock)

2023-03-15  9:42:02 38 [Note] WSREP: MDL BF-BF conflict

schema:  test

request: (6     seqno 1484      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf)

granted: (15    seqno 1497      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf)

2023-03-15  9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock)

2023-03-15  9:42:02 38 [ERROR] Aborting

node1:root@localhost> show status like 'wsrep_ready';

+---------------+-------+

| Variable_name | Value |

+---------------+-------+

| wsrep_ready   | OFF   |

+---------------+-------+

1 row in set (0.001 sec)

node1:root@localhost>

node1:root@localhost> select nextval(s1);

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

node1:root@localhost>

Ramesh Sivaraman added a comment - 2023-03-15 09:36 janlindstrom Reproduced BF conflict issue using RQG. PFA error logs node1.err 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_abort_thd, by: 23175262897920, victim: 23175850178304 2023-03-15 9:42:02 15 [Note] WSREP: abort transaction: BF: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf victim: select /* QNO 13 CON_ID 38 */ nextval(s1) victim conf: certifying 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_thd_set_wsrep_aborter setting wsrep_aborter 15 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_bf_abort BF aborter before thd: 15 thd_ptr: 0x151398000f88 client_mode: high priority client_state: exec trx_state: executing next_trx_id: 2366 trx_id: 3200 seqno: 1497 is_streaming: 0 fragments: 0 sql_errno: 0 message: command: 161 query: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf 2023-03-15 9:42:02 15 [Note] WSREP: wsrep_bf_abort victim before thd: 38 thd_ptr: 0x151278000d48 client_mode: local client_state: exec trx_state: certifying next_trx_id: 2365 trx_id: 2365 seqno: -1 is_streaming: 0 fragments: 0 sql_errno: 0 message: command: 0 query: select /* QNO 13 CON_ID 38 */ nextval(s1) 2023-03-15 9:42:02 17 [Note] WSREP: wsrep_before_commit: 1, 1498 2023-03-15 9:42:02 38 [Note] WSREP: MDL conflict schema: test request: (6 seqno 1484 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf) granted: (15 seqno 1497 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf) 2023-03-15 9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock) 2023-03-15 9:42:02 38 [Note] WSREP: MDL BF-BF conflict schema: test request: (6 seqno 1484 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf) granted: (15 seqno 1497 wsrep (high priority, exec, executing) cmd 0 161 select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf) 2023-03-15 9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock) 2023-03-15 9:42:02 38 [ERROR] Aborting node1:root@localhost> show status like 'wsrep_ready' ; + ---------------+-------+ | Variable_name | Value | + ---------------+-------+ | wsrep_ready | OFF | + ---------------+-------+ 1 row in set (0.001 sec) node1:root@localhost> node1:root@localhost> select nextval(s1); ERROR 1047 (08S01): WSREP has not yet prepared node for application use node1:root@localhost>

Jan Lindström added a comment - 2023-03-28 08:32

https://github.com/MariaDB/server/pull/2580

Jan Lindström added a comment - 2023-03-28 08:32 https://github.com/MariaDB/server/pull/2580

Julius Goryavsky added a comment - 2023-03-30 14:18

According to the results of running the tests, fix works as it should, therefore the fix is merged with head revision: https://github.com/MariaDB/server/commit/169def14f64492466a305114b0ca13b2b5775164

Julius Goryavsky added a comment - 2023-03-30 14:18 According to the results of running the tests, fix works as it should, therefore the fix is merged with head revision: https://github.com/MariaDB/server/commit/169def14f64492466a305114b0ca13b2b5775164

MariaDB Server

run sequence nextval got [Note] WSREP: MDL BF-BF conflict and [ERROR] Aborting

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration