[MDEV-30413] run sequence nextval got [Note] WSREP: MDL BF-BF conflict and [ERROR] Aborting Created: 2023-01-15  Updated: 2023-05-25  Resolved: 2023-03-30

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.6.10
Fix Version/s: 11.1.0, 10.11.3, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: William Wong Assignee: Julius Goryavsky
Resolution: Fixed Votes: 1
Labels: None

Attachments: File node1.err    
Issue Links:
Relates
relates to MDEV-30303 run optimize table got [Note] WSREP: ... Closed

 Description   

Hi,

Our Galera cluster architecture is 2 DB nodes + 1 witness node. Most DB traffic is redirected to one DB node.

One of our Galera cluster encounter one node down (wsrep_ready=OFF). We restarted the DB node but IST encountered the same error. We need to remove data directory and let Galera to trigger SST in order to resume the DB node.

DB version is 10.6.10 and Galera version is 26.4.12

We have another similar incident MDEV-30303

2023-01-06 22:04:55 2 [Note] WSREP: MDL BF-BF conflict
schema:  tswtrn1
request: (2     seqno 31040753  wsrep (high priority, exec, executing) cmd 0 161        select nextval(`SEQUENCE_LPCO_ID`)<87>*<B8>c^S^A)
granted: (6     seqno 31040752  wsrep (high priority, exec, preparing) cmd 0 161        (null))
2023-01-06 22:04:55 2 [ERROR] Aborting



 Comments   
Comment by Jan Lindström [ 2023-03-14 ]

frelist I would need more information about customer workload because I could not reproduce issue easily. Firstly, can you provide full unedited error log from all nodes, node configuration and show create sequence `SEQUENCE_LPCO_ID`; Here we need to remember that SELECT NEXT VALUE is basically write to sequence table and that could cause MDL conflict. However, at the moment it is not clear what the conflicting SQL-clause was.

Comment by Ramesh Sivaraman [ 2023-03-15 ]

janlindstrom Reproduced BF conflict issue using RQG. PFA error logs node1.err

2023-03-15  9:42:02 15 [Note] WSREP: wsrep_abort_thd, by: 23175262897920, victim: 23175850178304
2023-03-15  9:42:02 15 [Note] WSREP: abort transaction: BF: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf victim: select /* QNO 13 CON_ID 38 */ nextval(s1) victim conf: certifying
2023-03-15  9:42:02 15 [Note] WSREP: wsrep_thd_set_wsrep_aborter setting wsrep_aborter 15
2023-03-15  9:42:02 15 [Note] WSREP: wsrep_bf_abort BF aborter before
    thd: 15 thd_ptr: 0x151398000f88 client_mode: high priority client_state: exec trx_state: executing
    next_trx_id: 2366 trx_id: 3200 seqno: 1497
    is_streaming: 0 fragments: 0
    sql_errno: 0 message: 
    command: 161 query: select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf
2023-03-15  9:42:02 15 [Note] WSREP: wsrep_bf_abort victim before
    thd: 38 thd_ptr: 0x151278000d48 client_mode: local client_state: exec trx_state: certifying
    next_trx_id: 2365 trx_id: 2365 seqno: -1
    is_streaming: 0 fragments: 0
    sql_errno: 0 message: 
    command: 0 query: select /* QNO 13 CON_ID 38 */ nextval(s1)
2023-03-15  9:42:02 17 [Note] WSREP: wsrep_before_commit: 1, 1498
2023-03-15  9:42:02 38 [Note] WSREP: MDL conflict 
schema:  test
request: (6     seqno 1484      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf)
granted: (15    seqno 1497      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf)
2023-03-15  9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock)
2023-03-15  9:42:02 38 [Note] WSREP: MDL BF-BF conflict
schema:  test
request: (6     seqno 1484      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 563 CON_ID 29 */ nextval(s1)<CA>v^Qd^Sf)
granted: (15    seqno 1497      wsrep (high priority, exec, executing) cmd 0 161        select /* QNO 4 CON_ID 37 */ nextval(s1)<CA>v^Qd^Sf)
2023-03-15  9:42:02 38 [Note] WSREP: MDL ticket: type: MDL_EXCLUSIVE space: TABLE db: test name: s1 (Waiting for table metadata lock)
2023-03-15  9:42:02 38 [ERROR] Aborting

node1:root@localhost> show status like 'wsrep_ready';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| wsrep_ready   | OFF   |
+---------------+-------+
1 row in set (0.001 sec)
 
node1:root@localhost> 
node1:root@localhost> select nextval(s1);
ERROR 1047 (08S01): WSREP has not yet prepared node for application use
node1:root@localhost>

Comment by Jan Lindström [ 2023-03-28 ]

https://github.com/MariaDB/server/pull/2580

Comment by Julius Goryavsky [ 2023-03-30 ]

According to the results of running the tests, fix works as it should, therefore the fix is merged with head revision: https://github.com/MariaDB/server/commit/169def14f64492466a305114b0ca13b2b5775164

Generated at Thu Feb 08 10:16:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.