[MDEV-17809] Query execution was interrupted and BF lock wait long for trx Created: 2018-11-23  Updated: 2023-06-06  Resolved: 2023-06-06

Status: Closed
Project: MariaDB Server
Component/s: Galera, wsrep
Affects Version/s: 10.3.11
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Chow King Tak Assignee: Julius Goryavsky
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

Redhat el6, cluster with 2 nodes and 1 arbitrator


Attachments: File my.cnf     File mysqld.err     Zip Archive mysqld.zip    

 Description   

I am using 10.3.11. I have a table:
CREATE TABLE `tmp_app_to_db_status_report` (
`site` char(2) NOT NULL,
`project` varchar(7) NOT NULL,
`app_server` varchar(15) NOT NULL,
`instance` varchar(20) NOT NULL,
`status` varchar(7) NOT NULL,
`check_time` datetime NOT NULL,
PRIMARY KEY (`site`,`project`,`app_server`,`instance`,`status`,`check_time`),
KEY `idx_tmp_app_to_db_status_report` (`site`,`project`,`app_server`,`instance`,`check_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

The table contains data of Node1 and Node2 and it have the column 'site' to indicate the data coming from which node. Therefore Node1 only inserts, deletes, updates data of Node1 and Node2 only inserts, deletes, updates data of Node2.

I have a stored procedure which loops for more than 10,000 times and read the data from the table 'tmp_app_to_db_status_report' for processing. It will takes several minutes to finish.

On Node1, I start the stored procedure to read from 'tmp_app_to_db_status_report' and process Node1's data (i.e. where site='Node1'). During the execution of the stored procedure , on Node2, I delete/insert data of Node2 only (where site='Node2') from/to 'tmp_app_to_db_status_report'. The execution of the stored procedure on Node1 sometimes is terminated with error "ERROR 1317 (70100): Query execution was interrupted" and the massive "BF lock wait long for trx" messages are logged in the mysqld.err of Node1.

I have attachment the mysqld.err for reference.

Please advise and thanks in advanced.



 Comments   
Comment by Chow King Tak [ 2018-11-27 ]

I have created a simple test case to produce the similar massive BF message.

1. create table test1:

CREATE TABLE test1 (
     site varchar(5) not null,
     id int not null,
     val varchar(10) not null,
     primary key (site, id)
 );

2. Insert data to test1:

insert into test1 values ('A',1,'aaaa'),('A',2,'bbbb'), ('A',3, 'ccccc'),('A',4, 'ddddd'),('A',5, 'eeee'),('A',6,'fffff'),('A',7,'gggggg'),('A',8,'hhhhhh'),('A',9,'iiiiiii'),('A',10,'jjjjj');

3. Create SP loop_test (this SP loops for reading rows in test1 and waits for several sec. before next read):

DELIMITER $$
DROP PROCEDURE IF EXISTS loop_test $$
CREATE PROCEDURE loop_test (IN sleep_sec int, local_site varchar(5))
BEGIN
    DECLARE n INT;
    DECLARE i INT;
 
    SET n = (SELECT COUNT(*) FROM test1 WHERE site=local_site);
    SET i = 0;
 
    WHILE (i<n) DO
        SELECT * FROM test1 WHERE site=local_site
        LIMIT i,1;
 
        SET i = i + 1;
 
        SELECT sleep(sleep_sec);
 
    END WHILE;
END $$
DELIMITER ;

4. On Node A, logon the MariaDB and run :

START TRANSACTION;
CALL loop_test(5,'A');

(The SP runs with sleeping for 5 sec. in order to allow you to execute delete statement on Node B)

5. Allow SP on Node A running several reads, then on Node B, logon the MariaDB and run:

DELETE FROM test1;
COMMIT;

6. Massive BF message will be logged on Node A's mysqld.err.

I have further attached the updated my.cnf and mysqld.err for reference.

Please advise and thanks,

KT Chow

Comment by Jan Lindström [ 2023-06-06 ]

10.3 is EOL soon.

Generated at Thu Feb 08 08:39:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.