[MDEV-34167] We fail to terminate transaction early with ER_LOCK_TABLE_FULL when lock memory is growing - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6.8, 10.6
Fix Version/s: 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3
Component/s: Locking, Storage Engine - InnoDB
Labels:
- regression-10.6

Description

This issue was found while analysing MDEV-28800. When lock memory started occupying large amount of buffer pool we fail to terminate the transaction with ER_LOCK_TABLE_FULL. It eventually leads to a server crash with Innodb exiting with Fatal Error.

ib::fatal() << "Over 95 percent of the buffer pool is"

            " occupied by lock heaps"

This is a regression looks to have been introduced in 10.6 by

commit b6a2472489accf0ae9ac3655ffe9b2997ab267ba

Author: Daniel Black <daniel@mariadb.org>

Date:   Tue Feb 22 17:42:59 2022 +1100

    MDEV-27891: SIGSEGV in InnoDB buffer pool resize

   bool running_out() const

     return !recv_recovery_is_on() &&

-      UNIV_UNLIKELY(UT_LIST_GET_LEN(free) + UT_LIST_GET_LEN(LRU) <

-                    std::min(curr_size, old_size) / 4);

+      UT_LIST_GET_LEN(free) + UT_LIST_GET_LEN(LRU) <

+        n_chunks_new / 4 * chunks->size;

+  }

The integer division (n_chunks_new / 4) becomes zero whenever the total number of chunks are < 4 making the check completely ineffective for such cases. Also the check is inaccurate for larger chunks and needs to be corrected.

I will put forward a patch soon.

Attachments

Issue Links

is caused by

MDEV-27891 Delayed SIGSEGV in buf_pool_t::resize on InnoDB buffer pool resize after or during DROP TABLE

Closed

Activity

Ascending order - Click to sort in descending order

Debarun Banerjee created issue - 2024-05-15 08:46

Debarun Banerjee added a comment - 2024-05-16 08:03

The issue can be reproduced by the same test mentioned in MDEV-28800

Start server with --innodb_buffer_pool_size=20M

CREATE TABLE t1 (d DOUBLE) ENGINE=InnoDB;

INSERT INTO t1 VALUES (0x0061),(0x0041),(0x00E0),(0x00C0),(0x1EA3),(0x1EA2),(0x00E3),(0x00C3),(0x00E1),(0x00C1),(0x1EA1),(0x1EA0);

INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;

Also, the mtr test in MDEV-34166 would be the easiest to test it once we have the bug merged to 10.6.

I have uploaded the patch for review.

Debarun Banerjee added a comment - 2024-05-16 08:03 The issue can be reproduced by the same test mentioned in MDEV-28800 Start server with --innodb_buffer_pool_size=20M CREATE TABLE t1 (d DOUBLE) ENGINE=InnoDB; INSERT INTO t1 VALUES (0x0061),(0x0041),(0x00E0),(0x00C0),(0x1EA3),(0x1EA2),(0x00E3),(0x00C3),(0x00E1),(0x00C1),(0x1EA1),(0x1EA0); INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6; Also, the mtr test in MDEV-34166 would be the easiest to test it once we have the bug merged to 10.6. I have uploaded the patch for review.

Debarun Banerjee made changes - 2024-05-16 08:03

Field	Original Value	New Value
Fix Version/s		10.6 [ 24028 ]
Labels		regression-10.6

Debarun Banerjee made changes - 2024-05-16 08:04

Status

Open [ 1 ]

Confirmed [ 10101 ]

Debarun Banerjee made changes - 2024-05-16 08:04

Assignee	Debarun Banerjee [ JIRAUSER54513 ]	Marko Mäkelä [ marko ]
Status	Confirmed [ 10101 ]	In Review [ 10002 ]

Marko Mäkelä made changes - 2024-05-20 11:50

Link

This issue is caused by ~~MDEV-27891~~ [ ~~MDEV-27891~~ ]

Marko Mäkelä added a comment - 2024-05-20 12:15

Thank you. I wonder if we should revise sel_set_rec_lock() as well. All other code paths that result in ER_LOCK_TABLE_FULL are guarded solely by buf_pool.running_out(), but in that function we are additionally checking the number of explicit locks held by the current transaction.

Marko Mäkelä added a comment - 2024-05-20 12:15 Thank you. I wonder if we should revise sel_set_rec_lock() as well. All other code paths that result in ER_LOCK_TABLE_FULL are guarded solely by buf_pool.running_out() , but in that function we are additionally checking the number of explicit locks held by the current transaction.

Marko Mäkelä made changes - 2024-05-20 12:15

Assignee	Marko Mäkelä [ marko ]	Debarun Banerjee [ JIRAUSER54513 ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Debarun Banerjee added a comment - 2024-05-21 05:36

marko Interesting observation but I think it is not void of impact and I would be sceptic making such changes part of this patch in 10.6. I checked the commit that introduced "UT_LIST_GET_LEN(trx->lock.trx_locks) > 10000" in sel_set_rec_lock() but no specific comment there. Intuitively, we are trying to avoid throwing error for transactions that haven't allocated a lot of locks in locking select path. So, it could cause a change in behaviour and perhaps something we could consider in 11.* versions later.

Debarun Banerjee added a comment - 2024-05-21 05:36 marko Interesting observation but I think it is not void of impact and I would be sceptic making such changes part of this patch in 10.6. I checked the commit that introduced "UT_LIST_GET_LEN(trx->lock.trx_locks) > 10000" in sel_set_rec_lock() but no specific comment there. Intuitively, we are trying to avoid throwing error for transactions that haven't allocated a lot of locks in locking select path. So, it could cause a change in behaviour and perhaps something we could consider in 11.* versions later.

Debarun Banerjee made changes - 2024-05-21 05:37

Fix Version/s		10.6.19 [ 29833 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

JiraAutomate made changes - 2024-05-21 05:37

Fix Version/s		10.11.9 [ 29834 ]
Fix Version/s		11.1.6 [ 29835 ]
Fix Version/s		11.2.5 [ 29836 ]
Fix Version/s		11.4.3 [ 29837 ]

Marko Mäkelä added a comment - 2024-05-21 07:20

debarun, thank you. I see that the condition was originally added (MySQL link) almost 20 years ago in MySQL 4.1.8.

Marko Mäkelä added a comment - 2024-05-21 07:20 debarun , thank you. I see that the condition was originally added ( MySQL link ) almost 20 years ago in MySQL 4.1.8.

Roel Van de Paar added a comment - 2024-05-31 05:20

CREATE TABLE t (c INT, d INT, KEY d(d)) ENGINE=InnoDB;

INSERT INTO t VALUES (0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8);

INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g;

Before the fix:

11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized)
11.5.0-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g;
ERROR 1206 (HY000): The total number of locks exceeds the lock table size

11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized)
2024-05-29 16:15:11 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.

=====================================
2024-05-29 16:15:11 0x14d3585f06c0 INNODB MONITOR OUTPUT

After the fix (10.6 used as 11.5 does not contain the fix yet):

10.6.19 5ba542e9eec0ef5d69883c89755211f7d0e5d0e1 (Optimized)
10.6.19-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g;
Query OK, 4782969 rows affected (32.084 sec)
Records: 4782969 Duplicates: 0 Warnings: 0

Thank you debarun for fixing this!

Roel Van de Paar added a comment - 2024-05-31 05:20 CREATE TABLE t (c INT , d INT , KEY d(d)) ENGINE=InnoDB; INSERT INTO t VALUES (0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8); INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g; Before the fix: 11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized) 11.5.0-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g; ERROR 1206 (HY000): The total number of locks exceeds the lock table size 11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized) 2024-05-29 16:15:11 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics. ===================================== 2024-05-29 16:15:11 0x14d3585f06c0 INNODB MONITOR OUTPUT After the fix (10.6 used as 11.5 does not contain the fix yet): 10.6.19 5ba542e9eec0ef5d69883c89755211f7d0e5d0e1 (Optimized) 10.6.19-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g; Query OK, 4782969 rows affected (32.084 sec) Records: 4782969 Duplicates: 0 Warnings: 0 Thank you debarun for fixing this!

Roel Van de Paar made changes - 2024-05-31 05:21

Priority

Major [ 3 ]

Critical [ 2 ]

Jira Automation (IT) made changes - 2024-09-16 07:51

Zendesk active tickets

206179

People

Assignee:: Debarun Banerjee

Reporter:: Debarun Banerjee

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2024-05-15 08:46

Updated:: 2024-11-14 06:01

Resolved:: 2024-05-21 05:37

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server