[MDEV-28803] ERROR 1206 (HY000): The total number of locks exceeds the lock table size Created: 2022-06-11  Updated: 2024-01-29

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Roel Van de Paar Assignee: Marko Mäkelä
Resolution: Unresolved Votes: 0
Labels: not-10.3, not-10.4, not-10.5, regression-10.6

Issue Links:
Relates
relates to MDEV-25342 autosize innodb_buffer_pool_chunk_size Closed
relates to MDEV-28805 SET GLOBAL innodb_buffer_pool_size=12... Confirmed
relates to MDEV-28800 SIGABRT due to running out of memory ... Stalled
relates to MDEV-28804 Increased lock objects in 10.6+ leadi... Stalled
relates to MDEV-33324 insert ... select from joins hangs or... Open

 Description   

SET GLOBAL innodb_buffer_pool_size=12*1024*1024;
CREATE TABLE t1 (d DOUBLE);
INSERT INTO t1 VALUES (0x0061),(0x0041),(0x00E0),(0x00C0),(0x1EA3),(0x1EA2),(0x00E3),(0x00C3),(0x00E1),(0x00C1),(0x1EA1),(0x1EA0);
INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;

Will lead to:

10.10.0 081a284712bb661349e2e3802077b12211cede3e (Optimized)

10.10.0-opt>INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;
ERROR 1206 (HY000): The total number of locks exceeds the lock table size

In 10.6 to 10.10 only.
Note that in this ticket, unlike MDEV-28800, no initial innodb-buffer-pool-size is set, though the buffer pool is resized in the same was as MDEV-28800.



 Comments   
Comment by Marko Mäkelä [ 2022-06-13 ]

I think that this and MDEV-28800 are siblings of each other. There shouldn’t be any difference between 10.8, 10.9, 10.10 in InnoDB that would explain why we sometimes notice that the record locks are consuming too much memory, and sometimes won’t.

I can confirm that on my system, a non-debug build of 10.8 62419b1733042c30414a4feed89c79aebb5621af will only allow the file t1.ibd to grow to 28MiB, which is the same result that I got on a 10.10 32edabd1f2fa0cf9b2cf41f326d399ef0348fa30 debug build. On a non-debug build of the same 10.10 revision, the file grew to 29 MiB before the message was output.

In MDEV-28800, it is implied that locks are not consuming so much memory in 10.7. There have been no changes to the InnoDB locking subsystem since version 10.6. Hence, the reason for this regression must be somewhere outside InnoDB. But, I do not see any such regression:

10.7 fe75e5e5b1c5856fdfc9bd97265ba6ebe272f549

mysqltest: At line 8: query 'INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6' failed: <Unknown> (2013): Lost connection to server during query
2022-06-13  9:51:14 0 [Note] InnoDB: Completed to resize buffer pool from 8388608 to 16777216.
2022-06-13  9:51:14 0 [Note] InnoDB: Completed resizing buffer pool at 220613  9:51:14.
2022-06-13  9:51:19 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=15M. Starting the InnoDB Monitor to print diagnostics.
2022-06-13  9:51:19 4 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 231 OS file reads, 2707 OS file writes, 242 OS fsyncs.
2022-06-13  9:51:27 4 [ERROR] [FATAL] InnoDB: Over 95 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks, or review if innodb_buffer_pool_size=15M could be bigger.
220613  9:51:27 [ERROR] mysqld got signal 6 ;

The test case that I used was as follows:

--source include/have_innodb.inc
SET GLOBAL innodb_buffer_pool_size=12*1024*1024;
 
CREATE TABLE t1 (d DOUBLE) ENGINE=InnoDB;
 
INSERT INTO t1 VALUES (0x0061),(0x0041),(0x00E0),(0x00C0),(0x1EA3),(0x1EA2),(0x00E3),(0x00C3),(0x00E1),(0x00C1),(0x1EA1),(0x1EA0);
 
INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;

cd mysql-test
./mtr innodb.name_of_test

Please double-check the affected versions.

The excessive record locking would be fixed by MDEV-24813. It is not fixable in the InnoDB subsystem itself.

Comment by Roel Van de Paar [ 2022-06-13 ]

Double-checked affected versions. Issue only appears on optimized builds. Earlier testing had included debug builds on which the issue does not show. Found that only optimized builds and only 10.8-10.10 are affected.

Comment by Marko Mäkelä [ 2022-06-13 ]

I do not think that anything has been changed in InnoDB transactional locks since version 10.6. In MDEV-28800, I posted a failure that I reproduced on 10.6.

I think that there is a design issue that the transactional lock objects will be allocated from the buffer pool. If an excessive number of record locks is being created, InnoDB will report an error or crash due to running out of buffer pool. That an error is being reported is a nice thing, compared to a crash. What exactly do you expect to be fixed in the scope of this ticket?

Comment by Roel Van de Paar [ 2022-06-13 ]

Your comments made me think further and helped me to understand why this currently seems to show on 10.8-10.10 only: MDEV-25342 - i.e. the lower default-possible value for innodb_buffer_pool_size (due to a reduction of innodb_buffer_pool_chunk_size) w/ thanks for the pointer danblack. So, with the first query in the testcase, the actual buffer pool in 10.8+ becomes 12582912, and before 10.8 it becomes 134217728 (>10x as much).

I tested the same on 10.6 (and 10.7) using --innodb_buffer_pool_chunk_size=2097152 and in that case, the error can be produced there also:

10.6.9 05d049bdbe6814aee8f011fbd0d915f9d82a30ee (Optimized)

10.6.9-opt>INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;
ERROR 1206 (HY000): The total number of locks exceeds the lock table size

However, on 10.5 (and 10.3, 10.4), the query succeeds (using the same setup):

10.5.17 2840d7750db11a8d2ab3f212a05f5afefaef6d4d (Optimized)

10.5.17-opt>INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;
Query OK, 2985984 rows affected (42.419 sec)
Records: 2985984  Duplicates: 0  Warnings: 0

So the action item for this ticket is: why does 10.6-10.10 fail with the error whereas 10.3-10.5 will process the query fine, when using the same innodb_buffer_pool_chunk_size and innodb_buffer_pool_size?

Comment by Roel Van de Paar [ 2022-06-13 ]

Found one more odd difference. When starting 10.8 to 10.10 (all debug, and using CLI) with --innodb_buffer_pool_chunk_size=2097152, only 10.10 will produce the ERROR 1206 error. 10.8 and 10.9 will continue processing the query.

Comment by Marko Mäkelä [ 2022-06-14 ]

The buffer pool resizing is inaccurate and somewhat nondeterministic by design. I do not think that it can be fixed easily, other than by reimplementing the buffer pool resizing in a different way: Allocate a reasonable maximum amount of contiguous 64-bit virtual address space for the buffer pool, but only map the requested amount of memory for it. I have discussed it with danblack in the past.

I think that what is needed to address this bug report is to ensure that the crashes (MDEV-28800) are avoided, and that the error message is refined to mention innodb_buffer_pool_size, because the explicit record locks will be allocated from the InnoDB buffer pool. It is a reasonable design, only the enforcement of the maximum allocation size is sloppy and the error message is not helpful for a user.

Comment by Roel Van de Paar [ 2022-06-14 ]

Understood, thank you.

Generated at Thu Feb 08 10:03:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.