Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
11.8.0, 11.8.1, 10.11.12, 10.11.13, 10.11.14, 11.4.6, 11.4.7, 11.4.8, 11.8.2, 11.8.3, 11.8.4
-
None
-
None
Description
The new buffer pool implementation in MariaDB 10.11.12/13 and 11.4.6/7 exhibits inconsistent behavior across different operating systems, with the most critical issue being engine crashes during resizing operations under specific memory conditions.
Description
MariaDB Q2 minor releases introduced a significant redesign of the buffer pool architecture, replacing the chunk-based system with a contiguous memory model. Three new parameters were added: innodb_buffer_pool_size_max, innodb_buffer_pool_size_auto_min, and innodb_log_checkpoint_now. Testing has revealed some issues with this implementation.
Memory Allocation Behavior Differences
This information is for background and would be referred in the next part.
During initialization and resize operations, engine uses fundamentally different approaches for memory allocation.
- During initialization, engine reserves a contiguous virtual memory address range without immediately committing physical memory
- During resize, engine attempts to commit additional memory from the previously reserved space. This immediately attempts to allocate physical memory = (new value - previous value) assuming new value > previous value
Critical Issue: Engine Crashes During Buffer Pool Resizing
Scenario for Crash
- MariaDB 11.4.7 installed
- Buffer pool size (innodb_buffer_pool_size) = 10GB
- Maximum buffer pool size (innodb_buffer_pool_size_max) = 15GB
- Total system memory = 15GB
First, we reduce the buffer pool to minimum size:
SET GLOBAL innodb_buffer_pool_size=6M;
|
Then we increase it back to 10GB:
SET GLOBAL innodb_buffer_pool_size=10G;
|
This works fine, and the engine commits 10GB of physical memory. Purpose of reducing it to minimum was to force the engine to use 10G as during resize physical memory is allocated.
Meanwhile, another process starts and consumes 4GB of memory.
Now only 1GB of free system memory remains.
We try to increase buffer pool size further:
SET GLOBAL innodb_buffer_pool_size=12G;
|
This is technically allowed since our max is 15GB, but there's only 1GB free. But, it would crash on certain OS (with certain kernel version) and would show failure in other.
Failure Example:
MariaDB [(none)]> show global variables like "innodb_buffer_pool_size%"; |
+----------------------------------+-------------+
|
| Variable_name | Value |
|
+----------------------------------+-------------+
|
| innodb_buffer_pool_size | 134217728 | |
| innodb_buffer_pool_size_auto_min | 20501757952 | |
| innodb_buffer_pool_size_max | 20501757952 | |
+----------------------------------+-------------+
|
3 rows in set (0.001 sec) |
|
MariaDB [(none)]> set global innodb_buffer_pool_size=20501757952; |
ERROR 5 (HY000): Out of memory (Needed 3187671040 bytes) |
MariaDB [(none)]> set global innodb_buffer_pool_size=8501757952; |
Query OK, 0 rows affected, 1 warning (2.748 sec) |
|
$ hostnamectl
|
Static hostname: xxxxxx.us-west-2.compute.internal |
Icon name: computer-vm
|
Chassis: vm
|
Machine ID: xxxxxx
|
Boot ID: xxxxxx
|
Virtualization: amazon
|
Operating System: Amazon Linux 2 |
CPE OS Name: cpe:2.3:o:amazon:amazon_linux:2 |
Kernel: Linux 4.14.355-276.618.amzn2.x86_64 |
Architecture: x86-64 |
Crash Example:
MariaDB [(none)]> show global variables like "innodb_buffer_pool_size%"; |
+----------------------------------+-------------+
|
| Variable_name | Value |
|
+----------------------------------+-------------+
|
| innodb_buffer_pool_size | 134217728 | |
| innodb_buffer_pool_size_auto_min | 33646706688 | |
| innodb_buffer_pool_size_max | 33646706688 | |
+----------------------------------+-------------+
|
3 rows in set (0.001 sec) |
|
MariaDB [(none)]> set global innodb_buffer_pool_size=28672000000; |
ERROR 2026 (HY000): TLS/SSL error: unexpected eof while reading |
MariaDB [(none)]> exit
|
Bye
|
[1]- Killed mariadbd --defaults-file=/home/linuxbrew/.linuxbrew/etc/my.cnf --innodb_buffer_pool_size_max=33653706688 |
|
$ hostnamectl
|
Static hostname: xxxxxx.us-west-2.compute.internal |
Icon name: computer-vm
|
Chassis: vm
|
Machine ID: xxxxxx
|
Boot ID: xxxxxx
|
Virtualization: xen
|
Operating System: Amazon Linux 2 |
CPE OS Name: cpe:2.3:o:amazon:amazon_linux:2 |
Kernel: Linux 5.10.235-227.919.amzn2.x86_64 |
Architecture: x86-64 |
Memory Information just before crash
$ cat /proc/meminfo
|
MemTotal: 32859952 kB |
MemFree: 21227528 kB |
MemAvailable: 21520972 kB |
Buffers: 0 kB |
Cached: 632672 kB |
SwapCached: 0 kB |
Active: 329668 kB |
Inactive: 10993976 kB |
Active(anon): 280 kB |
Inactive(anon): 10691148 kB |
Active(file): 329388 kB |
Inactive(file): 302828 kB |
Unevictable: 0 kB |
Mlocked: 0 kB |
SwapTotal: 0 kB |
SwapFree: 0 kB |
Dirty: 172 kB |
Writeback: 0 kB |
AnonPages: 10691072 kB |
Mapped: 176412 kB |
Shmem: 456 kB |
KReclaimable: 48392 kB |
Slab: 101312 kB |
SReclaimable: 48392 kB |
SUnreclaim: 52920 kB |
KernelStack: 4016 kB |
PageTables: 81632 kB |
NFS_Unstable: 0 kB |
Bounce: 0 kB |
WritebackTmp: 0 kB |
CommitLimit: 16429976 kB |
Committed_AS: 45714748 kB |
VmallocTotal: 34359738367 kB |
VmallocUsed: 14928 kB |
VmallocChunk: 0 kB |
Percpu: 60480 kB |
HardwareCorrupted: 0 kB |
AnonHugePages: 10483712 kB |
ShmemHugePages: 0 kB |
ShmemPmdMapped: 0 kB |
FileHugePages: 0 kB |
FilePmdMapped: 0 kB |
HugePages_Total: 0 |
HugePages_Free: 0 |
HugePages_Rsvd: 0 |
HugePages_Surp: 0 |
Hugepagesize: 2048 kB |
Hugetlb: 0 kB |
DirectMap4k: 231424 kB |
DirectMap2M: 22837248 kB |
DirectMap1G: 11534336 kB |
|
$ sudo cat /proc/10457/smaps_rollup |
55dedcb1b000-7ffd9b3f2000 ---p 00000000 00:00 0 [rollup] |
Rss: 105984 kB |
Pss: 104254 kB |
Pss_Anon: 80332 kB |
Pss_File: 23922 kB |
Pss_Shmem: 0 kB |
Shared_Clean: 3456 kB |
Shared_Dirty: 0 kB |
Private_Clean: 22196 kB |
Private_Dirty: 80332 kB |
Referenced: 105984 kB |
Anonymous: 80332 kB |
LazyFree: 0 kB |
AnonHugePages: 0 kB |
ShmemPmdMapped: 0 kB |
FilePmdMapped: 0 kB |
Shared_Hugetlb: 0 kB |
Private_Hugetlb: 0 kB |
Swap: 0 kB |
SwapPss: 0 kB |
Locked: 0 kB |
Crash Analysis
OOM killer killed the engine
[159031.223998] containerd invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-999 |
[159031.235188] CPU: 1 PID: 3872 Comm: containerd Not tainted 5.10.235-227.919.amzn2.x86_64 #1 |
[159031.244367] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006 |
[159031.447288] Call Trace: |
[159031.450605] dump_stack+0x57/0x70 |
[159031.454473] dump_header+0x4c/0x20e |
[159031.458462] oom_kill_process.cold+0xb/0x10 |
[159031.462755] out_of_memory+0xed/0x2d0 |
[159031.467939] __alloc_pages_slowpath.constprop.0+0x93d/0xa00 |
[159031.474058] __alloc_pages_nodemask+0x2de/0x310 |
[159031.478904] pagecache_get_page+0x17e/0x310 |
[159031.483305] filemap_fault+0x4e4/0x6b0 |
[159031.487205] __xfs_filemap_fault.constprop.0+0x45/0x150 |
[159031.492764] __do_fault+0x3a/0x150 |
[159031.496521] do_fault+0x9c/0x240 |
[159031.500133] __handle_mm_fault+0x499/0x640 |
[159031.504550] handle_mm_fault+0xbe/0x2a0 |
[159031.508708] do_user_addr_fault+0x1b3/0x3f0 |
[159031.512949] exc_page_fault+0x68/0x130 |
[159031.516885] ? asm_exc_page_fault+0x8/0x30 |
[159031.521201] asm_exc_page_fault+0x1e/0x30 |
[159031.525416] RIP: 0033:0x55bb3619a4fc |
[159031.530823] Code: Unable to access opcode bytes at RIP 0x55bb3619a4d2. |
[159031.537431] RSP: 002b:00007fcbcb4b4d80 EFLAGS: 00010202 |
[159031.544665] RAX: ffffffffffffff92 RBX: 0000000000000000 RCX: 000055bb3620e8e3 |
[159031.554288] RDX: 000055bb377b6c28 RSI: 0000000000000080 RDI: 000055bb38cde2a0 |
[159031.563911] RBP: 00007fcbcb4b4db0 R08: 0000000000000000 R09: 0000000000000000 |
[159031.573130] R10: 00007fcbcb4b4d60 R11: 0000000000000206 R12: 00007fcbcb4b4d60 |
[159031.582649] R13: 00007fff4322224f R14: 000000c000006a80 R15: 00007fff43222340 |
[159031.592084] Mem-Info: |
[159031.597127] active_anon:70 inactive_anon:8118236 isolated_anon:0 |
active_file:53 inactive_file:0 isolated_file:32 |
unevictable:0 dirty:5 writeback:0 |
slab_reclaimable:5661 slab_unreclaimable:10990 |
mapped:64 shmem:114 pagetables:17304 bounce:0 |
free:48448 free_pcp:271 free_cma:0 |
[159031.633853] Node 0 active_anon:280kB inactive_anon:32472944kB active_file:212kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):128kB mapped:256kB dirty:20kB writeback:0kB shmem:456kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10483712kB writeback_tmp:0kB kernel_stack:4032kB all_unreclaimable? yes |
[159031.668854] Node 0 DMA free:11808kB min:32kB low:44kB high:56kB reserved_highatomic:0KB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB |
[159031.700790] lowmem_reserve[]: 0 3692 32055 32055 |
[159031.707453] Node 0 DMA32 free:121060kB min:7780kB low:11560kB high:15340kB reserved_highatomic:0KB active_anon:4kB inactive_anon:3665640kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3915776kB managed:3799540kB mlocked:0kB pagetables:7228kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB |
[159031.743425] lowmem_reserve[]: 0 0 28363 28363 |
[159031.747789] Node 0 Normal free:61332kB min:61816kB low:90860kB high:119904kB reserved_highatomic:0KB active_anon:276kB inactive_anon:28807484kB active_file:236kB inactive_file:0kB unevictable:0kB writepending:0kB present:29622272kB managed:29044508kB mlocked:0kB pagetables:62032kB bounce:0kB free_pcp:1084kB local_pcp:0kB free_cma:0kB |
[159031.775760] lowmem_reserve[]: 0 0 0 0 |
[159031.779809] Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 2*4096kB (M) = 11808kB |
[159031.793777] Node 0 DMA32: 199*4kB (UE) 104*8kB (UME) 24*16kB (UE) 10*32kB (UME) 6*64kB (UME) 2*128kB (UE) 1*256kB (U) 3*512kB (UME) 2*1024kB (UE) 2*2048kB (ME) 27*4096kB (M) = 121500kB |
[159031.810138] Node 0 Normal: 3815*4kB (UME) 1963*8kB (UE) 624*16kB (UME) 402*32kB (UME) 89*64kB (UME) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 59508kB |
[159031.824059] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB |
[159031.833793] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB |
[159031.843530] 140 total pagecache pages |
[159031.848726] 0 pages in swap cache |
[159031.854087] Swap cache stats: add 0, delete 0, find 0/0 |
[159031.860197] Free swap = 0kB |
[159031.864319] Total swap = 0kB |
[159031.868364] 8388509 pages RAM |
[159031.872546] 0 pages HighMem/MovableOnly |
[159031.877302] 173521 pages reserved |
[159031.881714] 0 pages hwpoisoned |
[159031.885759] Tasks state (memory values in pages): |
[159031.891346] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name |
[159031.900920] [ 2212] 0 2212 40478 85 376832 0 0 systemd-journal |
[159031.910766] [ 2230] 0 2230 11142 326 118784 0 -1000 systemd-udevd |
[159031.923298] [ 3294] 0 3294 14414 111 135168 0 -1000 auditd |
[159031.932867] [ 3334] 0 3334 6600 95 98304 0 0 systemd-logind |
[159031.942809] [ 3335] 81 3335 14042 137 159744 0 -900 dbus-daemon |
[159031.952567] [ 3336] 32 3336 16815 135 167936 0 0 rpcbind |
[159031.961565] [ 3339] 0 3339 1068 26 53248 0 0 acpid |
[159031.971894] [ 3342] 0 3342 25464 82 102400 0 0 irqbalance |
[159031.982834] [ 3364] 999 3364 29525 111 135168 0 0 chronyd |
[159031.992895] [ 3385] 998 3385 24087 216 221184 0 0 rngd |
[159032.003079] [ 3404] 0 3404 52999 119 180224 0 0 gssproxy |
[159032.013589] [ 3740] 0 3740 24672 516 208896 0 0 dhclient |
[159032.024176] [ 3787] 0 3787 24672 509 212992 0 0 dhclient |
[159032.034701] [ 3832] 0 3832 307848 256 98304 0 0 amazon-ecs-volu |
[159032.045594] [ 3839] 0 3839 589160 4608 389120 0 -999 containerd |
[159032.055335] [ 3946] 0 3946 22058 262 200704 0 0 master |
[159032.064793] [ 3948] 89 3948 22096 257 208896 0 0 qmgr |
[159032.073513] [ 3993] 0 3993 27192 257 249856 0 -1000 sshd |
[159032.082389] [ 3996] 0 3996 310145 2284 143360 0 0 amazon-ssm-agen |
[159032.091763] [ 3999] 0 3999 82618 487 405504 0 0 rsyslogd |
[159032.101107] [ 4032] 0 4032 33250 155 102400 0 0 crond |
[159032.111248] [ 4034] 0 4034 29203 27 77824 0 0 agetty |
[159032.119910] [ 4036] 0 4036 29291 27 61440 0 0 agetty |
[159032.127796] [ 4038] 0 4038 620252 10511 569344 0 -500 dockerd |
[159032.135742] [ 4126] 0 4126 1057 17 49152 0 0 bpfilter_umh |
[159032.144106] [ 4274] 0 4274 312997 3312 163840 0 0 ssm-agent-worke |
[159032.152506] [ 4530] 0 4530 484713 1025 249856 0 0 amazon-ecs-init |
[159032.161217] [ 30465] 89 30465 22079 253 204800 0 0 pickup |
[159032.169391] [ 9203] 0 9203 37136 330 335872 0 0 sshd |
[159032.177423] [ 9205] 1000 9205 37136 330 323584 0 0 sshd |
[159032.186774] [ 9206] 1000 9206 30532 114 81920 0 0 bash |
[159032.194777] [ 10457] 1000 10457 8502153 5465048 44118016 0 0 mariadbd |
[159032.202960] [ 11306] 0 11306 30565 114 86016 0 0 log4j-cve-2021- |
[159032.211981] [ 25269] 1000 25269 2902682 2624250 21147648 0 0 python3 |
[159032.220621] [ 25693] 1000 25693 6358 879 90112 0 0 mariadb |
[159032.229078] [ 25826] 0 25826 28662 16 65536 0 0 sleep |
[159032.237481] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice,task=mariadbd,pid=10457,uid=1000 |
[159032.251593] Out of memory: Killed process 10457 (mariadbd) total-vm:34008612kB, anon-rss:21860192kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:43084kB oom_score_adj:0 |
strace for checking system call
[pid 16956] mmap(0x7f4402800000, 28536995840, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS|MAP_POPULATE, -1, 0) = ? |
[pid 16956] +++ killed by SIGKILL +++ |
[pid 16861] +++ killed by SIGKILL +++ |
[pid 16854] +++ killed by SIGKILL +++ |
[pid 16849] +++ killed by SIGKILL +++ |
[pid 16839] +++ killed by SIGKILL +++ |
[pid 16838] +++ killed by SIGKILL +++ |
[pid 16840] +++ killed by SIGKILL +++ |
+++ killed by SIGKILL +++
|
Lack of documentation
Users upgrading to these versions have no way to understand the purpose, behavior, or proper configuration of these parameters. This documentation gap is particularly problematic for innodb_buffer_pool_size_max, which fundamentally changes how buffer pool resizing works in ways that contradict user expectations.
Documentation was requested in https://jira.mariadb.org/browse/MDEV-37176 but not much updates are there after that and this information was also missing in the release notes.
Most users familiar with MariaDB would expect that "innodb_buffer_pool_size" would be fully dynamic, allowing both increases and decreases within constraints. However, the undocumented reality is that "innodb_buffer_pool_size_max" acts as a hard ceiling that defaults to the initial buffer pool size and cannot be changed without restarting. This creates a confusing situation where users can reduce buffer pool size but cannot increase it beyond its initial value without a restart.
Another confusing aspect is that when users attempt to increase buffer pool size beyond "innodb_buffer_pool_size_max", the system shows only a warning rather than an error, and due to the lack of documentation, users have no way of knowing that their request won't actually increase the buffer pool size at all in the default case where the current size equals the maximum size or will cap it at max in another case.
MariaDB [(none)]> set global innodb_buffer_pool_size=99999999999999999; |
Query OK, 0 rows affected, 1 warning (0.696 sec) |
|
MariaDB [(none)]> show warnings;
|
+---------+------+------------------------------------------------------------------------+
|
| Level | Code | Message |
|
+---------+------+------------------------------------------------------------------------+
|
| Warning | 1292 | Truncated incorrect innodb_buffer_pool_size value: '99999999999999999' | |
+---------+------+------------------------------------------------------------------------+
|
|
MariaDB [(none)]> show global variables like "innodb_buffer_pool_size%"; |
+----------------------------------+-----------+
|
| Variable_name | Value |
|
+----------------------------------+-----------+
|
| innodb_buffer_pool_size | 452984832 | |
| innodb_buffer_pool_size_auto_min | 452984832 | |
| innodb_buffer_pool_size_max | 452984832 | |
+----------------------------------+-----------+
|
Attachments
Issue Links
- relates to
-
MDEV-37176 Provide description for now parameters
-
- Open
-