Hello Marko and Vlad,
Yes, the history list is much better managed now. We graph it, and it generally peaks at a few hundred thousand before a the background thread can clean it up after a long operation has run. Some of our even longer operations get it into the millions, but they are promptly cleaned up now as well. In contrast, the old ibdata1 file on our master was growing at about a terabyte a month on 10.11.6, before the fix and externalized undo logs.
Yes, your assumption about the graphs is correct. It's the Zabbix available memory item, vm.memory.size[available], which matches MemAvailable in /proc/meminfo. The system runs bare metal with basically nothing other than mariadb, so the number is highly reflective of the mariadb process.
I can start graphing other variables, but I don't have historical data for them.
I think the problem I had over the weekend might have actually been caused because of MDEV-24670. The process didn't crash, it just got much slower, and the kswap1 process was completely pegging one core, even though the system has no swap space whatsoever. In the past, it would have OOMed quickly, restarted, and caused much less headache overall. Here, it hobbled along slowly, with no resolution of freeing memory in sight, and I had to do a restart of the mariadb process to resolve it.
I can make it release pages of memory sometimes. To avoid this problem in the future, I lowered innodb_buffer_pool_size by a few gigabytes at runtime, and saw an associated increase in available memory.
I think Vladislav's instinct is correct. I considered it good behaviour that the process was returning memory; it made it easy to keep track of how much RAM was actually being used at any given time. We didn't note any decrease in performance with that strategy, but it would never run out of RAM. In fact, as you can see from the graphs, it would average about half of the RAM in the system and never get low.
Back in the day, MySQL 5.5 could be configured with about 90% of the system as innodb_buffer_pool_size and run very stable for literal years with something like 20M of free memory. 5.6 made this a little less stable, and 5.7 took it away completely. The various versions of MariaDB behave a little like MySQL 5.6 to me in this regard. There seem to be small leaks or unreturned RAM that creeps up over time, so it's much harder to find optimal settings, until that version where the memory was clearly aggressively being returned.
Perhaps you could confirm that this madvise change went away in either 10.11.7 or 10.11.8?
Maybe this isn't a bug, but an intentional change, and I need to keep track of innodb memory variables more closely to tune the system.
Thanks.
Reviewing the changelogs for 10.11.7 and 10.11.8, all I can see is an entry for 10.11.7:
MDEV-24670)