[MDEV-32946] Memory leak after upgrading from 10.4 to 10.6 Created: 2023-12-04  Updated: 2024-01-08

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.6.15, 10.6.16
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Marcin Gryszkalis Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

MariaDB 10.6.15 and .16 on FreeBSD 12.4


Attachments: PNG File mg-20231130-004.png     PNG File screenshot-1.png    

 Description   

After upgrading from 10.4.29 to 10.6.15 (and then 10.6.16) I'm observing huge memory leak probably related to FTS.

As calculated by mytuner.pl max memory usage should be around 20GB (and it was like that on 10.4) - with the leak memory can grow up to 90GB within few hours. Attached example graph of memory usage. Database is mostly innodb.

Observations I made:

  • leak seems to be SELECT related, because on replicas memory doesn't leak (replicase are used only for backups etc. - SELECTs are generally not executed on them).
  • jumps are sometimes pretty high, i.e. it can leak 10GB in 1 minute
  • I extracted SELECTs from general log (10 minutes set, ~120k queries) and rerun them on replica - it caused immediate grow of allocated memory.
  • These queries were originally executed as Prepared Statements but in this test they were run as-is (as saved in the log)
  • I started checking memory size after each query - it seems that SELECT..MATCH..AGAINST is causing trouble.
  • repeating same query didn't cause additional leak
  • switching the table with FTS index from Innodb to MyISAM limited the problem - i.e. after 24h I can observe 10GB leak (with Innodb it was over 100GB in 8 hours)

Table used is like that:

CREATE TABLE `kt_index` (
       `id_kt` varchar(32) NOT NULL,
       `id_kl` varchar(32) NOT NULL,
       `f1` varchar(255) NOT NULL,
       `f2` varchar(255) DEFAULT NULL,
       `f3` varchar(255) DEFAULT NULL,
       `f4` varchar(255) DEFAULT NULL,
       `k1` tinyint(1) DEFAULT NULL,
       `k2` tinyint(1) DEFAULT NULL,
       `k3` tinyint(1) DEFAULT NULL,
       `k4` tinyint(1) DEFAULT NULL,
       `k5` int(11) NOT NULL DEFAULT 1,
       `k6` int(11) NOT NULL DEFAULT 1,
       PRIMARY KEY (`id_kt`),
       KEY `kt_indeks_search` (`id_kl`,`k1`,`k2`),
       FULLTEXT KEY `xname` (`f1`,`f2`,`f3`,`f4`)
     ) ENGINE=Innodb DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci;

Queries that seems to cause leak are 99% like that:

SELECT kti.id_kl id, 'xyz' label, kti.f1 value, 'K' group_name, kti.f3, kti.f4 FROM kt_index kti LEFT JOIN kl ON kl.id_kl = kti.id_kl WHERE MATCH (kti.f1,kti.f2,kti.f3,kti.f4) AGAINST (? IN BOOLEAN MODE) GROUP BY kti.id_kl ORDER BY kti.f1 ASC LIMIT 20;



 Comments   
Comment by Sergei Golubchik [ 2023-12-04 ]

What does MariaDB think of the memory usage? (MEMORY_USED in SHOW GLOBAL STATUS, MEMORY_USED in INFORMATION_SCHEMA.PROCESSLIST)

Do you use standard glibc memory allocator or do you preload, say, jemalloc or tcmalloc?

Do you have transparent huge pages enabled?

Comment by Marcin Gryszkalis [ 2023-12-04 ]

show global status like '%memory_used%';
+---------------------+-------------+
| Variable_name       | Value       |
+---------------------+-------------+
| Memory_used         | 19569814152 |
| Memory_used_initial | 19533639480 |
+---------------------+-------------+

select memory_used, max_memory_used from INFORMATION_SCHEMA.PROCESSLIST order by max_memory_used;
+-------------+-----------------+
| memory_used | max_memory_used |
+-------------+-----------------+
|       81736 |           89896 |
|       81736 |           89896 |
|       83768 |          105520 |
|       83768 |          105520 |
|       83768 |          105520 |
|       83768 |          105520 |
|       83768 |          105520 |
|       81736 |          178728 |
|       82104 |          257112 |
|       81736 |          380400 |
|       81736 |          380400 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          532184 |
|       81736 |          617960 |
|       81736 |          617984 |
|       81736 |          617992 |
|       81736 |          752200 |
|       81736 |          752200 |
|       81736 |          752200 |
|       81736 |          752200 |
|       81736 |          752200 |
|       81496 |         2317352 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3100288 |
|       81736 |         3406320 |
|       81944 |         3406528 |
|       81736 |         3413656 |
|       81736 |         3413656 |
|       81736 |         3413656 |
|       81736 |         3413656 |
|       82048 |         3413968 |
|       81736 |         3660336 |
|       81736 |         3660336 |
|       81736 |         3660336 |
|       81736 |         3660336 |
|       81736 |         3660336 |
|       81736 |         3660336 |
|       81840 |         3660440 |
|       81944 |         3660544 |
|       81944 |         3660544 |
|       86144 |         3664744 |
|       98328 |         3677112 |
|      129560 |         3705984 |
|      106312 |         5015928 |
|       82048 |         5016136 |
|       82048 |         5016240 |
|       82048 |         5016240 |
|      196424 |         5063016 |
|      290632 |         9902376 |
|      569160 |        15454736 |
|       94336 |        33657304 |
|      253976 |        36410528 |
|       81736 |        66099104 |
|     1654912 |       122458264 |
|      102424 |       122478608 |
|       81736 |       134795232 |
+-------------+-----------------+

select sum(memory_used), sum(max_memory_used) from INFORMATION_SCHEMA.PROCESSLIST;
+------------------+----------------------+
| sum(memory_used) | sum(max_memory_used) |
+------------------+----------------------+
|          8416896 |            679353208 |
+------------------+----------------------+

I don't preload anything - although FreeBSD uses jemalloc as a standard libc allocator (not glibc obviously).

Regarding THP - in FreeBSD these are called "superpages" and they're enabled by default since FreeBSD 7 (~2009):

$ sysctl vm.pmap.pg_ps_enabled
vm.pmap.pg_ps_enabled: 1

although it's not enabled in MariaDB config:

show global variables where Variable_name like 'large%';
+---------------------+-------+
| Variable_name       | Value |
+---------------------+-------+
| large_files_support | ON    |
| large_page_size     | 0     |
| large_pages         | OFF   |
+---------------------+-------+

Comment by Sergei Golubchik [ 2023-12-04 ]

Sorry, I missed that it's FreeBSD (you did write it in the Environment).

There are many reports around about transparent huge pages on Linux, mostly when combined with jemalloc, but not only.

I didn't find anything for FreeBSD's transparent superpages, and given that FreeBSD uses jemalloc by default, I wouldn't try to say that the above applies to FreeBSD directly. Still, you might want to try to disable transparent superpages and see if it'll make any difference.

Meanwhile, we should try to repeat this memory growth and see what we can do about it

Comment by Marcin Gryszkalis [ 2023-12-06 ]

After switching FTS index/table to MyISAM the memory levels raised a bit since restart but then it's just fluctuating (I don't like having mixed engines though).

(it's VSZ for 2 instances of database, they are not related but use the same schemas).

I will try to isolate and replicate the problem on test FreeBSD instance where I can play with low-level configuration (like superpages) anyway.

Comment by Alice Sherepa [ 2023-12-08 ]

Could you please also add the structure of `kl` table and your .cnf file(s)?

Comment by Marcin Gryszkalis [ 2023-12-11 ]

relevant config entries (skipped paths, some logs and audit etc)

[mysqld]
binlog_format=mixed
server_id = 1
slow_query_log=1
long_query_time=10
userstat = 1
ft_min_word_len=3
 
key_buffer_size         = 2048M
max_allowed_packet      = 32M
thread_stack            = 256K
 
query_cache_size        = 0
query_cache_type        = 0
 
max_heap_table_size     = 1024M
tmp_table_size          = 512M
table_definition_cache  = 4096
table_open_cache        = 1024
max_connections = 1024
 
innodb_buffer_pool_size = 16G
innodb_log_file_size = 256M
innodb_log_buffer_size = 32M
innodb_file_per_table = true
innodb_open_files = 1024
innodb_thread_concurrency = 16
innodb_fast_shutdown = 1
transaction-isolation = REPEATABLE-READ
optimizer_switch=rowid_filter=off
plugin-load="SQL_ERROR_LOG=sql_errlog.so;server_audit=server_audit.so"

regarding tables - kl is large table (~130 colums, many indices, 15 FKs), kt is smaller. I can provide you with full anonymized definition if it's useful (later today).

Generated at Thu Feb 08 10:35:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.