[MDEV-16446] create temporary table engine=memory does not release memory and tmp space Created: 2018-06-08  Updated: 2020-08-25  Resolved: 2018-07-12

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.2.15
Fix Version/s: 10.2.15

Type: Bug Priority: Major
Reporter: Rick Pizzi Assignee: Axel Schwenke
Resolution: Not a Bug Votes: 1
Labels: None

Attachments: File create_temporary.lua     File create_temporary_fixed.lua     File create_temporary_proper.lua     File rickf.csv    

 Description   

A workload that uses a lot in-memory temporary tables makes the mysqld process grow until it fills all the avaialble system memory and eventually the process is killed by OOM killer.

While verifying this in my lab, I also found that the above statement is also using the ibtmp1 tablespace, and makes it grow enormously (it filled my disk during the test run).

According to the documentation, the memory allocated by CREATE TEMPORARY TABLE .. ENGINE=MEMORY should be released when the session is terminated, however, this doesn't seem to be the case. Also, the enormous growth of the temporary innodb tablespace is kind of unexpected - only way to reclaim this space is to restart MariaDB.

How to reproduce:

sysbench --db-driver=mysql  --threads=48 --time=900  --mysql-host=localhost  --mysql-user=root ./create_temporary.lua run

I am attaching the LUA file to this ticket.

While this runs, you can monitor mysqld footprint and ibtmp1 growth with:

while true; do ps -eAo pid,rss,args | grep mysqld | grep -v grep | grep -v safe; ls -lh /var/lib/mysql/ibtmp1; sleep 1; done

Please note:

show global variables like '%heap%';
+---------------------+----------+
| Variable_name       | Value    |
+---------------------+----------+
| max_heap_table_size | 16777216 |
+---------------------+----------+
1 row in set (0.02 sec)



 Comments   
Comment by Elena Stepanova [ 2018-06-25 ]

axel,

Could you please check it quickly and if it reproduces, preserve the setup and pass it over to serg (for futher assignment). The important part is the setup, because whoever ends up fixing it, will most likely need to be able to reproduce it again, and I don't have a shareable machine which can meaningfully run 48 threads.

Comment by Axel Schwenke [ 2018-07-11 ]

I cannot reproduce the claims made in this ticket. Instead I find several flaws in the test method:

  1. the CREATE TABLE statement in the LUA script is broken. The ENGINE clause is not part of the SQL statement but appears as LUA code and is (mysteriously) silently ignored by the LUA interpreter. If I move the "ENGINE=MEMORY" clause before the closing "]]" bracket (where it belongs), then the tables are created as MEMORY tables and ibtmp1 is not touched.
  2. sysbench never closes a connection while it runs (except you put that explcitly in the event() function). The test script that you have provided, will open 48 connections when sysbench starts and then create temporary tables. The connections will only be closed when sysbench stops - with the command line given by you - after 15 minutes. Temporary tables created during that time period will of course pile up and eventually fill all available RAM.

I have fixed the CREATE TABLE statement and shortened the execution time of the test to 2 minutes. I used

mysqldump debug

to write memory statistics to the error log. I did this 4 times:

  1. immediately before sysbench starts
  2. 100 seconds into the sysbench run
  3. immediately after sysbench stops
  4. about 30 seconds later, when mysqld cpu usage went from 100% to 0

The reason for 4. is that the cleaning up of 48 session contexts (including dropping temporary tables) is not immediate, but takes some time. If one would watch closer, one could see the memory usage falling steadily between 3. and 4.

Here are the 4 samples (edited for readability)

Status information:
 
Running threads: 6  Cached threads: 0  Stack size: 299008
 
Table status:
Opened tables:         17
Open tables:           11
Open files:            25
 
Memory status:
Global memory allocated by server:       275373576
Memory allocated by threads:             248728

Status information:
 
Running threads: 54  Cached threads: 0  Stack size: 299008
 
Table status:
Opened tables:      49142
Open tables:           11
Open files:    18446744073709551560
 
Memory status:
Global memory allocated by server:       50146279624
Memory allocated by threads:             110622216

Status information:
 
Running threads: 54  Cached threads: 0  Stack size: 299008
 
Table status:
Opened tables:      50605
Open tables:           11
Open files:    18446744073709551556
 
Memory status:
Global memory allocated by server:       52511732248
Memory allocated by threads:             113265232

Status information:
 
Running threads: 6  Cached threads: 48  Stack size: 299008
 
Table status:
Opened tables:      50605
Open tables:           11
Open files:    18446744073709551556
 
Memory status:
Global memory allocated by server:       277310472
Memory allocated by threads:             248728

Observations:

  1. my test machine can create ~50K temporyry tables in the 2 minutes run time
  2. this makes the memory footprint of the mysqld process grow from the initial ~275M to ~52G
  3. immediately after sysbench stopping the memory is not yet freed, nor are the worker threads released
  4. after context cleanup both number of threads and allocated memory are back to normal

I tested that with MariaDB 10.2.15 (as given in the ticket) and also with most recent MariaDB 10.3.8 and the results are the same.

Comment by Axel Schwenke [ 2018-07-11 ]

attached a fixed version create_temporary_fixed.lua of the LUA script

Comment by Axel Schwenke [ 2018-07-11 ]

Finally: I attached create_temporary_proper.lua to demonstrate how to connect and disconnect in the inner sysbench loop. You can run this script forever and it will not grow the server memory footprint at all.

Comment by Rick Pizzi [ 2018-07-12 ]

Sorry for having crafted a broken LUA script, and thanks for the analysis.
However, we DO have this problem on a customer of ours.

Also, not clear to me, are you saying we can bring down the server by creating a number
of temporary tables? It looks like we should provision a limit for that or any user could perform a DOS attack.

Why is creating a temporary table on disk using memory?
EDIT: I guess these were created in memory regardless the wrong LUA code,
since the table content was small enough to be below the configured limit.

Still this means a loop of "create temporary table" in a session can bring down the server.

Anyway I will try to reproduce properly our problem and reopen the ticket if I succeed...

Comment by Axel Schwenke [ 2018-07-12 ]

Rick, it is true that there is no limit on the total amount of memory that could be spent on MEMORY tables - be it temporary or resident tables. The max_heap_table_size limit is per table. If a user is allowed to create an arbitrary number of such tables, he can fill any amount of memory.

I'm not sure what the exact problem of your customer is. It could be some application flaw that creates (too) many temporary tables. Or it could be a broken connection pool (PHP's pconnect() was notorious for that) that doesn't clean up connections that go back into the pool.

The only safe way is IMHO to put a DROP TABLE statement for each CREATE TEMPORARY TABLE in the application code and make sure it is executed even in case of errors (using exception handling or whatever the app language offers).

Comment by Rick Pizzi [ 2018-07-12 ]

I think there must be some kind of memory leak.
The footprint grows over time, until OOM killer comes and kills MariaDB.
This happens on only one out of 3 identical slaves, and the only difference is the workload.

I'll be back if I can reproduce it in my lab.

RIck

Comment by Axel Schwenke [ 2018-07-12 ]

Why is creating a temporary table on disk using memory?

A temporary InnoDB table (the default when neither default_storage_engine nor default_tmp_storage_engine is set) will be created in an InnoDB tablespace and will use some memory in InnoDB buffers. The InnoDB buffer pool will not grow beyond the configured size, but the data dictionary may. It's however relatively small and unlikely to fill all RAM. But if the buffer pool size was made too big from the beginning (but never used under normal circumstances) then it might as well bring down the machine.

Again: there is not enough information for a detailed analysis.

Comment by Axel Schwenke [ 2018-07-12 ]

I think there must be some kind of memory leak.

"mysqladmin debug" could be helpful to find where the memory is spent. Output from "ps" is not helpful at all.

If an application creates unlimited number of temporary tables and never cleans them up, then this is the leak.

Comment by Axel Schwenke [ 2018-07-12 ]

Oh, one more thing. Replicating temporary tables is often a bad idea. From the application side it's often unnecessary to replicate such tables. Or to put it the other way: it's only necessary when the temporary table is later used to fill rows into a persistent table. I.e.

CREATE TEMPORARY TABLE foo ...
INSERT INTO foo ...
INSERT INTO bar ... SELECT FROM foo

But in most cases the temporary table is only used for aggregation and then none of the temporary table operations needs to be executed on the slave.

In the past I handled that in the application code, using a separate database for such temporary tables and a replication filter. A unique prefix on the table name would also do.

And of course: using row-based logging, temporary tables won't be replicated at all.

Comment by Rick Pizzi [ 2018-07-12 ]

Not replicated. they are running this on the slave directly, the flow should be (I have to recheck as it
was some time ago when I raised this bug):

. connect

  • create temp table
  • load data into temp table from csv
  • some selects join blah
  • disconnect (table gets dropped)
Comment by Rick Pizzi [ 2018-07-12 ]

OK, I have checked the customer again and:

1) this happens also on galera nodes
2) only happens on nodes where there is large use of temporary tables engine=MEMORY

The mysqladmin debug reports obviously incorrect values:

Running threads: 42 Cached threads: 101 Stack size: 299008

Key caches:
default
Buffer_size: 134217728
Block_size: 1024
Division_limit: 100
Age_threshold: 300
Partitions: 0
blocks used: 47
not flushed: 0
w_requests: 0
writes: 0
r_requests: 294045
reads: 48

handler status:
read_key: 10553305170
read_next: 132197590438
read_rnd 366586468
read_first: 204492
write: 270634788
delete 17123393
update: 20862780

Table status:
Opened tables: 93155
Open tables: 926
Open files: 64
Open streams: 0

Alarm status:
Active alarms: 0
Max used alarms: 0
Next alarm time: 0

Memory status:
Non-mmapped space allocated from system: 192692224
Number of free chunks: 1057547
Number of fastbin blocks: 446
Number of mmapped regions: 75
Space in mmapped regions: -520257536
Maximum total allocated space: 0
Space available in freed fastbin blocks: 20800
Total allocated space: -2107657728
Total free space: -1994617344
Top-most, releasable space: 10526544
Estimated memory (with thread stack): -284807168
Global memory allocated by server: 1475980592
Memory allocated by threads: 2573072

Comment by Rick Pizzi [ 2018-07-12 ]

More details:

MariaDB [(none)]> select sum(memory_used) from information_schema.processlist;
+------------------+
| sum(memory_used) |
+------------------+
| 3475864          |
+------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> select @@innodb_buffer_pool_size;
+---------------------------+
| @@innodb_buffer_pool_size |
+---------------------------+
|               75161927680 |
+---------------------------+
1 row in set (0.00 sec)
 
 
# free -m
              total        used        free      shared  buff/cache   available
Mem:         122881      100946         857         166       21077       20591
Swap:             0           0           0

Comment by Axel Schwenke [ 2018-07-12 ]

reopened on request of reporter

Comment by Rick Pizzi [ 2018-07-12 ]

Thanks. I am still trying to reproduce in lab but so far haven't found a way.

Comment by Rick Pizzi [ 2018-07-12 ]

Also I think there is a bug around counters in the memory stats inside the server (mysqladmin debug command), perhaps some counters should be 8 bytes instead of 4

Comment by Rick Pizzi [ 2018-07-12 ]

I have rechecked the workload of the customer - and found that, in fact, they are leaking one temp table at every iteration of their application. This explains the memory leak!
This bug can be closed then, I will file a separate one for the memory counters

Thanks for your support and apologies for the wrong LUA code posted above.

Comment by Axel Schwenke [ 2018-07-12 ]

The leak is in the application, not MariaDB.

Generated at Thu Feb 08 08:28:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.