[MDEV-26712] row events never reset thd->mem_root Created: 2021-09-28  Updated: 2021-10-27  Resolved: 2021-10-13

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.2.41, 10.3.32, 10.4.22, 10.5.13, 10.6.5

Type: Bug Priority: Blocker
Reporter: Sergei Golubchik Assignee: Andrei Elkin
Resolution: Fixed Votes: 2
Labels: Memory_leak, not-10.7

Issue Links:
Blocks
Relates
relates to MDEV-20455 memory leak in 10.4 series Closed

 Description   

row event execution starts from Rows_log_event::do_apply_event which calls open_and_lock_tables() which calls lock_tables(), which does

if (!(ptr=start=(TABLE**) thd->alloc(sizeof(TABLE*)*count)))

thus allocating an array of points of the thd's memroot. This is just few bytes, 8 in the test below. But this memory is never freed, not by row events at least. So, by creating a transaction with a huge number of row events we can consume any amount of memory.

source include/have_innodb.inc;
source include/have_binlog_format_row.inc;
source include/master-slave.inc;
create table t1 (a int) engine=innodb;
insert t1 values (1);
 
sync_slave_with_master;
stop slave;
set global max_session_mem_used=10*1024*1024;
start slave;
 
connection master;
let $a=1000000;
start transaction;
disable_query_log;
while ($a) {
  update t1 set a=a+1;
  dec $a;
}
enable_query_log;
commit;
drop table t1;
sync_slave_with_master;
set global max_session_mem_used=default;
source include/rpl_end.inc;

To fix that, rbr events must do

free_root(thd->mem_root, MYF(MY_KEEP_PREALLOC));

somewhere. Where it's safe, because it'll destroy anything allocated on the thd memroot.
This patch appears to be working fine, it frees thd memroot after closing all opened tables:

--- a/sql/rpl_rli.cc
+++ b/sql/rpl_rli.cc
@@ -2326,6 +2326,7 @@ void rpl_group_info::cleanup_context(THD *thd, bool error)
   */
   reset_row_stmt_start_timestamp();
   unset_long_find_row_note_printed();
+  free_root(thd->mem_root, MYF(MY_KEEP_PREALLOC));
 
   DBUG_EXECUTE_IF("inject_sleep_gtid_100_x_x", {
       if (current_gtid.domain_id == 100)



 Comments   
Comment by Sergei Golubchik [ 2021-09-28 ]
  • Note, when running the test you might see it apparently stuck after rpl_end line. This is mysqltest noticing the replication failure is trying to consume the output of SHOW BINLOG EVENTS. I used to press ^C at that point.
  • This is obviously not a test case to be pushed into mysql-test/. Just sayin'
  • If you're curious, you can run from another terminal mysql --socket=mysql-test/var/tmp/mysqld.2.sock -uroot and then select id,command,state,memory_used from information_schema.processlist; to watch it grow (or not, if you've applied the fix}}.
Comment by Andrei Elkin [ 2021-09-29 ]

To complete the description, the THD::mem_root of the slave applier gets finally released after commit. More precisely at the end of the following Gtid_log_event.

Comment by Roel Van de Paar [ 2021-09-29 ]

Confirmed. Memory starts increasing on the slave after the commit on the master:

10.5.13 c430aa72abbdccb1ece7f0d65b49a6b48e7c5ba7 (Debug)

MariaDB [(none)]> select id,command,state,memory_used from information_schema.processlist;
+----+-----------+------------------------------------------------------+-------------+
| id | command   | state                                                | memory_used |
+----+-----------+------------------------------------------------------+-------------+
| 13 | Query     | Filling schema table                                 |      132368 |
| 12 | Slave_SQL | Reading event from the relay log                     |     4390792 |
| 11 | Slave_IO  | Queueing master event to the relay log               |       46392 |
| 10 | Sleep     |                                                      |       74920 |
|  9 | Query     | Waiting for the slave SQL thread to advance position |       74920 |
|  6 | Sleep     |                                                      |       74920 |
|  5 | Sleep     |                                                      |       73848 |
+----+-----------+------------------------------------------------------+-------------+
7 rows in set (0.002 sec)
 
MariaDB [(none)]> select id,command,state,memory_used from information_schema.processlist;
+----+-----------+------------------------------------------------------+-------------+
| id | command   | state                                                | memory_used |
+----+-----------+------------------------------------------------------+-------------+
| 13 | Query     | Filling schema table                                 |      132368 |
| 12 | Slave_SQL | Reading event from the relay log                     |     4783096 |
| 11 | Slave_IO  | Queueing master event to the relay log               |       46392 |
| 10 | Sleep     |                                                      |       74920 |
|  9 | Query     | Waiting for the slave SQL thread to advance position |       74920 |
|  6 | Sleep     |                                                      |       74920 |
|  5 | Sleep     |                                                      |       73848 |
+----+-----------+------------------------------------------------------+-------------+
7 rows in set (0.002 sec)

And slave error log mysql-test/var/log/mysqld.2.err shows:

10.5.13 c430aa72abbdccb1ece7f0d65b49a6b48e7c5ba7 (Debug)

2021-09-29 20:23:24 12 [ERROR] Slave SQL: Error executing row event: 'The MariaDB server is running with the --max-thread-mem-used=10485760 option so it cannot execute this statement', Gtid 0-1-3, Internal MariaDB error code: 1290

Comment by Roel Van de Paar [ 2021-09-29 ]

Bug also present in 10.4

10.4.22 03a10706ecfa5cf0a6252d436e341483f5c0827c (Debug)

| id | command   | state                                                | memory_used |
...
| 17 | Slave_SQL | Closing tables                                       |     7183648 |
...
| 17 | Slave_SQL | Reading event from the relay log                     |     8180632 |

Comment by Roel Van de Paar [ 2021-09-29 ]

Bug now confirmed in 10.2 to 10.6. (To execute on 10.2 use ./mtr --suite=main test after placing test into mysql-test/t/test.test).

10.2.41 (Debug)

| 17 | Slave_SQL | after apply log event                                |     4205160 |
...
| 17 | Slave_SQL | Reading event from the relay log                     |     9239112 |

10.3.32 b112c9dfaacbcb7c3548414c6f402114663223dc (Debug)

| 17 | Slave_SQL | Reading event from the relay log                     |     4386400 |
...
| 17 | Slave_SQL | Reading event from the relay log                     |     6249616 |

10.4-10.5: ref comments above.

10.6.5 1e9c922fa726b22f4522f2a4de0fcb6595404086 (Debug)

| 12 | Slave_SQL | Reading event from the relay log                     |     2724176 |
...
| 12 | Slave_SQL | Executing                                            |     7186640 |

10.7 However is not affected (tested twice):

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

MariaDB [(none)]> select id,command,state,memory_used from information_schema.processlist;
+----+-----------+------------------------------------------------------+-------------+
| id | command   | state                                                | memory_used |
+----+-----------+------------------------------------------------------+-------------+
| 13 | Query     | Filling schema table                                 |       88304 |
| 12 | Slave_SQL | Executing                                            |       45768 |
| 11 | Slave_IO  | Queueing master event to the relay log               |       45768 |
| 10 | Sleep     |                                                      |       45768 |
|  9 | Query     | Waiting for the slave SQL thread to advance position |       45768 |
|  6 | Sleep     |                                                      |       45768 |
|  5 | Sleep     |                                                      |       45768 |
+----+-----------+------------------------------------------------------+-------------+
7 rows in set (0.002 sec)
 
MariaDB [(none)]> select id,command,state,memory_used from information_schema.processlist;
+----+-----------+------------------------------------------------------+-------------+
| id | command   | state                                                | memory_used |
+----+-----------+------------------------------------------------------+-------------+
| 13 | Query     | Filling schema table                                 |       88304 |
| 12 | Slave_SQL | Reading event from the relay log                     |       45768 |
| 11 | Slave_IO  | Queueing master event to the relay log               |       45768 |
| 10 | Sleep     |                                                      |       45768 |
|  9 | Query     | Waiting for the slave SQL thread to advance position |       45768 |
|  6 | Sleep     |                                                      |       45768 |
|  5 | Sleep     |                                                      |       45768 |
+----+-----------+------------------------------------------------------+-------------+
7 rows in set (0.003 sec)

Additionally, the 10.7 test succeeds

10.7.0 d552e092c9f3e20da078d1b62b976f629f73d3a4 (Debug)

...
commit;
drop table t1;
connection slave;
set global max_session_mem_used=default;
include/rpl_end.inc
main.test 'innodb,row'                   [ pass ]  396922
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 396.922 of 401 seconds executing testcases
 
Completed: All 1 tests were successful.

Comment by Andrei Elkin [ 2021-09-29 ]

Sergei, as you started on it anyway, could you please review as well?
The patch ddf10294831 is in bb-10.2-andrei (will have merge conflicts with 10.4 and 10.5,
resolved yet locally). Thanks.

Comment by Sergei Golubchik [ 2021-10-01 ]

ddf10294831 looks ok to me

Comment by Andrei Elkin [ 2021-10-13 ]

For merging 10.2->10.3 and 10.4->10.5 conflicts are resolved in

bb-10.3-MDEV-26712-andrei,
bb-10.5-MDEV-26712-andrei.

Generated at Thu Feb 08 09:47:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.