[MDEV-29988] Major performance regression with 10.6.11 Created: 2022-11-09  Updated: 2023-05-08  Resolved: 2023-01-02

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, Prepared Statements, Stored routines
Affects Version/s: 10.3.37, 10.4.27, 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Blocker
Reporter: Nico Jehle Assignee: Sergei Golubchik
Resolution: Fixed Votes: 4
Labels: regression
Environment:

CentOS 7


Attachments: PNG File MDEV-29988.2022-11-09_11-59.png     PNG File MDEV-29988.2022-11-09_13-30.png     Text File MDEV-29988.engine_innodb_status.txt     File dns01_10.6.11.tar.gz     File dns04_10.6.10.tar.gz    
Issue Links:
Blocks
is blocked by MCOL-5361 prepare/execute : accomodate latest c... Closed
Duplicate
is duplicated by MDEV-30096 Memory leak 10.3.37 and 10.6.11 Closed
is duplicated by MDEV-30114 prepared statements spend a lot of ti... Closed
is duplicated by MDEV-30116 Executing prepared statements may get... Closed
is duplicated by MDEV-30274 Massive trigger performance drop afte... Closed
PartOf
Problem/Incident
is caused by MDEV-16128 Server crash in Item_func::print_op o... Closed
Relates
relates to MDEV-30848 Memory leak in MariaDB 10.6 Closed
relates to MDEV-30889 Memory leak issues with MariaDB 10.6.... Stalled
relates to MDEV-30355 set names effects performance signifi... Closed

 Description   

After upgrading part of our systems to 10.6.11 the MariaDB server uses much more CPU compared to 10.6.10.

We were upgrading to 10.6.11 for the fix to MDEV-29843, but started to roll back due to load issues with that version.

Server config is the same as on MDEV-29843;
~8 MariaDB instances per server, issue is worse on systems with a higher number of queries, where it's using 100% cpu, with notocable repsonsiveness of the system



 Comments   
Comment by Nico Jehle [ 2022-11-09 ]

Load increase after the upgrade:

engine innodb status from one host:
MDEV-29988.engine_innodb_status.txt

Comment by Nico Jehle [ 2022-11-09 ]

In one location, we tested with 2 hosts, both getting abour the same amount of requests.

One host was still running 10.6.10 (dns04), usual load dns04_10.6.10.tar.gz

The other host was running 10.6.11 (dns01), high load dns01_10.6.11.tar.gz

Comment by Nico Jehle [ 2022-11-09 ]

Load graph for the 2 server from the previous comment:
green: 10.6.11 (dns01)
orange: 10.6.10 (dns04)

Comment by Sergei Golubchik [ 2022-11-28 ]

With the following patch:

--- a/mysql-test/main/mysql_client_test.test
+++ b/mysql-test/main/mysql_client_test.test
@@ -26,7 +26,7 @@ call mtr.add_suppression(" IP address .* could not be resolved");
 # var/log/mysql_client_test.trace
 
 --exec echo "$MYSQL_CLIENT_TEST" > $MYSQLTEST_VARDIR/log/mysql_client_test.out.log 2>&1
---exec $MYSQL_CLIENT_TEST --getopt-ll-test=25600M >> $MYSQLTEST_VARDIR/log/mysql_client_test.out.log 2>&1
+--exec $MYSQL_CLIENT_TEST test_mdev_16128 --getopt-ll-test=25600M >> $MYSQLTEST_VARDIR/log/mysql_client_test.out.log 2>&1
 
 # End of 4.1 tests
 echo ok;
--- a/tests/mysql_client_test.c
+++ b/tests/mysql_client_test.c
@@ -21053,6 +21053,7 @@ static void test_mdev_16128()
   rc= mysql_stmt_prepare(stmt, query, strlen(query));
   check_execute(stmt, rc);
 
+for (uint ii=0; ii < 10000; ii++) {
   memset(&bind, 0, sizeof(bind));
   bind.buffer_type= MYSQL_TYPE_STRING;
   bind.buffer_length= strlen(bind_arg_1);
@@ -21109,6 +21110,7 @@ static void test_mdev_16128()
     executed in PS-mode and bound with the value 'd' returns empty result set.
    */
   DIE_UNLESS(rc == MYSQL_NO_DATA);
+}
 
   mysql_stmt_close(stmt);

one can verify in the debugger that a new Item_string is allocated on the statement arena for every mysql_stmt_execute.

This means that memory is growing with every mysql_stmt_execute call and statement cleanup time is proportional to the number of mysql_stmt_execute calls.

Comment by Elena Stepanova [ 2022-11-28 ]

In terms of regular MTR (but not for the regression suite!):

CREATE TABLE t (a VARCHAR(10)) CHARACTER SET utf8;
INSERT INTO t VALUES ('');
 
PREPARE stmt FROM "SELECT 1 FROM t WHERE a = ?";
--let $run= 100000
while ($run) {
  EXECUTE stmt USING 'x';
  --dec $run
}
 
select now(6);
DEALLOCATE PREPARE stmt;
select now(6);
 
DROP TABLE t;

10.3.36 release bintar

total duration: 3.553 sec
deallocate duration: 0.000071 sec
max mem usage ~100M

10.3.37 release bintar

total duration: 34.254 sec
deallocate duration: 0.003371 sec
max mem usage ~115M

The measurements are very rough as the test was run on a machine not tuned for performance experiments, they just give a general idea.

The difference scales a lot with the increase in the number of repetitions.

Comment by Michael Widenius [ 2022-11-29 ]

We could have a test for this in MTR where we do a similar loop as the above and then check that memory used for the user has not gone over 500K
Normally it should be < 100K and stay there.

Comment by Elena Stepanova [ 2022-11-29 ]

Not this test, at least. Somebody would need to build a test which triggers the memory consumption more quickly.
This test takes ages to rise to a reliably detectable memory overuse. On the same machine, when I raised the number of repetitions from 100K to 1M, the test timed out on 10.3.37 with the regular MTR 15-min timeout, and by that time the memory use was still below 200M.

Comment by F K [ 2022-11-30 ]

Would this bug cause the innodb_buffer to fill faster and be larger than needed?

Comment by Sergei Golubchik [ 2022-11-30 ]

No, this bug cannot cause that

Comment by Cheong Chung Onn [ 2022-12-16 ]

We upgraded our customer production database this week, and we experienced major performance regression since we upgraded from 10.3 to 10.6.11. We have to tuned our a quite a few of our queries to achieve satisfactory performance.

Comment by Miroslav Lachman [ 2022-12-28 ]

Just "we too" comment.
We see very bad behaviors of 10.6 after upgrade from 10.3, namely tested versions 10.6.10 and 10.6.11. Both leaking memory like crazy. MariaDB is configured to use no more than 9GB of memory for maximum number of possible connections but it eats all (30GB RAM + 13GB swap) on one connection with project import and then crashes - killed by the OS because it is Out Of Memory. Details mentioned in MDEV-30096

Comment by Marko Mäkelä [ 2023-01-03 ]

For the record, my attempt to merge the fixes to 10.4 would cause incorrect evaluation of many WHERE conditions. I think that complex changes have to be tested in all relevant major branches before being pushed to any main branch.

Generated at Thu Feb 08 10:12:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.