[MDEV-4240] mariadb 5.3.12 using more memory than MySQL 5.1 for an inefficient query Created: 2013-03-04  Updated: 2013-04-03  Resolved: 2013-04-03

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.3.12
Fix Version/s: 5.3.13

Type: Bug Priority: Minor
Reporter: Peter (Stig) Edwards Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: optimizer
Environment:

RedHat EL6 using mariadb-5.3.12-Linux-x86_64.tar.gz



 Description   

Hello and thank you for mariadb,

I noticed mariadb 5.3.12 using a lot more memory than MySQL 5.1.48 for a very inefficient query (that should never get near production):

CREATE TABLE `table_a` (
  `field_c` varchar(8) DEFAULT NULL,
  `field_d` varchar(11) DEFAULT NULL,
  UNIQUE KEY `field_c` (`field_c`),
  UNIQUE KEY `field_d` (`field_d`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `table_b` (
  `field_c` char(9) NOT NULL,
  `field_d` char(12) DEFAULT NULL,
  PRIMARY KEY (`field_c`),
  KEY `field_d` (`field_d`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

table_a has 4,834,708 rows,
table_b has 6,522,728 rows, I will upload a tar.gz with mysqldumps of the tables.

The query:

SELECT
  table_a.field_d
FROM
  table_a
  INNER JOIN table_b
    ON table_a.field_c = SUBSTR(table_b.field_c, 1, 8) OR table_a.field_d = SUBSTR(table_b.field_d, 1, 11);

MySQL 5.1.48 takes a while but does return 4,099,991 rows with:
VmPeak: 9891240 kB
VmHWM: 7855352 kB
Staying under a 10 Gigabyte virtual memory ulimit.

+----+-------------+---------+------+-----------------+------+---------+------+---------+----------+------------------------------------------------+
| id | select_type | table   | type | possible_keys   | key  | key_len | ref  | rows    | filtered | Extra                                          |
+----+-------------+---------+------+-----------------+------+---------+------+---------+----------+------------------------------------------------+
|  1 | SIMPLE      | table_b | ALL  | NULL            | NULL | NULL    | NULL | 6522728 |   100.00 |                                                |
|  1 | SIMPLE      | table_a | ALL  | field_c,field_d | NULL | NULL    | NULL | 4817421 |   100.00 | Range checked for each record (index map: 0x3) |
+----+-------------+---------+------+-----------------+------+---------+------+---------+----------+------------------------------------------------+
select `test`.`table_a`.`field_d` AS `field_d` from `test`.`table_a` join `test`.`table_b` where ((`test`.`table_a`.`field_c` = substr(`test`.`table_b`.`field_c`,1,8)) or (`test`.`table_a`.`field_d` = substr(`test`.`table_b`.`field_d`,1,11)))

mariadb 5.3.12 hits a 20 Gigabyte virtual memory ulimit, RSS is around 16G.

+----+-------------+---------+------+-----------------+------+---------+------+---------+----------+------------------------------------------------+
| id | select_type | table   | type | possible_keys   | key  | key_len | ref  | rows    | filtered | Extra                                          |
+----+-------------+---------+------+-----------------+------+---------+------+---------+----------+------------------------------------------------+
|  1 | SIMPLE      | table_b | ALL  | NULL            | NULL | NULL    | NULL | 6522728 |   100.00 |                                                |
|  1 | SIMPLE      | table_a | ALL  | field_c,field_d | NULL | NULL    | NULL | 4798671 |   100.00 | Range checked for each record (index map: 0x3) |
+----+-------------+---------+------+-----------------+------+---------+------+---------+----------+------------------------------------------------+
select `test`.`table_a`.`field_d` AS `field_d` from `test`.`table_a` join `test`.`table_b` where ((`test`.`table_a`.`field_c` = substr(`test`.`table_b`.`field_c`,1,8)) or (`test`.`table_a`.`field_d` = substr(`test`.`table_b`.`field_d`,1,11)))

I was using the following when reproducing:

[mysqld]
key_buffer = 500M
innodb_buffer_pool_size = 2000M

Thank you.



 Comments   
Comment by Peter (Stig) Edwards [ 2013-03-04 ]

I sent MDEV4240_resend.tar.gz to ftp.askmonty.org/private, it should be 68679677 bytes (MDEV4240.tar.gz transfer was interrupted).

Comment by Peter (Stig) Edwards [ 2013-03-04 ]

I just tried 5.3 revision 3632 and see the same behaviour as 5.3.12.

Comment by Peter (Stig) Edwards [ 2013-03-04 ]

I just tried mysql-5.1.68-linux-x86_64-glibc23 and see the same behaviour as MySQL 5.1.48 (data returned staying under 10 Gigabyte virtual memory limit).

Comment by Elena Stepanova [ 2013-03-05 ]

>> MySQL 5.1.48 takes a while

"a while" is one way to put it, it took almost 6 hours in my VM, but it did work at the end, even with 6GB memory + 2 GB swap:
4099991 rows in set (5 hours 46 min 41.22 sec)

The problem with 5.3 hitting out-of-memory was also reproducible.

Peter,

Did you happen to try to scale down the test case a bit, so it would still show the extra memory usage, but would take less time?
I don't want to try in vain if you already did and didn't get any luck.

Comment by Peter (Stig) Edwards [ 2013-03-06 ]

I didn't try and scale down the number of rows past the test data I sent, it took a couple of minutes on the physical host I was testing on. Tomorrow I will post the other my.cnf [mysqld] settings that may have helped with my shorter runtime.

Comment by Peter (Stig) Edwards [ 2013-03-06 ]

Sorry for the delay, these are some of the settings that might be helping with my runtime (the host is a Dell PowerEdge R620 with 64G of RAM, 32x Xeon @ 2.7GHz, RAID-1 on 2.5" 10K disks with 64MB buffer, under very low load).

[mysqld]
key_buffer = 500M
sort_buffer_size = 64M
read_buffer_size = 256K
innodb_buffer_pool_size = 2000M

With MySQL 5.1.68

> pager cat > /dev/null
PAGER set to 'cat > /dev/null'

> SELECT table_a.field_d FROM table_a INNER JOIN table_b ON table_a.field_c = SUBSTR(table_b.field_c, 1, 8) OR table_a.field_d = SUBSTR(table_b.field_d, 1, 11);
4099991 rows in set (1 min 43.29 sec)

Comment by Elena Stepanova [ 2013-03-08 ]

It's still very slow on my machine, even with the provided parameters, apparently the reason is different.
Anyway, I've reduced the amount of data to below 1M rows in each table; the difference in memory usage is still obvious, but now both versions have enough memory to finish the query. Here are my results (I assume on your machine the execution time will be almost instantaneous), but memory usage should probably be somewhat similar?

MariaDB [test]> select count from table_a;
----------

count

----------

448568

----------
1 row in set (0.62 sec)

MariaDB [test]> select count from table_b;
----------

count

----------

767757

----------
1 row in set (0.00 sec)

> SELECT table_a.field_d FROM table_a INNER JOIN table_b ON table_a.field_c = SUBSTR(table_b.field_c, 1, 8) OR table_a.field_d = SUBSTR(table_b.field_d, 1, 11);

...

5.1
350328 rows in set (14 min 43.48 sec)
...
(end of execution – peak and going back to idle)
3423 elenst 20 0 5320m 1.7g 7232 S 100 17.9 15:33.10 mysqld
3423 elenst 20 0 4616m 1.0g 7236 S 60 10.6 15:34.91 mysqld
3423 elenst 20 0 4616m 1.0g 7236 S 0 10.6 15:34.92 mysqld

5.3
350328 rows in set (1 min 14.52 sec)

...
(end of execution – peak and going back to idle)
3543 elenst 20 0 6733m 3.0g 7588 S 100 31.1 2:02.10 mysqld
3543 elenst 20 0 4094m 479m 7596 S 34 4.8 2:03.13 mysqld
3543 elenst 20 0 4094m 479m 7596 S 0 4.8 2:03.14 mysqld

I'm not totally sure that the increased memory usage as such is a bug, it seems fair that a much faster execution time comes at the cost of more resources. What is worrisome however is that I haven't found a way to revert 5.3 to 5.1 behavior – even if I set optimizer_switch to the value of 5.1, the memory consumption is still the same:

SET optimizer_switch = REPLACE( @@optimizer_switch, '=on', '=off' );
set optimizer_switch = 'index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,table_elimination=on';

350328 rows in set (1 min 15.42 sec)

4277 elenst 20 0 6749m 3.0g 7056 S 100 31.1 1:16.09 mysqld
4277 elenst 20 0 4094m 445m 7068 S 22 4.4 1:16.74 mysqld
4277 elenst 20 0 4094m 445m 7068 S 0 4.4 1:16.74 mysqld

The dumps with smaller data are uploaded at ftp, side by side with the original ones, file name MDEV4240_shorter.tar.gz.

I'll pass it to Sergey to decide whether there is a misbehavior here somewhere.

Comment by Peter (Stig) Edwards [ 2013-03-20 ]

Found a host with some more free RAM to test a 30G virtual memory ulimit.
mariadb 5.3.12 was able to run the original query:
VmPeak: 28149180 kB
VmHWM: 24454632 kB
4099991 rows in set (2 min 30.08 sec)

Comment by Sergei Petrunia [ 2013-03-29 ]

There is no way how this query could meaningfully use so much memory. Memory is just wasted somewhere.

The query uses "range checked for each record". This optimization does the following:

for each record of table_b {
check if we could use range access for table a;
if (yes)

{ create and use quick select }

}

Apparently, each loop iteration gets some memory to be allocated on query's MEM_ROOT. Let's catch who does it:

Breakpoint 1, alloc_root (mem_root=0xa337478, length=1816) at my_alloc.c:182
$34966 = 60
(gdb) p mem_root->block_num
$34968 = 60

    1. ^^ this is a "fat" mem_root, of 60 blocks (typical usage is < 10 blocks).

The culprit is:

#0 alloc_root (mem_root=0xa337478, length=1816) at my_alloc.c:182
#1 0x000000000056560c in Sql_alloc::operator new (size=1816, mem_root=0xa337478) at sql_list.h:40
#2 0x0000000000a3a212 in innobase_create_handler (hton=0x1a645b8, table=0x7fff1000cf80, mem_root=0xa337478) at handler/ha_innodb.cc:387
#3 0x00000000007f77aa in get_new_handler (share=0x7fff1000cf80, alloc=0xa337478, db_type=0x1a645b8) at handler.cc:248
#4 0x00000000007fb25e in handler::clone (this=0x7fff1000f450, name=0x7fff1000d388 "./j60/table_a", mem_root=0xa337478) at handler.cc:2111
#5 0x0000000000a3f815 in ha_innobase::clone (this=0x7fff1000f450, name=0x7fff1000d388 "./j60/table_a", mem_root=0xa337478) at handler/ha_innodb.cc:4371
#6 0x00000000007c4b60 in QUICK_RANGE_SELECT::init_ror_merged_scan (this=0x7fff10042500, reuse_handler=false) at opt_range.cc:2000
#7 0x00000000007c579f in QUICK_ROR_UNION_SELECT::reset (this=0x7fff10042410) at opt_range.cc:2271
#8 0x0000000000727d2c in join_init_read_record (tab=0x7fff1001ed70) at sql_select.cc:16717
#9 0x0000000000727ba7 in join_init_quick_read_record (tab=0x7fff1001ed70) at sql_select.cc:16686
#10 0x0000000000725d2a in sub_select (join=0x7fff1001d470, join_tab=0x7fff1001ed70, end_of_records=false) at sql_select.cc:15875
#11 0x000000000072654b in evaluate_join_record (join=0x7fff1001d470, join_tab=0x7fff1001ea50, error=0) at sql_select.cc:16076
#12 0x0000000000725f16 in sub_select (join=0x7fff1001d470, join_tab=0x7fff1001ea50, end_of_records=false) at sql_select.cc:15918
#13 0x0000000000725557 in do_select (join=0x7fff1001d470, fields=0xa336a60, table=0x0, procedure=0x0) at sql_select.cc:15539
#14 0x000000000070613f in JOIN::exec (this=0x7fff1001d470) at sql_select.cc:2769
#15 0x00000000007069c9 in mysql_select (thd=0xa333eb8, rref_pointer_array=0xa336ba8, tables=0x7fff10009808, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0x7fff1000b288, unit=0xa336448, select_lex=0xa336950) at sql_select.cc:2990
#16 0x00000000006fd595 in handle_select (thd=0xa333eb8, lex=0xa3363a8, result=0x7fff1000b288, setup_tables_done_option=0) at sql_select.cc:288

Comment by Sergei Petrunia [ 2013-03-29 ]

After I make the above allocation to be done on quick select's mem_root and run
the query:

(while true; do sleep 2; ps hv --pid=29160 ; done )

PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND
29160 pts/9 tl+ 4:07 0 11906 3858337 555472 3.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
29160 pts/9 Sl+ 4:20 0 11906 3858337 555472 3.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
...
29160 pts/9 tl+ 5:19 0 11906 3858337 555472 3.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
29160 pts/9 tl+ 5:20 0 11906 3858337 555472 3.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
29160 pts/9 tl+ 5:20 0 11906 3858337 555472 3.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
...
29160 pts/9 Sl+ 13:32 0 11906 3924161 1211444 7.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
29160 pts/9 Sl+ 13:32 0 11906 3924161 1211444 7.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
...
29160 pts/9 Sl+ 13:32 0 11906 3924161 1211444 7.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf
29160 pts/9 Sl+ 13:32 0 11906 3924161 1211444 7.3 /data0/psergey/dev2/5.3/sql/mysqld --defaults-file=~/my1.cnf

The query finishes in 8 min 6.95 sec, on debug build.

RSS grows slowly as the query runs, reaching the top of about 1.2G. I am not sure why it is growing, but at least the patch cuts the growth rate.

Comment by Sergei Petrunia [ 2013-03-29 ]

Not directly related to the problem, but I was wondering whether the provided query plan is an efficient one.

Increasing join_buffer_size will make mysql do a cross-join in memory:

set join_buffer_size=120*1000*1000;
SELECT table_a.field_d FROM table_a INNER JOIN table_b ON table_a.field_c = SUBSTR(table_b.field_c, 1, 8) OR table_a.field_d = SUBSTR(table_b.field_d, 1, 11);

MariaDB [j60]> explain SELECT table_a.field_d FROM table_a INNER JOIN table_b ON table_a.field_c = SUBSTR(table_b.field_c, 1, 8) OR table_a.field_d = SUBSTR(table_b.field_d, 1, 11);
--------------------------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

--------------------------------------------------------------------------------------------------------------+

1 SIMPLE table_a ALL field_c,field_d NULL NULL NULL 4815162  
1 SIMPLE table_b ALL NULL NULL NULL NULL 6522728 Using where; Using join buffer (flat, BNL join)

--------------------------------------------------------------------------------------------------------------+

  1. Didn't finish in 900 seconds.

I re-wrote it as UNION, hoping two ref accesses will be faster. nope:

MariaDB [j60]> explain SELECT table_a.field_d FROM table_a INNER JOIN table_b ON table_a.field_c = SUBSTR(table_b.field_c, 1, 8) union SELECT table_a.field_d FROM table_a INNER JOIN table_b ON table_a.field_d = SUBSTR(table_b.field_d, 1, 11);
---------------------------------------------------------------------------------------------+

id select_type table type possible_keys key key_len ref rows Extra

---------------------------------------------------------------------------------------------+

1 PRIMARY table_b index NULL PRIMARY 9 NULL 6522728 Using index
1 PRIMARY table_a ref field_c field_c 11 func 1 Using index condition
2 UNION table_b index NULL field_d 13 NULL 6522728 Using index
2 UNION table_a ref field_d field_d 14 func 1 Using where; Using index
NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL  

---------------------------------------------------------------------------------------------+
4083525 rows in set (21 min 43.40 sec)

Comment by Sergei Petrunia [ 2013-03-29 ]

All of the above execution times are from a debug build. Release one should be faster.

Comment by Sergei Petrunia [ 2013-03-29 ]

The patch:

=== modified file 'sql/opt_range.cc'
— sql/opt_range.cc 2013-02-28 20:48:47 +0000
+++ sql/opt_range.cc 2013-03-29 11:05:57 +0000
@@ -2266,11 +2266,21 @@ int QUICK_ROR_UNION_SELECT::reset()
if (!scans_inited)
{
List_iterator_fast<QUICK_SELECT_I> it(quick_selects);
+
+ /* Provide a MEM-root to children. */
+ MEM_ROOT *save_mem_root= thd->mem_root;
+ thd->mem_root= &alloc;
+
while ((quick= it++))
{
if (quick->init_ror_merged_scan(FALSE))
+

{ + thd->mem_root= save_mem_root; DBUG_RETURN(1); + }

}
+ thd->mem_root= save_mem_root;
+
scans_inited= TRUE;
}
queue_remove_all(&queue);

This patch needs to be improved - there are other kinds of quick selects that may call handler::clone(). They all should make sure that thd->mem_root is pointing to quick select's mem_root. If thd->mem_root is a statement-execution-mem_root, then that mem_root will accumulate a lot of data.

Comment by Sergei Petrunia [ 2013-04-01 ]

Committed another variant of the patch.

Comment by Sergei Petrunia [ 2013-04-02 ]

Sanja, please reassign back when review is done.

Comment by Oleksandr Byelkin [ 2013-04-03 ]

OK to push

Comment by Sergei Petrunia [ 2013-04-03 ]

Fix pushed into 5.3

Generated at Thu Feb 08 06:54:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.