[MDEV-4626] Setting join_buffer_size causes join buffer not to be used Created: 2013-06-08  Updated: 2018-01-16

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.2, 5.5.31
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Sergei Petrunia Assignee: Igor Babaev
Resolution: Unresolved Votes: 4
Labels: None

Issue Links:
Relates
relates to MDEV-14960 [ERROR] mysqld got signal 11 with joi... Closed
Sprint: 10.1.30

 Description   

I was experimenting with join buffering.

I made a big table:

create table ten (a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
create table one_k (a int);
insert into one_k select A.a + 1000 *B.a from ten A, ten B;
 
create table t1 (a int);
insert into t1 select A.a*1000 + B.a  from one_k A, one_k B;
alter table t1 add b int, add c int, add d int;

And ran a query that uses join buffer:

MariaDB [test]> explain select count(*) from t1 A, t1 B where (A.b + B.a) mod 5 =0;
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                                           |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------------------------------------------+
|    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000533 |                                                 |
|    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000533 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+------+---------+------+---------+-------------------------------------------------+
2 rows in set (0.01 sec)

Then I run the corresponding SELECT. In debugger, I saw execution reach JOIN_CACHE::alloc_buffer(), where it allocated a buffer of ~130Kbytes in size. Everything is ok so far.

Then, I decided to try with a bigger join buffer. I did

MariaDB [test]> set join_buffer_size=10*1000*1000;
Query OK, 0 rows affected (0.00 sec)

and run the query again. EXPLAIN remained the same. As for SELECT, I notice that JOIN_CACHE::alloc_buffer() doesn't call my_malloc() anymore.
It reaches this line:

  if (curr_min_buff_space_sz > join_buff_space_limit ||
      (curr_buff_space_sz > join_buff_space_limit &&
       (!optimize_buff_size || 
        join->shrink_join_buffers(join_tab, curr_buff_space_sz,
                                  join_buff_space_limit))))
    goto fail;

and it takes the "if" and executes "goto fail".

Whatever it is, it must not work like this. Increasing @@join_buffer_size should not switch one from using join buffering to not using it. It is a very nasty performance surprise (especially since EXPLAIN gives one no clue that join buffer won't be used anymore).

I was using all default settings.



 Comments   
Comment by Sergei Petrunia [ 2013-06-08 ]

Relevant variables:

+-------------------------+----------------------+
| Variable_name           | Value                |
+-------------------------+----------------------+
| join_buffer_size        | 10000000             |
| join_buffer_space_limit | 2097152              |
| join_cache_level        | 2                    |
 
@@optimizer_switch:
index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

Comment by Sergei Petrunia [ 2013-06-08 ]

If I set @@join_buffer_space_limit to be 11M, then I see JOIN_CACHE::alloc_buffer() to allocate 10M.

I'm wondering what should happen when @@join_buffer_space_limit < @@join_buffer_size.

I think,

  • not using join buffer at all in this case is wrong.
  • the server should either disallow situation where holds, or maybe print warnings when it holds, etc.
Comment by Elena Stepanova [ 2014-10-13 ]

I see the following relatively acceptable solutions:

  • do not allow join_buffer_space_limit to be set smaller than join_buffer_size, and join_buffer_size to be set greater than join_buffer_space_limitOK, although probably too restrictive in case when optimizer_switch=optimize_join_buffer_size=on;
  • throw a warning and automatically adjust (set to the same value) the other variable if one is changed – rather scary if, for example, a bad value was chosen my mistake;
  • just throw a warning – doesn't really solve anything except for doing some conscience clearing;
  • do one of the above, but also take into account optimizer_switch.optimize_join_buffer_sizeseems too complicated to me.

psergey,

What of this should be done, or who should make this decision, who should actually do it, and in which version?

Comment by Chris Calender (Inactive) [ 2016-04-20 ]

We now have a customer experiencing repeated crashing when join_buffer_size = join_buffer_space_limit.

These were the related settings they had set:

join-buffer-space-limit = 4M
join-cache-level = 6
join-buffer-size = 4M

I had them make one change at a time:

1. removing "join-cache-level = 6" did not stop the crashing.
2. setting "join-buffer-size = 2M" did stop the crashing.

Using 10.1.13.

Here is a full stack trace from the error log:

160420 16:56:07 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.13-MariaDB-enterprise
key_buffer_size=67108864
read_buffer_size=131072
max_used_connections=10
max_threads=1002
thread_count=9
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2266392 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7f1760292008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f17573b4140 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f181281602b]
/usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7f1812376145]
/lib64/libpthread.so.0(+0xf790)[0x7f181197c790]
/usr/sbin/mysqld(_ZN4JOIN19shrink_join_buffersEP13st_join_tableyy+0x43)[0x7f1812233753]
/usr/sbin/mysqld(_ZN10JOIN_CACHE12alloc_bufferEv+0x1ba)[0x7f18122f005a]
/usr/sbin/mysqld(_ZN10JOIN_CACHE4initEb+0x60)[0x7f18122f0860]
/usr/sbin/mysqld(_Z33check_join_cache_usage_for_tablesP4JOINyj+0x7dc)[0x7f181224685c]
/usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x1583)[0x7f1812255763]
/usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x20)[0x7f1812257150]
/usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x9d)[0x7f18122572ad]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x25d)[0x7f181225ac4d]
/usr/sbin/mysqld(+0x4511b2)[0x7f18121fd1b2]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x5e40)[0x7f18122094c0]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x28d)[0x7f181220c9fd]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x243b)[0x7f181220f65b]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x16b)[0x7f181220fc0b]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x17f)[0x7f18122cb59f]
/usr/sbin/mysqld(handle_one_connection+0x47)[0x7f18122cb6f7]
/lib64/libpthread.so.0(+0x7a51)[0x7f1811974a51]
/lib64/libc.so.6(clone+0x6d)[0x7f180fe4993d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f1756820020): is an invalid pointer
Connection ID (thread ID): 77
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
160420 16:56:07 mysqld_safe Number of processes running now: 0
160420 16:56:07 mysqld_safe mysqld restarted

Comment by Igor Babaev [ 2017-12-13 ]

Chris,
Your problem is not related to MDEV-4626.
Please open a new mdev and move your complains there.

Comment by Igor Babaev [ 2017-12-13 ]

Chris,
Could you please provide any information for the crashing query?
Does EXPLAIN work for it or also crash?

Comment by Igor Babaev [ 2017-12-19 ]

The original reported problem can be easily resolved with this setting
set optimizer_switch='optimize_join_buffer_size=on';

Comment by Igor Babaev [ 2017-12-19 ]

I tried to reproduce Chris's problem (in 10.0 and in 10.1) on Sergey's database:

MariaDB [test]> create table ten (a int);
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [test]> insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
Query OK, 10 rows affected (0.01 sec)                                                                                  
Records: 10  Duplicates: 0  Warnings: 0
 
MariaDB [test]> create table one_k (a int);
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [test]> insert into one_k select A.a + 1000 *B.a from ten A, ten B;
Query OK, 100 rows affected (8.86 sec)
Records: 100  Duplicates: 0  Warnings: 0
 
MariaDB [test]> 
MariaDB [test]> create table t1 (a int);
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [test]> insert into t1 select A.a*1000 + B.a  from one_k A, one_k B;
Query OK, 10000 rows affected (5.52 sec)
Records: 10000  Duplicates: 0  Warnings: 0
 
MariaDB [test]> alter table t1 add b int, add c int, add d int;
Query OK, 0 rows affected (0.48 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> select count(*) from t1;
+----------+
| count(*) |
+----------+
|    10000 |
+----------+
1 row in set (0.08 sec)

I tried to do it with the following commands

set join_buffer_size=4096;
set join_buffer_space_limit=4096;
set optimizer_switch='optimize_join_buffer_size=on';
select count(*) from t1 A, t1 B, t1 C where (A.b + B.a + C.a) mod 5 =0;

and I failed.

Generated at Thu Feb 08 06:57:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.