Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-4626

Setting join_buffer_size causes join buffer not to be used

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.0.2, 5.5.31
    • 10.1(EOL)
    • None
    • None
    • 10.1.30

    Description

      I was experimenting with join buffering.

      I made a big table:

      create table ten (a int);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table one_k (a int);
      insert into one_k select A.a + 1000 *B.a from ten A, ten B;
       
      create table t1 (a int);
      insert into t1 select A.a*1000 + B.a  from one_k A, one_k B;
      alter table t1 add b int, add c int, add d int;
      

      And ran a query that uses join buffer:

      MariaDB [test]> explain select count(*) from t1 A, t1 B where (A.b + B.a) mod 5 =0;
      +------+-------------+-------+------+---------------+------+---------+------+---------+-------------------------------------------------+
      | id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows    | Extra                                           |
      +------+-------------+-------+------+---------------+------+---------+------+---------+-------------------------------------------------+
      |    1 | SIMPLE      | A     | ALL  | NULL          | NULL | NULL    | NULL | 1000533 |                                                 |
      |    1 | SIMPLE      | B     | ALL  | NULL          | NULL | NULL    | NULL | 1000533 | Using where; Using join buffer (flat, BNL join) |
      +------+-------------+-------+------+---------------+------+---------+------+---------+-------------------------------------------------+
      2 rows in set (0.01 sec)
      

      Then I run the corresponding SELECT. In debugger, I saw execution reach JOIN_CACHE::alloc_buffer(), where it allocated a buffer of ~130Kbytes in size. Everything is ok so far.

      Then, I decided to try with a bigger join buffer. I did

      MariaDB [test]> set join_buffer_size=10*1000*1000;
      Query OK, 0 rows affected (0.00 sec)
      

      and run the query again. EXPLAIN remained the same. As for SELECT, I notice that JOIN_CACHE::alloc_buffer() doesn't call my_malloc() anymore.
      It reaches this line:

        if (curr_min_buff_space_sz > join_buff_space_limit ||
            (curr_buff_space_sz > join_buff_space_limit &&
             (!optimize_buff_size || 
              join->shrink_join_buffers(join_tab, curr_buff_space_sz,
                                        join_buff_space_limit))))
          goto fail;
      

      and it takes the "if" and executes "goto fail".

      Whatever it is, it must not work like this. Increasing @@join_buffer_size should not switch one from using join buffering to not using it. It is a very nasty performance surprise (especially since EXPLAIN gives one no clue that join buffer won't be used anymore).

      I was using all default settings.

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia added a comment - - edited

            Relevant variables:

            +-------------------------+----------------------+
            | Variable_name           | Value                |
            +-------------------------+----------------------+
            | join_buffer_size        | 10000000             |
            | join_buffer_space_limit | 2097152              |
            | join_cache_level        | 2                    |
             
            @@optimizer_switch:
            index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off

            psergei Sergei Petrunia added a comment - - edited Relevant variables: +-------------------------+----------------------+ | Variable_name | Value | +-------------------------+----------------------+ | join_buffer_size | 10000000 | | join_buffer_space_limit | 2097152 | | join_cache_level | 2 |   @@optimizer_switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
            psergei Sergei Petrunia added a comment - - edited

            If I set @@join_buffer_space_limit to be 11M, then I see JOIN_CACHE::alloc_buffer() to allocate 10M.

            I'm wondering what should happen when @@join_buffer_space_limit < @@join_buffer_size.

            I think,

            • not using join buffer at all in this case is wrong.
            • the server should either disallow situation where holds, or maybe print warnings when it holds, etc.
            psergei Sergei Petrunia added a comment - - edited If I set @@join_buffer_space_limit to be 11M, then I see JOIN_CACHE::alloc_buffer() to allocate 10M. I'm wondering what should happen when @@join_buffer_space_limit < @@join_buffer_size. I think, not using join buffer at all in this case is wrong. the server should either disallow situation where holds, or maybe print warnings when it holds, etc.

            I see the following relatively acceptable solutions:

            • do not allow join_buffer_space_limit to be set smaller than join_buffer_size, and join_buffer_size to be set greater than join_buffer_space_limit – OK, although probably too restrictive in case when optimizer_switch=optimize_join_buffer_size=on;
            • throw a warning and automatically adjust (set to the same value) the other variable if one is changed – rather scary if, for example, a bad value was chosen my mistake;
            • just throw a warning – doesn't really solve anything except for doing some conscience clearing;
            • do one of the above, but also take into account optimizer_switch.optimize_join_buffer_size – seems too complicated to me.

            psergey,

            What of this should be done, or who should make this decision, who should actually do it, and in which version?

            elenst Elena Stepanova added a comment - I see the following relatively acceptable solutions: do not allow join_buffer_space_limit to be set smaller than join_buffer_size , and join_buffer_size to be set greater than join_buffer_space_limit – OK, although probably too restrictive in case when optimizer_switch=optimize_join_buffer_size=on ; throw a warning and automatically adjust (set to the same value) the other variable if one is changed – rather scary if, for example, a bad value was chosen my mistake ; just throw a warning – doesn't really solve anything except for doing some conscience clearing ; do one of the above, but also take into account optimizer_switch.optimize_join_buffer_size – seems too complicated to me . psergey , What of this should be done, or who should make this decision, who should actually do it, and in which version?
            ccalender Chris Calender (Inactive) added a comment - - edited

            We now have a customer experiencing repeated crashing when join_buffer_size = join_buffer_space_limit.

            These were the related settings they had set:

            join-buffer-space-limit = 4M
            join-cache-level = 6
            join-buffer-size = 4M

            I had them make one change at a time:

            1. removing "join-cache-level = 6" did not stop the crashing.
            2. setting "join-buffer-size = 2M" did stop the crashing.

            Using 10.1.13.

            Here is a full stack trace from the error log:

            160420 16:56:07 [ERROR] mysqld got signal 11 ;
            This could be because you hit a bug. It is also possible that this binary
            or one of the libraries it was linked against is corrupt, improperly built,
            or misconfigured. This error can also be caused by malfunctioning hardware.
             
            To report this bug, see https://mariadb.com/kb/en/reporting-bugs
             
            We will try our best to scrape up some info that will hopefully help
            diagnose the problem, but since we have already crashed,
            something is definitely wrong and this may fail.
             
            Server version: 10.1.13-MariaDB-enterprise
            key_buffer_size=67108864
            read_buffer_size=131072
            max_used_connections=10
            max_threads=1002
            thread_count=9
            It is possible that mysqld could use up to
            key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2266392 K bytes of memory
            Hope that's ok; if not, decrease some variables in the equation.
             
            Thread pointer: 0x0x7f1760292008
            Attempting backtrace. You can use the following information to find out
            where mysqld died. If you see no messages after this, something went
            terribly wrong...
            stack_bottom = 0x7f17573b4140 thread_stack 0x48400
            /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f181281602b]
            /usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7f1812376145]
            /lib64/libpthread.so.0(+0xf790)[0x7f181197c790]
            /usr/sbin/mysqld(_ZN4JOIN19shrink_join_buffersEP13st_join_tableyy+0x43)[0x7f1812233753]
            /usr/sbin/mysqld(_ZN10JOIN_CACHE12alloc_bufferEv+0x1ba)[0x7f18122f005a]
            /usr/sbin/mysqld(_ZN10JOIN_CACHE4initEb+0x60)[0x7f18122f0860]
            /usr/sbin/mysqld(_Z33check_join_cache_usage_for_tablesP4JOINyj+0x7dc)[0x7f181224685c]
            /usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x1583)[0x7f1812255763]
            /usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x20)[0x7f1812257150]
            /usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x9d)[0x7f18122572ad]
            /usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x25d)[0x7f181225ac4d]
            /usr/sbin/mysqld(+0x4511b2)[0x7f18121fd1b2]
            /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x5e40)[0x7f18122094c0]
            /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x28d)[0x7f181220c9fd]
            /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x243b)[0x7f181220f65b]
            /usr/sbin/mysqld(_Z10do_commandP3THD+0x16b)[0x7f181220fc0b]
            /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x17f)[0x7f18122cb59f]
            /usr/sbin/mysqld(handle_one_connection+0x47)[0x7f18122cb6f7]
            /lib64/libpthread.so.0(+0x7a51)[0x7f1811974a51]
            /lib64/libc.so.6(clone+0x6d)[0x7f180fe4993d]
             
            Trying to get some variables.
            Some pointers may be invalid and cause the dump to abort.
            Query (0x7f1756820020): is an invalid pointer
            Connection ID (thread ID): 77
            Status: NOT_KILLED
             
            Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on
             
            The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
            information that should help you find out what is causing the crash.
            160420 16:56:07 mysqld_safe Number of processes running now: 0
            160420 16:56:07 mysqld_safe mysqld restarted
            

            ccalender Chris Calender (Inactive) added a comment - - edited We now have a customer experiencing repeated crashing when join_buffer_size = join_buffer_space_limit. These were the related settings they had set: join-buffer-space-limit = 4M join-cache-level = 6 join-buffer-size = 4M I had them make one change at a time: 1. removing "join-cache-level = 6" did not stop the crashing. 2. setting "join-buffer-size = 2M" did stop the crashing. Using 10.1.13. Here is a full stack trace from the error log: 160420 16:56:07 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.   To report this bug, see https://mariadb.com/kb/en/reporting-bugs   We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.   Server version: 10.1.13-MariaDB-enterprise key_buffer_size=67108864 read_buffer_size=131072 max_used_connections=10 max_threads=1002 thread_count=9 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2266392 K bytes of memory Hope that's ok; if not, decrease some variables in the equation.   Thread pointer: 0x0x7f1760292008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f17573b4140 thread_stack 0x48400 /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7f181281602b] /usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7f1812376145] /lib64/libpthread.so.0(+0xf790)[0x7f181197c790] /usr/sbin/mysqld(_ZN4JOIN19shrink_join_buffersEP13st_join_tableyy+0x43)[0x7f1812233753] /usr/sbin/mysqld(_ZN10JOIN_CACHE12alloc_bufferEv+0x1ba)[0x7f18122f005a] /usr/sbin/mysqld(_ZN10JOIN_CACHE4initEb+0x60)[0x7f18122f0860] /usr/sbin/mysqld(_Z33check_join_cache_usage_for_tablesP4JOINyj+0x7dc)[0x7f181224685c] /usr/sbin/mysqld(_ZN4JOIN14optimize_innerEv+0x1583)[0x7f1812255763] /usr/sbin/mysqld(_ZN4JOIN8optimizeEv+0x20)[0x7f1812257150] /usr/sbin/mysqld(_Z12mysql_selectP3THDPPP4ItemP10TABLE_LISTjR4ListIS1_ES2_jP8st_orderSB_S2_SB_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x9d)[0x7f18122572ad] /usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x25d)[0x7f181225ac4d] /usr/sbin/mysqld(+0x4511b2)[0x7f18121fd1b2] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x5e40)[0x7f18122094c0] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x28d)[0x7f181220c9fd] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x243b)[0x7f181220f65b] /usr/sbin/mysqld(_Z10do_commandP3THD+0x16b)[0x7f181220fc0b] /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x17f)[0x7f18122cb59f] /usr/sbin/mysqld(handle_one_connection+0x47)[0x7f18122cb6f7] /lib64/libpthread.so.0(+0x7a51)[0x7f1811974a51] /lib64/libc.so.6(clone+0x6d)[0x7f180fe4993d]   Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f1756820020): is an invalid pointer Connection ID (thread ID): 77 Status: NOT_KILLED   Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=on,mrr_cost_based=on,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on   The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. 160420 16:56:07 mysqld_safe Number of processes running now: 0 160420 16:56:07 mysqld_safe mysqld restarted
            igor Igor Babaev (Inactive) added a comment - - edited

            Chris,
            Your problem is not related to MDEV-4626.
            Please open a new mdev and move your complains there.

            igor Igor Babaev (Inactive) added a comment - - edited Chris, Your problem is not related to MDEV-4626 . Please open a new mdev and move your complains there.
            igor Igor Babaev (Inactive) added a comment - - edited

            Chris,
            Could you please provide any information for the crashing query?
            Does EXPLAIN work for it or also crash?

            igor Igor Babaev (Inactive) added a comment - - edited Chris, Could you please provide any information for the crashing query? Does EXPLAIN work for it or also crash?

            The original reported problem can be easily resolved with this setting
            set optimizer_switch='optimize_join_buffer_size=on';

            igor Igor Babaev (Inactive) added a comment - The original reported problem can be easily resolved with this setting set optimizer_switch='optimize_join_buffer_size=on';
            igor Igor Babaev (Inactive) added a comment - - edited

            I tried to reproduce Chris's problem (in 10.0 and in 10.1) on Sergey's database:

            MariaDB [test]> create table ten (a int);
            Query OK, 0 rows affected (0.02 sec)
             
            MariaDB [test]> insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
            Query OK, 10 rows affected (0.01 sec)                                                                                  
            Records: 10  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> create table one_k (a int);
            Query OK, 0 rows affected (0.02 sec)
             
            MariaDB [test]> insert into one_k select A.a + 1000 *B.a from ten A, ten B;
            Query OK, 100 rows affected (8.86 sec)
            Records: 100  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> 
            MariaDB [test]> create table t1 (a int);
            Query OK, 0 rows affected (0.02 sec)
             
            MariaDB [test]> insert into t1 select A.a*1000 + B.a  from one_k A, one_k B;
            Query OK, 10000 rows affected (5.52 sec)
            Records: 10000  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> alter table t1 add b int, add c int, add d int;
            Query OK, 0 rows affected (0.48 sec)
            Records: 0  Duplicates: 0  Warnings: 0
             
            MariaDB [test]> select count(*) from t1;
            +----------+
            | count(*) |
            +----------+
            |    10000 |
            +----------+
            1 row in set (0.08 sec)
            

            I tried to do it with the following commands

            set join_buffer_size=4096;
            set join_buffer_space_limit=4096;
            set optimizer_switch='optimize_join_buffer_size=on';
            select count(*) from t1 A, t1 B, t1 C where (A.b + B.a + C.a) mod 5 =0;
            

            and I failed.

            igor Igor Babaev (Inactive) added a comment - - edited I tried to reproduce Chris's problem (in 10.0 and in 10.1) on Sergey's database: MariaDB [test]> create table ten (a int); Query OK, 0 rows affected (0.02 sec)   MariaDB [test]> insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9); Query OK, 10 rows affected (0.01 sec) Records: 10 Duplicates: 0 Warnings: 0   MariaDB [test]> create table one_k (a int); Query OK, 0 rows affected (0.02 sec)   MariaDB [test]> insert into one_k select A.a + 1000 *B.a from ten A, ten B; Query OK, 100 rows affected (8.86 sec) Records: 100 Duplicates: 0 Warnings: 0   MariaDB [test]> MariaDB [test]> create table t1 (a int); Query OK, 0 rows affected (0.02 sec)   MariaDB [test]> insert into t1 select A.a*1000 + B.a from one_k A, one_k B; Query OK, 10000 rows affected (5.52 sec) Records: 10000 Duplicates: 0 Warnings: 0   MariaDB [test]> alter table t1 add b int, add c int, add d int; Query OK, 0 rows affected (0.48 sec) Records: 0 Duplicates: 0 Warnings: 0   MariaDB [test]> select count(*) from t1; +----------+ | count(*) | +----------+ | 10000 | +----------+ 1 row in set (0.08 sec) I tried to do it with the following commands set join_buffer_size=4096; set join_buffer_space_limit=4096; set optimizer_switch='optimize_join_buffer_size=on'; select count(*) from t1 A, t1 B, t1 C where (A.b + B.a + C.a) mod 5 =0; and I failed.

            People

              Unassigned Unassigned
              psergei Sergei Petrunia
              Votes:
              4 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.