[MDEV-31348] Assertion `last_key_entry >= end_pos' failed in virtual bool JOIN_CACHE_HASHED::put_record() Created: 2023-05-26  Updated: 2023-08-11  Resolved: 2023-08-10

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: None
Fix Version/s: 10.8.8, 10.4.31, 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5

Type: Bug Priority: Critical
Reporter: Ramesh Sivaraman Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Blocks
is blocked by MDEV-30938 Wrong result with small join_buffer_s... Stalled
Relates
relates to MDEV-31893 Valgrind reports issues in main.join_... Closed

 Description   

SET JOIN_buffer_size=1;
SET SESSION JOIN_cache_level=4;
SET SESSION optimizer_switch='optimize_JOIN_buffer_size=OFF';
SELECT * FROM information_schema.statistics JOIN information_schema.COLUMNS USING (table_name,column_name);

Leads to

10.4.30 6966d7fe4b7ccfb2b7d16ca4d7d5ab08234fa9ec (Optimized)

Core was generated by `/test/MD220523-mariadb-10.4.30-linux-x86_64-opt/bin/mariadbd --no-defaults --co'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x14b110fa4700 (LWP 654282))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000014b12b9d8859 in __GI_abort () at abort.c:79
#2  0x000014b12ba4326e in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x14b12bb6d298 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x000014b12ba4b2fc in malloc_printerr (str=str@entry=0x14b12bb6b4c1 "free(): invalid pointer") at malloc.c:5347
#4  0x000014b12ba4cb2c in _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:4173
#5  0x0000557ee2ca7982 in JOIN_CACHE::free (this=0x14b0c803c948) at /test/10.4_opt/sql/sql_join_cache.h:677
#6  st_join_table::cleanup (this=this@entry=0x14b0c8030198) at /test/10.4_opt/sql/sql_select.cc:13685
#7  0x0000557ee2cc8418 in JOIN::cleanup (this=0x14b0c8015490, full=full@entry=true) at /test/10.4_opt/sql/sql_select.cc:14121
#8  0x0000557ee2cc8be7 in JOIN::join_free (this=this@entry=0x14b0c8015490) at /test/10.4_opt/sql/sql_select.cc:14022
#9  0x0000557ee2cca9fb in do_select (procedure=<optimized out>, join=0x14b0c8015490) at /test/10.4_opt/sql/sql_select.cc:20458
#10 JOIN::exec_inner (this=0x14b0c8015490) at /test/10.4_opt/sql/sql_select.cc:4605
#11 0x0000557ee2ccada6 in JOIN::exec (this=this@entry=0x14b0c8015490) at /test/10.4_opt/sql/sql_select.cc:4387
#12 0x0000557ee2cc8e26 in mysql_select (thd=0x14b0c8000c48, tables=0x14b0c8010650, wild_num=1, fields=@0x14b0c8010148: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x14b0c80105b0, last = 0x14b0c802d4e0, elements = 36}, <No data fields>}, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=<optimized out>, result=0x14b0c8015468, unit=0x14b0c8004828, select_lex=0x14b0c800ffe8) at /test/10.4_opt/sql/sql_select.cc:4826
#13 0x0000557ee2cc985a in handle_select (thd=thd@entry=0x14b0c8000c48, lex=lex@entry=0x14b0c8004768, result=result@entry=0x14b0c8015468, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.4_opt/sql/sql_select.cc:442
#14 0x0000557ee2c55a2e in execute_sqlcom_select (thd=0x14b0c8000c48, all_tables=0x14b0c8010650) at /test/10.4_opt/sql/sql_parse.cc:6473
#15 0x0000557ee2c63f9d in mysql_execute_command (thd=0x14b0c8000c48) at /test/10.4_opt/sql/sql_parse.cc:3976
#16 0x0000557ee2c664e2 in mysql_parse (thd=0x14b0c8000c48, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.4_opt/sql/sql_parse.cc:8008
#17 0x0000557ee2c68f95 in dispatch_command (command=COM_QUERY, thd=0x14b0c8000c48, packet=<optimized out>, packet_length=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /test/10.4_opt/sql/sql_class.h:1231
#18 0x0000557ee2c6a7ef in do_command (thd=0x14b0c8000c48) at /test/10.4_opt/sql/sql_parse.cc:1378
#19 0x0000557ee2d5816e in do_handle_one_connection (connect=<optimized out>) at /test/10.4_opt/sql/sql_connect.cc:1420
#20 0x0000557ee2d581fd in handle_one_connection (arg=<optimized out>) at /test/10.4_opt/sql/sql_connect.cc:1324
#21 0x000014b12bee9609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#22 0x000014b12bad5133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.4.30 6966d7fe4b7ccfb2b7d16ca4d7d5ab08234fa9ec (Debug)

mariadbd: /test/10.4_dbg/sql/sql_join_cache.cc:3101: virtual bool JOIN_CACHE_HASHED::put_record(): Assertion `last_key_entry >= end_pos' failed.

10.4.30 6966d7fe4b7ccfb2b7d16ca4d7d5ab08234fa9ec (Debug)

Core was generated by `/test/MD220523-mariadb-10.4.30-linux-x86_64-dbg/bin/mariadbd --no-defaults --co'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x148122624700 (LWP 731819))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00001481490f1859 in __GI_abort () at abort.c:79
#2  0x00001481490f1729 in __assert_fail_base (fmt=0x148149287588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5638af86bf4a "last_key_entry >= end_pos", file=0x5638af86b908 "/test/10.4_dbg/sql/sql_join_cache.cc", line=3101, function=<optimized out>) at assert.c:92
#3  0x0000148149102fd6 in __GI___assert_fail (assertion=assertion@entry=0x5638af86bf4a "last_key_entry >= end_pos", file=file@entry=0x5638af86b908 "/test/10.4_dbg/sql/sql_join_cache.cc", line=line@entry=3101, function=function@entry=0x5638af86bd70 "virtual bool JOIN_CACHE_HASHED::put_record()") at assert.c:101
#4  0x00005638aed99844 in JOIN_CACHE_HASHED::put_record (this=0x1480e0045b70) at /test/10.4_dbg/sql/sql_join_cache.cc:3101
#5  0x00005638aec77e9c in sub_select_cache (join=0x1480e0017aa0, join_tab=0x1480e00351c8, end_of_records=<optimized out>) at /test/10.4_dbg/sql/sql_select.cc:20666
#6  0x00005638aec62cd0 in evaluate_join_record (join=join@entry=0x1480e0017aa0, join_tab=join_tab@entry=0x1480e0034e20, error=error@entry=0) at /test/10.4_dbg/sql/sql_select.cc:21115
#7  0x00005638aec77a2b in sub_select (join=0x1480e0017aa0, join_tab=0x1480e0034e20, end_of_records=false) at /test/10.4_dbg/sql/sql_select.cc:20888
#8  0x00005638aeca9c5a in do_select (procedure=<optimized out>, join=0x1480e0017aa0) at /test/10.4_dbg/sql/sql_select.cc:20411
#9  JOIN::exec_inner (this=this@entry=0x1480e0017aa0) at /test/10.4_dbg/sql/sql_select.cc:4605
#10 0x00005638aecaa0f8 in JOIN::exec (this=this@entry=0x1480e0017aa0) at /test/10.4_dbg/sql/sql_select.cc:4387
#11 0x00005638aeca8811 in mysql_select (thd=thd@entry=0x1480e0000d28, tables=0x1480e0012c60, wild_num=1, fields=@0x1480e0012758: {<base_list> = {<Sql_alloc> = {<No data fields>}, first = 0x1480e0012bc0, last = 0x1480e0032510, elements = 36}, <No data fields>}, conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=<optimized out>, result=0x1480e0017a78, unit=0x1480e0004ac8, select_lex=0x1480e00125f8) at /test/10.4_dbg/sql/sql_select.cc:4826
#12 0x00005638aeca8ab6 in handle_select (thd=thd@entry=0x1480e0000d28, lex=lex@entry=0x1480e0004a08, result=result@entry=0x1480e0017a78, setup_tables_done_option=setup_tables_done_option@entry=0) at /test/10.4_dbg/sql/sql_select.cc:442
#13 0x00005638aec24f2d in execute_sqlcom_select (thd=thd@entry=0x1480e0000d28, all_tables=0x1480e0012c60) at /test/10.4_dbg/sql/sql_parse.cc:6473
#14 0x00005638aec2edd7 in mysql_execute_command (thd=thd@entry=0x1480e0000d28) at /test/10.4_dbg/sql/sql_parse.cc:3976
#15 0x00005638aec39247 in mysql_parse (thd=thd@entry=0x1480e0000d28, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x1481226233b0, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.4_dbg/sql/sql_parse.cc:8008
#16 0x00005638aec3bd62 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x1480e0000d28, packet=packet@entry=0x1480e0019509 "", packet_length=packet_length@entry=106, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /test/10.4_dbg/sql/sql_class.h:1231
#17 0x00005638aec3e58d in do_command (thd=0x1480e0000d28) at /test/10.4_dbg/sql/sql_parse.cc:1378
#18 0x00005638aed5b3d7 in do_handle_one_connection (connect=<optimized out>) at /test/10.4_dbg/sql/sql_connect.cc:1420
#19 0x00005638aed5b493 in handle_one_connection (arg=<optimized out>) at /test/10.4_dbg/sql/sql_connect.cc:1324
#20 0x0000148149602609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00001481491ee133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The crash is happening after the following commit in 10.4:

commit e74390d94f81d0bc6d0b2b8999f7ee44fd2092e9 (HEAD)
Author: Monty <monty@mariadb.org>
Date:   Thu May 4 13:06:39 2023 +0300
 
    Cleanup of sql_join_cache code (no logic changes)
    
    - Remove virtual from get_min_join_buffer_size() and
      get_max_join_buffer_size().
    - Avoid some calls to get_min_buffer_size()
    - Simply cache usage in get_..._join_buffer_size()
    - Simplify get_max_join_buffer_size() when using optimize_buff_size
    - Reindented some long comments
 



 Comments   
Comment by Igor Babaev [ 2023-07-24 ]

After the patch commit d657f18ea787dbf31460c77cb8fc56c01730b895

Author: Monty <monty@mariadb.org>
Date:   Sat May 27 16:31:22 2023 +0300
 
    MDEV-31226 Server crash or assertion failure with row size close to join_buffer_size
    
    The problem was that JOIN_CACHE::alloc_buffer() did not check if the
    given join_buffer_value is less than the query require.
    
    Added a check for this and disabled join cache if it cannot be used.

we don't have any assert failure / crash. Rather we have

MariaDB [test]> SET SESSION JOIN_cache_level=4;
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> 
MariaDB [test]> SET SESSION optimizer_switch='optimize_JOIN_buffer_size=OFF';
Query OK, 0 rows affected (0.000 sec)
 
MariaDB [test]> 
MariaDB [test]> SELECT * FROM information_schema.statistics JOIN information_schema.COLUMNS USING (table_name,column_name);
ERROR 1037 (HY001): Could not create a join buffer. Please check and adjust the value of the variables 'JOIN_BUFFER_SIZE (128)' and 'JOIN_BUFFER_SPACE_LIMIT (2097152)'

Returning such message is a temporary solution. The proper solution will appear in the fix of bug MDEV-30938.

Comment by Michael Widenius [ 2023-08-10 ]

Fixed as part of MDEV-31893

The bug was an unassigned variable that caused wrong buffer sizes to be allocated.
This happens only if SET optimizer_switch='optimize_join_buffer_size=off' is used

Comment by Michael Widenius [ 2023-08-10 ]

Pushed test case to 10.4 together with fix for MDEV-31893.
After MDEV-31893 this code did not crash anymore.
It may be that another of my earlier commits has fixed this issue, but there was not test case so this commit covers that.

Generated at Thu Feb 08 10:23:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.