[MDEV-31226] Server crash or assertion failure with row size close to join_buffer_size Created: 2023-05-09  Updated: 2023-06-07  Resolved: 2023-05-27

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.4
Fix Version/s: 10.4.31, 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: regression


 Description   

Note: The affected version is set to 10.4 because the guilty commit has not yet been merged into higher versions.

The issue is scalable – easily scalable down to the minimal value of join_buffer_size and correspondingly smaller row size, but I have intentionally kept it bigger to make it more realistic. I'm not sure if it can be scaled further up, I hit the internal row size limit. Hopefully it will become clear after the analysis.

CREATE TABLE t (f VARCHAR(16384)) ENGINE=MyISAM CHARACTER SET utf8;
INSERT INTO t VALUES (REPEAT('a',16384)),(REPEAT('b',16384));
 
SET OPTIMIZER_SWITCH = 'optimize_join_buffer_size=off';
SET JOIN_BUFFER_SIZE = 16384;
 
SELECT * FROM t t1, t t2;
 
# Cleanup
DROP TABLE t;

10.4 84b9fc25 non-debug

#6  0x00007feb6d445472 in __GI_abort () at ./stdlib/abort.c:79
#7  0x00007feb6d49e2d0 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7feb6d5b8459 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#8  0x00007feb6d4b364a in malloc_printerr (str=str@entry=0x7feb6d5bab98 "munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5660
#9  0x00007feb6d4b380c in munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3054
#10 0x00007feb6d4b7d78 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3375
#11 0x000055d5ed8af130 in my_free (ptr=<optimized out>) at /data/src/10.4/mysys/my_malloc.c:222
#12 0x000055d5ed89eb50 in end_io_cache (info=info@entry=0x7feb1c18df70) at /data/src/10.4/mysys/mf_iocache.c:2017
#13 0x000055d5ed8755e5 in mi_extra (info=0x7feb1c18dd48, function=HA_EXTRA_NO_CACHE, extra_arg=0x0) at /data/src/10.4/storage/myisam/mi_extra.c:155
#14 0x000055d5ed559e67 in end_read_record (info=info@entry=0x7feb1c013b00) at /data/src/10.4/sql/records.cc:325
#15 0x000055d5ed290ccf in st_join_table::cleanup (this=this@entry=0x7feb1c013a38) at /data/src/10.4/sql/sql_select.cc:13727
#16 0x000055d5ed2afa98 in JOIN::cleanup (this=this@entry=0x7feb1c011f70, full=full@entry=true) at /data/src/10.4/sql/sql_select.cc:14104
#17 0x000055d5ed2afd8e in JOIN::join_free (this=this@entry=0x7feb1c011f70) at /data/src/10.4/sql/sql_select.cc:14005
#18 0x000055d5ed2b1d26 in do_select (procedure=<optimized out>, join=0x7feb1c011f70) at /data/src/10.4/sql/sql_select.cc:20438
#19 JOIN::exec_inner (this=this@entry=0x7feb1c011f70) at /data/src/10.4/sql/sql_select.cc:4605
#20 0x000055d5ed2b2034 in JOIN::exec (this=this@entry=0x7feb1c011f70) at /data/src/10.4/sql/sql_select.cc:4387
#21 0x000055d5ed2b00b4 in mysql_select (thd=thd@entry=0x7feb1c000c58, tables=0x7feb1c0107b0, wild_num=1, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7feb1c011f48, unit=0x7feb1c0049d0, select_lex=0x7feb1c010190) at /data/src/10.4/sql/sql_select.cc:4826
#22 0x000055d5ed2b0aa3 in handle_select (thd=thd@entry=0x7feb1c000c58, lex=lex@entry=0x7feb1c004910, result=result@entry=0x7feb1c011f48, setup_tables_done_option=setup_tables_done_option@entry=0) at /data/src/10.4/sql/sql_select.cc:442
#23 0x000055d5ed2453cc in execute_sqlcom_select (thd=thd@entry=0x7feb1c000c58, all_tables=0x7feb1c0107b0) at /data/src/10.4/sql/sql_parse.cc:6463
#24 0x000055d5ed25188b in mysql_execute_command (thd=thd@entry=0x7feb1c000c58) at /data/src/10.4/sql/sql_parse.cc:3966
#25 0x000055d5ed253100 in mysql_parse (thd=thd@entry=0x7feb1c000c58, rawbuf=<optimized out>, length=24, parser_state=parser_state@entry=0x7feb626a5540, is_com_multi=is_com_multi@entry=false, is_next_command=<optimized out>) at /data/src/10.4/sql/sql_parse.cc:7998
#26 0x000055d5ed254dc5 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7feb1c000c58, packet=packet@entry=0x7feb1c007d59 "\200", packet_length=packet_length@entry=24, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:1857
#27 0x000055d5ed2568c8 in do_command (thd=0x7feb1c000c58) at /data/src/10.4/sql/sql_parse.cc:1378
#28 0x000055d5ed33b994 in do_handle_one_connection (connect=connect@entry=0x55d5f0d571f8) at /data/src/10.4/sql/sql_connect.cc:1420
#29 0x000055d5ed33bae4 in handle_one_connection (arg=arg@entry=0x55d5f0d571f8) at /data/src/10.4/sql/sql_connect.cc:1324
#30 0x000055d5ed659430 in pfs_spawn_thread (arg=0x55d5f0cb5208) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#31 0x00007feb6d4a7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#32 0x00007feb6d5285bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

10.4 84b9fc25

mysqld: /data/src/10.4/sql/sql_join_cache.cc:1431: uint JOIN_CACHE::write_record_data(uchar*, bool*): Assertion `cp + len <= buff + buff_size' failed.
230509 18:35:22 [ERROR] mysqld got signal 6 ;
 
#9  0x00007f8b1cc53df2 in __GI___assert_fail (assertion=0x5597f8fdd7c0 "cp + len <= buff + buff_size", file=0x5597f8fdd3c0 "/data/src/10.4/sql/sql_join_cache.cc", line=1431, function=0x5597f8fdd5e0 "uint JOIN_CACHE::write_record_data(uchar*, bool*)") at ./assert/assert.c:101
#10 0x00005597f72982dc in JOIN_CACHE::write_record_data (this=0x62b000067130, link=0x0, is_full=0x7f8b14f29e30) at /data/src/10.4/sql/sql_join_cache.cc:1431
#11 0x00005597f729968c in JOIN_CACHE::put_record (this=0x62b000067130) at /data/src/10.4/sql/sql_join_cache.cc:1585
#12 0x00005597f6f4562e in sub_select_cache (join=0x62b000064170, join_tab=0x62b000066090, end_of_records=false) at /data/src/10.4/sql/sql_select.cc:20646
#13 0x00005597f6f47963 in evaluate_join_record (join=0x62b000064170, join_tab=0x62b000065cf0, error=0) at /data/src/10.4/sql/sql_select.cc:21085
#14 0x00005597f6f4629f in sub_select (join=0x62b000064170, join_tab=0x62b000065cf0, end_of_records=false) at /data/src/10.4/sql/sql_select.cc:20858
#15 0x00005597f6f44236 in do_select (join=0x62b000064170, procedure=0x0) at /data/src/10.4/sql/sql_select.cc:20391
#16 0x00005597f6ed32fd in JOIN::exec_inner (this=0x62b000064170) at /data/src/10.4/sql/sql_select.cc:4605
#17 0x00005597f6ed0904 in JOIN::exec (this=0x62b000064170) at /data/src/10.4/sql/sql_select.cc:4387
#18 0x00005597f6ed4996 in mysql_select (thd=0x62b00005b208, tables=0x62b000062928, wild_num=1, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x62b000064140, unit=0x62b00005f140, select_lex=0x62b0000622c8) at /data/src/10.4/sql/sql_select.cc:4826
#19 0x00005597f6ea559b in handle_select (thd=0x62b00005b208, lex=0x62b00005f080, result=0x62b000064140, setup_tables_done_option=0) at /data/src/10.4/sql/sql_select.cc:442
#20 0x00005597f6e15129 in execute_sqlcom_select (thd=0x62b00005b208, all_tables=0x62b000062928) at /data/src/10.4/sql/sql_parse.cc:6463
#21 0x00005597f6e0263e in mysql_execute_command (thd=0x62b00005b208) at /data/src/10.4/sql/sql_parse.cc:3966
#22 0x00005597f6e1e301 in mysql_parse (thd=0x62b00005b208, rawbuf=0x62b000062228 "SELECT * FROM t t1, t t2", length=24, parser_state=0x7f8b14f2c860, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7998
#23 0x00005597f6df47ad in dispatch_command (command=COM_QUERY, thd=0x62b00005b208, packet=0x62900023f209 "", packet_length=24, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1857
#24 0x00005597f6df1326 in do_command (thd=0x62b00005b208) at /data/src/10.4/sql/sql_parse.cc:1378
#25 0x00005597f71ef5d4 in do_handle_one_connection (connect=0x6080000009a8) at /data/src/10.4/sql/sql_connect.cc:1420
#26 0x00005597f71eeeeb in handle_one_connection (arg=0x6080000009a8) at /data/src/10.4/sql/sql_connect.cc:1324
#27 0x00005597f7e59e00 in pfs_spawn_thread (arg=0x615000003508) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#28 0x00007f8b1cca7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#29 0x00007f8b1cd285bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The failure started happening after this commit in 10.4 (didn't make it to 10.4.29 release):

commit e74390d94f81d0bc6d0b2b8999f7ee44fd2092e9
Author: Monty
Date:   Thu May 4 13:06:39 2023 +0300
 
    Cleanup of sql_join_cache code (no logic changes)



 Comments   
Comment by Elena Stepanova [ 2023-05-19 ]

A variation of the assertion failure (for searching purposes)

10.4 956d6c4a

mysqld: /data/src/10.4/sql/sql_join_cache.cc:1475: uint JOIN_CACHE::write_record_data(uchar*, bool*): Assertion `cp + copy->length <= buff + buff_size' failed.
230519 20:56:02 [ERROR] mysqld got signal 6 ;
 
#9  0x00007fd8e5a53df2 in __GI___assert_fail (assertion=0x556071b817e8 "cp + copy->length <= buff + buff_size", file=0x556071b816e0 "/data/src/10.4/sql/sql_join_cache.cc", line=1475, function=0x556071b81770 "uint JOIN_CACHE::write_record_data(uchar*, bool*)") at ./assert/assert.c:101
#10 0x0000556070e8408c in JOIN_CACHE::write_record_data (this=0x7fd8c406c6f8, link=0x0, is_full=0x7fd8e01d50c7) at /data/src/10.4/sql/sql_join_cache.cc:1475
#11 0x0000556070e84511 in JOIN_CACHE::put_record (this=0x7fd8c406c6f8) at /data/src/10.4/sql/sql_join_cache.cc:1585
#12 0x0000556070d1bd1c in sub_select_cache (join=0x7fd8c4019d58, join_tab=0x7fd8c406a478, end_of_records=false) at /data/src/10.4/sql/sql_select.cc:20665
#13 0x0000556070d1c9ec in evaluate_join_record (join=0x7fd8c4019d58, join_tab=0x7fd8c406a0d0, error=0) at /data/src/10.4/sql/sql_select.cc:21114
#14 0x0000556070d1c1f5 in sub_select (join=0x7fd8c4019d58, join_tab=0x7fd8c406a0d0, end_of_records=false) at /data/src/10.4/sql/sql_select.cc:20887
#15 0x0000556070d1b4df in do_select (join=0x7fd8c4019d58, procedure=0x0) at /data/src/10.4/sql/sql_select.cc:20410
#16 0x0000556070ceeafc in JOIN::exec_inner (this=0x7fd8c4019d58) at /data/src/10.4/sql/sql_select.cc:4605
#17 0x0000556070cedbd0 in JOIN::exec (this=0x7fd8c4019d58) at /data/src/10.4/sql/sql_select.cc:4387
#18 0x0000556070cef3cf in mysql_select (thd=0x7fd8c4000da0, tables=0x7fd8c40152a8, wild_num=0, fields=..., conds=0x7fd8c4018558, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2684619520, result=0x7fd8c4019d30, unit=0x7fd8c4004cd8, select_lex=0x7fd8c4014af0) at /data/src/10.4/sql/sql_select.cc:4826
#19 0x0000556070cde207 in handle_select (thd=0x7fd8c4000da0, lex=0x7fd8c4004c18, result=0x7fd8c4019d30, setup_tables_done_option=0) at /data/src/10.4/sql/sql_select.cc:442
#20 0x0000556070ca0f7e in execute_sqlcom_select (thd=0x7fd8c4000da0, all_tables=0x7fd8c40152a8) at /data/src/10.4/sql/sql_parse.cc:6473
#21 0x0000556070c97480 in mysql_execute_command (thd=0x7fd8c4000da0) at /data/src/10.4/sql/sql_parse.cc:3976
#22 0x0000556070ca50f3 in mysql_parse (thd=0x7fd8c4000da0, rawbuf=0x7fd8c40147e8 "SELECT tbl1.lock_index AS fld1, tbl2.Grantor AS fld2 FROM information_schema.INNODB_LOCKS AS tbl1 JOIN mysql.proxies_priv AS tbl2 ON (tbl1.lock_trx_id >= tbl1.lock_mode OR '' = tbl1.lock_rec OR tbl2.T"..., length=343, parser_state=0x7fd8e01d63c0, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:8008
#23 0x0000556070c90ac4 in dispatch_command (command=COM_QUERY, thd=0x7fd8c4000da0, packet=0x7fd8c400ac41 "", packet_length=343, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1857
#24 0x0000556070c8f32b in do_command (thd=0x7fd8c4000da0) at /data/src/10.4/sql/sql_parse.cc:1378
#25 0x0000556070e30120 in do_handle_one_connection (connect=0x55607339f470) at /data/src/10.4/sql/sql_connect.cc:1420
#26 0x0000556070e2fe88 in handle_one_connection (arg=0x55607339f470) at /data/src/10.4/sql/sql_connect.cc:1324
#27 0x0000556071381e48 in pfs_spawn_thread (arg=0x5560732f0be0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#28 0x00007fd8e5aa7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#29 0x00007fd8e5b285bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

Comment by Michael Widenius [ 2023-05-27 ]

The problem was that JOIN_CACHE::alloc_buffer() did not check if the
given join_buffer_value is usable.

Added a check for this and disabled join cache if it cannot be used.

Generated at Thu Feb 08 10:22:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.