[MDEV-31935] ASAN errors in JOIN_CACHE::alloc_buffer when join_buffer_size cannot be allocated Created: 2023-08-16  Updated: 2023-09-28  Resolved: 2023-09-28

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.4, 10.5, 10.6, 10.10, 10.11, 11.0, 11.1
Fix Version/s: N/A

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Not a Bug Votes: 0
Labels: regression


 Description   

The difference with "usual" join-buffer-size-related issues is that here the join buffer is too big rather than too small, even although not necessarily unrealistically big.
I have it set to 128G in the test case, because the machine I'm filing it from has 96G memory, and it fails with 128G but doesn't fail with 64G.
On another machine with 16G memory it fails with 32G already.

If yours has more than 128G, in order to reproduce the failure, increase the value in the test case.

Note that there is no error in the error log about not being able to allocate the buffer (neither now nor before the change which introduced the failure).

SET optimizer_switch = 'optimize_join_buffer_size=off';
SET join_buffer_size = 128*1024*1024*1024;
 
create table t (a int);
insert into t values (1),(2);
 
select t1.* from t as t1 join t as t2;
 
# Cleanup
drop table t;

10.4 900c4d69

==1732811==ERROR: AddressSanitizer: allocator is out of memory trying to allocate 0x2000000008 bytes
    #0 0x7f677b6b89cf in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x55560f667882 in my_malloc /data/src/10.4/mysys/my_malloc.c:101
    #2 0x55560df537b5 in JOIN_CACHE::alloc_buffer() /data/src/10.4/sql/sql_join_cache.cc:964
    #3 0x55560df53f95 in JOIN_CACHE::init(bool) /data/src/10.4/sql/sql_join_cache.cc:1096
    #4 0x55560df61ffc in JOIN_CACHE_BNL::init(bool) /data/src/10.4/sql/sql_join_cache.cc:3757
    #5 0x55560db7316f in JOIN::init_join_caches() /data/src/10.4/sql/sql_select.cc:1878
    #6 0x55560db8017c in JOIN::optimize_stage2() /data/src/10.4/sql/sql_select.cc:3111
    #7 0x55560db7895c in JOIN::optimize_inner() /data/src/10.4/sql/sql_select.cc:2394
    #8 0x55560db715fa in JOIN::optimize() /data/src/10.4/sql/sql_select.cc:1711
    #9 0x55560db924ce in mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /data/src/10.4/sql/sql_select.cc:4812
    #10 0x55560db632fe in handle_select(THD*, LEX*, select_result*, unsigned long) /data/src/10.4/sql/sql_select.cc:442
    #11 0x55560dad2826 in execute_sqlcom_select /data/src/10.4/sql/sql_parse.cc:6473
    #12 0x55560dabfd3b in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:3976
    #13 0x55560dadba76 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8010
    #14 0x55560dab1d41 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1857
    #15 0x55560daae8b0 in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1378
    #16 0x55560deade0f in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1420
    #17 0x55560dead726 in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1324
    #18 0x55560eb1de1f in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1869
    #19 0x7f677afc8fd3 in start_thread nptl/pthread_create.c:442
 
==1732811==HINT: if you don't care about these errors you may set allocator_may_return_null=1
SUMMARY: AddressSanitizer: out-of-memory ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69 in __interceptor_malloc
Thread T5 created by T0 here:
    #0 0x7f677b649726 in __interceptor_pthread_create ../../../../src/libsanitizer/asan/asan_interceptors.cpp:207
    #1 0x55560eb1e20c in spawn_thread_v1 /data/src/10.4/storage/perfschema/pfs.cc:1919
    #2 0x55560d7b9f89 in inline_mysql_thread_create /data/src/10.4/include/mysql/psi/mysql_thread.h:1275
    #3 0x55560d7d1690 in create_thread_to_handle_connection(CONNECT*) /data/src/10.4/sql/mysqld.cc:6287
    #4 0x55560d7d1ddb in create_new_thread(CONNECT*) /data/src/10.4/sql/mysqld.cc:6357
    #5 0x55560d7d22a9 in handle_accepted_socket(st_mysql_socket, st_mysql_socket) /data/src/10.4/sql/mysqld.cc:6455
    #6 0x55560d7d3155 in handle_connections_sockets() /data/src/10.4/sql/mysqld.cc:6613
    #7 0x55560d7d0df3 in mysqld_main(int, char**) /data/src/10.4/sql/mysqld.cc:5945
    #8 0x55560d7b80b8 in main /data/src/10.4/sql/main.cc:25
    #9 0x7f677af67189 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
 
==1732811==ABORTING

The failure started happening after this commit in 10.4.31

commit 08a4732860348bcdc16b4ad8ecfc2b4b2e644ae5
Author: Monty
Date:   Wed May 3 21:27:30 2023 +0300
 
    MDEV-28217 Incorrect Join Execution When Controlling Join Buffer Size



 Comments   
Comment by Michael Widenius [ 2023-09-28 ]

This is not a bug. It a side effect of using asan. Without asan, things works 'perfectly'.

What happens in the normal case (no asan). I have verified this with --debug:

  • When trying to allocate join buffer, it starts by trying to allocate the requested amount (137438953472 in this case)
  • While my_malloc(join_buffer_size) returns 0 (not enough memory).
  • Decrease join_size by 25 %

When using asan, asan generates an error if malloc cannot allocate the requested memory. It wrongly assumes that the code will not able to handle this case (which it does).

Generated at Thu Feb 08 10:27:34 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.