[MDEV-30659] Server crash on EXPLAIN SELECT/SELECT on table with engine Aria for LooseScan Strategy Created: 2023-02-15  Updated: 2023-05-27  Resolved: 2023-05-27

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 11.0
Fix Version/s: 11.0.2

Type: Bug Priority: Critical
Reporter: Lena Startseva Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None


 Description   

Case to reproduce:

set default_storage_engine=Aria;
 
create table t1 (old_c1 integer,
                 old_c2 integer,
                 c1 integer,
                 c2 integer,
                 c3 integer);
 
insert into t1(c1,c2,c3)
        values (1,1,1), (1,2,2), (1,3,3),
               (2,1,4), (2,2,5), (2,3,6),
               (2,4,7), (2,5,8);
 
create index t1_c2 on t1 (c2,c1);
 
explain
select * from t1
         where t1.c2 in (select a.c2 from t1 a)
            and c2 >= 3 order by c2;
 
drop table t1;

Stacktrace:

Thread pointer: 0x7f3690000dc8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f36a0bf6c38 thread_stack 0x49000
mysys/stacktrace.c:212(my_print_stacktrace)[0x55a5f3e9c026]
sql/signal_handler.cc:238(handle_fatal_signal)[0x55a5f3573c7f]
libc_sigaction.c:0(__restore_rt)[0x7f36a625e520]
nptl/pthread_kill.c:44(__pthread_kill_implementation)[0x7f36a62b2a7c]
posix/raise.c:27(__GI_raise)[0x7f36a625e476]
stdlib/abort.c:81(__GI_abort)[0x7f36a62447f3]
intl/loadmsgcat.c:1177(_nl_load_domain)[0x7f36a624471b]
:0(__GI___assert_fail)[0x7f36a6255e96]
sql/sql_select.cc:30497(get_range_limit_read_cost(POSITION const*, TABLE const*, unsigned int, unsigned long long, unsigned long long, double*, double*))[0x55a5f324696d]
sql/sql_select.cc:30856(test_if_cheaper_ordering(st_join_table const*, st_order*, TABLE*, Bitmap<64u>, int, unsigned long long, int*, int*, unsigned long long*, unsigned int*, unsigned int*))[0x55a5f3247b86]
sql/sql_select.cc:25766(test_if_skip_sort_order(st_join_table*, st_order*, unsigned long long, bool, Bitmap<64u> const*))[0x55a5f3238c7b]
sql/sql_select.cc:3306(JOIN::optimize_stage2())[0x55a5f31f7926]
sql/sql_select.cc:2595(JOIN::optimize_inner())[0x55a5f31f5070]
sql/sql_select.cc:1897(JOIN::optimize())[0x55a5f31f2743]
sql/sql_select.cc:5121(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55a5f31fe61c]
sql/sql_select.cc:29564(mysql_explain_union(THD*, st_select_lex_unit*, select_result*))[0x55a5f3243d82]
sql/sql_parse.cc:6203(execute_sqlcom_select(THD*, TABLE_LIST*))[0x55a5f3191148]
sql/sql_parse.cc:3949(mysql_execute_command(THD*, bool))[0x55a5f31885c8]
sql/sql_parse.cc:8000(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55a5f319643e]
sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x55a5f31822ec]
sql/sql_parse.cc:1407(do_command(THD*, bool))[0x55a5f3180c40]
sql/sql_connect.cc:1416(do_handle_one_connection(CONNECT*, bool))[0x55a5f33702f6]
sql/sql_connect.cc:1320(handle_one_connection)[0x55a5f337005f]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x55a5f38e388d]
nptl/pthread_create.c:442(start_thread)[0x7f36a62b0b43]
x86_64/clone3.S:83(__clone3)[0x7f36a6342a00]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f36900154c0): explain
select * from t1
where t1.c2 in (select a.c2 from t1 a)
and c2 >= 3 order by c2

If we add

analyze table t1 persistent for all;

after index creation problem is gone.

There is a second case with truncate table causing the server crash (trigger is important in this case):

set default_storage_engine=Aria;
 
create table t1 (old_c1 integer,
                 old_c2 integer,
                 c1 integer,
                 c2 integer,
                 c3 integer);
 
delimiter /;
create trigger trg_t1 before update on t1 for each row
begin
  set new.old_c1=old.c1;
  set new.old_c2=old.c2;
end;
/
delimiter ;/
 
insert into t1(c1,c2,c3)
        values (1,1,1), (1,2,2), (1,3,3),
               (2,1,4), (2,2,5), (2,3,6),
               (2,4,7), (2,5,8);
 
create index t1_c2 on t1 (c2,c1);
 
analyze table t1 persistent for all;
 
create table tmp as select * from t1;
truncate table t1;
insert into t1 select * from tmp;
 
 
explain
select * from t1
         where t1.c2 in (select a.c2 from t1 a)
            and c2 >= 3 order by c2;
 
drop table tmp;
drop table t1;



 Comments   
Comment by Michael Widenius [ 2023-02-16 ]

First fix done
Waiting for review by Sergei Petrunia

Comment by Sergei Petrunia [ 2023-02-20 ]

One interesting thing that the testcase shows:

The cost of full index scan when it is considered by LooseScan strategy is lower than the cost of the range access.

This is because the quick select adds

records * WHERE_COST_THD(thd)

while LooseScan's full-index-scan cost doesn't add it...

Comment by Sergei Petrunia [ 2023-02-20 ]

Another thing to check is that Loose_scan_opt::check_ref_access_part1 does this:

          /*
            Now find out how many different keys we will get (for now we
            ignore the fact that we have "keypart_i=const" restriction for
            some key components, that may make us think think that loose
            scan will produce more distinct records than it actually will)
          */
          ulong rpc;
          if ((rpc= s->table->key_info[key].rec_per_key[max_loose_keypart]))
            records= records / rpc;

That is, it computes how many #records will be left semi-join duplicates are removed.

At the same time, Monty's patch does this in Loose_scan_opt::save_to_position():

      set_if_smaller(*records_out, best_loose_scan_records);

which means best_access_path will get the estimated #records after the duplicates are removed...
This seems incorrect.
It's fine to adjust loose_scan_pos->records_out based on computations in best_access_path(). But not the other way around...

Comment by Sergei Petrunia [ 2023-02-21 ]

Pushed my variant of the patch into bb-11.0-mdev30659-v2. It's a "subset" of Monty's patch.

Comment by Michael Widenius [ 2023-05-27 ]

Fixed by commit by Sergei Petrunia, using some code from me.

Generated at Thu Feb 08 10:17:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.