[MDEV-6442] Assertion `join->best_read < double(...)' failed with optimizer_use_condition_selectivity >=3, InnoDB, multi-part key, no info in stat tables Created: 2014-07-13  Updated: 2014-10-06  Resolved: 2014-10-06

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.12
Fix Version/s: 10.0.15

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: eits, optimizer

Issue Links:
Relates
relates to MDEV-6738 use_stat_table + histograms crashing ... Closed

 Description   

The problem appeared on 10.0 tree with the following revision:

revno: 4169
revision-id: psergey@askmonty.org-20140425150454-dsk6kba2vn13gw50
parent: psergey@askmonty.org-20140428174939-32ycvsxmajmfdjno
committer: Sergey Petrunya <psergey@askmonty.org>
branch nick: 10.0-cp
timestamp: Fri 2014-04-25 19:04:54 +0400
message:
  MDEV-6003: EITS: ref access, keypart2=const vs keypart2=expr - inconsistent filtered% value
  - Fix table_cond_selectivity() to work correctly for ref access 
    and "keypart2=const" case.

10.0/sql/sql_select.cc:6905: bool greedy_search(JOIN*, table_map, uint, uint, uint): Assertion `join->best_read < double(1.79769313486231570815e+308L)' failed.
140713 17:00:35 [ERROR] mysqld got signal 6 ;

#6  0x00007f6f1babf621 in *__GI___assert_fail (assertion=0xf39368 "join->best_read < double(1.79769313486231570815e+308L)", file=<optimized out>, line=6905, function=0xf3c100 "bool greedy_search(JOIN*, table_map, uint, uint, uint)") at assert.c:81
#7  0x00000000006be4de in greedy_search (join=0x7f6f084f77f0, remaining_tables=15, search_depth=62, prune_level=1, use_cond_selectivity=3) at 10.0/sql/sql_select.cc:6905
#8  0x00000000006bda68 in choose_plan (join=0x7f6f084f77f0, join_tables=15) at 10.0/sql/sql_select.cc:6474
#9  0x00000000006b7531 in make_join_statistics (join=0x7f6f084f77f0, tables_list=..., conds=0x7f6f0854c6c8, keyuse_array=0x7f6f084f7af8) at 10.0/sql/sql_select.cc:4018
#10 0x00000000006ae249 in JOIN::optimize_inner (this=0x7f6f084f77f0) at 10.0/sql/sql_select.cc:1338
#11 0x00000000006ad1e0 in JOIN::optimize (this=0x7f6f084f77f0) at 10.0/sql/sql_select.cc:1023
#12 0x00000000006b4d6f in mysql_select (thd=0x7f6f15b5f070, rref_pointer_array=0x7f6f15b636e8, tables=0x7f6f08422350, wild_num=1, fields=..., conds=0x7f6f084f7518, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f6f084f77d0, unit=0x7f6f15b62d88, select_lex=0x7f6f15b63470) at 10.0/sql/sql_select.cc:3289
#13 0x00000000006ab3ef in handle_select (thd=0x7f6f15b5f070, lex=0x7f6f15b62cc0, result=0x7f6f084f77d0, setup_tables_done_option=0) at 10.0/sql/sql_select.cc:372
#14 0x00000000006801a5 in execute_sqlcom_select (thd=0x7f6f15b5f070, all_tables=0x7f6f08422350) at 10.0/sql/sql_parse.cc:5263
#15 0x000000000067859c in mysql_execute_command (thd=0x7f6f15b5f070) at 10.0/sql/sql_parse.cc:2554
#16 0x000000000068292f in mysql_parse (thd=0x7f6f15b5f070, rawbuf=0x7f6f08422088 "SELECT * FROM t1, t2 WHERE ( 't', 'o' ) IN ( \nSELECT t1_2.b, t1_1.a FROM t1 AS t1_1 STRAIGHT_JOIN t1 AS t1_2 ON ( t1_2.a = t1_1.b ) \n)", length=134, parser_state=0x7f6f1da3d610) at 10.0/sql/sql_parse.cc:6409
#17 0x000000000067583d in dispatch_command (command=COM_QUERY, thd=0x7f6f15b5f070, packet=0x7f6f0fffa071 "SELECT * FROM t1, t2 WHERE ( 't', 'o' ) IN ( \nSELECT t1_2.b, t1_1.a FROM t1 AS t1_1 STRAIGHT_JOIN t1 AS t1_2 ON ( t1_2.a = t1_1.b ) \n)", packet_length=134) at 10.0/sql/sql_parse.cc:1309
#18 0x0000000000674be2 in do_command (thd=0x7f6f15b5f070) at 10.0/sql/sql_parse.cc:1006
#19 0x0000000000790ab5 in do_handle_one_connection (thd_arg=0x7f6f15b5f070) at 10.0/sql/sql_connect.cc:1379
#20 0x0000000000790808 in handle_one_connection (arg=0x7f6f15b5f070) at 10.0/sql/sql_connect.cc:1293
#21 0x0000000000cc22f6 in pfs_spawn_thread (arg=0x7f6f0fe87df0) at 10.0/storage/perfschema/pfs.cc:1860
#22 0x00007f6f1d676b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#23 0x00007f6f1bb6ea7d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#24 0x0000000000000000 in ?? ()

Stack trace from:

revision-id: knielsen@knielsen-hq.org-20140711100647-nf3rdaf5ep26pgty
revno: 4290
branch-nick: 10.0

--source include/have_innodb.inc
 
SET use_stat_tables = PREFERABLY;
SET optimizer_use_condition_selectivity = 3;
 
CREATE TABLE t1 ( a VARCHAR(3), b VARCHAR(8), KEY (a,b) ) ENGINE=InnoDB;
INSERT INTO t1 VALUES ('USA','Chinese'),('USA','English');
 
CREATE TABLE t2 (i INT) ENGINE=InnoDB;
 
SELECT * FROM t1, t2 WHERE ( 't', 'o' ) IN ( 
  SELECT t1_2.b, t1_1.a FROM t1 AS t1_1 STRAIGHT_JOIN t1 AS t1_2 ON ( t1_2.a = t1_1.b ) 
);

EXPLAIN also crashes.



 Comments   
Comment by Sergei Petrunia [ 2014-07-30 ]

The cause is that table_cond_selectivity() produces this:

(gdb) p sel
  $32 = -nan(0x8000000000000)

for idx=2, table->alias= t1_2

Actually, the problem starts here:

(gdb) p s->table->cond_selectivity
  $38 = 0

and also we have

(gdb) next
(gdb) p table->field[fldno]->cond_selectivity
  $39 = 0
(gdb) p fldno
  $40 = 1

Comment by Sergei Petrunia [ 2014-07-30 ]

And the reason we get that is as follows: Range optimizer finds the range t1_2.b='t'
It calls Column_statistics::get_avg_frequency() which returns 0.

Comment by Sergei Petrunia [ 2014-07-30 ]

We have a Column_statistics object that has all zeros:

(gdb) p *column_stats
  $95 = {static Scale_factor_nulls_ratio = 100000, static Scale_factor_avg_length = 100000, static Scale_factor_avg_frequency = 100000, column_stat_nulls = 0, min_value = 0x0, max_value = 0x0, nulls_ratio = 0, avg_length = 0, avg_frequency = 0, histogram = {type = SINGLE_PREC_HB, size = 0 '\000', values = 0x0}}
(gdb) p column_stats
  $96 = (Column_statistics *) 0x7fffc986e608

And it is set to be like that here:

(gdb) wher
  #0  alloc_statistics_for_table_share (thd=0x7fffd1ec7070, table_share=0x7fffc985c688, is_safe=false) at /home/psergey/dev2/10.0/sql/sql_statistics.cc:2041
  #1  0x00000000005fdc70 in open_and_process_table (thd=0x7fffd1ec7070, lex=0x7fffd1ecacc0, tables=0x7fffc9847358, counter=0x7ffff7f373bc, flags=0, prelocking_strategy=0x7ffff7f373f0, has_prelocking_list=false, ot_ctx=0x7ffff7f37250, new_frm_mem=0x7ffff7f37290) at /home/psergey/dev2/10.0/sql/sql_base.cc:4047
  #2  0x00000000005fe98c in open_tables (thd=0x7fffd1ec7070, start=0x7ffff7f37370, counter=0x7ffff7f373bc, flags=0, prelocking_strategy=0x7ffff7f373f0) at /home/psergey/dev2/10.0/sql/sql_base.cc:4462
  #3  0x00000000005ff8a3 in open_and_lock_tables (thd=0x7fffd1ec7070, tables=0x7fffc9847358, derived=true, flags=0, prelocking_strategy=0x7ffff7f373f0) at /home/psergey/dev2/10.0/sql/sql_base.cc:5079
  #4  0x00000000005f3913 in open_and_lock_tables (thd=0x7fffd1ec7070, tables=0x7fffc9847358, derived=true, flags=0) at /home/psergey/dev2/10.0/sql/sql_base.h:485
  #5  0x0000000000664b31 in execute_sqlcom_select (thd=0x7fffd1ec7070, all_tables=0x7fffc9847358) at /home/psergey/dev2/10.0/sql/sql_parse.cc:5208

Comment by Sergei Petrunia [ 2014-07-30 ]

The bug is caused by two factor:
1. optimizer code crashing when Column_statistics::get_avg_frequency() returns 0.
2. Column_statistics::get_avg_frequency() returning 0.

I would say that #1 is not really a bug, because avg_frequency() >= 1 by definition.

The problem is #2. Perhaps, get_avg_frequency()=0 mean "No data is available"? It's not documented anywhere.

Comment by Sergei Petrunia [ 2014-07-30 ]

Discussed with Igor. Indeed, get_avg_frequency()=0 means "No data is available". The optimizer code should give special treatment to the case where get_avg_frequency()=0 . It's not only about avoiding divisions by zero, it is also about not producing very wrong estimates.

Comment by Elena Stepanova [ 2014-08-01 ]

I have a somewhat different test case, where I do run ANALYZE, but then add a key and after that run SELECT without re-running ANALYZE. That is, the flow is like this:

  • start server with use_stat_tables=PREFERABLY and optimizer_use_condition_selectivity=2
  • create table
  • populate table
  • analyze table
  • add a new index to the table
  • run SELECT (which presumably uses the newly created index)

After that I get the same assertion failure.

Can I consider it a variation of the same problem, or does it need to be analyzed separately?

Comment by Sergei Petrunia [ 2014-08-01 ]

elenst, please file a separate issue about this. The above problem is not using index statistics, it seems like you're looking at a similar but a separate issue.

Comment by Elena Stepanova [ 2014-08-01 ]

Filed as MDEV-6519

Generated at Thu Feb 08 07:11:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.