[MDEV-4556] Server crashes in SEL_ARG::rb_insert with index_merge+index_merge_sort_union, FORCE INDEX Created: 2013-05-21  Updated: 2014-02-19  Resolved: 2014-02-19

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.0.2, 5.5.30, 5.3.12
Fix Version/s: 10.0.4, 5.5.32, 5.3.13

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: optimizer

Attachments: File mdev4556-wrong_result.test     File mdev4556.diff    
Issue Links:
Relates

 Description   

#2  0x000000000082eeae in handle_fatal_signal (sig=11) at signal_handler.cc:273
#3  <signal handler called>
#4  0x00000000007ec0b6 in SEL_ARG::rb_insert (this=0x7fcafc03a408, leaf=0x7fcafc03a498) at opt_range.cc:9557
#5  0x00000000007ebb3b in SEL_ARG::insert (this=0x7fcafc03a408, key=0x7fcafc03a498) at opt_range.cc:9400
#6  0x00000000007eadf6 in key_or (param=0x7fcb16e60b70, key1=0x7fcafc03a408, key2=0x7fcafc033d78) at opt_range.cc:8971
#7  0x00000000007e119a in merge_same_index_scans (param=0x7fcb16e60b70, imerge=0x7fcafc038678, imerge_trp=0x7fcafc03a3b8, read_time=10.919500474043771) at opt_range.cc:4810
#8  0x00000000007e0a33 in get_best_disjunct_quick (param=0x7fcb16e60b70, imerge=0x7fcafc038678, read_time=10.919500474043771) at opt_range.cc:4644
#9  0x00000000007ddeed in SQL_SELECT::test_quick_select (this=0x7fcafc032038, thd=0x2fd9480, keys_to_use=..., prev_tables=0, limit=18446744073709551615, force_quick_range=false, ordered_output=false) at opt_range.cc:3198
#10 0x0000000000714933 in get_quick_record_count (thd=0x2fd9480, select=0x7fcafc032038, table=0x7fcafc026940, keys=0x7fcafc030c88, limit=18446744073709551615) at sql_select.cc:3039
#11 0x00000000007166bd in make_join_statistics (join=0x7fcafc02ead8, tables_list=..., conds=0x7fcafc02cf78, keyuse_array=0x7fcafc02ede0) at sql_select.cc:3583
#12 0x000000000070dabd in JOIN::optimize (this=0x7fcafc02ead8) at sql_select.cc:1158
#13 0x000000000071460f in mysql_select (thd=0x2fd9480, rref_pointer_array=0x2fdc1e8, tables=0x7fcafc02c888, wild_num=1, fields=..., conds=0x7fcafc02cf78, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147764736, result=0x7fcafc023418, unit=0x2fdba18, select_lex=0x2fdbf40) at sql_select.cc:2976
#14 0x000000000070b04f in handle_select (thd=0x2fd9480, lex=0x2fdb978, result=0x7fcafc023418, setup_tables_done_option=0) at sql_select.cc:288
#15 0x0000000000696507 in execute_sqlcom_select (thd=0x2fd9480, all_tables=0x7fcafc02c888) at sql_parse.cc:5172
#16 0x000000000068d2c6 in mysql_execute_command (thd=0x2fd9480) at sql_parse.cc:2305
#17 0x0000000000698f81 in mysql_parse (thd=0x2fd9480, rawbuf=0x7fcafc022ea8 "SELECT * FROM state FORCE INDEX ( PRIMARY, population_rate, area_rate, code ) \nWHERE pk = 1 OR population_rate = 1 OR ( area_rate IN ( 1,2 ) OR area_rate IS NULL ) AND ( population_rate = 25 OR area_r"..., length=253, found_semicolon=0x7fcb16e647d8) at sql_parse.cc:6173
#18 0x000000000068aa6a in dispatch_command (command=COM_QUERY, thd=0x2fd9480, packet=0x30641b1 "SELECT * FROM state FORCE INDEX ( PRIMARY, population_rate, area_rate, code ) \nWHERE pk = 1 OR population_rate = 1 OR ( area_rate IN ( 1,2 ) OR area_rate IS NULL ) AND ( population_rate = 25 OR area_r"..., packet_length=254) at sql_parse.cc:1243
#19 0x0000000000689d08 in do_command (thd=0x2fd9480) at sql_parse.cc:923
#20 0x0000000000686763 in handle_one_connection (arg=0x2fd9480) at sql_connect.cc:1231
#21 0x00007fcb18b7ce9a in start_thread (arg=0x7fcb16e65700) at pthread_create.c:308
#22 0x00007fcb180a5cbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Stack trace from:

revision-id: sergii@pisem.net-20130521074210-lcv22xh6audhc5a9
revno: 3659
branch-nick: 5.3

Also reproducible on current 5.5, 10.0. Not reproducible on MySQL 5.6.

Minimal optimizer_switch: index_merge=on,index_merge_sort_union=on
Full optimizer_switch (default):

index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on

Test case:

SET optimizer_switch = 'index_merge=on,index_merge_sort_union=on';
 
CREATE TABLE state (
  pk int,
  code char(2),
  population_rate int,
  area_rate int,
  primary key (pk),
  index (code),
  key (population_rate),
  key (area_rate)
);
 
INSERT INTO state VALUES  (1,'WI',20, 23), (2, 'WA', 13, 18);
 
SELECT * FROM state FORCE INDEX ( PRIMARY, population_rate, area_rate, code ) 
WHERE pk = 1 OR population_rate = 1 OR ( area_rate IN ( 1,2 ) OR area_rate IS NULL ) AND ( population_rate = 25 OR area_rate BETWEEN 2 AND 25 OR code BETWEEN 'MA' AND 'TX'  )
;

EXPLAIN also crashes.



 Comments   
Comment by Patryk Pomykalski [ 2013-06-21 ]

Attached a possible fix. Not sure if it's the proper way, I got the idea after looking at a similar bug in mysql: http://bazaar.launchpad.net/~mysql/mysql-server/5.6/revision/4764

Comment by Sergei Petrunia [ 2013-07-11 ]

Analysis:

The crash happens in SEL_ARG::rb_insert. The reason for the crash is that
rb_inser was invoked:

$node1->rb_insert($node2)

when $node1 is not the root node of the RB tree.

The rb_insert is called from SEL_ARG::insert(), which has the same restriction:
one may call $selarg->insert(...) only when $selarg is the root node of the
RB-tree that it belongs to.

Comment by Sergei Petrunia [ 2013-07-11 ]

Tracking it further, one can see that 10th call to key_or() made by the query receives non-root SEL_ARG as argument. Before that, the same SEL_ARG is passed as an argument to key_or() call #4.

key_or() call #4 combines two SEL_ARG tree, and the passed tree ceases to be the root node (because of RB-tree rebalancing).

Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x7fff2c017030, key2=0x7fff2c0170a0) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x7fff2c017030, key2=0x7fff2c017108) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x7fff2c016838, key2=0x7fff2c017178) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x7fff2c017030, key2=0x7fff2c01ad28) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x0, key2=0x0) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x7fff2c01ce48, key2=0x7fff2c017030) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x0, key2=0x0) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x0, key2=0x0) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x0, key2=0x0) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854
Breakpoint 3, key_or (param=0x7fff38543a20, key1=0x7fff2c01ad28, key2=0x7fff2c017030) at /home/psergey/dev2/5.5/sql/opt_range.cc:8854

Comment by Sergei Petrunia [ 2013-07-11 ]

So, the problem is that merge_same_index_scans() does the following: given a SEL_ARG* pointer $selarg, it
1. calls tree_or($selarg, ...). This causes $selarg to be embedded in the RB-tree.
2. re-uses the $selarg in other part of SEL_TREE graph.

The idea in Patryk's patch: call key->clone() before passing it as an argument to key_or() seems to be correct.

As for implementation, I don't get what these lines are for:

+ next_arg->next=0; // Fix last link
+ tmp.next->prev=0; // Fix first link

Comment by Sergei Petrunia [ 2013-07-11 ]

Another question is, why do we need to call key->clone() in merge_same_index_scans() manually. There is SEL_ARG::use_count, which could be used to provide a kind of copy-on-write protection for SEL_ARG trees.

Comment by Sergei Petrunia [ 2013-07-11 ]

Oracle's patch is for a similar, but different problem. mysql-5.5 (which doesn't have Oracle's fix) doesn't crash on the testcase from this bug. It seems, we don't have access to Oracle's testcase.

Comment by Patryk Pomykalski [ 2013-07-11 ]

I don't get it too...

Comment by Sergei Petrunia [ 2013-07-11 ]

Alternative fix:

=== modified file 'sql/opt_range.cc'
— sql/opt_range.cc 2013-06-05 20:53:35 +0000
+++ sql/opt_range.cc 2013-07-11 10:24:07 +0000
@@ -4916,6 +4916,8 @@ TABLE_READ_PLAN *merge_same_index_scans(
bzero((*changed_tree)->keys,
sizeof((*changed_tree)>keys[0])*param>keys);
(*changed_tree)->keys_map.clear_all();
+ key->incr_refs();
+ (*tree)>keys[key_idx]>incr_refs();
if (((*changed_tree)->keys[key_idx]=
key_or(param, key, (*tree)->keys[key_idx])))
(*changed_tree)->keys_map.set_bit(key_idx);

Comment by Sergei Petrunia [ 2013-07-11 ]

Made a commit with the alternative diff. Will need a review discussion for it

Comment by Sergei Petrunia [ 2013-07-11 ]

Pushed the "Alternative fix". Oracle's bugfix will be dealt with outside of the scope of this bug.

Patryk, thanks for the ideas and pointers!

Comment by Elena Stepanova [ 2014-02-01 ]

Although the 'Fix version(s)' field says that it was fixed in 5.3, it seems the patch only made it to 5.5, while 5.3 is still crashing. Maybe it makes sense to backport it.

Comment by Elena Stepanova [ 2014-02-01 ]

Also, I have an apparently related test case which causes wrong results on 5.3. It used to fail on 5.5 too, but the failure was gone after this patch. The full test case is big-gish, so I will attach it as mdev4556-wrong_result.test. The problematic query is:

SELECT COUNT(*) FROM t1 
FORCE KEY (g,d,PRIMARY) 
WHERE 
      g = 255 
   OR d BETWEEN 'Alabama' AND 'New Jersey' 
   OR ( g < 227 OR pk NOT IN ( 255, 1 ) ) AND pk NOT BETWEEN 3 AND 5;

On the provided data it returns

COUNT(*)
1711

It's supposed to be 2000.
COUNT is not important, it can be SELECT * as well, the result is just more obvious this way.

Comment by Sergei Petrunia [ 2014-02-19 ]

Backported the fix to 5.3

Generated at Thu Feb 08 06:57:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.