[MDEV-10324] Server crash in get_sel_arg_for_keypart or Assertion `n < size()' failed in Mem_root_array Created: 2016-07-02  Updated: 2016-08-27  Resolved: 2016-07-05

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.1.15
Fix Version/s: 10.1.16

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Sergei Petrunia
Resolution: Fixed Votes: 2
Labels: regression

Issue Links:
Duplicate
duplicates MDEV-10326 Server crash under query Closed
duplicates MDEV-10333 MariaDb 10.1 crashes Closed
duplicates MDEV-10337 MariaDB 10.1.15 instantly crashes whe... Closed
duplicates MDEV-10691 crash in get_sel_arg_for_keypart Closed
Sprint: 1.0.2

 Description   

The problem was apparently introduced by this change:

commit 016790403a4bb6182094870870ce1a1c3e2756dc
Author: Sergei Petrunia <psergey@askmonty.org>
Date:   Tue May 31 17:59:04 2016 +0300
 
    MDEV-9764: MariaDB does not limit memory used for range optimization
...

Stack trace from 10.1 commit ccdd63388a789f407d502848d12c618593a5ded1 debug build

mysqld: /data/src/10.1/sql/mem_root_array.h:72: Element_type& Mem_root_array<Element_type, has_trivial_destructor>::at(size_t) [with Element_type = SEL_ARG*; bool has_trivial_destructor = true; size_t = long unsigned int]: Assertion `n < size()' failed.
160702 14:30:03 [ERROR] mysqld got signal 6 ;
 
#6  0x00007f742b4e8266 in __assert_fail_base (fmt=0x7f742b621238 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f742e6e5dac "n < size()", file=file@entry=0x7f742e6e5d88 "/data/src/10.1/sql/mem_root_array.h", line=line@entry=72, function=function@entry=0x7f742e6e5ec0 <Mem_root_array<SEL_ARG*, true>::at(unsigned long)::__PRETTY_FUNCTION__> "Element_type& Mem_root_array<Element_type, has_trivial_destructor>::at(size_t) [with Element_type = SEL_ARG*; bool has_trivial_destructor = true; size_t = long unsigned int]") at assert.c:92
#7  0x00007f742b4e8312 in __GI___assert_fail (assertion=0x7f742e6e5dac "n < size()", file=0x7f742e6e5d88 "/data/src/10.1/sql/mem_root_array.h", line=72, function=0x7f742e6e5ec0 <Mem_root_array<SEL_ARG*, true>::at(unsigned long)::__PRETTY_FUNCTION__> "Element_type& Mem_root_array<Element_type, has_trivial_destructor>::at(size_t) [with Element_type = SEL_ARG*; bool has_trivial_destructor = true; size_t = long unsigned int]") at assert.c:101
#8  0x00007f742e063e27 in Mem_root_array<SEL_ARG*, true>::at (this=0x7f7422454108, n=1) at /data/src/10.1/sql/mem_root_array.h:72
#9  0x00007f742e063805 in Mem_root_array<SEL_ARG*, true>::operator[] (this=0x7f7422454108, n=1) at /data/src/10.1/sql/mem_root_array.h:82
#10 0x00007f742e05d1d8 in get_index_range_tree (index=1, range_tree=0x7f7422454100, param=0x7f742ec457d0, param_idx=0x7f742ec43e00) at /data/src/10.1/sql/opt_range.cc:13057
#11 0x00007f742e05ba28 in get_best_group_min_max (param=0x7f742ec457d0, tree=0x7f7422454100, read_time=1.428462998102467) at /data/src/10.1/sql/opt_range.cc:12404
#12 0x00007f742e0453fc in SQL_SELECT::test_quick_select (this=0x7f74224e65e0, thd=0x7f74247fa070, keys_to_use=..., prev_tables=4611686018427387904, limit=1, force_quick_range=false, ordered_output=false, remove_false_parts_of_where=false) at /data/src/10.1/sql/opt_range.cc:2572
#13 0x00007f742dd44cf4 in make_join_select (join=0x7f74224e4280, select=0x7f74224e6408, cond=0x7f74224e36d8) at /data/src/10.1/sql/sql_select.cc:9917
#14 0x00007f742dd2cd38 in JOIN::optimize_inner (this=0x7f74224e4280) at /data/src/10.1/sql/sql_select.cc:1572
#15 0x00007f742dd2aeaa in JOIN::optimize (this=0x7f74224e4280) at /data/src/10.1/sql/sql_select.cc:1036
#16 0x00007f742dce1aad in st_select_lex::optimize_unflattened_subqueries (this=0x7f74247fe188, const_only=true) at /data/src/10.1/sql/sql_lex.cc:3763
#17 0x00007f742de78a17 in JOIN::optimize_constant_subqueries (this=0x7f74224e3bd0) at /data/src/10.1/sql/opt_subselect.cc:5085
#18 0x00007f742dd2b2f1 in JOIN::optimize_inner (this=0x7f74224e3bd0) at /data/src/10.1/sql/sql_select.cc:1146
#19 0x00007f742dd2aeaa in JOIN::optimize (this=0x7f74224e3bd0) at /data/src/10.1/sql/sql_select.cc:1036
#20 0x00007f742dd3351e in mysql_select (thd=0x7f74247fa070, rref_pointer_array=0x7f74247fe400, tables=0x0, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f74224e3bb0, unit=0x7f74247fda88, select_lex=0x7f74247fe188) at /data/src/10.1/sql/sql_select.cc:3437
#21 0x00007f742dd29080 in handle_select (thd=0x7f74247fa070, lex=0x7f74247fd9c0, result=0x7f74224e3bb0, setup_tables_done_option=0) at /data/src/10.1/sql/sql_select.cc:384
#22 0x00007f742dcf941c in execute_sqlcom_select (thd=0x7f74247fa070, all_tables=0x7f74224e2f48) at /data/src/10.1/sql/sql_parse.cc:5894
#23 0x00007f742dcef329 in mysql_execute_command (thd=0x7f74247fa070) at /data/src/10.1/sql/sql_parse.cc:2960
#24 0x00007f742dcfcb54 in mysql_parse (thd=0x7f74247fa070, rawbuf=0x7f74224e2088 "SELECT 1 IN ( SELECT COUNT( DISTINCT f2 ) FROM t1 WHERE f1 <= 4 )", length=65, parser_state=0x7f742ec475e0) at /data/src/10.1/sql/sql_parse.cc:7314
#25 0x00007f742dceb57e in dispatch_command (command=COM_QUERY, thd=0x7f74247fa070, packet=0x7f742613e071 "SELECT 1 IN ( SELECT COUNT( DISTINCT f2 ) FROM t1 WHERE f1 <= 4 )", packet_length=65) at /data/src/10.1/sql/sql_parse.cc:1486
#26 0x00007f742dcea2b5 in do_command (thd=0x7f74247fa070) at /data/src/10.1/sql/sql_parse.cc:1107
#27 0x00007f742de1fddd in do_handle_one_connection (thd_arg=0x7f74247fa070) at /data/src/10.1/sql/sql_connect.cc:1350
#28 0x00007f742de1fb41 in handle_one_connection (arg=0x7f74247fa070) at /data/src/10.1/sql/sql_connect.cc:1262
#29 0x00007f742e101336 in pfs_spawn_thread (arg=0x7f742ac27ef0) at /data/src/10.1/storage/perfschema/pfs.cc:1860
#30 0x00007f742d3ea0a4 in start_thread (arg=0x7f742ec48b00) at pthread_create.c:309

Stack trace from 10.1.15 release build

#2  <signal handler called>
#3  get_sel_arg_for_keypart (cur_range=0x7f5f54cb6440, keypart_tree=0x4, field=0x7f5f49041058) at /home/buildbot/buildbot/build/sql/opt_range.cc:12843
#4  get_constant_key_infix (first_non_infix_part=<synthetic pointer>, key_infix_len=<synthetic pointer>, key_infix=0x7f5f54cb7910 "", last_part=0x7f5f490413a0, min_max_arg_part=0x0, first_non_group_part=0x7f5f49041380, index_range_tree=0x4, index_info=<optimized out>, thd=<optimized out>) at /home/buildbot/buildbot/build/sql/opt_range.cc:12935
#5  get_best_group_min_max (read_time=1.428462998102467, tree=0x7f5f4933c098, param=0x7f5f54cb65c0) at /home/buildbot/buildbot/build/sql/opt_range.cc:12409
#6  SQL_SELECT::test_quick_select (this=this@entry=0x7f5f49339478, thd=thd@entry=0x7f5f4dbfa008, keys_to_use=..., prev_tables=4611686018427387905, prev_tables@entry=4611686018427387904, limit=<optimized out>, force_quick_range=force_quick_range@entry=false, ordered_output=ordered_output@entry=false, remove_false_parts_of_where=remove_false_parts_of_where@entry=false) at /home/buildbot/buildbot/build/sql/opt_range.cc:2572
#7  0x00007f5f53edd952 in make_join_select (join=join@entry=0x7f5f49337218, select=0x7f5f49339320, cond=0x7f5f49336670) at /home/buildbot/buildbot/build/sql/sql_select.cc:9917
#8  0x00007f5f53fe5554 in JOIN::optimize_inner (this=this@entry=0x7f5f49337218) at /home/buildbot/buildbot/build/sql/sql_select.cc:1572
#9  0x00007f5f53fe7510 in JOIN::optimize (this=this@entry=0x7f5f49337218) at /home/buildbot/buildbot/build/sql/sql_select.cc:1036
#10 0x00007f5f53f8be38 in st_select_lex::optimize_unflattened_subqueries (this=0x7f5f4dbfdfe0, const_only=const_only@entry=true) at /home/buildbot/buildbot/build/sql/sql_lex.cc:3763
#11 0x00007f5f5408f44a in JOIN::optimize_constant_subqueries (this=this@entry=0x7f5f49336b68) at /home/buildbot/buildbot/build/sql/opt_subselect.cc:5085
#12 0x00007f5f53fe4798 in JOIN::optimize_inner (this=this@entry=0x7f5f49336b68) at /home/buildbot/buildbot/build/sql/sql_select.cc:1146
#13 0x00007f5f53fe7510 in JOIN::optimize (this=this@entry=0x7f5f49336b68) at /home/buildbot/buildbot/build/sql/sql_select.cc:1036
#14 0x00007f5f53fe766d in mysql_select (thd=thd@entry=0x7f5f4dbfa008, rref_pointer_array=rref_pointer_array@entry=0x7f5f4dbfe258, tables=<optimized out>, wild_num=<optimized out>, fields=..., conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=result@entry=0x7f5f49336b48, unit=unit@entry=0x7f5f4dbfd8e0, select_lex=select_lex@entry=0x7f5f4dbfdfe0) at /home/buildbot/buildbot/build/sql/sql_select.cc:3437
#15 0x00007f5f53feafed in handle_select (thd=thd@entry=0x7f5f4dbfa008, lex=lex@entry=0x7f5f4dbfd818, result=result@entry=0x7f5f49336b48, setup_tables_done_option=setup_tables_done_option@entry=0) at /home/buildbot/buildbot/build/sql/sql_select.cc:384
#16 0x00007f5f53f8d2c2 in execute_sqlcom_select (thd=thd@entry=0x7f5f4dbfa008, all_tables=0x7f5f49335ee0) at /home/buildbot/buildbot/build/sql/sql_parse.cc:5894
#17 0x00007f5f53f9955e in mysql_execute_command (thd=thd@entry=0x7f5f4dbfa008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:2960
#18 0x00007f5f53f9caad in mysql_parse (thd=0x7f5f4dbfa008, rawbuf=<optimized out>, length=<optimized out>, parser_state=0x7f5f54cba600) at /home/buildbot/buildbot/build/sql/sql_parse.cc:7314
#19 0x00007f5f53f9f6fb in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f5f4dbfa008, packet=0x7f5f4cfcf009 "SELECT 1 IN ( SELECT COUNT( DISTINCT f2 ) FROM t1 WHERE f1 <= 4 )", packet_length=1228099616, packet_length@entry=65) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1486
#20 0x00007f5f53f9fcab in do_command (thd=0x7f5f4dbfa008) at /home/buildbot/buildbot/build/sql/sql_parse.cc:1107
#21 0x00007f5f5405bacf in do_handle_one_connection (thd_arg=thd_arg@entry=0x7f5f4dbfa008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1350
#22 0x00007f5f5405bc27 in handle_one_connection (arg=arg@entry=0x7f5f4dbfa008) at /home/buildbot/buildbot/build/sql/sql_connect.cc:1262
#23 0x00007f5f5449769d in pfs_spawn_thread (arg=0x7f5f5182ca08) at /home/buildbot/buildbot/build/storage/perfschema/pfs.cc:1860
#24 0x00007f5f52ec70a4 in start_thread (arg=0x7f5f54cbbb00) at pthread_create.c:309
#25 0x00007f5f525f087d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Test case

CREATE TABLE t1 (f1 INT NOT NULL, f2 VARCHAR(3) NOT NULL, KEY(f1), KEY(f2, f1));
INSERT INTO t1 VALUES (0,'foo'),(1,'bar');
SELECT 1 IN ( SELECT COUNT( DISTINCT f2 ) FROM t1 WHERE f1 <= 4 );

No crash on MySQL 5.7 as far as I can tell.



 Comments   
Comment by Davis Mosenkovs [ 2016-07-03 ]

Hi, I have the same issue - in my case this was affecting MediaWiki installation (I have done mysqldump, removed the datafiles, reinstalled MariaDB server and imported the mysqldump, and this didn't help; MariaDB 10.1.14 is working properly).

If there is a possibility that this will affect other applications as well may be it is worth considering to withdraw release 10.1.15? At least I received the update Friday evening, so I assume many enterprises will do the update during the business hours next week.

Comment by Sergei Golubchik [ 2016-07-03 ]

DavisNT, yes, we have had exactly the same thought. 10.1.15 is now withdrawn and we've shifted the release schedule to have 10.1.16 out sooner.

Sorry for the trouble.

Comment by Elena Stepanova [ 2016-07-03 ]

MediaWiki-based test case:

CREATE TABLE `job` (
  `job_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `job_cmd` varbinary(60) NOT NULL DEFAULT '',
  `job_namespace` int(11) NOT NULL,
  `job_title` varbinary(255) NOT NULL,
  `job_params` blob NOT NULL,
  `job_timestamp` varbinary(14) DEFAULT NULL,
  `job_random` int(10) unsigned NOT NULL DEFAULT '0',
  `job_token` varbinary(32) NOT NULL DEFAULT '',
  `job_token_timestamp` varbinary(14) DEFAULT NULL,
  `job_sha1` varbinary(32) NOT NULL DEFAULT '',
  `job_attempts` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`job_id`),
  KEY `job_cmd` (`job_cmd`,`job_namespace`,`job_title`,`job_params`(128)),
  KEY `job_timestamp` (`job_timestamp`),
  KEY `job_sha1` (`job_sha1`),
  KEY `job_cmd_token` (`job_cmd`,`job_token`,`job_random`),
  KEY `job_cmd_token_id` (`job_cmd`,`job_token`,`job_id`)
);
 
INSERT INTO `job` VALUES 
    (NULL, 'foo', 1, 'foo', 'foo', 'foo', 1, 'foo', 'foo', 'foo', 1),
    (NULL, 'bar', 2, 'bar', 'bar', 'bar', 2, 'bar', 'bar', 'bar', 2);
    
 
SELECT DISTINCT job_cmd  FROM `job` WHERE job_cmd IN ('foobar','null');

Comment by Sergei Petrunia [ 2016-07-04 ]

Investigation:

The following happens:

  • range optimizer is invoked for the first time (from get_quick_record_count)
    with keys_to_use=3.
  • range optimizer is invoked for the second time here:

  #0  SQL_SELECT::test_quick_select (this=0x7fff9c009718, thd=0x555557a60070, keys_to_use=..., prev_tables=4611686018427387904, limit=1, force_quick_range=false, ordered_output=false, remove_false_parts_of_where=false) at /home/psergey/dev-git/10.1-dbg6/sql/opt_range.cc:2434
  #1  0x0000555555aa6fbd in make_join_select (join=0x7fff9c0073b8, select=0x7fff9c009540, cond=0x7fff9c006880) at /home/psergey/dev-git/10.1-dbg6/sql/sql_select.cc:9917
  #2  0x0000555555a8fe98 in JOIN::optimize_inner (this=0x7fff9c0073b8) at /home/psergey/dev-git/10.1-dbg6/sql/sql_select.cc:1572
  #3  0x0000555555a8e068 in JOIN::optimize (this=0x7fff9c0073b8) at /home/psergey/dev-git/10.1-dbg6/sql/sql_select.cc:1036

This is the "We plan to scan all rows. Check again if we should use an index."
call.

This time, keys_to_use.map=1

However, group-min-max optimizer ignores that and tries to consider a plan with
key_no=1. This causes assertion failure, because Mem_root_array checks against
attempts to access data beyond the end of the array.

Comment by Sergei Petrunia [ 2016-07-04 ]

The example doesn't crash MySQL 5.7 for multiple reasons:

  • 5.7 doesn't invoke the range optimizer for the second time. They have the
    call there, but the logic to invoke it is changed (it's complex), and the
    condition for invoking the range optimizer is not met
  • 5.7 also has tab->keys()=3, tab->keys=3. So, even if I use gdb to force the
    optimizer to be invoked, it will operate on the full set of keys, not on the
    subset.
Comment by Sergei Petrunia [ 2016-07-04 ]
  • but forcing tab->keys= {1}

    doesn't cause them to crash, either...

Found it:
there is this change:

-  for (uint cur_index= 0 ; cur_index_info != cur_index_info_end ;
 -       cur_index_info++, cur_index++)
 +  // We go through allowed indexes
 +  for (uint cur_param_idx= 0; cur_param_idx < param->keys ; ++cur_param_idx)

made by this patch: https://github.com/mysql/mysql-server/commit/96fcfcbd7b5120e8f64fd45985001eca8d36fbfb

This is what I've missed when backporting the patch mentioned in the bug report.

Comment by Sergei Petrunia [ 2016-07-05 ]

The mentioned patch against 5.7 causes a different query plan to be chosen for this query:

EXPLAIN SELECT a FROM t1 IGNORE INDEX FOR GROUP BY (PRIMARY, ab) GROUP BY a;

THe change is:

-1	SIMPLE	t1	NULL	range	PRIMARY,ab	ab	4	NULL	13	100.00	Using index for group-by
+1	SIMPLE	t1	NULL	range	PRIMARY,ab	ab	4	NULL	13	100.00	Using index for group-by; Using temporary; Using filesort

One can argue that the previous plan was incorrect: if the query says "IGNORE INDEX FOR GROUP BY", why does the plan have "Using index for group-by" ?

However, the new query plan doesn't make any sense. It has:

Using index for group-by; Using temporary; Using filesort

If we do "Using index for group-by", it does grouping, and provides result in GROUP BY order.
What is the purpose of doing "Using temporary" or "Using filesort", then?

But this is what current mysql-5.7 has. I'm looking at tip cset:

commit e0e0ae2ea27c9bb76577664845507ef224d362e4
Author: Lars Tangvald <lars.tangvald@oracle.com>
Date:   Tue May 24 09:22:25 2016 +0200

Comment by Sergei Petrunia [ 2016-07-05 ]

Running ANALYZE and debugging shows that the execution follows the EXPLAIN. It uses both QUICK_GROUP_MIN_MAX_SELECT, then writes rows into a temp. table and then invokes filesort to read the result.

We arrive at the wrong query plan through the following steps:

  • range optimizer is invoked by get_quick_record_count(). It produces a QUICK_GROUP_MIN_MAX_SELECT. Note that this ignores the "IGNORE INDEX FOR GROUP BY" hint
  • We reach test_if_skip_sort_order. It invokes test_quick_select (... keys_to_use= {1}

    ...), which fails to produce a quick select that produces the required ordering. test_quick_select executes "goto use_filesort" and returns false/

  • The code in sql_select.cc sees that, and sets:

           need_tmp=1; simple_order=simple_group=0;	// Force tmp table without sort
    

  • This way, we get both a GROUP_MIN_MAX_SELECT and a temp. table with filesort.
Comment by Sergei Petrunia [ 2016-07-05 ]

A way to fix the above without rewriting a lot of code: make get_best_group_min_max() take into account IGNORE INDEX FOR GROUP BY, and not produce "Using index for group-by" query plans when they are not allowed.

Generated at Thu Feb 08 07:41:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.