[MDEV-11162] Assertion `num_records == m_idx_array.size()' failed in Filesort_buffer::alloc_sort_buffer(uint, uint) Created: 2016-10-27  Updated: 2016-12-07  Resolved: 2016-12-07

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0, 10.1
Fix Version/s: 10.0.29

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Varun Gupta (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

--source include/have_innodb.inc
 
CREATE TABLE t1 (i INT) ENGINE=InnoDB;
SELECT ( SELECT DISTINCT GROUP_CONCAT(SLEEP(0)) FROM t1 GROUP BY i );
SELECT i FROM t1 ORDER BY i LIMIT 1;

10.1ea0ae42d83

mysqld: /data/src/10.1/sql/filesort_utils.cc:110: uchar** Filesort_buffer::alloc_sort_buffer(uint, uint): Assertion `num_records == m_idx_array.size()' failed.
161027 18:14:48 [ERROR] mysqld got signal 6 ;
 
#6  0x00007f2cdb1c8266 in __assert_fail_base (fmt=0x7f2cdb301238 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7f2cde3e0b48 "num_records == m_idx_array.size()", file=file@entry=0x7f2cde3e0ae0 "/data/src/10.1/sql/filesort_utils.cc", line=line@entry=110, function=function@entry=0x7f2cde3e0ba0 <Filesort_buffer::alloc_sort_buffer(unsigned int, unsigned int)::__PRETTY_FUNCTION__> "uchar** Filesort_buffer::alloc_sort_buffer(uint, uint)") at assert.c:92
#7  0x00007f2cdb1c8312 in __GI___assert_fail (assertion=0x7f2cde3e0b48 "num_records == m_idx_array.size()", file=0x7f2cde3e0ae0 "/data/src/10.1/sql/filesort_utils.cc", line=110, function=0x7f2cde3e0ba0 <Filesort_buffer::alloc_sort_buffer(unsigned int, unsigned int)::__PRETTY_FUNCTION__> "uchar** Filesort_buffer::alloc_sort_buffer(uint, uint)") at assert.c:101
#8  0x00007f2cdddb1a6c in Filesort_buffer::alloc_sort_buffer (this=0x7f2cde8d1e40, num_records=2, record_length=10) at /data/src/10.1/sql/filesort_utils.cc:110
#9  0x00007f2cddc071c0 in Filesort_info::alloc_sort_buffer (this=0x7f2cde8d1e40, num_records=2, record_length=10) at /data/src/10.1/sql/table.h:354
#10 0x00007f2cddc05002 in check_if_pq_applicable (param=0x7f2cde8d1eb0, filesort_info=0x7f2cde8d1e40, table=0x7f2cc5881c70, num_rows=1129, memory_available=262144) at /data/src/10.1/sql/filesort.cc:1346
#11 0x00007f2cddc01ff9 in filesort (thd=0x7f2cd1bb2070, table=0x7f2cc5881c70, sortorder=0x7f2cc58fccc0, s_length=1, select=0x7f2cc58fc008, max_rows=1, sort_positions=false, examined_rows=0x7f2cde8d22b0, found_rows=0x7f2cde8d22b8, tracker=0x7f2cc58fcd00) at /data/src/10.1/sql/filesort.cc:224
#12 0x00007f2cdda4207a in create_sort_index (thd=0x7f2cd1bb2070, join=0x7f2cc58faac0, order=0x7f2cc58fa988, filesort_limit=1, select_limit=1, is_order_by=true) at /data/src/10.1/sql/sql_select.cc:21486
#13 0x00007f2cdda14beb in JOIN::exec_inner (this=0x7f2cc58faac0) at /data/src/10.1/sql/sql_select.cc:3185
#14 0x00007f2cdda120ad in JOIN::exec (this=0x7f2cc58faac0) at /data/src/10.1/sql/sql_select.cc:2505
#15 0x00007f2cdda1561b in mysql_select (thd=0x7f2cd1bb2070, rref_pointer_array=0x7f2cd1bb6430, tables=0x7f2cc58fa288, wild_num=0, fields=..., conds=0x0, og_num=1, order=0x7f2cc58fa988, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7f2cc58faaa0, unit=0x7f2cd1bb5ab8, select_lex=0x7f2cd1bb61b8) at /data/src/10.1/sql/sql_select.cc:3438
#16 0x00007f2cdda0b10c in handle_select (thd=0x7f2cd1bb2070, lex=0x7f2cd1bb59f0, result=0x7f2cc58faaa0, setup_tables_done_option=0) at /data/src/10.1/sql/sql_select.cc:384
#17 0x00007f2cdd9db309 in execute_sqlcom_select (thd=0x7f2cd1bb2070, all_tables=0x7f2cc58fa288) at /data/src/10.1/sql/sql_parse.cc:5895
#18 0x00007f2cdd9d1216 in mysql_execute_command (thd=0x7f2cd1bb2070) at /data/src/10.1/sql/sql_parse.cc:2961
#19 0x00007f2cdd9dea62 in mysql_parse (thd=0x7f2cd1bb2070, rawbuf=0x7f2cc58fa088 "SELECT i FROM t1 ORDER BY i LIMIT 1", length=35, parser_state=0x7f2cde8d35e0) at /data/src/10.1/sql/sql_parse.cc:7318
#20 0x00007f2cdd9cd46b in dispatch_command (command=COM_QUERY, thd=0x7f2cd1bb2070, packet=0x7f2cd3a9b071 "SELECT i FROM t1 ORDER BY i LIMIT 1", packet_length=35) at /data/src/10.1/sql/sql_parse.cc:1487
#21 0x00007f2cdd9cc1a2 in do_command (thd=0x7f2cd1bb2070) at /data/src/10.1/sql/sql_parse.cc:1108
#22 0x00007f2cddb0207d in do_handle_one_connection (thd_arg=0x7f2cd1bb2070) at /data/src/10.1/sql/sql_connect.cc:1350
#23 0x00007f2cddb01de1 in handle_one_connection (arg=0x7f2cd1bb2070) at /data/src/10.1/sql/sql_connect.cc:1262
#24 0x00007f2cddde4aba in pfs_spawn_thread (arg=0x7f2cd3ab8bf0) at /data/src/10.1/storage/perfschema/pfs.cc:1860
#25 0x00007f2cdd0ca0a4 in start_thread (arg=0x7f2cde8d4b00) at pthread_create.c:309
#26 0x00007f2cdb28287d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Reproducible on 10.0, 10.1.
Not reproducible on 5.5 and current 10.2 (0c15d1a6ff).



 Comments   
Comment by Sergei Petrunia [ 2016-12-05 ]

Notes from our discussion with varun:

  • The assert happens during the second query
  • It looks like this:

  #4  0x0000555555e04e1e in Filesort_buffer::alloc_sort_buffer (this=0x7ffff7f16e90, num_records=2, record_length=10) at /home/psergey/dev-git/10.1/sql/filesort_utils.cc:110
(gdb) print num_records
  $6 = 2
(gdb) p m_idx_array.size()
  $8 = 1129

and the number 1129 doesn't make any sense. It looks like the code is using some garbage value left over from the previous query.

Comment by Sergei Petrunia [ 2016-12-05 ]

Indeed, the value 1129 gets assigned by the previous query here in filesort():

  // Assign the copy back!
  table->sort= table_sort;

Hardware watchpoint 5: table->sort->filesort_buffer.m_idx_array.m_size
  Old value = 0
  New value = 1129

[

Comment by Sergei Petrunia [ 2016-12-05 ]

It's not only the buffer length. The buffer itself is left, too:
If I stop at the

  Filesort_info table_sort= table->sort;

and add a watchpoints to see where memory is allocated, I get:

(gdb) watch -l  table_sort.filesort_buffer.m_idx_array.m_array
  Hardware watchpoint 10: -location table_sort.filesort_buffer.m_idx_array.m_array
(gdb) watch -l  table->sort.filesort_buffer.m_idx_array.m_array
  Hardware watchpoint 11: -location table->sort.filesort_buffer.m_idx_array.m_array

then I see the buffer to be allocated

  Hardware watchpoint 10: -location table_sort.filesort_buffer.m_idx_array.m_array
  Old value = (unsigned char **) 0x0
  New value = (unsigned char **) 0x7fffc790f070
  0x0000555555e04da9 in Filesort_buffer::alloc_sort_buffer (this=0x7ffff7f16500, num_records=1129, record_length=10) at /home/psergey/dev-git/10.1/sql/filesort_utils.cc:103
(gdb) wher
  #0  0x0000555555e04da9 in Filesort_buffer::alloc_sort_buffer (this=0x7ffff7f16500, num_records=1129, record_length=10) at /home/psergey/dev-git/10.1/sql/filesort_utils.cc:103
  #1  0x0000555555c5a3c6 in Filesort_info::alloc_sort_buffer (this=0x7ffff7f16500, num_records=1129, record_length=10) at /home/psergey/dev-git/10.1/sql/table.h:354
  #2  0x0000555555c55472 in filesort (thd=0x7fffe337d070, table=0x7fffc7890e70, sortorder=0x7fffc7826a78, s_length=1, select=0x7fffc7825a88, max_rows=18446744073709551615, sort_positions=false, examined_rows=0x7ffff7f16970, found_rows=0x7ffff7f16978, tracker=0x7fffc7826ab8) at /home/psergey/dev-git/10.1/sql/filesort.cc:273
  #3  0x0000555555a94df4 in create_sort_index (thd=0x7fffe337d070, join=0x7fffc7824390, order=0x7fffc7823990, filesort_limit=18446744073709551615, select_limit=18446744073709551615, is_order_by=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:21486
...

but it's never freed (checked by looking into .trace file. my_malloc with this address is there, my_free is not)

Comment by Sergei Petrunia [ 2016-12-05 ]

Looking at where Filesort_buffer::free_sort_buffer() calls are made:

- various error conditions in filesort() itself.
- in Filesort_info::free_sort_buffer() call
  - in filesort_free_buffers(... ,full=true) call
     - called from JOIN::cleanup(full=true)
 
- irrelevant places in sql_show.cc, subquery materialization code, etc.

Need to check JOIN::cleanup() calls.

Comment by Sergei Petrunia [ 2016-12-05 ]

Debugging when/how JOIN::cleanup is run for the query:

SELECT ( SELECT DISTINCT GROUP_CONCAT(SLEEP(0)) FROM t1 GROUP BY i );

(gdb) wher
  #0  JOIN::cleanup (this=0x7fffc7026ff8, full=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924
  #1  0x0000555555a7e1f4 in JOIN::join_free (this=0x7fffc7026ff8) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11862
  #2  0x0000555555a66b75 in JOIN::exec_inner (this=0x7fffc7024390) at /home/psergey/dev-git/10.1/sql/sql_select.cc:2979
  #3  0x0000555555a64e27 in JOIN::exec (this=0x7fffc7024390) at /home/psergey/dev-git/10.1/sql/sql_select.cc:2505
  #4  0x0000555555d09306 in subselect_single_select_engine::exec (this=0x7fffc7023b18) at /home/psergey/dev-git/10.1/sql/item_subselect.cc:3788
  #5  0x0000555555cff72e in Item_subselect::exec (this=0x7fffc70239e0) at /home/psergey/dev-git/10.1/sql/item_subselect.cc:707
  #6  0x0000555555d011d3 in Item_singlerow_subselect::val_str (this=0x7fffc70239e0, str=0x7ffff7f16f90) at /home/psergey/dev-git/10.1/sql/item_subselect.cc:1318
  #7  0x0000555555c7f363 in Item::send (this=0x7fffc70239e0, protocol=0x7fffe337d600, buffer=0x7ffff7f16f90) at /home/psergey/dev-git/10.1/sql/item.cc:6460
  #8  0x000055555596ce46 in Protocol::send_result_set_row (this=0x7fffe337d600, row_items=0x7fffe33812d0) at /home/psergey/dev-git/10.1/sql/protocol.cc:914
  #9  0x00005555559e4bf0 in select_send::send_data (this=0x7fffc7023ce0, items=...) at /home/psergey/dev-git/10.1/sql/sql_class.cc:2829
  #10 0x0000555555a653aa in JOIN::exec_inner (this=0x7fffc7023d00) at /home/psergey/dev-git/10.1/sql/sql_select.cc:2584
  #11 0x0000555555a64e27 in JOIN::exec (this=0x7fffc7023d00) at /home/psergey/dev-git/10.1/sql/sql_select.cc:2505
  #12 0x0000555555a68395 in mysql_select (thd=0x7fffe337d070, rref_pointer_array=0x7fffe3381430, tables=0x0, wild_num=0, fields=..., conds=0x0, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2147748608, result=0x7fffc7023ce0, unit=0x7fffe3380ab8, select_lex=0x7fffe33811b8) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3438
  #13 0x0000555555a5de86 in handle_select (thd=0x7fffe337d070, lex=0x7fffe33809f0, result=0x7fffc7023ce0, setup_tables_done_option=0) at /home/psergey/dev-git/10.1/sql/sql_select.cc:384
  #14 0x0000555555a2e05d in execute_sqlcom_select (thd=0x7fffe337d070, all_tables=0x7fffc7023290) at /home/psergey/dev-git/10.1/sql/sql_parse.cc:5896
  #15 0x0000555555a23f7f in mysql_execute_command (thd=0x7fffe337d070) at /home/psergey/dev-git/10.1/sql/sql_parse.cc:2968
  #16 0x0000555555a317b6 in mysql_parse (thd=0x7fffe337d070, rawbuf=0x7fffc7022088 "SELECT ( SELECT DISTINCT GROUP_CONCAT(SLEEP(0)) FROM t1 GROUP BY i )", length=68, parser_state=0x7ffff7f18630) at /home/psergey/dev-git/10.1/sql/sql_parse.cc:7319
  #17 0x0000555555a20189 in dispatch_command (command=COM_QUERY, thd=0x7fffe337d070, packet=0x7fffd1fd6071 "", packet_length=68) at /home/psergey/dev-git/10.1/sql/sql_parse.cc:1487
  #18 0x0000555555a1eec0 in do_command (thd=0x7fffe337d070) at /home/psergey/dev-git/10.1/sql/sql_parse.cc:1108
  #19 0x0000555555b54fa5 in do_handle_one_connection (thd_arg=0x7fffe337d070) at /home/psergey/dev-git/10.1/sql/sql_connect.cc:1350
  #20 0x0000555555b54d09 in handle_one_connection (arg=0x7fffe337d070) at /home/psergey/dev-git/10.1/sql/sql_connect.cc:1262
  #21 0x00007ffff64d16aa in start_thread (arg=0x7ffff7f19b00) at pthread_create.c:333
  #22 0x00007ffff5befeed in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Look at frames #0, #1, #2, #3, #10.
There are THREE join objects:

  • 0x7fffc7026ff8
  • 0x7fffc7024390
  • 0x7fffc7023d00

(gdb) c
  Continuing.
  
  Breakpoint 12, JOIN::cleanup (this=0x7fffc7026ff8, full=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924
(gdb) c
  Continuing.
  
  Breakpoint 12, JOIN::cleanup (this=0x7fffc7023d00, full=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924
(gdb) c
  Continuing.
  
  Breakpoint 12, JOIN::cleanup (this=0x7fffc7024390, full=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924
(gdb) c
  Continuing.
  
  Breakpoint 12, JOIN::cleanup (this=0x7fffc7023d00, full=true) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924
(gdb) c
  Continuing.
  
  Breakpoint 12, JOIN::cleanup (this=0x7fffc7026ff8, full=true) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924

  • 0x7fffc7023d00 is a (degenerate) JOIN from the top-level select. It has JOIN::cleanup(full=true) call
  • 0x7fffc7026ff8 is a JOIN object that represents the pre-GROUP-BY select operation in the subquery. It has had a JOIN::cleanup(full=true) call.
  • 0x7fffc7024390 is a JOIN object that represents a post-group-by row read in the subquery. it DOES NOT have its JOIN::cleanup(full=true)
Comment by Sergei Petrunia [ 2016-12-05 ]

Indeed, if I stop at the cleanup call for 0x7fffc7024390

  Breakpoint 12, JOIN::cleanup (this=0x7fffc7024390, full=false) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11924

I can follow it into the filesort_free_buffers(first_tab->table, full) call and see that the buffer that should have been freed is here:

(gdb) p table->sort
  $109 = {filesort_buffer = {m_idx_array = {m_array = 0x7fffc750f070, m_size = 1129},...

Comment by Sergei Petrunia [ 2016-12-05 ]

The relationship between the two subquery's JOIN objects is that one of them points to another using tmp_join member:

(gdb) p this
  $157 = (JOIN * const) 0x7fffc8824390
(gdb) p tmp_join
  $158 = (JOIN *) 0x7fffc8826ff8
(gdb) p tmp_join->tmp_join
  $159 = (JOIN *) 0x0

(gdb) wher
  #0  JOIN::cleanup (this=0x7fffc8826ff8, full=true) at /home/psergey/dev-git/10.1/sql/sql_select.cc:11921
  #1  0x0000555555a67eb3 in JOIN::destroy (this=0x7fffc8826ff8) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3282
  #2  0x0000555555a67e7d in JOIN::destroy (this=0x7fffc8824390) at /home/psergey/dev-git/10.1/sql/sql_select.cc:3277
  #3  0x0000555555afcbda in st_select_lex::cleanup (this=0x7fffc8822188) at /home/psergey/dev-git/10.1/sql/sql_union.cc:1189

I can see that

  • The object with address *390 object has JOIN::destroy() called for it
  • It calls JOIN::destroy() for its {{tmp_join} object, the ...ff8 object
  • In JOIN::destroy(), the there is a cleanup(1) call, so ..ff8 object cleans up itself.
  • However, the *390 object does not call cleanup(1) for itself. It calls JOIN::destroy for the tmp_join from this line

 
    DBUG_RETURN(tmp_join->destroy());

but then it immediately exits, without having called cleanup(1) for itself.

Comment by Sergei Petrunia [ 2016-12-05 ]

This one seems to work (it's important to call cleanup() first, before destroy())

diff --git a/sql/sql_select.cc b/sql/sql_select.cc
index 239e5b6..9d26b8d 100644
--- a/sql/sql_select.cc
+++ b/sql/sql_select.cc
@@ -3274,7 +3274,10 @@ JOIN::destroy()
     */
     tmp_table_param.cleanup();
     tmp_join->tmp_table_param.copy_field= 0;
-    DBUG_RETURN(tmp_join->destroy());
+    int err;
+    cleanup(1);
+    err= tmp_join->destroy();
+    DBUG_RETURN(err);
   }
   cond_equal= 0;
   having_equal= 0;

Generated at Thu Feb 08 07:47:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.