[MDEV-6663] Assertion `0' failed in Protocol::end_statement() on concurrent UPDATE and ALTER TABLE .. PERSISTENT Created: 2014-08-29  Updated: 2022-10-21  Resolved: 2022-10-21

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Update, Optimizer, Partitioning
Affects Version/s: 10.0.13
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Vicențiu Ciorbaru
Resolution: Won't Fix Votes: 0
Labels: eits, partioning

Issue Links:
Relates
relates to MDEV-15601 [draft] protocol.cc:587: void Protoco... Closed
Sprint: 10.0.20, 10.0.26, 10.0.30

 Description   

10.0/sql/protocol.cc:521: void Protocol::end_statement(): Assertion `0' failed.
140829 19:00:32 [ERROR] mysqld got signal 6 ;

#6  0x00007f3ebc1d76f1 in *__GI___assert_fail (assertion=0xf0d710 "0", file=<optimized out>, line=521, function=0xf0ea40 "void Protocol::end_statement()") at assert.c:81
#7  0x00000000005d0ef3 in Protocol::end_statement (this=0x7f3ea38505f8) at 10.0/sql/protocol.cc:521
#8  0x00000000006772a6 in dispatch_command (command=COM_QUERY, thd=0x7f3ea3850070, packet=0x7f3ea3856071 "UPDATE `t1` SET `col_int` = 2 WHERE `col_varchar_1_key` = 'g' OR `col_int_key` >= 3", packet_length=83) at 10.0/sql/sql_parse.cc:1728
#9  0x000000000067521d in do_command (thd=0x7f3ea3850070) at 10.0/sql/sql_parse.cc:1004
#10 0x0000000000791512 in do_handle_one_connection (thd_arg=0x7f3ea3850070) at 10.0/sql/sql_connect.cc:1379
#11 0x0000000000791265 in handle_one_connection (arg=0x7f3ea3850070) at 10.0/sql/sql_connect.cc:1293
#12 0x00007f3ebdd90b50 in start_thread (arg=<optimized out>) at pthread_create.c:304
#13 0x00007f3ebc28820d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

Stack trace from:

revision-id: jplindst@mariadb.org-20140825103533-p9jodvh91r9tueo1
revno: 4362
branch-nick: 10.0
elenst@wheezy-64:~/bzr/10.0$ cd -
/home/elenst/bzr/randgen-mariadb-patches

RQG grammar (assertion.yy):

thread1:
	UPDATE `t1` SET `col_int` = 2 WHERE `col_varchar_1_key` = 'g' OR `col_int_key` >= 3;
 
thread2:
	ANALYZE TABLE `t1` PERSISTENT FOR ALL;

RQG data template (assertion.zz):

$tables = {
	names => [ 't1' ],
	rows => [ 5000 ] ,
	engines => [ 'InnoDB' ] ,
	pk => [ undef ] , 
	partitions => [ "LIST COLUMNS(col_int) (
		PARTITION p0 VALUES IN (1,3,5,7,9,NULL),
		PARTITION p1 VALUES IN (2,4,6,8,0))" ]
};
$fields = { types => [ 'int', 'varchar(1)' ] };
 
$data = {
	numbers => [ 'digit' , 'digit' , 'digit' , 'digit' , 'null' ],
	strings => [ 'letter' ]
};

RQG command line:

perl ./runall-new.pl  --gendata=assertion.zz --threads=2  --duration=300 --queries=100M --grammar=assertion.yy --basedir=<basedir> --vardir=<vardir>



 Comments   
Comment by Sergei Petrunia [ 2015-02-17 ]

cvicentiu, please check this (after MDEV-6687). It has a similar end result - attempt to put the client-server protocol into an invalid state. I suspect, the source is similar - something raised an error condition but didn't print it to the network (or vice versa - printed the error to the network but didn't let the SQL layer know this).

We need to get rid of the assertion, first. The second part of the bug is why ANALYZE TABLE causes errors in concurrent UPDATEs.

Comment by Vicențiu Ciorbaru [ 2015-06-12 ]

Finally tracked down where the bug occurs:

sql_update.cc:585

      /* If quick select is used, initialize it before retrieving rows. */
      if (select && select->quick && select->quick->reset())
      {
        close_cached_file(&tempfile);
        goto err;
      }

Calling quick->reset() returns HA_ERR_LOCK_DEADLOCK.
We don't report this error, which is why we get the assertion failure.

I've tried a small hack:

      if (select && select->quick && select->quick->reset())
      {
        close_cached_file(&tempfile);
        my_error(ER_UNKNOWN_ERROR, MYF(0));
        goto err;
      }

With this hack in place, we no longer get assertion errors.

Comment by Vicențiu Ciorbaru [ 2015-06-12 ]

correction: cur_quick->get_next() in sql/opt_range.cc:11303 return the HA_ERR_LOCK_DEADLOCK

Comment by Vicențiu Ciorbaru [ 2015-06-12 ]

Backtrace when encountering the HA_ERR_LOCK_DEADLOCK error.

(gdb) bt
#0  convert_error_code_to_mysql (error=DB_DEADLOCK, flags=1, thd=0x7f07847ef070) at /home/vicentiu/Workspace/MariaDB/server/storage/xtradb/handler/ha_innodb.cc:1736
#1  0x0000000000b9cdc2 in ha_innobase::index_read (this=0x7f077b8ee888, buf=0x7f077b8291a0 "\377\217\217\003", key_ptr=0x7f077b8234b0 "", key_len=4, find_flag=HA_READ_KEY_EXACT) at /home/vicentiu/Workspace/MariaDB/server/storage/xtradb/handler/ha_innodb.cc:8597
#2  0x0000000000862d56 in handler::index_read_map (this=0x7f077b8ee888, buf=0x7f077b8291a0 "\377\217\217\003", key=0x7f077b8234b0 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /home/vicentiu/Workspace/MariaDB/server/sql/handler.h:2920
#3  0x000000000085a0ec in handler::ha_index_read_map (this=0x7f077b8ee888, buf=0x7f077b8291a0 "\377\217\217\003", key=0x7f077b8234b0 "", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at /home/vicentiu/Workspace/MariaDB/server/sql/handler.cc:2600
#4  0x0000000000860388 in handler::read_range_first (this=0x7f077b8ee888, start_key=0x7f077b8eddd0, end_key=0x7f077b8ed9c0, eq_range_arg=true, sorted=true) at /home/vicentiu/Workspace/MariaDB/server/sql/handler.cc:5378
#5  0x0000000000e54876 in ha_partition::handle_ordered_index_scan (this=0x7f077b8ed888, buf=0x7f077b8291a0 "\377\217\217\003", reverse_order=false) at /home/vicentiu/Workspace/MariaDB/server/sql/ha_partition.cc:6153
#6  0x0000000000e53339 in ha_partition::common_index_read (this=0x7f077b8ed888, buf=0x7f077b8291a0 "\377\217\217\003", have_start_key=true) at /home/vicentiu/Workspace/MariaDB/server/sql/ha_partition.cc:5497
#7  0x0000000000e53b9e in ha_partition::read_range_first (this=0x7f077b8ed888, start_key=0x7f077b8ed970, end_key=0x7f077b8ed990, eq_range_arg=true, sorted=true) at /home/vicentiu/Workspace/MariaDB/server/sql/ha_partition.cc:5780
#8  0x00000000007bab73 in handler::multi_range_read_next (this=0x7f077b8ed888, range_info=0x7f07a89411c8) at /home/vicentiu/Workspace/MariaDB/server/sql/multi_range_read.cc:296
#9  0x000000000098729b in QUICK_RANGE_SELECT::get_next (this=0x7f077b85a280) at /home/vicentiu/Workspace/MariaDB/server/sql/opt_range.cc:11775
#10 0x0000000000985e15 in read_keys_and_merge_scans (thd=0x7f07847ef070, head=0x7f077b8ccc70, quick_selects=..., pk_quick_select=0x0, read_record=0x7f077b40eca8, intersection=false, filtered_scans=0x0, unique_ptr=0x7f077b40ec40)
    at /home/vicentiu/Workspace/MariaDB/server/sql/opt_range.cc:11303
#11 0x000000000098607f in QUICK_INDEX_MERGE_SELECT::read_keys_and_merge (this=0x7f077b40ec00) at /home/vicentiu/Workspace/MariaDB/server/sql/opt_range.cc:11374
#12 0x000000000096ff83 in QUICK_INDEX_SORT_SELECT::reset (this=0x7f077b40ec00) at /home/vicentiu/Workspace/MariaDB/server/sql/opt_range.cc:1956
#13 0x0000000000724a17 in mysql_update (thd=0x7f07847ef070, table_list=0x7f077b8221e8, fields=..., values=..., conds=0x7f077b823058, order_num=0, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7f07a8941c90, 
    updated_return=0x7f07a8941d20) at /home/vicentiu/Workspace/MariaDB/server/sql/sql_update.cc:585
#14 0x000000000065058b in mysql_execute_command (thd=0x7f07847ef070) at /home/vicentiu/Workspace/MariaDB/server/sql/sql_parse.cc:3312
#15 0x000000000065945a in mysql_parse (thd=0x7f07847ef070, rawbuf=0x7f077b822088 "UPDATE `t1` SET `col_int` = 2 WHERE `col_varchar_1_key` = 'g' OR `col_int_key` >= 3", length=83, parser_state=0x7f07a8942680) at /home/vicentiu/Workspace/MariaDB/server/sql/sql_parse.cc:6529
#16 0x000000000064b820 in dispatch_command (command=COM_QUERY, thd=0x7f07847ef070, packet=0x7f07847f5071 "UPDATE `t1` SET `col_int` = 2 WHERE `col_varchar_1_key` = 'g' OR `col_int_key` >= 3", packet_length=83)
    at /home/vicentiu/Workspace/MariaDB/server/sql/sql_parse.cc:1308
#17 0x000000000064aa7e in do_command (thd=0x7f07847ef070) at /home/vicentiu/Workspace/MariaDB/server/sql/sql_parse.cc:999
#18 0x000000000077b017 in do_handle_one_connection (thd_arg=0x7f07847ef070) at /home/vicentiu/Workspace/MariaDB/server/sql/sql_connect.cc:1375
#19 0x000000000077ad6e in handle_one_connection (arg=0x7f07847ef070) at /home/vicentiu/Workspace/MariaDB/server/sql/sql_connect.cc:1289
#20 0x00007f07a7da90a5 in start_thread (arg=0x7f07a8943700) at pthread_create.c:309
#21 0x00007f07a6c7bcfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Comment by Daniel Black [ 2015-10-01 ]

can we get rid of the assertion as step 1? Looks like its still there in 10.1 at least.

Comment by Vicențiu Ciorbaru [ 2017-02-27 ]

Analyzing this further, with current 10.0.29 release.

I am able to pin-point the place where the race condition leads to XtraDB reporting DB_DEADLOCK. The use case features a partitioned table, with 2 partitions. XtraDB detects a deadlock when performing UPDATE, during QUICK_RANGE_SELECT as seen in backtrace above. The other concurrent query is ANALYZE statement.

The other thread (running the Analyze statement) is performing the while loop within the following function:

static
int collect_statistics_for_index(THD *thd, TABLE *table, uint index)
{
  int rc= 0;
  KEY *key_info= &table->key_info[index];
  ha_rows rows= 0;
 
  DBUG_ENTER("collect_statistics_for_index");
 
  /* No statistics for FULLTEXT indexes. */
  if (key_info->flags & HA_FULLTEXT)
    DBUG_RETURN(rc);
 
  Index_prefix_calc index_prefix_calc(table, key_info);
 
  DEBUG_SYNC(table->in_use, "statistics_collection_start1");
  DEBUG_SYNC(table->in_use, "statistics_collection_start2");
 
  if (index_prefix_calc.is_single_comp_pk)
  {
    index_prefix_calc.get_avg_frequency();
    DBUG_RETURN(rc);
  }
 
  table->key_read= 1;
  table->file->extra(HA_EXTRA_KEYREAD);
 
  table->file->ha_index_init(index, TRUE);
  rc= table->file->ha_index_first(table->record[0]);
  while (rc != HA_ERR_END_OF_FILE)
  {
    if (thd->killed)
      break;
 
    if (rc)
      break;
    rows++;
    index_prefix_calc.add();
    rc= table->file->ha_index_next(table->record[0]);
  }
  table->key_read= 0;
  table->file->ha_index_end();
 
  rc= (rc == HA_ERR_END_OF_FILE && !thd->killed) ? 0 : 1;
 
  if (!rc)
    index_prefix_calc.get_avg_frequency();
 
  DBUG_RETURN(rc);
}

This code is executed after any specific xtradb handler::analyze.

ANALYZE TABLE attempts to acquire a READ lock, while UPDATE naturally acquires a WRITE lock. Changing ANALYZE TABLE to acquire a WRITE lock fixes the problem.

My theory as to why the deadlock happens is the following:
In the while loop, Analyze gets READ lock for Index P1 (from partition 1)
UPDATE gets WRITE lock for Index P0 (from partition 0)

UPDATE needs to get WRITE lock for Index P1 (from partition 1), blocked by Analyze
Analyze needs to get READ lock for Index P0 (from partition 0), blocked by UPDATE => Deadlock

Due to my limited understanding of xtradb internals I am unable to verify this, however there doesn't seem to be any other sequence of locks that would cause a deadlock.

The bottom line is that we need to handle the deadlock error within quick_select logic so that our initial protocol assertion is not fired.

The deadlock is expected, given a partitioned table, with the current lock system. To avoid it, force analyze to lock with a write lock, or make index locks atomic within partition storage engine.

psergey thoughts?

Comment by Elena Stepanova [ 2022-10-21 ]

Not reproducible on 10.2+ (still reproducible on 10.1, but it's long EOL-ed).

Generated at Thu Feb 08 07:13:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.