[MDEV-32532] Assertion failure in ddl_log_increment_phase_no_lock upon partition operations with spider plugin loaded Created: 2023-10-20  Updated: 2023-11-30  Resolved: 2023-10-24

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Partitioning, Server, Storage Engine - Spider
Affects Version/s: 10.10
Fix Version/s: 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Blocker
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Problem/Incident
is caused by MDEV-22979 "mysqld --bootstrap" / mysql_install_... Closed
Relates
relates to MDEV-32559 Move alter table statements in spider... Closed

 Description   

--source include/have_partition.inc
 
--let $restart_parameters= --plugin-load-add=ha_spider
--source include/restart_mysqld.inc
 
CREATE TABLE t (a INT) PARTITION BY HASH (a) PARTITIONS 2;
ALTER TABLE t REBUILD PARTITION p0;

10.10 d5e15424

mariadbd: /data/src/10.10/sql/ddl_log.cc:879: bool ddl_log_increment_phase_no_lock(uint): Assertion `0' failed.
231020 21:18:28 [ERROR] mysqld got signal 6 ;
 
#9  0x00007f5efb253df2 in __GI___assert_fail (assertion=0x558eb0416c20 "0", file=0x558eb0416160 "/data/src/10.10/sql/ddl_log.cc", line=879, function=0x558eb0417160 "bool ddl_log_increment_phase_no_lock(uint)") at ./assert/assert.c:101
#10 0x0000558eae6123fa in ddl_log_increment_phase_no_lock (entry_pos=1) at /data/src/10.10/sql/ddl_log.cc:879
#11 0x0000558eae61d1ab in ddl_log_increment_phase (entry_pos=1) at /data/src/10.10/sql/ddl_log.cc:2594
#12 0x0000558eaf2f5477 in ha_partition::rename_partitions (this=0x61d0004c36b8, path=0x7f5ee7b869d0 "./test/t") at /data/src/10.10/sql/ha_partition.cc:1183
#13 0x0000558eaebb9c13 in handler::ha_rename_partitions (this=0x61d0004c36b8, path=0x7f5ee7b869d0 "./test/t") at /data/src/10.10/sql/handler.cc:5608
#14 0x0000558eae311174 in mysql_rename_partitions (lpt=0x7f5ee7b86e60) at /data/src/10.10/sql/sql_partition.cc:6204
#15 0x0000558eae31cf03 in fast_alter_partition_table (thd=0x62c000210218, table=0x619000232d98, alter_info=0x7f5ee7b89f70, alter_ctx=0x7f5ee7b890f0, create_info=0x7f5ee7b8a120, table_list=0x629000316358) at /data/src/10.10/sql/sql_partition.cc:7863
#16 0x0000558eae5b7143 in mysql_alter_table (thd=0x62c000210218, new_db=0x62c000214f70, new_name=0x62c0002153c0, create_info=0x7f5ee7b8a120, table_list=0x629000316358, recreate_info=0x7f5ee7b89ed0, alter_info=0x7f5ee7b89f70, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/src/10.10/sql/sql_table.cc:10596
#17 0x0000558eae77f19d in Sql_cmd_alter_table::execute (this=0x629000316aa8, thd=0x62c000210218) at /data/src/10.10/sql/sql_alter.cc:685
#18 0x0000558eae2ca4fd in mysql_execute_command (thd=0x62c000210218, is_called_from_prepared_stmt=false) at /data/src/10.10/sql/sql_parse.cc:6026
#19 0x0000558eae2d7898 in mysql_parse (thd=0x62c000210218, rawbuf=0x629000316238 "ALTER TABLE t REBUILD PARTITION p0", length=34, parser_state=0x7f5ee7b8b9f0) at /data/src/10.10/sql/sql_parse.cc:8054
#20 0x0000558eae2ad4f9 in dispatch_command (command=COM_QUERY, thd=0x62c000210218, packet=0x62900030c219 "ALTER TABLE t REBUILD PARTITION p0", packet_length=34, blocking=true) at /data/src/10.10/sql/sql_parse.cc:1894
#21 0x0000558eae2aa24a in do_command (thd=0x62c000210218, blocking=true) at /data/src/10.10/sql/sql_parse.cc:1407
#22 0x0000558eae7603ea in do_handle_one_connection (connect=0x608000007138, put_in_cache=true) at /data/src/10.10/sql/sql_connect.cc:1416
#23 0x0000558eae75fdab in handle_one_connection (arg=0x608000007138) at /data/src/10.10/sql/sql_connect.cc:1318
#24 0x0000558eaf358c14 in pfs_spawn_thread (arg=0x617000036818) at /data/src/10.10/storage/perfschema/pfs.cc:2201
#25 0x00007f5efb2a7fd4 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#26 0x00007f5efb3285bc in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The failure started happening after this commit in 10.10:

commit a60cf9c7ae4ae012443a6a91db4ee265fa6e4403
Author: Yuchen Pei
Date:   Thu Oct 5 16:19:10 2023 +1100
 
    MDEV-22979 MDEV-27233 MDEV-28218 Fixing spider init bugs



 Comments   
Comment by Elena Stepanova [ 2023-10-21 ]

It is also reproducible on bb-10.10-all-builders c9e5d725bb8c0d8eb28caf6bc766e946fc0cf8d7

Comment by Yuchen Pei [ 2023-10-22 ]

The server restart seems to be needed for reproduction.

If I do without restart then it passes:

--disable_query_log
--disable_result_log
--source ../../t/test_init.inc
--enable_result_log
--enable_query_log
 
CREATE TABLE t (a INT) PARTITION BY HASH (a) PARTITIONS 2;
ALTER TABLE t REBUILD PARTITION p0;
 
--disable_query_log
--disable_result_log
--source ../../t/test_deinit.inc
--enable_result_log
--enable_query_log

Comment by Yuchen Pei [ 2023-10-23 ]

A mid-low level explanation of what happened:

A low level function for getting a free entry which will be later
written to the log:

static bool ddl_log_get_free_entry(DDL_LOG_MEMORY_ENTRY **active_entry)
{
DDL_LOG_MEMORY_ENTRY *used_entry;
DDL_LOG_MEMORY_ENTRY *first_used= global_ddl_log.first_used;
DBUG_ENTER("ddl_log_get_free_entry");
 
if (global_ddl_log.first_free == NULL) // (A)
{
//  [... 9 lines elided]
  global_ddl_log.num_entries++; // (B)
  used_entry->entry_pos= global_ddl_log.num_entries; // (C)
}
else
{
  used_entry= global_ddl_log.first_free; // (D)
  global_ddl_log.first_free= used_entry->next_log_entry; // (E)
}
//  [... 12 lines elided]
}

A higher level function that uses ddl_log_get_free_entry() in many
occasions:

uint fast_alter_partition_table(THD *thd, TABLE *table,
                              Alter_info *alter_info,
                              Alter_table_ctx *alter_ctx,
                              HA_CREATE_INFO *create_info,
                              TABLE_LIST *table_list)
{
//  [... 411 lines elided]
  if (write_log_drop_shadow_frm(lpt) || // (1)
//  [... 3 lines elided]
      write_log_add_change_partition(lpt) || // (2)
//  [... 7 lines elided]
      write_log_final_change_partition(lpt) || // (3)
//  [... 10 lines elided]
      mysql_rename_partitions(lpt) || // (4)
//  [... 24 lines elided]
}

in (1): first_free (!= NULL (A)) is assigned to
part_info->execute_entry (D). it has entry_pos 1

in (2): first_free becomes NULL (E)

in (3):

  • part_elem->log_entry is allocated because first_free is NULL (A),
    with its entry_pos to be the new size of the ddl_log, which is
    also 1 (B) (C)
  • part_elem->log_entry is written to the log file at 1st block
  • part_info->execute_entry is written to the log file at 1st block
    (overwrite!)

in (4):

  • expected the (part_elem->log_entry->entry_pos)th block to be a log
    entry, but found an execute entry instead. Assertion failure:

static bool ddl_log_increment_phase_no_lock(uint entry_pos)
{
  uchar *file_entry_buf= global_ddl_log.file_entry_buf;
  DBUG_ENTER("ddl_log_increment_phase_no_lock");
 
  mysql_mutex_assert_owner(&LOCK_gdl);
  if (!read_ddl_log_file_entry(entry_pos))
  {
    ddl_log_entry_code  code=   ((ddl_log_entry_code)
                                 file_entry_buf[DDL_LOG_ENTRY_TYPE_POS]);
    ddl_log_action_code action= ((ddl_log_action_code)
                                 file_entry_buf[DDL_LOG_ACTION_TYPE_POS]);
 
    if (code == DDL_LOG_ENTRY_CODE && action < (uint) DDL_LOG_LAST_ACTION)
    {
//  [... 14 lines elided]
    }
    else
    {
      /*
        Trying to deativate an execute entry or already deactive entry.
        This should not happen
      ,*/
      DBUG_ASSERT(0);
//  [... 8 lines elided]
}

FYI monty

Comment by Michael Widenius [ 2023-10-23 ]

The reason for the bug is that spider plugin does some ALTER TABLE commands before recovery is done.
I don't think that is a good idea. It would be better to first fix old tables before creating new tables.
For example, as part of 'ha_signal_ddl_recovery_done().

Anyway, the problem is that 'first_free' was not reset (as this was never assumed to be change before recovery). I have now fixed that for 10.10.

10.6 does not have this problem

Comment by Michael Widenius [ 2023-10-24 ]

Pushed to 10.10

Generated at Thu Feb 08 10:32:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.