Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36122

Assertion failure ctx0->old_table->get_ref_count() == 1 in ha_innobase::commit_inplace_alter_table()

Details

    Description

      mleich reported the following assertion failure while testing MDEV-35000:

      10.6-MDEV-35000 5598e0230404b89861e1577ea3549a1372a7b0a2

      #12 0x000078cc5023b507 in __assert_fail (assertion=0x63698d8c7830 "ctx0->old_table->get_ref_count() == 1", file=0x63698d8c3b38 "/data/Server/10.6-MDEV-35000/storage/innobase/handler/handler0alter.cc", line=0x2e37, 
          function=0x63698d8c7560 "virtual bool ha_innobase::commit_inplace_alter_table(TABLE*, Alter_inplace_info*, bool)") at ./assert/assert.c:103
      #13 0x000063698d4721cd in ha_innobase::commit_inplace_alter_table (this=<optimized out>, altered_table=0x78cc30bfb230, ha_alter_info=0x78cc30bfb170, commit=<optimized out>) at /data/Server/10.6-MDEV-35000/storage/innobase/handler/handler0alter.cc:11831
      #14 0x000063698d1f2151 in handler::ha_commit_inplace_alter_table (this=0x78cbb00eaa48, altered_table=altered_table@entry=0x78cc30bfb230, ha_alter_info=ha_alter_info@entry=0x78cc30bfb170, commit=commit@entry=0x1) at /data/Server/10.6-MDEV-35000/sql/handler.cc:5392
      #15 0x000063698d03f6d6 in mysql_inplace_alter_table (thd=thd@entry=0x78cbb0000d58, table_list=0x78cbb0013690, table=table@entry=0x78cbb00fdac8, altered_table=altered_table@entry=0x78cc30bfb230, ha_alter_info=ha_alter_info@entry=0x78cc30bfb170, target_mdl_request=target_mdl_request@entry=0x78cc30bfbaf0, 
          ddl_log_state=0x78cc30bfb130, trigger_param=0x78cc30bfb660, alter_ctx=0x78cc30bfc6b0) at /data/Server/10.6-MDEV-35000/sql/sql_table.cc:7805
      #16 0x000063698d04fbb9 in mysql_alter_table (thd=thd@entry=0x78cbb0000d58, new_db=new_db@entry=0x78cbb00057f8, new_name=<optimized out>, create_info=create_info@entry=0x78cc30bfd550, table_list=<optimized out>, table_list@entry=0x78cbb0013690, recreate_info=recreate_info@entry=0x78cc30bfd3a0, 
          alter_info=<optimized out>, order_num=<optimized out>, order=<optimized out>, ignore=<optimized out>, if_exists=<optimized out>) at /data/Server/10.6-MDEV-35000/sql/sql_table.cc:10857
      #17 0x000063698d0bd364 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x78cbb0000d58) at /data/Server/10.6-MDEV-35000/sql/sql_alter.cc:675
      #18 0x000063698cfa46f2 in mysql_execute_command (thd=thd@entry=0x78cbb0000d58, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=0x0) at /data/Server/10.6-MDEV-35000/sql/sql_parse.cc:6167
      #19 0x000063698cfa521b in mysql_parse (thd=thd@entry=0x78cbb0000d58, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x78cc30bfe410) at /data/Server/10.6-MDEV-35000/sql/sql_parse.cc:8209
      #20 0x000063698cfa6887 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x78cbb0000d58, packet=packet@entry=0x78cbb000af59 " ALTER TABLE t1 DROP KEY `Marvão_idx3`, ALGORITHM = NOCOPY, LOCK = DEFAULT  /* E_R Thread7 QNO 2160 CON_ID 93 */ ", packet_length=packet_length@entry=0x72, 
          blocking=blocking@entry=0x1) at /data/Server/10.6-MDEV-35000/sql/sql_parse.cc:1908
      

      This thread is holding MDL_EXCLUSIVE on the table name. There is an unexpected table reference on the table that is held by another thread:

      10.6-MDEV-35000 5598e0230404b89861e1577ea3549a1372a7b0a2

      Thread 18 (Thread 0x78cc22a006c0 (LWP 897946)):
      #0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
      #1  0x000063698d5cd67c in srw_mutex_impl<false>::wake (this=this@entry=0x63698e1445c0 <dict_sys+64>) at /data/Server/10.6-MDEV-35000/storage/innobase/sync/srw_lock.cc:255
      #2  0x000063698d4ac82c in srw_mutex_impl<false>::wr_unlock (this=this@entry=0x63698e1445c0 <dict_sys+64>) at /data/Server/10.6-MDEV-35000/storage/innobase/include/srw_lock.h:162
      #3  0x000063698d5cd89d in ssux_lock_impl<false>::wr_unlock (this=this@entry=0x63698e1445c0 <dict_sys+64>) at /data/Server/10.6-MDEV-35000/storage/innobase/include/srw_lock.h:331
      #4  0x000063698d5ccd1f in srw_lock_debug::wr_unlock (this=0x63698e1445c0 <dict_sys+64>) at /data/Server/10.6-MDEV-35000/storage/innobase/sync/srw_lock.cc:669
      #5  0x000063698d5d14fb in dict_sys_t::unlock (this=<optimized out>) at /data/Server/10.6-MDEV-35000/storage/innobase/include/dict0dict.h:1480
      #6  trx_purge_table_open (table_id=0x14, mdl_context=mdl_context@entry=0x63698f324530, mdl=mdl@entry=0x78cc229ffab8) at /data/Server/10.6-MDEV-35000/storage/innobase/trx/trx0purge.cc:1152
      #7  0x000063698d5d3c43 in trx_purge_attach_undo_recs (thd=thd@entry=0x63698f3243d8, n_work_items=n_work_items@entry=0x78cc229ffba8) at /data/Server/10.6-MDEV-35000/storage/innobase/trx/trx0purge.cc:1263
      #8  0x000063698d5d4178 in trx_purge (n_tasks=<optimized out>, n_tasks@entry=0x4, history_size=0x3) at /data/Server/10.6-MDEV-35000/storage/innobase/trx/trx0purge.cc:1381
      #9  0x000063698d5c4b13 in purge_coordinator_state::do_purge (this=this@entry=0x63698ea44420 <purge_state>) at /data/Server/10.6-MDEV-35000/storage/innobase/srv/srv0srv.cc:1632
      #10 0x000063698d5c45d8 in purge_coordinator_callback () at /data/Server/10.6-MDEV-35000/storage/innobase/srv/srv0srv.cc:1716
      

      This thread is executing the following code while not yet holding a metadata lock (MDL) on the table name:

      trx_purge_table_open()

          table= dict_load_table_on_id(table_id, DICT_ERR_IGNORE_FK_NOKEY);
          if (table)
            table->acquire();
          dict_sys.unlock();
      

      This code was last refactored in MDEV-32050, but as far as I can tell, this failure could have possibly have been introduced already in MDEV-16678. It seems to me that we must refactor the logic so that a table handle must not be held during the MDL acquisition, and the table may need to be looked up again after a successful MDL acquisition. A hazard pointer technique (similar to buf_pool.lru_hp) could allow us to avoid redundant table lookups when the table is not being evicted or dropped during the MDL acquisition.

      Attachments

        1. TBR-1569-MDEV-36122.cfg
          47 kB
          Matthias Leich
        2. TBR-1569-MDEV-36122.yy
          0.7 kB
          Matthias Leich
        3. TBR-1569-MDEV-36122-1.cfg
          47 kB
          Matthias Leich
        4. TBR-1569-MDEV-36122-1.yy
          0.7 kB
          Matthias Leich

        Issue Links

          Activity

            I figured out a possible work-around of the lack of MDL related to ALTER IGNORE TABLE or ALTER TABLE…ALGORITHM=COPY: In any DDL operations that expect other threads not to hold any table references, wait for the current purge batch to end in case unexpected references exist. I will try to implement this.

            marko Marko Mäkelä added a comment - I figured out a possible work-around of the lack of MDL related to ALTER IGNORE TABLE or ALTER TABLE…ALGORITHM=COPY : In any DDL operations that expect other threads not to hold any table references, wait for the current purge batch to end in case unexpected references exist. I will try to implement this.

            It seems that for fixing the bug in ha_innobase::rename_table() in 10.6 in a similar way to 10.11, we would need a fix of MDEV-35000 in 10.6. It might be the case that there is no issue with ha_innobase::rename_table() in 10.6, but I cannot be certain of it.

            marko Marko Mäkelä added a comment - It seems that for fixing the bug in ha_innobase::rename_table() in 10.6 in a similar way to 10.11, we would need a fix of MDEV-35000 in 10.6. It might be the case that there is no issue with ha_innobase::rename_table() in 10.6, but I cannot be certain of it.

            Based on initial test results from mleich, the work-around for missing MDL needs to be revised. In my initial revision of that, it is controlled by purge_sys.m_active, which can actually be cleared while the purge subsystem is holding references to tables.

            marko Marko Mäkelä added a comment - Based on initial test results from mleich , the work-around for missing MDL needs to be revised. In my initial revision of that, it is controlled by purge_sys.m_active , which can actually be cleared while the purge subsystem is holding references to tables.

            While it did not come up in the stress tests so far, I believe that also ha_innobase::truncate() needs to be adjusted. The scenario would be that an ALTER IGNORE TABLE on a nonempty table is followed by a TRUNCATE with an unlikely timing. I have revised my fixes accordingly.

            marko Marko Mäkelä added a comment - While it did not come up in the stress tests so far, I believe that also ha_innobase::truncate() needs to be adjusted. The scenario would be that an ALTER IGNORE TABLE on a nonempty table is followed by a TRUNCATE with an unlikely timing. I have revised my fixes accordingly.

            Both 10.6 and 10.11 versions look fine to me. Added some notes in https://github.com/MariaDB/server/pull/3856.

            debarun Debarun Banerjee added a comment - Both 10.6 and 10.11 versions look fine to me. Added some notes in https://github.com/MariaDB/server/pull/3856 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.