Details

    Description

      The InnoDB table file is 144 GB; the #sql-alter-2964-a.ibd file is 56 GB in size, which might very well be its final size; about 700 million rows had been deleted from the table. There are currently 174 GB free on the data device (with both files in place).

      I received the following crash:

      220318  9:01:01 [ERROR] mysqld got exception 0xc0000005 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.6.7-MariaDB
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=28
      max_threads=65537
      thread_count=12
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 142742881 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x270ab8fc038
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      server.dll!row_log_table_apply_op()[row0log.cc:2475]
      server.dll!row_log_table_apply_ops()[row0log.cc:3066]
      server.dll!row_log_table_apply()[row0log.cc:3174]
      server.dll!ha_innobase::inplace_alter_table()[handler0alter.cc:8485]
      server.dll!mysql_inplace_alter_table()[sql_table.cc:7409]
      server.dll!mysql_alter_table()[sql_table.cc:10266]
      server.dll!mysql_recreate_table()[sql_table.cc:11319]
      server.dll!admin_recreate_table()[sql_admin.cc:67]
      server.dll!mysql_admin_table()[sql_admin.cc:1130]
      server.dll!Sql_cmd_optimize_table::execute()[sql_admin.cc:1515]
      server.dll!mysql_execute_command()[sql_parse.cc:6008]
      server.dll!mysql_parse()[sql_parse.cc:8033]
      server.dll!dispatch_command()[sql_parse.cc:1898]
      server.dll!do_command()[sql_parse.cc:1404]
      server.dll!threadpool_process_request()[threadpool_common.cc:403]
      server.dll!tp_callback()[threadpool_common.cc:203]
      KERNEL32.DLL!QueryPerformanceCounter()
      ntdll.dll!TpReleaseWork()
      ntdll.dll!RtlInitializeResource()
      KERNEL32.DLL!BaseThreadInitThunk()
      ntdll.dll!RtlUserThreadStart()
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x270a8fd3480): OPTIMIZE TABLE node_log
      Connection ID (thread ID): 10
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file at C:\ProgramData\MariaDB 10.6\
      Minidump written to C:\ProgramData\MariaDB 10.6\mysqld.dmp
      

      Attachments

        Issue Links

          Activity

            alice Alice Sherepa added a comment -

            probably the same bug as MDEV-26198

            alice Alice Sherepa added a comment - probably the same bug as MDEV-26198

            The latest trace for MDEV-26198 involves a table in ROW_FORMAT=REDUNDANT on which columns had been previously instantly added.
            What does SHOW TABLE STATUS display for this table?

            marko Marko Mäkelä added a comment - The latest trace for MDEV-26198 involves a table in ROW_FORMAT=REDUNDANT on which columns had been previously instantly added. What does SHOW TABLE STATUS display for this table?
            axel.doerfler Axel Dörfler added a comment - - edited

            The server has garbled its boot disk to a point where it could not be repaired. It has now been reset from a backup. While the server uses a RAID-1 and ECC memory, and did not log anything suspicious, there is a good chance that this is a hardware error. And if that's true, the crash could well be caused by that, too.

            I have now resurrected the server's predecessor, and a SHOW TABLE STATUS shows this:

            Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment
            node_log InnoDB 10 Compact 449973282 186 84136689664 0 60923265024 36799774720 1084647359 2017-07-31 13:24:01 \N \N utf8_bin \N    

            But this was before I deleted more than half the table (its oldest entries).

            axel.doerfler Axel Dörfler added a comment - - edited The server has garbled its boot disk to a point where it could not be repaired. It has now been reset from a backup. While the server uses a RAID-1 and ECC memory, and did not log anything suspicious, there is a good chance that this is a hardware error. And if that's true, the crash could well be caused by that, too. I have now resurrected the server's predecessor, and a SHOW TABLE STATUS shows this: Name Engine Version Row_format Rows Avg_row_length Data_length Max_data_length Index_length Data_free Auto_increment Create_time Update_time Check_time Collation Checksum Create_options Comment node_log InnoDB 10 Compact 449973282 186 84136689664 0 60923265024 36799774720 1084647359 2017-07-31 13:24:01 \N \N utf8_bin \N     But this was before I deleted more than half the table (its oldest entries).

            I will try the same procedure on the old server, and see how it goes there...

            axel.doerfler Axel Dörfler added a comment - I will try the same procedure on the old server, and see how it goes there...

            MDEV-18361 may be specific to ROW_FORMAT=REDUNDANT, and I was unable to reproduce it.

            Both in 10.6.7 and 10.11.4, the crash occurs here in row_log_table_apply_op():

                    switch (*mrec++) {
                    default:
                            ut_ad(0);
                            *error = DB_CORRUPTION;
                            return(NULL);
                    case ROW_T_INSERT:
                            extra_size = *mrec++;
            

            I now realize that we do not check for mrec > mrec_end until some steps later. In another case we do have a proper check for end-of-buffer:

                    case ROW_T_DELETE:
                            /* 1 (extra_size) + at least 1 (payload) */
                            if (mrec + 2 >= mrec_end) {
                                    return(NULL);
                            }
            

            Also in ROW_T_UPDATE we are potentially reading after the end of the buffer. In row_log_apply_op() (which is used during online CREATE INDEX) we have proper overflow checks in place.

            marko Marko Mäkelä added a comment - MDEV-18361 may be specific to ROW_FORMAT=REDUNDANT , and I was unable to reproduce it. Both in 10.6.7 and 10.11.4, the crash occurs here in row_log_table_apply_op() : switch (*mrec++) { default : ut_ad(0); *error = DB_CORRUPTION; return (NULL); case ROW_T_INSERT: extra_size = *mrec++; I now realize that we do not check for mrec > mrec_end until some steps later. In another case we do have a proper check for end-of-buffer: case ROW_T_DELETE: /* 1 (extra_size) + at least 1 (payload) */ if (mrec + 2 >= mrec_end) { return (NULL); } Also in ROW_T_UPDATE we are potentially reading after the end of the buffer. In row_log_apply_op() (which is used during online CREATE INDEX ) we have proper overflow checks in place.

            I failed to notice that there actually is a check right before the switch statement:

            	/* 3 = 1 (op type) + 1 (extra_size) + at least 1 byte payload */
            	if (mrec + 3 >= mrec_end) {
            		return(NULL);
            	}
            

            So, the check after case ROW_T_DELETE: is actually redundant, and the reason for this crash remains a mystery until we have a reproducible test case.

            marko Marko Mäkelä added a comment - I failed to notice that there actually is a check right before the switch statement: /* 3 = 1 (op type) + 1 (extra_size) + at least 1 byte payload */ if (mrec + 3 >= mrec_end) { return (NULL); } So, the check after case ROW_T_DELETE: is actually redundant, and the reason for this crash remains a mystery until we have a reproducible test case.

            The check that I quoted had been removed as part of MDEV-515 in MariaDB Server 10.6.0, related to the introduction of a ROW_T_EMPTY record, which was later removed in MDEV-15250.

            I think that we just need to put back the check right before the switch(*mrec++) statement, like this:

            diff --git a/storage/innobase/row/row0log.cc b/storage/innobase/row/row0log.cc
            index 6842a093b5c..e1ce8161a02 100644
            --- a/storage/innobase/row/row0log.cc
            +++ b/storage/innobase/row/row0log.cc
            @@ -2167,6 +2167,11 @@ row_log_table_apply_op(
             
             	*error = DB_SUCCESS;
             
            +	/* 3 = 1 (op type) + 1 (extra_size) + at least 1 byte payload */
            +	if (mrec + 3 >= mrec_end) {
            +		return(NULL);
            +	}
            +
             	const bool is_instant = log->is_instant(dup->index);
             	const mrec_t* const mrec_start = mrec;
             
            @@ -2214,11 +2219,6 @@ row_log_table_apply_op(
             		break;
             
             	case ROW_T_DELETE:
            -		/* 1 (extra_size) + at least 1 (payload) */
            -		if (mrec + 2 >= mrec_end) {
            -			return(NULL);
            -		}
            -
             		extra_size = *mrec++;
             		ut_ad(mrec < mrec_end);
             
            

            It would be great to have a test for this in our regression test suite.

            marko Marko Mäkelä added a comment - The check that I quoted had been removed as part of MDEV-515 in MariaDB Server 10.6.0, related to the introduction of a ROW_T_EMPTY record, which was later removed in MDEV-15250 . I think that we just need to put back the check right before the switch(*mrec++) statement, like this: diff --git a/storage/innobase/row/row0log.cc b/storage/innobase/row/row0log.cc index 6842a093b5c..e1ce8161a02 100644 --- a/storage/innobase/row/row0log.cc +++ b/storage/innobase/row/row0log.cc @@ -2167,6 +2167,11 @@ row_log_table_apply_op( *error = DB_SUCCESS; + /* 3 = 1 (op type) + 1 (extra_size) + at least 1 byte payload */ + if (mrec + 3 >= mrec_end) { + return(NULL); + } + const bool is_instant = log->is_instant(dup->index); const mrec_t* const mrec_start = mrec; @@ -2214,11 +2219,6 @@ row_log_table_apply_op( break; case ROW_T_DELETE: - /* 1 (extra_size) + at least 1 (payload) */ - if (mrec + 2 >= mrec_end) { - return(NULL); - } - extra_size = *mrec++; ut_ad(mrec < mrec_end); It would be great to have a test for this in our regression test suite.

            People

              thiru Thirunarayanan Balathandayuthapani
              axel.doerfler Axel Dörfler
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.