Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26450

Corruption due to innodb_undo_log_truncate

Details

    Description

      origin/10.6 b0d38448a176dc990554c7dd5b184c3ebe64be8b 2021-08-19T16:35:44+05:30
       
      Workflow:
      1. Bootstrap + start DB server with
           --mysqld=--innodb_undo_tablespaces=3 --mysqld=--innodb_undo_log_truncate=ON
           in command line.
      2. One sessions creates some initial data and disconnects.
      3. Several sessions run for a few minutes concurrent DML on the tables created in 2.
           and disconnect.
      4. Dump data, shutdown of DB server
      5. Restart of DB server
      6. Run SHOW VARIABLES
       
      Most frequent seen failure:
      mysqld: /data/Server/10.6Z/storage/innobase/trx/trx0purge.cc:99: bool TrxUndoRsegsIterator::set_next(): Assertion `purge_sys.purge_queue.empty() || purge_sys.purge_queue.top() != m_rsegs' failed.
       
      Less frequent seen
      # 2021-08-20T12:44:32 [2934208] | 2021-08-20 12:44:26 5 [ERROR] InnoDB: PAGE_MAX_TRX_ID out of bounds: 8205, 4
      # 2021-08-20T12:44:32 [2934208] | 2021-08-20 12:44:26 5 [ERROR] InnoDB: Apparent corruption in space 8 page 280 of index col_longtext_utf8_key of table test.table100_innodb_int_autoinc
      ...
       mysqld: /data/Server/10.6Z/storage/innobase/trx/trx0sys.cc:62: static void ReadViewBase::check_trx_id_sanity(trx_id_t, const table_name_t&): Assertion `0' failed.
      ---------------
      # 2021-08-20T12:34:35 [2762582] | 2021-08-20 12:33:47 16 [Note] InnoDB: trying to read page [page id: space=3, page number=102] in nonexisting or being-dropped tablespace
      ...  many repetitions of the line above
      # 2021-08-20T12:34:35 [2762582] | 2021-08-20 12:33:47 16 [ERROR] [FATAL] InnoDB: Unable to read page [page id: space=3, page number=102] into the buffer pool after 100. The most probable cause of this error may be that the table has been corrupted. See https://mariadb.com/kb/en/library/innodb-recovery-modes/
      # 2021-08-20T12:34:35 [2762582] | 210820 12:33:47 [ERROR] mysqld got signal 6 ;
      ---------------
      Workflow like above including
      7. CHECK TABLE ...
      # 2021-08-20T16:24:28 [936058] | [rr 990542 12500]2021-08-20 16:23:51 5 [Warning] InnoDB: A transaction id in a record of table `test`.`table100_innodb_int_autoinc` is newer than the system-wide maximum.
      mysqld: /data/Server/10.6Z/storage/innobase/trx/trx0sys.cc:62: static void ReadViewBase::check_trx_id_sanity(trx_id_t, const table_name_t&): Assertion `0' failed.
      -----------------
      [ERROR] [FATAL] InnoDB: Trying to read 4096 bytes at 2097152 outside the bounds of the file: .//undo001
       
       
      RQG
      ====
      git clone https://github.com/mleich1/rqg --branch experimental RQG
       
      perl rqg.pl \
      --grammar=conf/engines/many_indexes.yy \
      --gendata=conf/engines/many_indexes.zz \
      --reporters=RestartConsistencyRR \
      --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=240 \
      --mysqld=--plugin-load-add=file_key_management.so \
      --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      --duration=150 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --mysqld=--innodb_stats_persistent=on \
      --mysqld=--innodb_adaptive_hash_index=on \
      --mysqld=--log-bin \
      --mysqld=--sync-binlog=1 \
      --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
      --mysqld=--loose-max-statement-time=30 \
      --threads=9 \
      --mysqld=--innodb_use_native_aio=1 \
      --mysqld=--innodb_undo_tablespaces=3 \
      --mysqld=--innodb_undo_log_truncate=ON \
      --mysqld=--innodb_page_size=16K \
      --mysqld=--innodb-buffer-pool-size=8M \
      --no_mask \
      --workdir=<local settings>
      --vardir=<local settings>
      --mtr-build-thread=<local settings>
      --basedir1=<local settings>
      --script_debug=_nix_
       
      Observations:
      1. innodb-buffer-pool-size=8M seems to be extreme small for innodb_page_size=16K.
          Raising innodb-buffer-pool-size to 32M does not let the failures disappear.
      2. In case the server finally shut down was running under 'rr' than the problems
           did never show up.
      3. After setting  innodb_undo_log_truncate=off the failures disappeared.
      

      Attachments

        Issue Links

          Activity

            After I implemented a new member function mtr_t::commit_shrink(fil_space_t&) that performs the steps like I envisioned above, 95Ă—100 rounds of a variant of the innodb.undo_truncate test passed without problems, and no undo pages were discarded by buf_flush_discard_page().

            Even though I did not repeat any crash in 10.2, I am now convinced that undo tablespace truncation may have caused corruption ever since MDEV-13564 was introduced in MariaDB 10.2.19 and 10.3.10. Before that, a separate log file for undo tablespace truncation was being used.

            marko Marko Mäkelä added a comment - After I implemented a new member function mtr_t::commit_shrink(fil_space_t&) that performs the steps like I envisioned above, 95×100 rounds of a variant of the innodb.undo_truncate test passed without problems, and no undo pages were discarded by buf_flush_discard_page() . Even though I did not repeat any crash in 10.2, I am now convinced that undo tablespace truncation may have caused corruption ever since MDEV-13564 was introduced in MariaDB 10.2.19 and 10.3.10. Before that, a separate log file for undo tablespace truncation was being used.

            origin/bb-10.2-MDEV-26450 565c38c994a3584dff82eec6e2deccbc93e57555 2021-09-21T18:02:04+03:00
            performed well in RQG testing (Usage of foreign keys disabled).
             
            origin/bb-10.5-MDEV-26450 2755e86a53cd89729d0cb54424fe83074dccd8ab 2021-09-22T16:40:47+03:00
            did not look bad but I hit twice the same assert.
            sdp:/data/Results/1632322073/Marko/dev/shm/vardir/1632322073/164/1/rr
             
            _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1     # Fate till SIGKILL  
            _RR_TRACE_DIR="." rr replay --mark-stdio
            # Restart with success but crash during checks
            # 2021-09-22T11:29:28 [74467] Verifying table: `test`.`t6`
            # 2021-09-22T11:29:28 [74467] INFO: Reporter 'CrashRecovery1': Executing CHECK TABLE `test`.`t6` EXTENDED.
            # 2021-09-22T11:29:28 [74467] INFO: Reporter 'CrashRecovery1': Executing ANALYZE TABLE `test`.`t6`.
            # 2021-09-22T11:29:28 [74467] INFO: Reporter 'CrashRecovery1': Executing OPTIMIZE TABLE `test`.`t6`.
            # 2021-09-22T11:30:21 [74467] ERROR: OPTIMIZE TABLE `test`.`t6` harvested 2013: Lost connection to MySQL server during query. Will return STATUS_RECOVERY_FAILURE later.
            # 2021-09-22T11:30:21 [74467] | [rr 163758 39652]mysqld: /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:2889: void row_sel_field_store_in_mysql_format_func(byte*, const mysql_row_templ_t*, const dict_index_t*, ulint, const byte*, ulint): Assertion `templ->mysql_col_len == len' failed.
             
            (rr) bt
            #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1  0x00007f021bb89859 in __GI_abort () at abort.c:79
            #2  0x00007f021bb89729 in __assert_fail_base (fmt=0x7f021bd1f588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56143ee6d3e0 "templ->mysql_col_len == len",     file=0x56143ee6ac60 "/data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc", line=2889, function=<optimized out>) at assert.c:92
            #3  0x00007f021bb9af36 in __GI___assert_fail (assertion=0x56143ee6d3e0 "templ->mysql_col_len == len", file=0x56143ee6ac60 "/data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc", line=2889,     function=0x56143ee6cec0 "void row_sel_field_store_in_mysql_format_func(byte*, const mysql_row_templ_t*, const dict_index_t*, ulint, const byte*, ulint)") at assert.c:101
            #4  0x000056143da0b56c in row_sel_field_store_in_mysql_format_func (dest=0x61a0001674ce "", templ=0x6190000ca910, index=0x617000051e08, field_no=0, data=0x7f020c5200ad "\200", len=4)    at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:2889
            #5  0x000056143da0c8c7 in row_sel_store_mysql_field (mysql_rec=0x61a0001674b8 "\375\n8888888888", prebuilt=0x62100009b188, rec=0x7f020c5200ad "\200", index=0x617000051e08, offsets=0x7f01e9f94b50, field_no=0, templ=0x6190000ca910)    at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:3039
            #6  0x000056143da0d5b7 in row_sel_store_mysql_rec (mysql_rec=0x61a0001674b8 "\375\n8888888888", prebuilt=0x62100009b188, rec=0x7f020c5200ad "\200", vrow=0x0, rec_clust=false, index=0x617000051e08, offsets=0x7f01e9f94b50)    at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:3175
            #7  0x000056143da1c8bd in row_search_mvcc (buf=0x61a0001674b8 "\375\n8888888888", mode=PAGE_CUR_G, prebuilt=0x62100009b188, match_mode=0, direction=0) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:5553
            #8  0x000056143d61c195 in ha_innobase::index_read (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/ha_innodb.cc:8867
            #9  0x000056143d61eb12 in ha_innobase::index_first (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/ha_innodb.cc:9228
            #10 0x000056143d61ed50 in ha_innobase::rnd_next (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/ha_innodb.cc:9321
            #11 0x000056143ccfa998 in handler::ha_rnd_next (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888") at /data/Server/bb-10.5-MDEV-26450/sql/handler.cc:3080
            #12 0x000056143d10b870 in rr_sequential (info=0x7f01e9f95890) at /data/Server/bb-10.5-MDEV-26450/sql/records.cc:519
            #13 0x000056143c3ee738 in READ_RECORD::read_record (this=0x7f01e9f95890) at /data/Server/bb-10.5-MDEV-26450/sql/records.h:80
            #14 0x000056143c7fcc35 in copy_data_between_tables (thd=0x62b00009a218, from=0x6190000cc698, to=0x6190002ff898, create=..., ignore=false, order_num=0, order=0x0, copied=0x7f01e9f964f0, deleted=0x7f01e9f96510, keys_onoff=Alter_info::LEAVE_AS_IS,     alter_ctx=0x7f01e9f978b0) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:11463
            #15 0x000056143c7f8491 in mysql_alter_table (thd=0x62b00009a218, new_db=0x56143e514640 <null_clex_str>, new_name=0x56143e514640 <null_clex_str>, create_info=0x7f01e9f98b90, table_list=0x62b0000a1368, alter_info=0x7f01e9f98a80, order_num=0,     order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:10873
            #16 0x000056143c7fead3 in mysql_recreate_table (thd=0x62b00009a218, table_list=0x62b0000a1368, table_copy=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:11696
            #17 0x000056143c980167 in admin_recreate_table (thd=0x62b00009a218, table_list=0x62b0000a1368) at /data/Server/bb-10.5-MDEV-26450/sql/sql_admin.cc:61
            #18 0x000056143c9874ed in mysql_admin_table (thd=0x62b00009a218, tables=0x62b0000a1368, check_opt=0x62b00009f480, operator_name=0x56143e6de6e0 "optimize", lock_type=TL_WRITE, org_open_for_modify=true, repair_table_use_frm=false,     extra_open_options=0, prepare_func=0x0, operator_func=(int (handler::*)(class handler * const, class THD *, HA_CHECK_OPT *)) 0x56143cd05670 <handler::ha_optimize(THD*, st_ha_check_opt*)>, view_operator_func=0x0, is_cmd_replicated=true)    at /data/Server/bb-10.5-MDEV-26450/sql/sql_admin.cc:1139
            #19 0x000056143c989a9b in Sql_cmd_optimize_table::execute (this=0x62b0000a1a48, thd=0x62b00009a218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_admin.cc:1502
            #20 0x000056143c58c227 in mysql_execute_command (thd=0x62b00009a218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:6056
            #21 0x000056143c598764 in mysql_parse (thd=0x62b00009a218, rawbuf=0x62b0000a1238 "OPTIMIZE TABLE `test`.`t6`", length=26, parser_state=0x7f01e9f9acb0, is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:8100
            #22 0x000056143c570816 in dispatch_command (command=COM_QUERY, thd=0x62b00009a218, packet=0x6290000dc219 "", packet_length=26, is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:1891
            #23 0x000056143c56d8af in do_command (thd=0x62b00009a218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:1370
            #24 0x000056143c952e91 in do_handle_one_connection (connect=0x608000002b38, put_in_cache=true) at /data/Server/bb-10.5-MDEV-26450/sql/sql_connect.cc:1418
            #25 0x000056143c9526af in handle_one_connection (arg=0x608000002b38) at /data/Server/bb-10.5-MDEV-26450/sql/sql_connect.cc:1312
            #26 0x00007f021c0b3609 in start_thread (arg=<optimized out>) at pthread_create.c:477
            #27 0x00007f021bc86293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
            (rr)
            I do not memorize to have hit this assert within this year.
            

            mleich Matthias Leich added a comment - origin/bb-10.2-MDEV-26450 565c38c994a3584dff82eec6e2deccbc93e57555 2021-09-21T18:02:04+03:00 performed well in RQG testing (Usage of foreign keys disabled).   origin/bb-10.5-MDEV-26450 2755e86a53cd89729d0cb54424fe83074dccd8ab 2021-09-22T16:40:47+03:00 did not look bad but I hit twice the same assert. sdp:/data/Results/1632322073/Marko/dev/shm/vardir/1632322073/164/1/rr   _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-1 # Fate till SIGKILL _RR_TRACE_DIR="." rr replay --mark-stdio # Restart with success but crash during checks # 2021-09-22T11:29:28 [74467] Verifying table: `test`.`t6` # 2021-09-22T11:29:28 [74467] INFO: Reporter 'CrashRecovery1': Executing CHECK TABLE `test`.`t6` EXTENDED. # 2021-09-22T11:29:28 [74467] INFO: Reporter 'CrashRecovery1': Executing ANALYZE TABLE `test`.`t6`. # 2021-09-22T11:29:28 [74467] INFO: Reporter 'CrashRecovery1': Executing OPTIMIZE TABLE `test`.`t6`. # 2021-09-22T11:30:21 [74467] ERROR: OPTIMIZE TABLE `test`.`t6` harvested 2013: Lost connection to MySQL server during query. Will return STATUS_RECOVERY_FAILURE later. # 2021-09-22T11:30:21 [74467] | [rr 163758 39652]mysqld: /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:2889: void row_sel_field_store_in_mysql_format_func(byte*, const mysql_row_templ_t*, const dict_index_t*, ulint, const byte*, ulint): Assertion `templ->mysql_col_len == len' failed.   (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f021bb89859 in __GI_abort () at abort.c:79 #2 0x00007f021bb89729 in __assert_fail_base (fmt=0x7f021bd1f588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x56143ee6d3e0 "templ->mysql_col_len == len", file=0x56143ee6ac60 "/data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc", line=2889, function=<optimized out>) at assert.c:92 #3 0x00007f021bb9af36 in __GI___assert_fail (assertion=0x56143ee6d3e0 "templ->mysql_col_len == len", file=0x56143ee6ac60 "/data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc", line=2889, function=0x56143ee6cec0 "void row_sel_field_store_in_mysql_format_func(byte*, const mysql_row_templ_t*, const dict_index_t*, ulint, const byte*, ulint)") at assert.c:101 #4 0x000056143da0b56c in row_sel_field_store_in_mysql_format_func (dest=0x61a0001674ce "", templ=0x6190000ca910, index=0x617000051e08, field_no=0, data=0x7f020c5200ad "\200", len=4) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:2889 #5 0x000056143da0c8c7 in row_sel_store_mysql_field (mysql_rec=0x61a0001674b8 "\375\n8888888888", prebuilt=0x62100009b188, rec=0x7f020c5200ad "\200", index=0x617000051e08, offsets=0x7f01e9f94b50, field_no=0, templ=0x6190000ca910) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:3039 #6 0x000056143da0d5b7 in row_sel_store_mysql_rec (mysql_rec=0x61a0001674b8 "\375\n8888888888", prebuilt=0x62100009b188, rec=0x7f020c5200ad "\200", vrow=0x0, rec_clust=false, index=0x617000051e08, offsets=0x7f01e9f94b50) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:3175 #7 0x000056143da1c8bd in row_search_mvcc (buf=0x61a0001674b8 "\375\n8888888888", mode=PAGE_CUR_G, prebuilt=0x62100009b188, match_mode=0, direction=0) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/row/row0sel.cc:5553 #8 0x000056143d61c195 in ha_innobase::index_read (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/ha_innodb.cc:8867 #9 0x000056143d61eb12 in ha_innobase::index_first (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/ha_innodb.cc:9228 #10 0x000056143d61ed50 in ha_innobase::rnd_next (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/ha_innodb.cc:9321 #11 0x000056143ccfa998 in handler::ha_rnd_next (this=0x61d0001cdeb8, buf=0x61a0001674b8 "\375\n8888888888") at /data/Server/bb-10.5-MDEV-26450/sql/handler.cc:3080 #12 0x000056143d10b870 in rr_sequential (info=0x7f01e9f95890) at /data/Server/bb-10.5-MDEV-26450/sql/records.cc:519 #13 0x000056143c3ee738 in READ_RECORD::read_record (this=0x7f01e9f95890) at /data/Server/bb-10.5-MDEV-26450/sql/records.h:80 #14 0x000056143c7fcc35 in copy_data_between_tables (thd=0x62b00009a218, from=0x6190000cc698, to=0x6190002ff898, create=..., ignore=false, order_num=0, order=0x0, copied=0x7f01e9f964f0, deleted=0x7f01e9f96510, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x7f01e9f978b0) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:11463 #15 0x000056143c7f8491 in mysql_alter_table (thd=0x62b00009a218, new_db=0x56143e514640 <null_clex_str>, new_name=0x56143e514640 <null_clex_str>, create_info=0x7f01e9f98b90, table_list=0x62b0000a1368, alter_info=0x7f01e9f98a80, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:10873 #16 0x000056143c7fead3 in mysql_recreate_table (thd=0x62b00009a218, table_list=0x62b0000a1368, table_copy=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:11696 #17 0x000056143c980167 in admin_recreate_table (thd=0x62b00009a218, table_list=0x62b0000a1368) at /data/Server/bb-10.5-MDEV-26450/sql/sql_admin.cc:61 #18 0x000056143c9874ed in mysql_admin_table (thd=0x62b00009a218, tables=0x62b0000a1368, check_opt=0x62b00009f480, operator_name=0x56143e6de6e0 "optimize", lock_type=TL_WRITE, org_open_for_modify=true, repair_table_use_frm=false, extra_open_options=0, prepare_func=0x0, operator_func=(int (handler::*)(class handler * const, class THD *, HA_CHECK_OPT *)) 0x56143cd05670 <handler::ha_optimize(THD*, st_ha_check_opt*)>, view_operator_func=0x0, is_cmd_replicated=true) at /data/Server/bb-10.5-MDEV-26450/sql/sql_admin.cc:1139 #19 0x000056143c989a9b in Sql_cmd_optimize_table::execute (this=0x62b0000a1a48, thd=0x62b00009a218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_admin.cc:1502 #20 0x000056143c58c227 in mysql_execute_command (thd=0x62b00009a218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:6056 #21 0x000056143c598764 in mysql_parse (thd=0x62b00009a218, rawbuf=0x62b0000a1238 "OPTIMIZE TABLE `test`.`t6`", length=26, parser_state=0x7f01e9f9acb0, is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:8100 #22 0x000056143c570816 in dispatch_command (command=COM_QUERY, thd=0x62b00009a218, packet=0x6290000dc219 "", packet_length=26, is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:1891 #23 0x000056143c56d8af in do_command (thd=0x62b00009a218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:1370 #24 0x000056143c952e91 in do_handle_one_connection (connect=0x608000002b38, put_in_cache=true) at /data/Server/bb-10.5-MDEV-26450/sql/sql_connect.cc:1418 #25 0x000056143c9526af in handle_one_connection (arg=0x608000002b38) at /data/Server/bb-10.5-MDEV-26450/sql/sql_connect.cc:1312 #26 0x00007f021c0b3609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #27 0x00007f021bc86293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 (rr) I do not memorize to have hit this assert within this year.

            mleich, thank you. I believe that the prominent FOREIGN KEY problem MDEV-26554 is limited to the 10.6 series only.

            I have now updated the bb-10.6-MDEV-26450 branch with the bb-10.5-MDEV-26450 that you tested. Those branches will also include a fix of MDEV-26445.

            The 10.5 failure feels a lot like a mismatch between t6.frm and the InnoDB data dictionary. I checked when the server was killed before the restarted server failed:

            ssh sdp
            rr replay /data/Results/1632322073/Marko/dev/shm/vardir/1632322073/164/1/rr/mysqld-1
            

            continue
            watch -l log_sys.flushed_to_disk_lsn._M_i
            reverse-continue
            backtrace
            watch -l log_sys.lsn._M_i
            reverse-continue
            backtrace
            …
            

            The last durable log write was not for a DDL operation, but something else:

            …
            UPDATE t3 SET col1 = 1 ORDER BY col1 DESC LIMIT 2  /* E_R Thread2 QNO 10359 CON_ID 127 */;
            COMMIT  /* E_R Thread2 QNO 10360 CON_ID 127 */;
            

            At the time of the last durable InnoDB log write, there is no DDL ongoing. But, after that, we find the smoking gun:

            break ha_innobase::commit_inplace_alter_table
            continue
            backtrace
            

            bb-10.5-MDEV-26450 2755e86a53cd89729d0cb54424fe83074dccd8ab

            #0  ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true)
                at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:10907
            #1  0x000055c34481ba04 in handler::ha_commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true)
                at /data/Server/bb-10.5-MDEV-26450/sql/handler.cc:4874
            #2  0x000055c3442fa924 in mysql_inplace_alter_table (thd=0x62b0000fc218, table_list=0x62b000103418, table=0x619000518198, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, 
                target_mdl_request=0x7f6d1e37f390, alter_ctx=0x7f6d1e3803f0) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:8164
            #3  0x000055c34430c3dd in mysql_alter_table (thd=0x62b0000fc218, new_db=0x62b000100a78, new_name=0x62b000100e98, create_info=0x7f6d1e381760, table_list=0x62b000103418, alter_info=0x7f6d1e381650, order_num=0, 
                order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:10730
            #4  0x000055c3444804e5 in Sql_cmd_alter_table::execute (this=0x62b000103c70, thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_alter.cc:539
            #5  0x000055c3440a1227 in mysql_execute_command (thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:6056
            #6  0x000055c3440ad764 in mysql_parse (thd=0x62b0000fc218, 
                rawbuf=0x62b000103238 "ALTER TABLE t6 MODIFY COLUMN col1 INT AFTER col_string, LOCK = EXCLUSIVE, ALGORITHM = NOCOPY  /* E_R Thread2 QNO 10361 CON_ID 127 */", length=132, parser_state=0x7f6d1e382cb0, 
                is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:8100
            

            This looks simply like the lack of MDEV-17567 Atomic DDL. That this DDL operation was not durably committed inside InnoDB before the server was killed could be enough information for us. We actually do get this far right before the server is killed:

            (rr) backtrace
            #0  0x000055c34520dda8 in commit_try_norebuild (ha_alter_info=0x7f6d1e37f2b0, ctx=0x62b000105a28, altered_table=0x7f6d1e37f990, old_table=0x619000518198, trx=0x7f6d436dbdc8, table_name=0x61b0000c854d "t6")
                at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:10346
            #1  0x000055c3451e7664 in ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true)
                at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:11109
            #2  0x000055c34481ba04 in handler::ha_commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true)
                at /data/Server/bb-10.5-MDEV-26450/sql/handler.cc:4874
            #3  0x000055c3442fa924 in mysql_inplace_alter_table (thd=0x62b0000fc218, table_list=0x62b000103418, table=0x619000518198, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, 
                target_mdl_request=0x7f6d1e37f390, alter_ctx=0x7f6d1e3803f0) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:8164
            #4  0x000055c34430c3dd in mysql_alter_table (thd=0x62b0000fc218, new_db=0x62b000100a78, new_name=0x62b000100e98, create_info=0x7f6d1e381760, table_list=0x62b000103418, alter_info=0x7f6d1e381650, order_num=0, 
                order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:10730
            #5  0x000055c3444804e5 in Sql_cmd_alter_table::execute (this=0x62b000103c70, thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_alter.cc:539
            #6  0x000055c3440a1227 in mysql_execute_command (thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:6056
            #7  0x000055c3440ad764 in mysql_parse (thd=0x62b0000fc218, 
                rawbuf=0x62b000103238 "ALTER TABLE t6 MODIFY COLUMN col1 INT AFTER col_string, LOCK = EXCLUSIVE, ALGORITHM = NOCOPY  /* E_R Thread2 QNO 10361 CON_ID 127 */", length=132, parser_state=0x7f6d1e382cb0, 
                is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:8100
            …
            (rr) finish
            Run till exit from #0  0x000055c34520dda8 in commit_try_norebuild (ha_alter_info=0x7f6d1e37f2b0, ctx=0x62b000105a28, altered_table=0x7f6d1e37f990, old_table=0x619000518198, trx=0x7f6d436dbdc8, 
                table_name=0x61b0000c854d "t6") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:10346
            0x000055c3451e7664 in ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true)
                at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:11109
            11109				fail = commit_try_norebuild(
            Value returned is $28 = false
            (rr) 
            Run till exit from #0  0x000055c3451e7664 in ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true)
                at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:11109
             
            Thread 3 received signal SIGKILL, Killed.
            [Switching to Thread 79076.104468]
            

            So, due to the broken DDL design that was only fixed in 10.6, we got an inconsistency between t6.frm and InnoDB. That this dictionary inconsistency (columns being swapped) was not flagged by CHECK TABLE should be worth a new bug report.

            marko Marko Mäkelä added a comment - mleich , thank you. I believe that the prominent FOREIGN KEY problem MDEV-26554 is limited to the 10.6 series only. I have now updated the bb-10.6- MDEV-26450 branch with the bb-10.5- MDEV-26450 that you tested. Those branches will also include a fix of MDEV-26445 . The 10.5 failure feels a lot like a mismatch between t6.frm and the InnoDB data dictionary. I checked when the server was killed before the restarted server failed: ssh sdp rr replay /data/Results/1632322073/Marko/dev/shm/vardir/1632322073/164/1/rr/mysqld-1 continue watch -l log_sys.flushed_to_disk_lsn._M_i reverse-continue backtrace watch -l log_sys.lsn._M_i reverse-continue backtrace … The last durable log write was not for a DDL operation, but something else: … UPDATE t3 SET col1 = 1 ORDER BY col1 DESC LIMIT 2 /* E_R Thread2 QNO 10359 CON_ID 127 */ ; COMMIT /* E_R Thread2 QNO 10360 CON_ID 127 */ ; At the time of the last durable InnoDB log write, there is no DDL ongoing. But, after that, we find the smoking gun: break ha_innobase::commit_inplace_alter_table continue backtrace bb-10.5-MDEV-26450 2755e86a53cd89729d0cb54424fe83074dccd8ab #0 ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:10907 #1 0x000055c34481ba04 in handler::ha_commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true) at /data/Server/bb-10.5-MDEV-26450/sql/handler.cc:4874 #2 0x000055c3442fa924 in mysql_inplace_alter_table (thd=0x62b0000fc218, table_list=0x62b000103418, table=0x619000518198, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, target_mdl_request=0x7f6d1e37f390, alter_ctx=0x7f6d1e3803f0) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:8164 #3 0x000055c34430c3dd in mysql_alter_table (thd=0x62b0000fc218, new_db=0x62b000100a78, new_name=0x62b000100e98, create_info=0x7f6d1e381760, table_list=0x62b000103418, alter_info=0x7f6d1e381650, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:10730 #4 0x000055c3444804e5 in Sql_cmd_alter_table::execute (this=0x62b000103c70, thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_alter.cc:539 #5 0x000055c3440a1227 in mysql_execute_command (thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:6056 #6 0x000055c3440ad764 in mysql_parse (thd=0x62b0000fc218, rawbuf=0x62b000103238 "ALTER TABLE t6 MODIFY COLUMN col1 INT AFTER col_string, LOCK = EXCLUSIVE, ALGORITHM = NOCOPY /* E_R Thread2 QNO 10361 CON_ID 127 */", length=132, parser_state=0x7f6d1e382cb0, is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:8100 This looks simply like the lack of MDEV-17567 Atomic DDL. That this DDL operation was not durably committed inside InnoDB before the server was killed could be enough information for us. We actually do get this far right before the server is killed: (rr) backtrace #0 0x000055c34520dda8 in commit_try_norebuild (ha_alter_info=0x7f6d1e37f2b0, ctx=0x62b000105a28, altered_table=0x7f6d1e37f990, old_table=0x619000518198, trx=0x7f6d436dbdc8, table_name=0x61b0000c854d "t6") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:10346 #1 0x000055c3451e7664 in ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:11109 #2 0x000055c34481ba04 in handler::ha_commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true) at /data/Server/bb-10.5-MDEV-26450/sql/handler.cc:4874 #3 0x000055c3442fa924 in mysql_inplace_alter_table (thd=0x62b0000fc218, table_list=0x62b000103418, table=0x619000518198, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, target_mdl_request=0x7f6d1e37f390, alter_ctx=0x7f6d1e3803f0) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:8164 #4 0x000055c34430c3dd in mysql_alter_table (thd=0x62b0000fc218, new_db=0x62b000100a78, new_name=0x62b000100e98, create_info=0x7f6d1e381760, table_list=0x62b000103418, alter_info=0x7f6d1e381650, order_num=0, order=0x0, ignore=false, if_exists=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_table.cc:10730 #5 0x000055c3444804e5 in Sql_cmd_alter_table::execute (this=0x62b000103c70, thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_alter.cc:539 #6 0x000055c3440a1227 in mysql_execute_command (thd=0x62b0000fc218) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:6056 #7 0x000055c3440ad764 in mysql_parse (thd=0x62b0000fc218, rawbuf=0x62b000103238 "ALTER TABLE t6 MODIFY COLUMN col1 INT AFTER col_string, LOCK = EXCLUSIVE, ALGORITHM = NOCOPY /* E_R Thread2 QNO 10361 CON_ID 127 */", length=132, parser_state=0x7f6d1e382cb0, is_com_multi=false, is_next_command=false) at /data/Server/bb-10.5-MDEV-26450/sql/sql_parse.cc:8100 … (rr) finish Run till exit from #0 0x000055c34520dda8 in commit_try_norebuild (ha_alter_info=0x7f6d1e37f2b0, ctx=0x62b000105a28, altered_table=0x7f6d1e37f990, old_table=0x619000518198, trx=0x7f6d436dbdc8, table_name=0x61b0000c854d "t6") at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:10346 0x000055c3451e7664 in ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:11109 11109 fail = commit_try_norebuild( Value returned is $28 = false (rr) Run till exit from #0 0x000055c3451e7664 in ha_innobase::commit_inplace_alter_table (this=0x61d0008a48b8, altered_table=0x7f6d1e37f990, ha_alter_info=0x7f6d1e37f2b0, commit=true) at /data/Server/bb-10.5-MDEV-26450/storage/innobase/handler/handler0alter.cc:11109   Thread 3 received signal SIGKILL, Killed. [Switching to Thread 79076.104468] So, due to the broken DDL design that was only fixed in 10.6, we got an inconsistency between t6.frm and InnoDB. That this dictionary inconsistency (columns being swapped) was not flagged by CHECK TABLE should be worth a new bug report.

            For the 10.5 recovery problem, I filed MDEV-26668 CHECK TABLE fails to report schema mismatch

            marko Marko Mäkelä added a comment - For the 10.5 recovery problem, I filed MDEV-26668 CHECK TABLE fails to report schema mismatch

            origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00
            which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.
            

            mleich Matthias Leich added a comment - origin/bb-10.6-MDEV-26450 456479690b9ee38b0b8a912bf4b6217952c5903b 2021-09-23T11:21:51+03:00 which should contain fixes for MDEV-26450 and MDEV-26445 behaved well in RQG testing.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.