Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34572

[ERROR] InnoDB: Record in index of table was not found on update

Details

    Description

      Hi,

      We got now this error since last week very often and an fresh restart of the server with an SST from server1 doesn't work.
      The server2 still keeps crashing and it's just only on server2, server1 isn't crashing.

      2024-07-11 14:11:40 2 [ERROR] InnoDB: Record in index `IX_altref` of table `xxx`.`media_contents_varchar` was not found on update: TUPLE (info_bits=0, 2 fields): {NULL,[8]       &(0x0000000000151A26)} at: COMPACT RECORD(info
      _bits=0, 2 fields): {NULL,[8]       $(0x0000000000151A24)}
      2024-07-11 14:11:40 0x7f1d2c68d700  InnoDB: Assertion failure in file ./storage/innobase/row/row0ins.cc line 219
      InnoDB: Failing assertion: !cursor->index->is_committed()
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mariadbd startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      240711 14:11:40 [ERROR] mysqld got signal 6 ;
      Sorry, we probably made a mistake, and this is a bug.
       
      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.5.23-MariaDB-0+deb11u1 source revision: 6cfd2ba397b0ca689d8ff1bdb9fc4a4dc516a5eb
      key_buffer_size=268435456
      read_buffer_size=131072
      max_used_connections=8
      max_threads=1252
      thread_count=10
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3018523 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f1d0c000c58
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f1d2c68cd98 thread_stack 0x49000
      /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x56219b91a14e]
      /usr/sbin/mariadbd(handle_fatal_signal+0x475)[0x56219b40f3b5]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x13140)[0x7f1d3848d140]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7f1d37fc5ce1]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7f1d37faf537]
      /usr/sbin/mariadbd(+0x64329a)[0x56219b0e029a]
      /usr/sbin/mariadbd(+0x6323d9)[0x56219b0cf3d9]
      /usr/sbin/mariadbd(+0xcf381c)[0x56219b79081c]
      /usr/sbin/mariadbd(+0xd25382)[0x56219b7c2382]
      /usr/sbin/mariadbd(+0xd25a7f)[0x56219b7c2a7f]
      /usr/sbin/mariadbd(+0xd03dcb)[0x56219b7a0dcb]
      /usr/sbin/mariadbd(+0xc5cc8e)[0x56219b6f9c8e]
      /usr/sbin/mariadbd(_ZN7handler13ha_update_rowEPKhS1_+0x2a2)[0x56219b41d732]
      /usr/sbin/mariadbd(_ZN21Update_rows_log_event11do_exec_rowEP14rpl_group_info+0x2b4)[0x56219b533ed4]
      /usr/sbin/mariadbd(_ZN14Rows_log_event14do_apply_eventEP14rpl_group_info+0x33f)[0x56219b5279cf]
      /usr/sbin/mariadbd(_Z18wsrep_apply_eventsP3THDP14Relay_log_infoPKvm+0x1e9)[0x56219b6cfa19]
      /usr/sbin/mariadbd(+0xc19b30)[0x56219b6b6b30]
      /usr/sbin/mariadbd(_ZN21Wsrep_applier_service15apply_write_setERKN5wsrep7ws_metaERKNS0_12const_bufferERNS0_14mutable_bufferE+0xa6)[0x56219b6b7966]
      /usr/sbin/mariadbd(+0xef47c7)[0x56219b9917c7]
      /usr/sbin/mariadbd(+0xf05c2e)[0x56219b9a2c2e]
      /usr/lib/galera/libgalera_smm.so(+0x576b1)[0x7f1d2f8996b1]
      /usr/lib/galera/libgalera_smm.so(+0x6591f)[0x7f1d2f8a791f]
      /usr/lib/galera/libgalera_smm.so(+0x6a724)[0x7f1d2f8ac724]
      /usr/lib/galera/libgalera_smm.so(+0x959df)[0x7f1d2f8d79df]
      /usr/lib/galera/libgalera_smm.so(+0x964a2)[0x7f1d2f8d84a2]
      /usr/lib/galera/libgalera_smm.so(+0x6ab78)[0x7f1d2f8acb78]
      /usr/lib/galera/libgalera_smm.so(+0x459cd)[0x7f1d2f8879cd]
      /usr/sbin/mariadbd(_ZN5wsrep18wsrep_provider_v2611run_applierEPNS_21high_priority_serviceE+0xe)[0x56219b9a329e]
      /usr/sbin/mariadbd(+0xc34e0d)[0x56219b6d1e0d]
      /usr/sbin/mariadbd(_Z15start_wsrep_THDPv+0x26b)[0x56219b6c1d0b]
      /usr/sbin/mariadbd(+0xbb4ceb)[0x56219b651ceb]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x7ea7)[0x7f1d38481ea7]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f1d38088a2f]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f1d27bc637b): UPDATE media_contents_varchar SET media_id=12820, tag_id=0, tag_internal_id=14, media_content_value='14-550337', media_content_dtupdate='2024-07-11 14:11:40' WHERE media_content_id=1382950
       
      Connection ID (thread ID): 2
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /data/server2
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        0                    unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             128112               128112               processes
      Max open files            1048576              1048576              files
      Max locked memory         65536                65536                bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       128112               128112               signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: core
       
      Kernel version: Linux version 5.10.0-26-amd64 (debian-kernel@lists.debian.org) (gcc-10 (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP Debian 5.10.197-1 (2023-09-29)
      

      CREATE TABLE `media_contents_varchar` (
        `media_content_id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
        `media_id` BIGINT(20) NOT NULL,
        `tag_id` BIGINT(20) UNSIGNED NOT NULL,
        `tag_internal_id` BIGINT(20) NOT NULL,
        `media_content_value` VARCHAR(255) NOT NULL COLLATE 'utf8_general_ci',
        `media_content_dtcreate` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
        `media_content_dtupdate` DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00',
        `media_content_altref` VARCHAR(255) AS (if(`tag_internal_id` = 14,`media_content_value`,NULL)) virtual,
        PRIMARY KEY (`media_content_id`) USING BTREE,
        INDEX `ix_media_tag` (`media_id`, `tag_internal_id`) USING BTREE,
        INDEX `ix_media_id` (`media_id`) USING BTREE,
        INDEX `IX_altref` (`media_content_altref`(10)) USING BTREE
      )
      COLLATE='utf8_general_ci'
      ENGINE=InnoDB
      AUTO_INCREMENT=1512055
      ;
      

      Attachments

        Issue Links

          Activity

            So, the bold results of the day are

            1. I've been able to recreate the environment close to the reported one. See Dockerfile and packages_modified.txt.
            I believe this wasn't really required, but I was struggling with matching the debug symbols with core file. Tuns out, I had to take maraidb packages from snapshot.debian.org, and not from https://archive.mariadb.org/mariadb-10.5.23/

            2. Upon analyzing the core file backtrace, the important thread is of course the wsrep_applier one:

            (gdb) t 5
            [Switching to thread 5 (Thread 0x7fb87c312700 (LWP 630341))]
            #0  0x00007fb884c45087 in kill () at ../sysdeps/unix/syscall-template.S:120
            120	../sysdeps/unix/syscall-template.S: No such file or directory.
            (gdb) bt
            #0  0x00007fb884c45087 in kill () at ../sysdeps/unix/syscall-template.S:120
            #1  0x0000562c92ce635e in handle_fatal_signal (sig=6) at ./sql/signal_handler.cc:369
            #2  <signal handler called>
            #3  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #4  0x00007fb884c2e537 in __GI_abort () at abort.c:79
            #5  0x0000562c929b729a in ut_dbg_assertion_failed (expr=expr@entry=0x562c93450f58 "!cursor->index->is_committed()", file=file@entry=0x562c93450ab0 "./storage/innobase/row/row0ins.cc", 
                line=line@entry=219) at ./storage/innobase/ut/ut0dbg.cc:60
            #6  0x0000562c929a63d9 in row_ins_sec_index_entry_by_modify (mtr=0x7fb87c30f550, thr=0x7fb3d0319e38, entry=0x7fb3de911320, heap=0x7fb3de92b5e8, offsets_heap=<optimized out>, 
                offsets=0x7fb87c30e5f0, cursor=0x7fb87c30e660, mode=2, flags=0) at ./storage/innobase/row/row0ins.cc:219
            #7  row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7fb85e304b20, offsets_heap=<optimized out>, heap=0x7fb3de92b5e8, entry=0x7fb3de911320, trx_id=0, thr=0x7fb3d0319e38)
                at ./storage/innobase/row/row0ins.cc:3147
            #8  0x0000562c9306781c in row_ins_sec_index_entry (index=index@entry=0x7fb85e304b20, entry=entry@entry=0x7fb3de911320, thr=thr@entry=0x7fb3d0319e38, check_foreign=<optimized out>)
                at ./storage/innobase/row/row0ins.cc:3358
            #9  0x0000562c93099382 in row_upd_sec_index_entry (node=0x7fb3d0319a28, thr=0x7fb3d0319e38) at ./storage/innobase/row/row0upd.cc:2237
            #10 0x0000562c93099a7f in row_upd_sec_step (thr=0x7fb3d0319e38, node=0x7fb3d0319a28) at ./storage/innobase/row/row0upd.cc:2264
            #11 row_upd (thr=0x7fb3d0319e38, node=0x7fb3d0319a28) at ./storage/innobase/row/row0upd.cc:3028
            #12 row_upd_step (thr=thr@entry=0x7fb3d0319e38) at ./storage/innobase/row/row0upd.cc:3143
            #13 0x0000562c93077dcb in row_update_for_mysql (prebuilt=0x7fb3d0319190) at ./storage/innobase/row/row0mysql.cc:1801
            #14 0x0000562c92fd0c8e in ha_innobase::update_row (this=0x7fb85f472e60, old_row=0x7fb3d0497410 "\377I\214\031", new_row=0x7fb3d0496de0 "\376I\214\031")
                at ./storage/innobase/handler/ha_innodb.cc:8477
            #15 0x0000562c92cf4732 in handler::ha_update_row (this=0x7fb85f472e60, old_data=0x7fb3d0497410 "\377I\214\031", new_data=0x7fb3d0496de0 "\376I\214\031") at ./sql/handler.cc:7333
            #16 0x0000562c92e0aed4 in Update_rows_log_event::do_exec_row (this=0x7fb3dd124908, rgi=0x7fb85c01cd10) at ./sql/log_event_server.cc:8452
            #17 0x0000562c92dfe9cf in Rows_log_event::do_apply_event (this=0x7fb3dd124908, rgi=0x7fb85c01cd10) at ./sql/log_event_server.cc:5717
            #18 0x0000562c92fa6a19 in Log_event::apply_event (rgi=<optimized out>, this=0x7fb3dd124908) at ./sql/log_event.h:1488
            #19 wsrep_apply_events (thd=thd@entry=0x7fb85c000c58, rli=<optimized out>, events_buf=<optimized out>, buf_len=0) at ./sql/wsrep_applier.cc:208
            #20 0x0000562c92f8db30 in apply_events (thd=thd@entry=0x7fb85c000c58, rli=<optimized out>, data=..., err=...) at ./sql/wsrep_high_priority_service.cc:128
            #21 0x0000562c92f8e966 in Wsrep_applier_service::apply_write_set (this=0x7fb87c311cc0, ws_meta=..., data=..., err=...) at ./sql/wsrep_high_priority_service.cc:595
            #22 0x0000562c932687c7 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at ./wsrep-lib/src/server_state.cpp:332
            #23 0x0000562c93279c2e in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x7fb87c311cc0) at ./wsrep-lib/include/wsrep/high_priority_service.hpp:47
            #24 (anonymous namespace)::apply_cb (ctx=0x7fb87c311cc0, wsh=<optimized out>, flags=<optimized out>, buf=<optimized out>, meta=<optimized out>, exit_loop=0x7fb87c31117f)
                at ./wsrep-lib/src/wsrep_provider_v26.cpp:507
            #25 0x00007fb87c5186b1 in galera::Monitor<galera::ReplicatorSMM::CommitOrder>::update_last_left (lock=..., this=0x1) at ./galera/src/monitor.hpp:467
            #26 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::post_leave (this=0x1, obj_seqno=65, lock=...) at ./galera/src/monitor.hpp:517
            [snip]
            

            An uncommitted index may happen in three cases:
            1. Upon innodb statistics manipulation
            2. Upon creating a new table
            3. During inplace alter table adding index
            4. During a table cold load

            It's not 2, we know the table is long-existant
            It's not innodb statistics, as cursor->index->table->name shows, so, not 1

            To confirm it's not 3, dcz01, I'll need a feedback from you:

            Did you alter table "b*_p*_t*/m*_c*_v*" (name is obstructed) just around the failure on the node?
            Note that Alter table, especially adding index, may take days, so it's important to know was it in progress close to the time of failure.

            nikitamalyavin Nikita Malyavin added a comment - So, the bold results of the day are 1. I've been able to recreate the environment close to the reported one. See Dockerfile and packages_modified.txt . I believe this wasn't really required, but I was struggling with matching the debug symbols with core file. Tuns out, I had to take maraidb packages from snapshot.debian.org, and not from https://archive.mariadb.org/mariadb-10.5.23/ 2. Upon analyzing the core file backtrace, the important thread is of course the wsrep_applier one: (gdb) t 5 [Switching to thread 5 (Thread 0x7fb87c312700 (LWP 630341))] #0 0x00007fb884c45087 in kill () at ../sysdeps/unix/syscall-template.S:120 120 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x00007fb884c45087 in kill () at ../sysdeps/unix/syscall-template.S:120 #1 0x0000562c92ce635e in handle_fatal_signal (sig=6) at ./sql/signal_handler.cc:369 #2 <signal handler called> #3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #4 0x00007fb884c2e537 in __GI_abort () at abort.c:79 #5 0x0000562c929b729a in ut_dbg_assertion_failed (expr=expr@entry=0x562c93450f58 "!cursor->index->is_committed()", file=file@entry=0x562c93450ab0 "./storage/innobase/row/row0ins.cc", line=line@entry=219) at ./storage/innobase/ut/ut0dbg.cc:60 #6 0x0000562c929a63d9 in row_ins_sec_index_entry_by_modify (mtr=0x7fb87c30f550, thr=0x7fb3d0319e38, entry=0x7fb3de911320, heap=0x7fb3de92b5e8, offsets_heap=<optimized out>, offsets=0x7fb87c30e5f0, cursor=0x7fb87c30e660, mode=2, flags=0) at ./storage/innobase/row/row0ins.cc:219 #7 row_ins_sec_index_entry_low (flags=0, mode=2, index=0x7fb85e304b20, offsets_heap=<optimized out>, heap=0x7fb3de92b5e8, entry=0x7fb3de911320, trx_id=0, thr=0x7fb3d0319e38) at ./storage/innobase/row/row0ins.cc:3147 #8 0x0000562c9306781c in row_ins_sec_index_entry (index=index@entry=0x7fb85e304b20, entry=entry@entry=0x7fb3de911320, thr=thr@entry=0x7fb3d0319e38, check_foreign=<optimized out>) at ./storage/innobase/row/row0ins.cc:3358 #9 0x0000562c93099382 in row_upd_sec_index_entry (node=0x7fb3d0319a28, thr=0x7fb3d0319e38) at ./storage/innobase/row/row0upd.cc:2237 #10 0x0000562c93099a7f in row_upd_sec_step (thr=0x7fb3d0319e38, node=0x7fb3d0319a28) at ./storage/innobase/row/row0upd.cc:2264 #11 row_upd (thr=0x7fb3d0319e38, node=0x7fb3d0319a28) at ./storage/innobase/row/row0upd.cc:3028 #12 row_upd_step (thr=thr@entry=0x7fb3d0319e38) at ./storage/innobase/row/row0upd.cc:3143 #13 0x0000562c93077dcb in row_update_for_mysql (prebuilt=0x7fb3d0319190) at ./storage/innobase/row/row0mysql.cc:1801 #14 0x0000562c92fd0c8e in ha_innobase::update_row (this=0x7fb85f472e60, old_row=0x7fb3d0497410 "\377I\214\031", new_row=0x7fb3d0496de0 "\376I\214\031") at ./storage/innobase/handler/ha_innodb.cc:8477 #15 0x0000562c92cf4732 in handler::ha_update_row (this=0x7fb85f472e60, old_data=0x7fb3d0497410 "\377I\214\031", new_data=0x7fb3d0496de0 "\376I\214\031") at ./sql/handler.cc:7333 #16 0x0000562c92e0aed4 in Update_rows_log_event::do_exec_row (this=0x7fb3dd124908, rgi=0x7fb85c01cd10) at ./sql/log_event_server.cc:8452 #17 0x0000562c92dfe9cf in Rows_log_event::do_apply_event (this=0x7fb3dd124908, rgi=0x7fb85c01cd10) at ./sql/log_event_server.cc:5717 #18 0x0000562c92fa6a19 in Log_event::apply_event (rgi=<optimized out>, this=0x7fb3dd124908) at ./sql/log_event.h:1488 #19 wsrep_apply_events (thd=thd@entry=0x7fb85c000c58, rli=<optimized out>, events_buf=<optimized out>, buf_len=0) at ./sql/wsrep_applier.cc:208 #20 0x0000562c92f8db30 in apply_events (thd=thd@entry=0x7fb85c000c58, rli=<optimized out>, data=..., err=...) at ./sql/wsrep_high_priority_service.cc:128 #21 0x0000562c92f8e966 in Wsrep_applier_service::apply_write_set (this=0x7fb87c311cc0, ws_meta=..., data=..., err=...) at ./sql/wsrep_high_priority_service.cc:595 #22 0x0000562c932687c7 in apply_write_set (server_state=..., high_priority_service=..., ws_handle=..., ws_meta=..., data=...) at ./wsrep-lib/src/server_state.cpp:332 #23 0x0000562c93279c2e in wsrep::high_priority_service::apply (data=..., ws_meta=..., ws_handle=..., this=0x7fb87c311cc0) at ./wsrep-lib/include/wsrep/high_priority_service.hpp:47 #24 (anonymous namespace)::apply_cb (ctx=0x7fb87c311cc0, wsh=<optimized out>, flags=<optimized out>, buf=<optimized out>, meta=<optimized out>, exit_loop=0x7fb87c31117f) at ./wsrep-lib/src/wsrep_provider_v26.cpp:507 #25 0x00007fb87c5186b1 in galera::Monitor<galera::ReplicatorSMM::CommitOrder>::update_last_left (lock=..., this=0x1) at ./galera/src/monitor.hpp:467 #26 galera::Monitor<galera::ReplicatorSMM::CommitOrder>::post_leave (this=0x1, obj_seqno=65, lock=...) at ./galera/src/monitor.hpp:517 [snip] An uncommitted index may happen in three cases: 1. Upon innodb statistics manipulation 2. Upon creating a new table 3. During inplace alter table adding index 4. During a table cold load It's not 2, we know the table is long-existant It's not innodb statistics, as cursor->index->table->name shows, so, not 1 To confirm it's not 3, dcz01 , I'll need a feedback from you: Did you alter table "b*_p*_t*/m*_c*_v*" (name is obstructed) just around the failure on the node? Note that Alter table, especially adding index, may take days, so it's important to know was it in progress close to the time of failure.
            dcz01 Daniel Czadek added a comment - - edited

            Hi,
            I asked our developers and they confirmed that this table has never been optimized automatically nor manually.
            This new extra column "media_content_altref" was created live and the systems hasn't been bootet since then (table cold load).

            Directly from developers:
            We had just performed an alter table shortly before, adding the index to the virtual column. The error occurs when records affecting this index are entered (insert, update). The associated ALTER TABLE was very quick, and at least according to SHOW PROCESSLIST, no active background activity was visible. As long as there is no index on the virtual column, everything runs smoothly.

            Regards

            dcz01 Daniel Czadek added a comment - - edited Hi, I asked our developers and they confirmed that this table has never been optimized automatically nor manually. This new extra column "media_content_altref" was created live and the systems hasn't been bootet since then (table cold load). Directly from developers: We had just performed an alter table shortly before, adding the index to the virtual column. The error occurs when records affecting this index are entered (insert, update). The associated ALTER TABLE was very quick, and at least according to SHOW PROCESSLIST, no active background activity was visible. As long as there is no index on the virtual column, everything runs smoothly. Regards

            dcz01 So, did I understand correctly, that the crash has happened just after ALTER TABLE adding that index?

            Or is does it happen after the server reload? What comes first?

            Also you mentioned you've got the second time crash – I guess no alter table was around it, especially, related to that virtual column index, is it?

            nikitamalyavin Nikita Malyavin added a comment - dcz01 So, did I understand correctly, that the crash has happened just after ALTER TABLE adding that index? Or is does it happen after the server reload? What comes first? Also you mentioned you've got the second time crash – I guess no alter table was around it, especially, related to that virtual column index, is it?

            I've missed the bit that the assertion was !cursor->index->is_committed(), i.e. the index is expected to be uncommitted.

            This corrupts my chain of thought significantly.

            The log reports
            InnoDB: Record in index `IX_altref` of table `xxx`.`media_contents_varchar` was not found on update: TUPLE (info_bits=0, 2 fields):

            {NULL, (0x0000000000151A26)}

            However, the assertion seems to happen on a different record.

            (gdb) f 9
            #9  0x0000562c93099382 in row_upd_sec_index_entry (node=0x7fb3d0319a28, thr=0x7fb3d0319e38) at ./storage/innobase/row/row0upd.cc:2237
            2237	./storage/innobase/row/row0upd.cc: No such file or directory.
            (gdb) x/8xb entry->fields[1]->data
            0x7fb85cdcf976:	0x00	0x00	0x00	0x00	0x00	0x19	0x8c	0x49
            (gdb) x/8xb entry->fields[0]->data
            0x7fb3d049710c:	0x32	0x34	0x2d	0x35	0x31	0x35	0x32	0x38
            (gdb) p (char*)entry->fields[0]->data
            $51 = 0x7fb3d049710c "24-515287"
            

            The primary key is unchanged, which is easier to tell by the sql-layer records:

            (gdb) f 16
            (gdb) p m_table->key_info[m_table->s->primary_key]->key_part[0].field->ptr - m_table->record[0]
            $52 = 1
            (gdb) x/8xb m_table->record[0]+1
            0x7fb3d0496de1:	0x49	0x8c	0x19	0x00	0x00	0x00	0x00	0x00
            (gdb) x/8xb m_table->record[1]+1
            0x7fb3d0497411:	0x49	0x8c	0x19	0x00	0x00	0x00	0x00	0x00
            

            `media_content_altref` is indeed changed.

            (gdb) p m_table->field[7]->field_name.str
            $54 = 0x7fb3de714264 "media_content_altref"
            (gdb) p m_table->field[7]->ptr - m_table->record[0]
            $56 = 810
            (gdb) x/8xb m_table->record[0]+810
            0x7fb3d049710a:	0x09	0x00	0x32	0x34	0x2d	0x35	0x31	0x35
            (gdb) x/8xb m_table->record[1]+810
            0x7fb3d049773a:	0x00	0x00	0x00	0x00	0x00	0x00	0x00	0x00
            

            We can see it's actually changed from NULL:

            (gdb) p m_table->field[7]->null_bit
            $59 = 1 '\001'
            (gdb) p m_table->field[7]->null_ptr - m_table->record[0]
            $60 = 0
            (gdb) p m_table->record[0][0]&1
            $62 = 0
            (gdb) p m_table->record[1][0]&1
            $63 = 1
            

            So the story is:
            1. Some supposedly unrelated entry wasn't found in the index to update.
            2. The entry (NULL, 0x198c49) began updating.
            3. It was found in the index, and engine decides to update it inplace (row_ins_sec_index_entry_by_modify)
            4. It found not delete-marked, which is expected to happen only to uncommitted indexes, however the index is committed.

            nikitamalyavin Nikita Malyavin added a comment - I've missed the bit that the assertion was !cursor->index->is_committed() , i.e. the index is expected to be uncommitted. This corrupts my chain of thought significantly. The log reports InnoDB: Record in index `IX_altref` of table `xxx`.`media_contents_varchar` was not found on update: TUPLE (info_bits=0, 2 fields): {NULL, (0x0000000000151A26)} However, the assertion seems to happen on a different record. (gdb) f 9 #9 0x0000562c93099382 in row_upd_sec_index_entry (node=0x7fb3d0319a28, thr=0x7fb3d0319e38) at ./storage/innobase/row/row0upd.cc:2237 2237 ./storage/innobase/row/row0upd.cc: No such file or directory. (gdb) x/8xb entry->fields[1]->data 0x7fb85cdcf976: 0x00 0x00 0x00 0x00 0x00 0x19 0x8c 0x49 (gdb) x/8xb entry->fields[0]->data 0x7fb3d049710c: 0x32 0x34 0x2d 0x35 0x31 0x35 0x32 0x38 (gdb) p (char*)entry->fields[0]->data $51 = 0x7fb3d049710c "24-515287" The primary key is unchanged, which is easier to tell by the sql-layer records: (gdb) f 16 (gdb) p m_table->key_info[m_table->s->primary_key]->key_part[0].field->ptr - m_table->record[0] $52 = 1 (gdb) x/8xb m_table->record[0]+1 0x7fb3d0496de1: 0x49 0x8c 0x19 0x00 0x00 0x00 0x00 0x00 (gdb) x/8xb m_table->record[1]+1 0x7fb3d0497411: 0x49 0x8c 0x19 0x00 0x00 0x00 0x00 0x00 `media_content_altref` is indeed changed. (gdb) p m_table->field[7]->field_name.str $54 = 0x7fb3de714264 "media_content_altref" (gdb) p m_table->field[7]->ptr - m_table->record[0] $56 = 810 (gdb) x/8xb m_table->record[0]+810 0x7fb3d049710a: 0x09 0x00 0x32 0x34 0x2d 0x35 0x31 0x35 (gdb) x/8xb m_table->record[1]+810 0x7fb3d049773a: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 We can see it's actually changed from NULL: (gdb) p m_table->field[7]->null_bit $59 = 1 '\001' (gdb) p m_table->field[7]->null_ptr - m_table->record[0] $60 = 0 (gdb) p m_table->record[0][0]&1 $62 = 0 (gdb) p m_table->record[1][0]&1 $63 = 1 So the story is: 1. Some supposedly unrelated entry wasn't found in the index to update. 2. The entry (NULL, 0x198c49) began updating. 3. It was found in the index, and engine decides to update it inplace (row_ins_sec_index_entry_by_modify) 4. It found not delete-marked, which is expected to happen only to uncommitted indexes, however the index is committed.

            The only suggestion for now that I have is that "record is not deleted when expected" means data corruption, and I believe the server shouldn't crash in this case.

            marko I would propose changing assertion to debug-only and make handling in release builds:

            --- a/storage/innobase/row/row0ins.cc
            +++ b/storage/innobase/row/row0ins.cc
            @@ -215,10 +215,12 @@
             		upgrade at least until this INSERT or UPDATE
             		returns. After that point, set_committed(true)
             		would be invoked in commit_inplace_alter_table(). */
            +#ifdef UNIV_DEBUG
             		ut_a(update->n_fields == 0);
             		ut_a(!cursor->index->is_committed());
             		ut_ad(!dict_index_is_online_ddl(cursor->index));
            -		return(DB_SUCCESS);
            +#endif
            +		return(DB_INDEX_CORRUPT);
             	}
             
             	if (mode == BTR_MODIFY_LEAF) {
            

            What do you think?

            nikitamalyavin Nikita Malyavin added a comment - The only suggestion for now that I have is that "record is not deleted when expected" means data corruption, and I believe the server shouldn't crash in this case. marko I would propose changing assertion to debug-only and make handling in release builds: --- a/storage/innobase/row/row0ins.cc +++ b/storage/innobase/row/row0ins.cc @@ -215,10 +215,12 @@ upgrade at least until this INSERT or UPDATE returns. After that point, set_committed(true) would be invoked in commit_inplace_alter_table(). */ +#ifdef UNIV_DEBUG ut_a(update->n_fields == 0); ut_a(!cursor->index->is_committed()); ut_ad(!dict_index_is_online_ddl(cursor->index)); - return(DB_SUCCESS); +#endif + return(DB_INDEX_CORRUPT); } if (mode == BTR_MODIFY_LEAF) { What do you think?

            People

              nikitamalyavin Nikita Malyavin
              dcz01 Daniel Czadek
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.