Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18519

0x7f0118195700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.7/storage/innobase/btr/btr0cur.cc line 4308

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.3.7, 10.3.12, 10.4(EOL)
    • 10.6.9, 10.7.5, 10.8.4, 10.9.2

    Description

      Following error output is generated during execution of an "normal" update query and leads to a crash of mariadb master (master-master replication).
      Occurrence: 3 times on two different database tables, not reproducible so far.

      2019-01-29 13:24:41 434826568 [ERROR] InnoDB: Page old data size 15175 new data size 15391, page old max ins size 1063 new max ins size 847
      2019-01-29 13:24:41 434826568 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      2019-01-29 13:24:41 0x7f0118195700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.3.7/storage/innobase/btr/btr0cur.cc line 4308
      InnoDB: Failing assertion: rec
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/xtradbinnodb-recovery-modes/
      InnoDB: about forcing recovery.
      190129 13:24:41 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.3.7-MariaDB-1:10.3.7+maria~xenial-log
      key_buffer_size=8589934592
      read_buffer_size=2097152
      max_used_connections=200
      max_threads=242
      thread_count=138
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 24749309 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7efdec0009a8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f0118194cf8 thread_stack 0x30000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55f49174003e]
      /usr/sbin/mysqld(handle_fatal_signal+0x347)[0x55f4911ec607]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f045d869390]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f045ce38428]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f045ce3a02a]
      /usr/sbin/mysqld(+0x49ee74)[0x55f490f39e74]
      /usr/sbin/mysqld(+0xb0a1aa)[0x55f4915a51aa]
      /usr/sbin/mysqld(+0xa8c949)[0x55f491527949]
      /usr/sbin/mysqld(+0xa91a1d)[0x55f49152ca1d]
      /usr/sbin/mysqld(+0xa92c4f)[0x55f49152dc4f]
      /usr/sbin/mysqld(+0xa5fed4)[0x55f4914faed4]
      /usr/sbin/mysqld(+0x99f3ff)[0x55f49143a3ff]
      /usr/sbin/mysqld(_ZN7handler13ha_update_rowEPKhS1_+0x122)[0x55f4911f7072]
      /usr/sbin/mysqld(_Z12mysql_updateP3THDP10TABLE_LISTR4ListI4ItemES6_PS4_jP8st_ordery15enum_duplicatesbPySB_+0x16b1)[0x55f4910b05d1]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2478)[0x55f491014058]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x21a)[0x55f49101a77a]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1c23)[0x55f49101d0c3]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x13c)[0x55f49101db7c]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x252)[0x55f4910ebc92]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55f4910ebe0d]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f045d85f6ba]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f045cf0a41d]
      

      "Optimize table" does not help because the second crash occurred after an "optimize table run" on that database table.

      Attachments

        Issue Links

          Activity

            onock Christian Kueppers created issue -

            Do you have an error log and a stack trace from 10.3.12?
            There are dbgsym packages for Xenial in the repo, if you can install them and enable the coredump, it might produce a better stack trace.

            elenst Elena Stepanova added a comment - Do you have an error log and a stack trace from 10.3.12? There are dbgsym packages for Xenial in the repo, if you can install them and enable the coredump, it might produce a better stack trace.
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Labels crash crash need_feedback

            No, i haven't got the error log from 10.3.12. Sadly i've lost them because of log rotation.
            @dbgsym package: Do you mean mariadb-server-10.3-dbgsym and mariadb-server-core-10.3-dbgsym?
            I can try to install them, enable coredump for mariadb service and wait for the next bug hit/crash. Does it create any impact on the performance or the application?

            onock Christian Kueppers added a comment - No, i haven't got the error log from 10.3.12. Sadly i've lost them because of log rotation. @dbgsym package: Do you mean mariadb-server-10.3-dbgsym and mariadb-server-core-10.3-dbgsym? I can try to install them, enable coredump for mariadb service and wait for the next bug hit/crash. Does it create any impact on the performance or the application?
            elenst Elena Stepanova made changes -
            Labels crash need_feedback crash

            Enabling coredump will slow down server restart in case of a crash, as the server will take time to write the coredump.
            We are not aware of any noticeable impact from installing the debug symbol packages.

            elenst Elena Stepanova added a comment - Enabling coredump will slow down server restart in case of a crash, as the server will take time to write the coredump. We are not aware of any noticeable impact from installing the debug symbol packages.
            elenst Elena Stepanova made changes -
            Labels crash crash need_feedback

            We've enabled the coredump on all of our mariadb servers. Until now there was no further crash. I'll report back with further details if there is a new crash.

            onock Christian Kueppers added a comment - We've enabled the coredump on all of our mariadb servers. Until now there was no further crash. I'll report back with further details if there is a new crash.
            elenst Elena Stepanova made changes -
            Labels crash need_feedback crash
            elenst Elena Stepanova made changes -
            Labels crash crash need_feedback

            Until now there was no further occurrence. I would say that the ticket can be closed once.

            onock Christian Kueppers added a comment - Until now there was no further occurrence. I would say that the ticket can be closed once.
            elenst Elena Stepanova made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Fix Version/s N/A [ 14700 ]
            Resolution Incomplete [ 4 ]
            Status Open [ 1 ] Closed [ 6 ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            onock Christian Kueppers made changes -
            Affects Version/s 10.3.17 [ 23411 ]
            Environment 1. environnement: Ubuntu 16.04.5, kernel: 4.4.0-141-generic x86_64, MariaDB version: 10.3.12
            2. environnement: Ubuntu 16.04.4, kernel: 4.4.0-127-generic x86_64, MariaDB version: 10.3.7
            1. environnement: Ubuntu 16.04.5, kernel: 4.4.0-141-generic x86_64, MariaDB version: 10.3.12
            2. environnement: Ubuntu 16.04.4, kernel: 4.4.0-127-generic x86_64, MariaDB version: 10.3.7
            3. environnement: Ubuntu 18.04.3, kernel: 4.15.0-46-generic, MariaDB version: 10.3.17
            onock Christian Kueppers added a comment - - edited

            I've noticed a new crash with same error message in version 10.3.17 on added 3. environnement. There is a 11G unpacked coredump available. I've uploaded detailed error log and coredump to ftp.askmonty.org/private (MDEV18519.tar.gz). We've manually dropped the table with corrupted index and import a backup. Please reopen this bug ticket.

            onock Christian Kueppers added a comment - - edited I've noticed a new crash with same error message in version 10.3.17 on added 3. environnement. There is a 11G unpacked coredump available. I've uploaded detailed error log and coredump to ftp.askmonty.org/private (MDEV18519.tar.gz). We've manually dropped the table with corrupted index and import a backup. Please reopen this bug ticket.
            elenst Elena Stepanova made changes -
            Resolution Incomplete [ 4 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            elenst Elena Stepanova made changes -
            Labels crash need_feedback crash
            elenst Elena Stepanova made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s N/A [ 14700 ]
            Assignee Marko Mäkelä [ marko ]

            onock, this ticket has not been closed.
            I am afraid that a core dump alone is not sufficient for debugging the problem. Starting with MariaDB Server 10.3, core dumps do not include a copy of the InnoDB buffer pool by default (which is why they should be much smaller now). For this kind of corruption, having a copy of the corrupted page in the core dump is essential.

            Furthermore, the core dump can only be analyzed if we have the exact same copy of the server executable and all shared libraries that were linked to it. Otherwise, the stack traces of the active threads will not be resolved correctly.

            Could you post the stack trace of the crashing thread, based on the core dump? You should be able to obtain it with the following:

            gdb /usr/sbin/mysqld core
            thread 1
            backtrace
            quit
            

            This should identify the problematic table, and then perhaps, based on query logs, you could check if anything special was executed on the table.

            You might want to try to rebuild the table (ALTER TABLE t FORCE) to see if it cures the corruption. If the table is corrupted seriously enough, the server may crash during the rebuild attempt.

            marko Marko Mäkelä added a comment - onock , this ticket has not been closed. I am afraid that a core dump alone is not sufficient for debugging the problem. Starting with MariaDB Server 10.3, core dumps do not include a copy of the InnoDB buffer pool by default (which is why they should be much smaller now). For this kind of corruption, having a copy of the corrupted page in the core dump is essential. Furthermore, the core dump can only be analyzed if we have the exact same copy of the server executable and all shared libraries that were linked to it. Otherwise, the stack traces of the active threads will not be resolved correctly. Could you post the stack trace of the crashing thread, based on the core dump? You should be able to obtain it with the following: gdb /usr/sbin/mysqld core thread 1 backtrace quit This should identify the problematic table, and then perhaps, based on query logs, you could check if anything special was executed on the table. You might want to try to rebuild the table ( ALTER TABLE t FORCE ) to see if it cures the corruption. If the table is corrupted seriously enough, the server may crash during the rebuild attempt.

            Thanks, here is the desired backtrace

            (gdb) backtrace
            #0  0x00007f6eb3914e97 in raise () from /lib/x86_64-linux-gnu/libc.so.6
            #1  0x00007f6eb39168db in abort () from /lib/x86_64-linux-gnu/libc.so.6
            #2  0x000055bacad33776 in ut_dbg_assertion_failed (expr=expr@entry=0x55bacb7001bb "rec", file=file@entry=0x55bacb718b28 "/home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/btr/btr0cur.cc", line=line@entry=4467)
                at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/ut/ut0dbg.cc:60
            #3  0x000055bacb3a774f in btr_cur_optimistic_update (flags=flags@entry=2, cursor=cursor@entry=0x7f6ae4398b50, offsets=offsets@entry=0x7f6c9c246868, heap=heap@entry=0x7f6c9c246920, update=<optimized out>, cmpl_info=0,
                thr=0x7f6ae44ffae8, trx_id=1319505779, mtr=0x7f6c9c247280) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/btr/btr0cur.cc:4467
            #4  0x000055bacb329c01 in row_upd_clust_rec (flags=flags@entry=0, node=node@entry=0x7f6ae44fc0a0, index=index@entry=0x55bad3f74500, offsets=offsets@entry=0x7f6ae4503000, offsets_heap=offsets_heap@entry=0x7f6c9c246920,
                thr=thr@entry=0x7f6ae44ffae8, mtr=0x7f6c9c247280) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/row/row0upd.cc:2871
            #5  0x000055bacb32f22d in row_upd_clust_step (node=node@entry=0x7f6ae44fc0a0, thr=thr@entry=0x7f6ae44ffae8) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/row/row0upd.cc:3237
            #6  0x000055bacb330214 in row_upd (thr=0x7f6ae44ffae8, node=0x7f6ae44fc0a0) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/row/row0upd.cc:3299
            #7  row_upd_step (thr=thr@entry=0x7f6ae44ffae8) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/row/row0upd.cc:3443
            #8  0x000055bacb301004 in row_update_for_mysql (prebuilt=0x7f6ae44f8190) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/row/row0mysql.cc:1887
            #9  0x000055bacb24341f in ha_innobase::update_row (this=0x7f6ae44c6640, old_row=0x7f6ae44c7f40 "\274\003\366\003\242\001", new_row=0x7f6ae44c6d80 "\274\003\366\003\242\001")
                at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/handler/ha_innodb.cc:8860
            #10 0x000055bacaffa2f2 in handler::ha_update_row (this=0x7f6ae44c6640, old_data=0x7f6ae44c7f40 "\274\003\366\003\242\001", new_data=0x7f6ae44c6d80 "\274\003\366\003\242\001")
                at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/handler.cc:6478
            #11 0x000055bacaeacab4 in mysql_update (thd=thd@entry=0x7f6ae4000c08, table_list=0x7f6ae400f978, fields=..., values=..., conds=0x7f6ae4010788, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=false,
                found_return=0x7f6c9c247e90, updated_return=0x7f6c9c247f50) at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_update.cc:955
            #12 0x000055bacae0c170 in mysql_execute_command (thd=thd@entry=0x7f6ae4000c08) at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_parse.cc:4296
            #13 0x000055bacae1261a in mysql_parse (thd=thd@entry=0x7f6ae4000c08, rawbuf=<optimized out>, length=293, parser_state=parser_state@entry=0x7f6c9c249550, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false)
                at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_parse.cc:7829
            #14 0x000055bacae152a3 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f6ae4000c08,
                packet=packet@entry=0x7f6ae40b93b9 "/* sql query comment cleared */\n\nUPDATE\n `dateien`\nSET\n `Aenderung` = '2019-08-13 15"..., packet_length=packet_length@entry=293, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_parse.cc:1856
            #15 0x000055bacae15b1c in do_command (thd=0x7f6ae4000c08) at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_parse.cc:1402
            #16 0x000055bacaee9552 in do_handle_one_connection (connect=connect@entry=0x55baddf317f8) at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_connect.cc:1403
            #17 0x000055bacaee960d in handle_one_connection (arg=arg@entry=0x55baddf317f8) at /home/buildbot/buildbot/build/mariadb-10.3.17/sql/sql_connect.cc:1308
            #18 0x000055bacb183191 in pfs_spawn_thread (arg=0x55badde6ef08) at /home/buildbot/buildbot/build/mariadb-10.3.17/storage/perfschema/pfs.cc:1862
            #19 0x00007f6eb43f56db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
            #20 0x00007f6eb39f788f in clone () from /lib/x86_64-linux-gnu/libc.so.6
            

            We haven't got enabled the general query log. The query extracted from error log was

            UPDATE  `dateien` SET  `Aenderung` = '2019-08-13 15:37:41',  `Titel` = 'Expos   zu Mitnahme-2',  `creator:user.Nr` = '19' WHERE  `Id` = '107011'
            

            That don't look special for me.

            Database and table were found via error log entry and sql comment. Last time we took a backup from synced slave and import this. I'll try your command next time.

            onock Christian Kueppers added a comment - Thanks, here is the desired backtrace (gdb) backtrace # 0 0x00007f6eb3914e97 in raise () from /lib/x86_64-linux-gnu/libc.so. 6 # 1 0x00007f6eb39168db in abort () from /lib/x86_64-linux-gnu/libc.so. 6 # 2 0x000055bacad33776 in ut_dbg_assertion_failed (expr=expr @entry = 0x55bacb7001bb "rec" , file=file @entry = 0x55bacb718b28 "/home/buildbot/buildbot/build/mariadb-10.3.17/storage/innobase/btr/btr0cur.cc" , line=line @entry = 4467 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/ut/ut0dbg.cc: 60 # 3 0x000055bacb3a774f in btr_cur_optimistic_update (flags=flags @entry = 2 , cursor=cursor @entry = 0x7f6ae4398b50 , offsets=offsets @entry = 0x7f6c9c246868 , heap=heap @entry = 0x7f6c9c246920 , update=<optimized out>, cmpl_info= 0 , thr= 0x7f6ae44ffae8 , trx_id= 1319505779 , mtr= 0x7f6c9c247280 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/btr/btr0cur.cc: 4467 # 4 0x000055bacb329c01 in row_upd_clust_rec (flags=flags @entry = 0 , node=node @entry = 0x7f6ae44fc0a0 , index=index @entry = 0x55bad3f74500 , offsets=offsets @entry = 0x7f6ae4503000 , offsets_heap=offsets_heap @entry = 0x7f6c9c246920 , thr=thr @entry = 0x7f6ae44ffae8 , mtr= 0x7f6c9c247280 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/row/row0upd.cc: 2871 # 5 0x000055bacb32f22d in row_upd_clust_step (node=node @entry = 0x7f6ae44fc0a0 , thr=thr @entry = 0x7f6ae44ffae8 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/row/row0upd.cc: 3237 # 6 0x000055bacb330214 in row_upd (thr= 0x7f6ae44ffae8 , node= 0x7f6ae44fc0a0 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/row/row0upd.cc: 3299 # 7 row_upd_step (thr=thr @entry = 0x7f6ae44ffae8 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/row/row0upd.cc: 3443 # 8 0x000055bacb301004 in row_update_for_mysql (prebuilt= 0x7f6ae44f8190 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/row/row0mysql.cc: 1887 # 9 0x000055bacb24341f in ha_innobase::update_row ( this = 0x7f6ae44c6640 , old_row= 0x7f6ae44c7f40 "\274\003\366\003\242\001" , new_row= 0x7f6ae44c6d80 "\274\003\366\003\242\001" ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/innobase/handler/ha_innodb.cc: 8860 # 10 0x000055bacaffa2f2 in handler::ha_update_row ( this = 0x7f6ae44c6640 , old_data= 0x7f6ae44c7f40 "\274\003\366\003\242\001" , new_data= 0x7f6ae44c6d80 "\274\003\366\003\242\001" ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/handler.cc: 6478 # 11 0x000055bacaeacab4 in mysql_update (thd=thd @entry = 0x7f6ae4000c08 , table_list= 0x7f6ae400f978 , fields=..., values=..., conds= 0x7f6ae4010788 , order_num=<optimized out>, order=<optimized out>, limit= 18446744073709551615 , ignore= false , found_return= 0x7f6c9c247e90 , updated_return= 0x7f6c9c247f50 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_update.cc: 955 # 12 0x000055bacae0c170 in mysql_execute_command (thd=thd @entry = 0x7f6ae4000c08 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_parse.cc: 4296 # 13 0x000055bacae1261a in mysql_parse (thd=thd @entry = 0x7f6ae4000c08 , rawbuf=<optimized out>, length= 293 , parser_state=parser_state @entry = 0x7f6c9c249550 , is_com_multi=is_com_multi @entry = false , is_next_command=is_next_command @entry = false ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_parse.cc: 7829 # 14 0x000055bacae152a3 in dispatch_command (command=command @entry =COM_QUERY, thd=thd @entry = 0x7f6ae4000c08 , packet=packet @entry = 0x7f6ae40b93b9 "/* sql query comment cleared */\n\nUPDATE\n `dateien`\nSET\n `Aenderung` = '2019-08-13 15" ..., packet_length=packet_length @entry = 293 , is_com_multi=is_com_multi @entry = false , is_next_command=is_next_command @entry = false ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_parse.cc: 1856 # 15 0x000055bacae15b1c in do_command (thd= 0x7f6ae4000c08 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_parse.cc: 1402 # 16 0x000055bacaee9552 in do_handle_one_connection (connect=connect @entry = 0x55baddf317f8 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_connect.cc: 1403 # 17 0x000055bacaee960d in handle_one_connection (arg=arg @entry = 0x55baddf317f8 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /sql/sql_connect.cc: 1308 # 18 0x000055bacb183191 in pfs_spawn_thread (arg= 0x55badde6ef08 ) at /home/buildbot/buildbot/build/mariadb- 10.3 . 17 /storage/perfschema/pfs.cc: 1862 # 19 0x00007f6eb43f56db in start_thread () from /lib/x86_64-linux-gnu/libpthread.so. 0 # 20 0x00007f6eb39f788f in clone () from /lib/x86_64-linux-gnu/libc.so. 6 We haven't got enabled the general query log. The query extracted from error log was UPDATE `dateien` SET `Aenderung` = '2019-08-13 15:37:41' , `Titel` = 'Expos zu Mitnahme-2' , `creator:user.Nr` = '19' WHERE `Id` = '107011' That don't look special for me. Database and table were found via error log entry and sql comment. Last time we took a backup from synced slave and import this. I'll try your command next time.

            Vielen Dank, onock!
            Would it be possible to get the page dump? Note that starting with MariaDB Server 10.3, the buffer pool is excluded from core dumps by default, so the page dump will not be present in any core dump. I think that it should be easiest to attach the debugger to the running server:

            gdb -p $(pgrep mysqld)
            continue
            

            and then reissue the UPDATE statement that triggered the crash. If the table is corrupted, it should crash every time on this operation. I see that it happens when trying to update a clustered index record without involving page split or merge:

            	rec = btr_cur_insert_if_possible(
            		cursor, new_entry, offsets, heap, 0/*n_ext*/, mtr);
            	ut_a(rec); /* <- We calculated above the insert would fit */
            

            Once you get the crash inside gdb, then you have to issue the command up until you are in the stack frame of btr_cur_optimistic_update().
            I would like you to execute the following in gdb:

            dump binary memory /tmp/page.bin block->frame block->frame+srv_page_size
            p *index
            p *index->fields@index->n_fields
            

            If the local variable block is not available to the debugger, try replacing block->frame with cursor.btr_cur.page_cur.block->frame. Likewise, if index is not available, try cursor->index instead.

            Please attach the /tmp/page.bin, or upload it to our FTP server if it contains confidential data. Also, I would like to see the output of SHOW CREATE TABLE dateien.

            Did you invoke any ALTER TABLE on this table while running on MariaDB 10.3? I’d like to know if this might be related to instant ADD COLUMN, which is MDEV-11369. The output that I requested should answer that question.

            Also, it could be useful to start a debug version of the server, to see if other assertions fail when attempting the UPDATE statement.

            marko Marko Mäkelä added a comment - Vielen Dank, onock ! Would it be possible to get the page dump? Note that starting with MariaDB Server 10.3, the buffer pool is excluded from core dumps by default, so the page dump will not be present in any core dump. I think that it should be easiest to attach the debugger to the running server: gdb -p $(pgrep mysqld) continue and then reissue the UPDATE statement that triggered the crash. If the table is corrupted, it should crash every time on this operation. I see that it happens when trying to update a clustered index record without involving page split or merge: rec = btr_cur_insert_if_possible( cursor, new_entry, offsets, heap, 0 /*n_ext*/ , mtr); ut_a(rec); /* <- We calculated above the insert would fit */ Once you get the crash inside gdb, then you have to issue the command up until you are in the stack frame of btr_cur_optimistic_update() . I would like you to execute the following in gdb: dump binary memory /tmp/page.bin block->frame block->frame+srv_page_size p *index p *index->fields@index->n_fields If the local variable block is not available to the debugger, try replacing block->frame with cursor.btr_cur.page_cur.block->frame . Likewise, if index is not available, try cursor->index instead. Please attach the /tmp/page.bin , or upload it to our FTP server if it contains confidential data. Also, I would like to see the output of SHOW CREATE TABLE dateien . Did you invoke any ALTER TABLE on this table while running on MariaDB 10.3? I’d like to know if this might be related to instant ADD COLUMN , which is MDEV-11369 . The output that I requested should answer that question. Also, it could be useful to start a debug version of the server, to see if other assertions fail when attempting the UPDATE statement.

            The described server environment is a production system with several hundreds of databases. The crashed table was reimportet from backup taken from slave, like i said before. From that moment until now this table is in use without any further crash. We do not currently have a test environment with the same behavior.
            I want to help you by providing this page dump, but i don't know the consequence of the described procedure for our production system. Furthermore i think reissuing could lead to no crash.
            If i'm right, is there another possible way to help you or can we prepare something for the next possible crash?

            Following ALTER TABLE commands were issued five days before:

            ALTER TABLE `dateien` ADD COLUMN `Portal277` ENUM('0', '1') NOT NULL AFTER `Portal276`;
            ALTER TABLE `dateien` ADD COLUMN `Portal276` ENUM('0', '1') NOT NULL AFTER `Portal275`;
            ALTER TABLE `dateien` ADD COLUMN `Portal275` ENUM('0', '1') NOT NULL AFTER `Portal274`;
            

            I've placed the SHOW CREATE TABLE on ftp named "MDEV18519_2.tar.gz".

            onock Christian Kueppers added a comment - The described server environment is a production system with several hundreds of databases. The crashed table was reimportet from backup taken from slave, like i said before. From that moment until now this table is in use without any further crash. We do not currently have a test environment with the same behavior. I want to help you by providing this page dump, but i don't know the consequence of the described procedure for our production system. Furthermore i think reissuing could lead to no crash. If i'm right, is there another possible way to help you or can we prepare something for the next possible crash? Following ALTER TABLE commands were issued five days before: ALTER TABLE `dateien` ADD COLUMN `Portal277` ENUM( '0' , '1' ) NOT NULL AFTER `Portal276`; ALTER TABLE `dateien` ADD COLUMN `Portal276` ENUM( '0' , '1' ) NOT NULL AFTER `Portal275`; ALTER TABLE `dateien` ADD COLUMN `Portal275` ENUM( '0' , '1' ) NOT NULL AFTER `Portal274`; I've placed the SHOW CREATE TABLE on ftp named "MDEV18519_2.tar.gz".
            marko Marko Mäkelä made changes -

            onock, thanks. Unfortunately it looks like this could indeed be caused by the instant ADD COLUMN operation. There could be something particular in the table definition that causes page_get_max_insert_size_after_reorganize() to produce a wrong result. As a possible workaround, I would suggest to add ,FORCE to such statements, to ensure that the tables will be rebuilt.

            If the problem is related to instant ADD COLUMN, then it should go away by causing the table to be rebuilt.

            We have an open issue MDEV-20090 reported against 10.4. It could affect 10.3 as well, even though the current test case depends on instant DROP COLUMN (MDEV-15562).

            marko Marko Mäkelä added a comment - onock , thanks. Unfortunately it looks like this could indeed be caused by the instant ADD COLUMN operation. There could be something particular in the table definition that causes page_get_max_insert_size_after_reorganize() to produce a wrong result. As a possible workaround, I would suggest to add ,FORCE to such statements, to ensure that the tables will be rebuilt. If the problem is related to instant ADD COLUMN , then it should go away by causing the table to be rebuilt. We have an open issue MDEV-20090 reported against 10.4. It could affect 10.3 as well, even though the current test case depends on instant DROP COLUMN ( MDEV-15562 ).
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            This looks pretty much like MDEV-19783 and MDEV-19743.

            marko Marko Mäkelä added a comment - This looks pretty much like MDEV-19783 and MDEV-19743 .

            onock, I checked ftp://ftp.mariadb.com/uploads but cannot see any file "MDEV18519_2.tar.gz". I would really need that SHOW CREATE TABLE output in order to proceed with the analysis.

            Another user in MDEV-19783 claimed to have the .ibd file of a similar incident. Having that file could simplify things a lot, because I suspect that the bug depends not only on the table schema, but also on the exact sequence of INSERT, UPDATE, DELETE and purge operations.

            marko Marko Mäkelä added a comment - onock , I checked ftp://ftp.mariadb.com/uploads but cannot see any file "MDEV18519_2.tar.gz". I would really need that SHOW CREATE TABLE output in order to proceed with the analysis. Another user in MDEV-19783 claimed to have the .ibd file of a similar incident. Having that file could simplify things a lot, because I suspect that the bug depends not only on the table schema, but also on the exact sequence of INSERT, UPDATE, DELETE and purge operations.

            I finally got the files. In the error log snippet, I see this mismatch by 100 bytes:

            2019-08-13 15:37:01 129 [ERROR] InnoDB: Page old data size 11431 new data size 11531, page old max ins size 4808 new max ins size 4708
            

            I will try to figure it out based on the SHOW CREATE TABLE output.

            onock, because I do not have the same mysqld and *.so files as you, I cannot view the stack traces from the core dump that you uploaded, but thankfully you posted a stack trace with some interesting pointer values. I loaded the core dump with a mismatching 10.3 server:

            (gdb) p *(btr_cur_t*)0x7f6ae4398b50
            $1 = {index = 0x55bad3f74500, page_cur = {index = 0x0, 
                rec = 0x7f6c0fdea420 <error: Cannot access memory at address 0x7f6c0fdea420>, offsets = 0x0, block = 0x7f6c0cf17d30}, purge_node = 0x0, left_block = 0x0, 
              thr = 0x7f6ae44fbfe8, flag = BTR_CUR_BINARY, tree_height = 2, up_match = 1, 
              up_bytes = 0, low_match = 0, low_bytes = 2, n_fields = 0, n_bytes = 0, 
              fold = 0, path_arr = 0x0, rtr_info = 0x0}
            (gdb) p *((btr_cur_t*)0x7f6ae4398b50)->index
            $2 = {id = 36404, heap = 0x55bad3f743b0, name = {
                m_name = 0x55bad3f74760 "PRIMARY"}, table = 0x55bad3f6f020, page = 3, 
              merge_threshold = 50, type = 3, trx_id_offset = 4, n_user_defined_cols = 1, 
              nulls_equal = 0, n_uniq = 1, n_def = 299, n_fields = 299, n_nullable = 22, 
              n_core_fields = 299, n_core_null_bytes = 3, cached = 1, to_be_dropped = 0, 
              online_status = 0, uncommitted = 0, is_dummy = false, magic_n = 0, 
            …
            

            I will not post index->fields@index->n_fields here, because it would reveal some of the SHOW CREATE TABLE information. That information seems coherent with regard to index->trx_id_offset and index->n_uniq, so I am confident that I extracted the above data correctly from the core dump, despite using data type information from a different executable.

            The interesting thing here is that we have n_fields == n_core_fields, that is, the table is not in the "instant ALTER TABLE" format. This error might not be caused by instant ALTER after all.

            marko Marko Mäkelä added a comment - I finally got the files. In the error log snippet, I see this mismatch by 100 bytes: 2019-08-13 15:37:01 129 [ERROR] InnoDB: Page old data size 11431 new data size 11531, page old max ins size 4808 new max ins size 4708 I will try to figure it out based on the SHOW CREATE TABLE output. onock , because I do not have the same mysqld and *.so files as you, I cannot view the stack traces from the core dump that you uploaded, but thankfully you posted a stack trace with some interesting pointer values. I loaded the core dump with a mismatching 10.3 server: (gdb) p *(btr_cur_t*)0x7f6ae4398b50 $1 = {index = 0x55bad3f74500, page_cur = {index = 0x0, rec = 0x7f6c0fdea420 <error: Cannot access memory at address 0x7f6c0fdea420>, offsets = 0x0, block = 0x7f6c0cf17d30}, purge_node = 0x0, left_block = 0x0, thr = 0x7f6ae44fbfe8, flag = BTR_CUR_BINARY, tree_height = 2, up_match = 1, up_bytes = 0, low_match = 0, low_bytes = 2, n_fields = 0, n_bytes = 0, fold = 0, path_arr = 0x0, rtr_info = 0x0} (gdb) p *((btr_cur_t*)0x7f6ae4398b50)->index $2 = {id = 36404, heap = 0x55bad3f743b0, name = { m_name = 0x55bad3f74760 "PRIMARY"}, table = 0x55bad3f6f020, page = 3, merge_threshold = 50, type = 3, trx_id_offset = 4, n_user_defined_cols = 1, nulls_equal = 0, n_uniq = 1, n_def = 299, n_fields = 299, n_nullable = 22, n_core_fields = 299, n_core_null_bytes = 3, cached = 1, to_be_dropped = 0, online_status = 0, uncommitted = 0, is_dummy = false, magic_n = 0, … I will not post index->fields@index->n_fields here, because it would reveal some of the SHOW CREATE TABLE information. That information seems coherent with regard to index->trx_id_offset and index->n_uniq , so I am confident that I extracted the above data correctly from the core dump, despite using data type information from a different executable. The interesting thing here is that we have n_fields == n_core_fields , that is, the table is not in the "instant ALTER TABLE " format. This error might not be caused by instant ALTER after all.

            It is also possible that the InnoDB data dictionary cache is wrong, and the table actually still was in the "instant ALTER" format. This would obviously cause such mismatch. I cannot check that, because the buffer pool was not included in the core dump, and because the .ibd file is no longer available.

            I have a vague memory that I fixed something like that in the past (InnoDB would "forget" about a past "instant ALTER"), but I cannot find that right now.

            marko Marko Mäkelä added a comment - It is also possible that the InnoDB data dictionary cache is wrong, and the table actually still was in the "instant ALTER" format. This would obviously cause such mismatch. I cannot check that, because the buffer pool was not included in the core dump, and because the .ibd file is no longer available. I have a vague memory that I fixed something like that in the past (InnoDB would "forget" about a past "instant ALTER"), but I cannot find that right now.
            marko Marko Mäkelä made changes -

            My analysis of the data uploaded to MDEV-19783 confirms that InnoDB can wrongly reset the "instant ALTER TABLE was used" status on the clustered index root page.

            marko Marko Mäkelä added a comment - My analysis of the data uploaded to MDEV-19783 confirms that InnoDB can wrongly reset the "instant ALTER TABLE was used" status on the clustered index root page.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.4 [ 22408 ]
            Affects Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            Thanks for your work and provided information. I may have overlooked it, but can I try to help at the moment, or do I have to wait for a fix?

            onock Christian Kueppers added a comment - Thanks for your work and provided information. I may have overlooked it, but can I try to help at the moment, or do I have to wait for a fix?

            We have tried, but have been unable to reproduce this bug. Unfortunately I cannot fix it without a repeatable test case.

            marko Marko Mäkelä added a comment - We have tried, but have been unable to reproduce this bug. Unfortunately I cannot fix it without a repeatable test case.
            julien.fritsch Julien Fritsch made changes -
            Labels crash crash need_feedback
            elenst Elena Stepanova made changes -
            Labels crash need_feedback crash
            elenst Elena Stepanova made changes -
            Labels crash crash need_feedback
            GeoffMontee Geoff Montee (Inactive) made changes -
            Labels crash need_feedback crash

            I believe that I may have found an explanation why mleich failed to repeat this corruption with recent 10.3 or 10.4.
            This error could be fully explained by MDEV-19916, which was fixed in MariaDB Server 10.3.17 and 10.4.7.

            onock, you reported 10.3.17 as an affected version. Could it be that you had executed instant ADD COLUMN before upgrading to that version? In that case, the corruption could have been introduced in an earlier version and only noticed after the upgrade.

            marko Marko Mäkelä added a comment - I believe that I may have found an explanation why mleich failed to repeat this corruption with recent 10.3 or 10.4. This error could be fully explained by MDEV-19916 , which was fixed in MariaDB Server 10.3.17 and 10.4.7. onock , you reported 10.3.17 as an affected version. Could it be that you had executed instant ADD COLUMN before upgrading to that version? In that case, the corruption could have been introduced in an earlier version and only noticed after the upgrade.
            marko Marko Mäkelä made changes -

            The dpkg log shows that the update from version 10.3.13 to 10.3.17 was done on 10.08.2019. ADD COLUMN queries were executed at the end of January and at the end of July 2019. So you can be right. Is there a way to proactively fix this (maybe via e.g. optimize table query) so that it doesn't happen again in the future? Because installing a newer version alone does not seem to help.

            onock Christian Kueppers added a comment - The dpkg log shows that the update from version 10.3.13 to 10.3.17 was done on 10.08.2019. ADD COLUMN queries were executed at the end of January and at the end of July 2019. So you can be right. Is there a way to proactively fix this (maybe via e.g. optimize table query) so that it doesn't happen again in the future? Because installing a newer version alone does not seem to help.
            marko Marko Mäkelä added a comment - - edited

            onock, you can have tables rebuilt by

            ALTER TABLE table_with_instant_add_before_10_3_17 FORCE;
            

            If the corruption was caused by the bug that was fixed in MDEV-19916, then it should only be necessary to do this for tables whose clustered index root page claims that the table is in the 'no instant ADD was done' format. If the root page carries the 'instant ADD' identifier, or if you know that no instant ADD COLUMN was performed, then there is no use to execute that step.

            You should be able to identify the format of tables with the following:

            for i in */*.ibd
            do
              echo -n "$i:"
              od -An -t x1 -j 0xc018 -N 2 "$i"
            done
            

            The 'old format' type code is 45 bf and the 'instant ADD' identifier is 00 12.
            If you use something else than the default innodb_page_size=16k, then you will have to adjust the offset accordingly, e.g., 0x6018 for innodb_page_size=8k (8192 * 3 + 24 = 0x2000 * 3 + 0x18).
            (Edit: originally I forgot to multiply by the clustered index root page number 3.)

            But, beware:

            • The server could crash while converting the table.
            • The columns to which data was instantly added could contain garbage values.
            • The NULL flag bitmaps and column length information could end up being interpreted wrong, so also 'old' columns could appear corrupted. If the table contained only NOT NULL and fixed-length columns before the instant ADD, this form of corruption should not be possible.
            marko Marko Mäkelä added a comment - - edited onock , you can have tables rebuilt by ALTER TABLE table_with_instant_add_before_10_3_17 FORCE ; If the corruption was caused by the bug that was fixed in MDEV-19916 , then it should only be necessary to do this for tables whose clustered index root page claims that the table is in the 'no instant ADD was done' format. If the root page carries the 'instant ADD' identifier, or if you know that no instant ADD COLUMN was performed, then there is no use to execute that step. You should be able to identify the format of tables with the following: for i in */*.ibd do echo -n "$i:" od -An -t x1 -j 0xc018 -N 2 "$i" done The 'old format' type code is 45 bf and the 'instant ADD' identifier is 00 12 . If you use something else than the default innodb_page_size=16k , then you will have to adjust the offset accordingly, e.g., 0x6018 for innodb_page_size=8k (8192 * 3 + 24 = 0x2000 * 3 + 0x18). ( Edit: originally I forgot to multiply by the clustered index root page number 3. ) But, beware: The server could crash while converting the table. The columns to which data was instantly added could contain garbage values. The NULL flag bitmaps and column length information could end up being interpreted wrong, so also 'old' columns could appear corrupted. If the table contained only NOT NULL and fixed-length columns before the instant ADD , this form of corruption should not be possible.
            marko Marko Mäkelä made changes -
            Affects Version/s 10.3.7 [ 23005 ]

            I removed 10.3.17 from the affected versions, because according to onock the corruption could have been introduced before the upgrade.

            marko Marko Mäkelä added a comment - I removed 10.3.17 from the affected versions, because according to onock the corruption could have been introduced before the upgrade.
            marko Marko Mäkelä made changes -
            Affects Version/s 10.3.7 [ 23005 ]
            Affects Version/s 10.3.17 [ 23411 ]

            elenst reasons that this bug is about the assertion failure, which is reporting corruption of the table, which in the reported case was likely introduced in an older 10.3 or 10.4 version of MariaDB Server before the MDEV-19916 fix (10.3.17, 10.4.7).

            This assertion could also fail due to something else causing the corruption.

            By this reasoning, I will not close this bug as a duplicate of MDEV-19916, but I will lower the priority.

            Instead of reporting the corruption by an assertion failure, we should ideally return an error to the client and abort the operation. It might not be wise to initiate rollback of the current transaction, because that might end up corrupting the known-corrupted index even further. This is conceptually related to
            MDEV-13542 Crashing on a corrupted page is unhelpful.

            marko Marko Mäkelä added a comment - elenst reasons that this bug is about the assertion failure, which is reporting corruption of the table, which in the reported case was likely introduced in an older 10.3 or 10.4 version of MariaDB Server before the MDEV-19916 fix (10.3.17, 10.4.7). This assertion could also fail due to something else causing the corruption. By this reasoning, I will not close this bug as a duplicate of MDEV-19916 , but I will lower the priority. Instead of reporting the corruption by an assertion failure, we should ideally return an error to the client and abort the operation. It might not be wise to initiate rollback of the current transaction, because that might end up corrupting the known-corrupted index even further. This is conceptually related to MDEV-13542 Crashing on a corrupted page is unhelpful.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 92391 ] MariaDB v4 [ 143576 ]
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            This was missed in MDEV-13542.

            marko Marko Mäkelä added a comment - This was missed in MDEV-13542 .
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            r.kubik Robert Kubik added a comment -

            Hello,
            we have similar issue as well

            2022-06-03 09:51:07 0x7f95d5e9c700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.6.4/storage/innobase/rem/rem0rec.cc line 877
            

            Can I assume, that it's the same issue and just wait for the fix?

            r.kubik Robert Kubik added a comment - Hello, we have similar issue as well 2022-06-03 09:51:07 0x7f95d5e9c700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.6.4/storage/innobase/rem/rem0rec.cc line 877 Can I assume, that it's the same issue and just wait for the fix?

            r.kubik, that assertion is in a different file. It is MDEV-21098.

            marko Marko Mäkelä added a comment - r.kubik , that assertion is in a different file. It is MDEV-21098 .

            In addition to the function btr_page_reorganize_low(), I fixed a few more functions so that they will return an error code instead of crashing the server.

            MDEV-21098, MDEV-22388, MDEV-28457 are the last currently known cases of crashing due to corruption.

            marko Marko Mäkelä added a comment - In addition to the function btr_page_reorganize_low() , I fixed a few more functions so that they will return an error code instead of crashing the server. MDEV-21098 , MDEV-22388 , MDEV-28457 are the last currently known cases of crashing due to corruption.
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2022-06-08 07:08:00.0 2022-06-08 07:08:00.482
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.9 [ 27507 ]
            Fix Version/s 10.7.5 [ 27505 ]
            Fix Version/s 10.8.4 [ 27503 ]
            Fix Version/s 10.9.2 [ 27115 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 109468 110645

            People

              marko Marko Mäkelä
              onock Christian Kueppers
              Votes:
              1 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.