Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29153

2022-07-22 18:12:56 0x7f3319fff700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/page/page0cur.cc line 11

Details

    Description

      After server soft reboot, MySQL Server 10.4.24 cant start
      I tried to start with all innodb_force_recovery modes from 1 to 6, one by one. But nothing recovered. In mode 6 mysqld hangs, and mysql -p hangs after password.
      Please help to start mysql with some data losses, because data isnt very critical in a part, but critical in overall. And I have no backups because database is very large 1,4Tb

      22-07-22 18:12:56 0 [Note] InnoDB: Using Linux native AIO
      22-07-22 18:12:56 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      22-07-22 18:12:56 0 [Note] InnoDB: Uses event mutexes
      22-07-22 18:12:56 0 [Note] InnoDB: Compressed tables use zlib 1.2.8
      22-07-22 18:12:56 0 [Note] InnoDB: Number of pools: 1
      22-07-22 18:12:56 0 [Note] InnoDB: Using SSE2 crc32 instructions
      22-07-22 18:12:56 0 [Note] InnoDB: Initializing buffer pool, total size = 512M, instances = 1, chunk size = 128M
      22-07-22 18:12:56 0 [Note] InnoDB: Completed initialization of buffer pool
      22-07-22 18:12:56 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority()
      22-07-22 18:12:56 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=8117163271706
      22-07-22 18:12:56 0 [Note] InnoDB: Starting final batch to recover 11069 pages from redo log.
      22-07-22 18:12:56 0x7f3319fff700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.4.25/storage/innobase/page/page0cur.cc line 1154
      noDB: We intentionally generate a memory trap.
      noDB: Submit a detailed bug report to https://jira.mariadb.org/
      noDB: If you get repeated assertion failures or crashes, even
      noDB: immediately after the mysqld startup, there may be
      noDB: corruption in the InnoDB tablespace. Please refer to
      noDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      noDB: about forcing recovery.
      0722 18:12:56 [ERROR] mysqld got signal 6 ;
      is could be because you hit a bug. It is also possible that this binary
      one of the libraries it was linked against is corrupt, improperly built,
      misconfigured. This error can also be caused by malfunctioning hardware.

      report this bug, see https://mariadb.com/kb/en/reporting-bugs

      will try our best to scrape up some info that will hopefully help
      agnose the problem, but since we have already crashed,
      mething is definitely wrong and this may fail.

      rver version: 10.4.25-MariaDB-1:10.4.25+maria~stretch
      y_buffer_size=536870912
      ad_buffer_size=131072
      x_used_connections=0
      x_threads=122
      read_count=0
      is possible that mysqld could use up to
      y_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6789305 K bytes of memory
      pe that's ok; if not, decrease some variables in the equation.

      read pointer: 0x0
      tempting backtrace. You can use the following information to find out
      ere mysqld died. If you see no messages after this, something went
      rribly wrong...
      ack_bottom = 0x0 thread_stack 0x80000
      dr2line: 'mysqld': Нет такого файла
      sqld(my_print_stacktrace+0x2e)[0x56082302ec5e]
      sqld(handle_fatal_signal+0x3af)[0x560822a731af]
      ib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7f3380eef0e0]
      ib/x86_64-linux-gnu/libc.so.6(gsignal+0xcf)[0x7f33802e7fff]
      ib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f33802e942a]
      dr2line: 'mysqld': Нет такого файла
      sqld(+0x59541d)[0x56082278f41d]
      sqld(+0xbd1fa3)[0x560822dcbfa3]
      sqld(+0xbaf48c)[0x560822da948c]
      sqld(+0xbb044f)[0x560822daa44f]
      sqld(+0x591220)[0x56082278b220]
      sqld(+0xce2848)[0x560822edc848]
      sqld(+0xd4ccc1)[0x560822f46cc1]
      sqld(+0xc5e158)[0x560822e58158]
      ib/x86_64-linux-gnu/libpthread.so.0(+0x74a4)[0x7f3380ee54a4]
      ib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f338039dd0f]
      e manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      formation that should help you find out what is causing the crash.
      iting a core file...
      rking directory at /var/lib/mysql
      source Limits:
      tal signal 11 while backtracing

      Attachments

        Issue Links

          Activity

            Kirill Gromov, thank you for the full stack trace.

            MariaDB Server 10.6 replaced InnoDB’s earlier homebrew synchronization primitives with plain mutex, condition variable, and a homebrew shared/update/exclusive lock (MDEV-24142).

            The sync_array is something that I removed in MDEV-21452. InnoDB used to have some extra bookkeeping of all mutex or rw-lock waits, so that it could implement a watchdog against hangs (the infamous long semaphore wait messages). After MDEV-21452, there only is a watchdog for one specific mutex or rw-lock, dict_sys.mutex which was later merged to dict_sys.latch.

            The crash could theoretically occur due to some bug in the sync_array. I am not keen to debug that, especially since we know that the database is corrupted in the first place.

            The stack trace also tells me that this crash occurs while constructing an earlier version of a record. With the following commands, you should know the name of the table, which I suppose you may already know:

            frame 13
            print prebuilt.table.name
            print prebuilt.trx.mysql_thd.query_string
            

            I previously suggested that the DB_TRX_ID (identifying the last transaction that modified the record) and DB_ROLL_PTR (pointing to an undo log record of the transaction identified by DB_TRX_ID if it was not committed and purged yet) in the index page may be inconsistent with the undo log page, because one of those pages is missing some updates, for example, if crash recovery was unable to apply redo log because of some corruption of the log.

            Your best bet to extract some data from the database (or this table) would be to execute the following before trying to SELECT any data:

            SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
            

            If that fails, you could try to start up the database with innodb_force_recovery=5 so that all undo logs will be ignored. The end result is the same: all transactions will be executed at the READ UNCOMMITTED isolation level.

            Once you have rescued all data (and verified its consistency), you should discard the database and create a new one from the SQL dump.

            marko Marko Mäkelä added a comment - Kirill Gromov , thank you for the full stack trace. MariaDB Server 10.6 replaced InnoDB’s earlier homebrew synchronization primitives with plain mutex, condition variable, and a homebrew shared/update/exclusive lock ( MDEV-24142 ). The sync_array is something that I removed in MDEV-21452 . InnoDB used to have some extra bookkeeping of all mutex or rw-lock waits, so that it could implement a watchdog against hangs (the infamous long semaphore wait messages). After MDEV-21452 , there only is a watchdog for one specific mutex or rw-lock, dict_sys.mutex which was later merged to dict_sys.latch . The crash could theoretically occur due to some bug in the sync_array . I am not keen to debug that, especially since we know that the database is corrupted in the first place. The stack trace also tells me that this crash occurs while constructing an earlier version of a record. With the following commands, you should know the name of the table, which I suppose you may already know: frame 13 print prebuilt.table.name print prebuilt.trx.mysql_thd.query_string I previously suggested that the DB_TRX_ID (identifying the last transaction that modified the record) and DB_ROLL_PTR (pointing to an undo log record of the transaction identified by DB_TRX_ID if it was not committed and purged yet) in the index page may be inconsistent with the undo log page, because one of those pages is missing some updates, for example, if crash recovery was unable to apply redo log because of some corruption of the log. Your best bet to extract some data from the database (or this table) would be to execute the following before trying to SELECT any data: SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED ; If that fails, you could try to start up the database with innodb_force_recovery=5 so that all undo logs will be ignored. The end result is the same: all transactions will be executed at the READ UNCOMMITTED isolation level. Once you have rescued all data (and verified its consistency), you should discard the database and create a new one from the SQL dump.
            Kirill Gromov Kirill Gromov added a comment -

            @marko Marko Mäkelä, thank you for your time.

            I agree ofcourse that debugging an old version is wasting of time.

            So I tried all of you wrote to do. Here the results:

            (gdb) frame 13
            #13 row_search_mvcc (buf=buf@entry=0x7f546459fef0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f54645a2df8, match_mode=match_mode@entry=0, direction=direction@entry=1) at ./storage/innobase/row/row0sel.cc:5265
            5265    ./storage/innobase/row/row0sel.cc: Нет такого файла или каталога.
            (gdb) print prebuilt.table.name
            $1 = {m_name = 0x7f546459ab50 "mydb/host", static part_suffix = "#P#"}
            (gdb) print prebuilt.trx.mysql_thd.query_string
            $2 = {string = {str = 0x7f5464011ef0 "SELECT /*!40001 SQL_NO_CACHE */ `id`, `ctx`, `hostname`, `fqdns`, `asset`, `threshold_c`, `threshold_a`, `alert`, `persistence`, `nat`, `rrd_profile`, `descr`, `lat`, `lon`, `icon`, `country`, `extern"..., length = 273}, cs = 0x557f894fb220 <my_charset_utf8mb4_general_ci>}
            

            And I can't do SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; because mysqld didn't start in any mode except in innodb_force_recovery=6

            So I have a last question

            Why mysqldump crashes on mydb.host with simple query (look at $2 in text before) and if I do a simple query from mysql

            SELECT * FROM mydb.host
            

            all works without crash and return all of 218 rows of a table?

            Maybe something wrong with used CharSet?

            Anyway, please tell me how can I dump my tables without using of mysqldump or maybe I can run it with some magic params to avoid crashing.

            Kirill Gromov Kirill Gromov added a comment - @marko Marko Mäkelä, thank you for your time. I agree ofcourse that debugging an old version is wasting of time. So I tried all of you wrote to do. Here the results: (gdb) frame 13 #13 row_search_mvcc (buf=buf@entry=0x7f546459fef0 "\b\376\223\361)\253\235D\277ldQ{[J\022\070\222\344uw%\"\276\021\354\242\a", mode=PAGE_CUR_G, mode@entry=PAGE_CUR_UNSUPP, prebuilt=0x7f54645a2df8, match_mode=match_mode@entry=0, direction=direction@entry=1) at ./storage/innobase/row/row0sel.cc:5265 5265 ./storage/innobase/row/row0sel.cc: Нет такого файла или каталога. (gdb) print prebuilt.table.name $1 = {m_name = 0x7f546459ab50 "mydb/host", static part_suffix = "#P#"} (gdb) print prebuilt.trx.mysql_thd.query_string $2 = {string = {str = 0x7f5464011ef0 "SELECT /*!40001 SQL_NO_CACHE */ `id`, `ctx`, `hostname`, `fqdns`, `asset`, `threshold_c`, `threshold_a`, `alert`, `persistence`, `nat`, `rrd_profile`, `descr`, `lat`, `lon`, `icon`, `country`, `extern"..., length = 273}, cs = 0x557f894fb220 <my_charset_utf8mb4_general_ci>} And I can't do SET GLOBAL TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; because mysqld didn't start in any mode except in innodb_force_recovery=6 So I have a last question Why mysqldump crashes on mydb.host with simple query (look at $2 in text before) and if I do a simple query from mysql SELECT * FROM mydb.host all works without crash and return all of 218 rows of a table? Maybe something wrong with used CharSet? Anyway, please tell me how can I dump my tables without using of mysqldump or maybe I can run it with some magic params to avoid crashing.

            Kirill Gromov, innodb_force_recovery=6 is otherwise the same as innodb_force_recovery=5, but it will additionally ignore the ib_logfile*, which means that whatever changes had not been written back to data pages (only to the log) will be ignored. You could have most pages corresponding to some logical time before the server had been killed, but some pages could be older or newer (and thus you could get broken next-page links or such).

            So, if the database starts up with innodb_force_recovery=5, I would suggest using that. Already that setting will hard-wire the isolation level to READ UNCOMMITTED.

            The GDB output does not show the entire query. It might be that the query plan involves using a secondary index. In InnoDB secondary index pages, if a record is delete-marked, or the PAGE_MAX_TRX_ID is ‘too new’ for the read view, then the corresponding record in the clustered index (primary key + data) will be looked up. The clustered index page contains only the latest version of the record, and the DB_ROLL_PTR forms a singly-linked list into undo pages. The secondary index page will contain all (secondary_key,primary_key) values for every not-yet-purged version of the primary_key record. So, the secondary index access may involve finding out whether the secondary index record actually belongs to this read view.

            The intended purpose of innodb_force_recovery is to allow the database to start up, so that some data can be rescued with mysqldump or similar tools.

            A query like SELECT * FROM mydb.host should always be executed as a full table scan: search the path from the clustered index root page to the leftmost leaf page, and then follow the FIL_PAGE_NEXT links until the end of the leaf page level. A similar query plan should also be used if you add LIMIT or OFFSET to the query.

            If SELECT * FROM mydb.host crashed after dumping some record (let me assume PRIMARY KEY(id) and id=12345), you could try to dump the rest by adding a condition like WHERE id>12345 or possibly a larger value. This could allow InnoDB to navigate to the next uncorrupted leaf page.

            Based on what you wrote, it could be that the secondary index MVCC ‘disobeys’ the innodb_force_recovery=5 setting and tries to access the undo logs even though we did not discover them during startup. This could be the reason of the crash. I tried the following test case on 10.6, but it did not crash that easily:

            --source include/have_innodb.inc
            create table t1(a int primary key, b int unique)engine=innodb;
            insert into t1 values(1,2);
            # prevent the purge of history, so that the secondary index will contain 2 record versions after the UPDATE below
            --let $restart_parameters=--innodb-force-recovery=2
            --source include/restart_mysqld.inc
            update t1 set b=3;
            --let $restart_parameters=--innodb-force-recovery=5
            --source include/restart_mysqld.inc
            select b from t1 force index(b);
            

            An InnoDB code fix against that crash would be to refuse to use any secondary index when innodb_force_recovery=5 or 6 has been specified and the undo logs have not been initialized. With such a fix, the force index(b) in my above test should be refused.

            marko Marko Mäkelä added a comment - Kirill Gromov , innodb_force_recovery=6 is otherwise the same as innodb_force_recovery=5 , but it will additionally ignore the ib_logfile* , which means that whatever changes had not been written back to data pages (only to the log) will be ignored. You could have most pages corresponding to some logical time before the server had been killed, but some pages could be older or newer (and thus you could get broken next-page links or such). So, if the database starts up with innodb_force_recovery=5 , I would suggest using that. Already that setting will hard-wire the isolation level to READ UNCOMMITTED . The GDB output does not show the entire query. It might be that the query plan involves using a secondary index. In InnoDB secondary index pages, if a record is delete-marked, or the PAGE_MAX_TRX_ID is ‘too new’ for the read view, then the corresponding record in the clustered index (primary key + data) will be looked up. The clustered index page contains only the latest version of the record, and the DB_ROLL_PTR forms a singly-linked list into undo pages. The secondary index page will contain all (secondary_key,primary_key) values for every not-yet-purged version of the primary_key record. So, the secondary index access may involve finding out whether the secondary index record actually belongs to this read view. The intended purpose of innodb_force_recovery is to allow the database to start up, so that some data can be rescued with mysqldump or similar tools. A query like SELECT * FROM mydb.host should always be executed as a full table scan: search the path from the clustered index root page to the leftmost leaf page, and then follow the FIL_PAGE_NEXT links until the end of the leaf page level. A similar query plan should also be used if you add LIMIT or OFFSET to the query. If SELECT * FROM mydb.host crashed after dumping some record (let me assume PRIMARY KEY(id) and id=12345 ), you could try to dump the rest by adding a condition like WHERE id>12345 or possibly a larger value. This could allow InnoDB to navigate to the next uncorrupted leaf page. Based on what you wrote, it could be that the secondary index MVCC ‘disobeys’ the innodb_force_recovery=5 setting and tries to access the undo logs even though we did not discover them during startup. This could be the reason of the crash. I tried the following test case on 10.6, but it did not crash that easily: --source include/have_innodb.inc create table t1(a int primary key , b int unique )engine=innodb; insert into t1 values (1,2); # prevent the purge of history, so that the secondary index will contain 2 record versions after the UPDATE below --let $restart_parameters=--innodb-force-recovery=2 --source include/restart_mysqld.inc update t1 set b=3; --let $restart_parameters=--innodb-force-recovery=5 --source include/restart_mysqld.inc select b from t1 force index (b); An InnoDB code fix against that crash would be to refuse to use any secondary index when innodb_force_recovery=5 or 6 has been specified and the undo logs have not been initialized. With such a fix, the force index(b) in my above test should be refused.
            Kirill Gromov Kirill Gromov added a comment -

            Sorry, @marko Marko but I'm a little confused by your answer.

            I wrote that mysqld didn`t start in mode 5 but only in mode 6
            And another thing that I can't understand is: mysqldump crashes on near row 152, but when I do SELECT - everything is OK, and I got all of 218 rows

            Now I started mysqldump with key --ignore-table=mydb.host and it still work, and dumping all my databases.

            It seems like my adventure going to the finish

            Also I have successfully done SELECT into OUTFILE FROM mydb.host and only thing that worry me is how to recreate the structure of table host in new database. Please point me to the page of documentation, where I can learn about it

            After that I thing this topic can be closed!

            Kirill Gromov Kirill Gromov added a comment - Sorry, @marko Marko but I'm a little confused by your answer. I wrote that mysqld didn`t start in mode 5 but only in mode 6 And another thing that I can't understand is: mysqldump crashes on near row 152, but when I do SELECT - everything is OK, and I got all of 218 rows Now I started mysqldump with key --ignore-table=mydb.host and it still work, and dumping all my databases. It seems like my adventure going to the finish Also I have successfully done SELECT into OUTFILE FROM mydb.host and only thing that worry me is how to recreate the structure of table host in new database. Please point me to the page of documentation, where I can learn about it After that I thing this topic can be closed!

            It should be possible to view the schema of a .frm file. One way would be to copy the file to the data directory of a working server and then execute SHOW CREATE TABLE for that name.

            The sync_array that seems to be the immediate reason of this crash was removed in MDEV-21452. MariaDB Server 10.6 or later should not crash on many forms of corruption, thanks to MDEV-13542. It would be interesting to know if a recent MariaDB 10.6 would crash on your corrupted data directory, but I understand that you may already have deleted and rebuilt it.

            marko Marko Mäkelä added a comment - It should be possible to view the schema of a .frm file. One way would be to copy the file to the data directory of a working server and then execute SHOW CREATE TABLE for that name. The sync_array that seems to be the immediate reason of this crash was removed in MDEV-21452 . MariaDB Server 10.6 or later should not crash on many forms of corruption, thanks to MDEV-13542 . It would be interesting to know if a recent MariaDB 10.6 would crash on your corrupted data directory, but I understand that you may already have deleted and rebuilt it.

            People

              marko Marko Mäkelä
              Kirill Gromov Kirill Gromov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.