Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28695

InnoDB: Database page corruption on disk or a failed read => mysqld got signal 11 - std::unique_lock<std::mutex>::unlock()

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.5.15
    • N/A
    • None

    Description

      I'm essentially facing two problems.
      1)

      every few days I'm getting... (db/table/col names manually obfuscated below)

      2022-05-29  5:54:28 226 [ERROR] InnoDB: Database page corruption on disk or a failed read of file './xxxxdb/bad_table.ibd' page [page id: space=386040, page number=1446430]. You may have to recover from a backup.
      2022-05-29  5:54:28 226 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
       len 16384; hex ...
      InnoDB: End of page dump
      2022-05-29  5:54:28 226 [Note] InnoDB: Uncompressed page, stored checksum in field1 642196308, calculated checksums for field1: crc32 3586801619, innodb 1305855102,  page type 10 == BLOB.none 3735928559, stored checksum in field2 642196308, calculated checksums for field2: crc32 3586801619, innodb 1806045908, none 3735928559,  page LSN 3289 3616289816, low 4 bytes of LSN at page end 3616289816, page number (if stored to page already) 1446430, space id (if create with >= MySQL-4.1.1 and stored already) 386040
      InnoDB: Page may be a BLOB page
      2022-05-29  5:54:28 226 [Note] InnoDB:  You can use CHECK TABLE to scan your table for corruption. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      

      I don't know if this is i) a bug, ii) failing memory or iii) failing disk. It happens with a variety of tables and happens for about 5 seconds, logging this about 20 times per second, before...

      Sometimes (I think the pattern probably has more to do with my application querying MariaDB, then MariaDB itself?) it then goes...

      2022-05-27  0:03:55 353 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mysqld server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      2022-05-27  0:03:55 353 [ERROR] mariadbd: Index for table 'bad_table' is corrupt; try to repair it
      

      but others times it then hits the second, more serious, problem...

      2)

      220529  5:54:33 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail.
       
      Server version: 10.5.15-MariaDB-0+deb11u1-log
      key_buffer_size=402653184
      read_buffer_size=2097152
      max_used_connections=2
      max_threads=153
      thread_count=2
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1963808 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f9a5c000c58
      Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong...
      stack_bottom = 0x7f9bdc054d78 thread_stack 0x30000
      ??:0(my_print_stacktrace)[0x55d6fdc5154e]
      ??:0(handle_fatal_signal)[0x55d6fd750f65]
      sigaction.c:0(__restore_rt)[0x7f9be0732140]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdb455b4]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdb51a96]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdaed4ea]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdaed90c]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d6fdaf469d]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55d6fda37b42]
      ??:0(handler::ha_index_read_map(unsigned char*, unsigned char const*, unsigned long, ha_rkey_function))[0x55d6fd756bd8]
      ??:0(cp_buffer_from_ref(THD*, TABLE*, st_table_ref*))[0x55d6fd5a8254]
      ??:0(sub_select(JOIN*, st_join_table*, bool))[0x55d6fd59480e]
      ??:0(Item_bool_func2::remove_eq_conds(THD*, Item::cond_result*, bool))[0x55d6fd580fec]
      ??:0(sub_select(JOIN*, st_join_table*, bool))[0x55d6fd5948a3]
      ??:0(JOIN::exec_inner())[0x55d6fd5bee38]
      ??:0(JOIN::exec())[0x55d6fd5bf295]
      ??:0(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55d6fd5bd116]
      ??:0(mysql_multi_update(THD*, TABLE_LIST*, List<Item>*, List<Item>*, Item*, unsigned long long, enum_duplicates, bool, st_select_lex_unit*, st_select_lex*, multi_update**))[0x55d6fd6109ed]
      ??:0(mysql_execute_command(THD*))[0x55d6fd55b0b4]
      ??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d6fd55c5db]
      ??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d6fd55ea5d]
      ??:0(do_command(THD*))[0x55d6fd5602de]
      ??:0(do_handle_one_connection(CONNECT*, bool))[0x55d6fd651fb2]
      ??:0(handle_one_connection)[0x55d6fd65222d]
      ??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55d6fd98e11b]
      nptl/pthread_create.c:478(start_thread)[0x7f9be0726ea7]
      x86_64/clone.S:97(__GI___clone)[0x7f9be033ddef]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f9a5c010470): update bad_table p, table2 n set p.n_col=n.n_col where p.colID=n.colID and n.col1=p.col1 and p.col='xx'
       
      Connection ID (thread ID): 226
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /mnt/disk3/mysqldata
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        0                    unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             62978                62978                processes
      Max open files            32768                32768                files
      Max locked memory         65536                65536                bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       62978                62978                signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: core
       
      2022-05-29  5:54:38 0 [Note] Using unique option prefix 'myisam-recover' is error-prone and can break in the future. Please use the full name 'myisam-recover-options' instead.
      2022-05-29  5:54:38 0 [Note] CONNECT: Version 1.07.0002 March 22, 2021
      2022-05-29  5:54:38 0 [Warning] The parameter innodb_file_format is deprecated and has no effect. It may be removed in future releases. See https://mariadb.com/kb/en/library/xtradbinnodb-file-format/
      2022-05-29  5:54:38 0 [Note] InnoDB: !!! innodb_force_recovery is set to 1 !!!
      2022-05-29  5:54:38 0 [Note] InnoDB: Uses event mutexes
      2022-05-29  5:54:38 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2022-05-29  5:54:38 0 [Note] InnoDB: Number of pools: 1
      2022-05-29  5:54:38 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-05-29  5:54:39 0 [Note] InnoDB: Using Linux native AIO
      2022-05-29  5:54:39 0 [Note] InnoDB: Initializing buffer pool, total size = 4294967296, chunk size = 134217728
      2022-05-29  5:54:39 0 [Note] InnoDB: Completed initialization of buffer pool
      2022-05-29  5:54:39 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14858023269921,14858023269921
      2022-05-29  5:54:40 0 [Note] InnoDB: Starting final batch to recover 17480 pages from redo log.
      2022-05-29  5:54:41 0 [Note] InnoDB: 128 rollback segments are active.
      2022-05-29  5:54:41 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2022-05-29  5:54:41 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2022-05-29  5:54:41 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2022-05-29  5:54:41 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2022-05-29  5:54:41 0 [Note] InnoDB: 10.5.15 started; log sequence number 14858195549968; transaction id 5082853300
      2022-05-29  5:54:41 0 [Note] InnoDB: Loading buffer pool(s) from /mnt/disk3/mysqldata/ib_buffer_pool
      2022-05-29  5:54:41 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-05-29  5:54:41 0 [ERROR] mariadbd: Plugin 'CONNECT' already installed
      2022-05-29  5:54:41 0 [Note] Server socket created on IP: '0.0.0.0'.
      2022-05-29  5:54:41 0 [Note] Reading of all Master_info entries succeeded
      2022-05-29  5:54:41 0 [Note] Added new Master_info '' to hash table
      2022-05-29  5:54:41 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.5.15-MariaDB-0+deb11u1-log'  socket: '/run/mysqld/mysqld.sock'  port: 3306  Debian 11
      2022-05-29  5:54:41 5 [Warning] ./sqlite3/xxxx.frm is inconsistent: engine typecode 44, engine name CONNECT (46)
      2022-05-29  5:54:41 0 [Note] InnoDB: Buffer pool(s) load completed at 220529  5:54:41
      

      once MariaDB is running again, my application starts querying again and the process repeats. I usually stop my application, mysqlcheck/repair/restore the table before re starting it again.

      The segfault is caused by a variety of different queries accessing the corrupt table(s) in different ways, but each time it's the std::unique_lock<std::mutex>::unlock() that seems to be responsible for the segfault. It's particularly sad when the resulting CHECK TABLE/OPTIMIZE TABLE cause the segfault too :-/ e.g.

      Thread pointer: 0x7fa500003648
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fa678090d78 thread_stack 0x30000
      ??:0(my_print_stacktrace)[0x55d8e417954e]
      ??:0(handle_fatal_signal)[0x55d8e3c78f65]
      sigaction.c:0(__restore_rt)[0x7fa67c481140]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d8e3ffba71]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x55d8e3ffd13a]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x55d8e3f77f15]
      ??:0(mysql_alter_table(THD*, st_mysql_const_lex_string const*, st_mysql_const_lex_string const*, HA_CREATE_INFO*, TABLE_LIST*, Alter_info*, unsigned int, st_order*, bool, bool))[0x55d8e3b247cb]
      ??:0(mysql_recreate_table(THD*, TABLE_LIST*, bool))[0x55d8e3b25017]
      ??:0(MDL_ticket::~MDL_ticket())[0x55d8e3b84f3d]
      ??:0(MDL_ticket::~MDL_ticket())[0x55d8e3b86dcc]
      ??:0(Sql_cmd_optimize_table::execute(THD*))[0x55d8e3b87f0d]
      ??:0(mysql_execute_command(THD*))[0x55d8e3a80356]
      ??:0(mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool))[0x55d8e3a845db]
      ??:0(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool))[0x55d8e3a86a5d]
      ??:0(do_command(THD*))[0x55d8e3a882de]
      ??:0(do_handle_one_connection(CONNECT*, bool))[0x55d8e3b79fb2]
      ??:0(handle_one_connection)[0x55d8e3b7a22d]
      ??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55d8e3eb611b]
      nptl/pthread_create.c:478(start_thread)[0x7fa67c475ea7]
      x86_64/clone.S:97(__GI___clone)[0x7fa67c08cdef]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fa500011cd0): OPTIMIZE TABLE `bad_table`
       
      Connection ID (thread ID): 212
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
      

      I don't suppose there's much that can easily be done to track down the first problem, but it would be great if the segfault could be fixed, so at least MariaDB stays up and carries on handling queries for the non-corrupt tables.

      Attachments

        Activity

          People

            marko Marko Mäkelä
            adennisa15 A D
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.