Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29374

Frequent "Data structure corruption" in InnoDB after OOM

    XMLWordPrintable

    Details

      Description

      My company has around 200 CentOS 7 servers running MariaDB 10.6. Last week, after the 10.6.9 update was applied, we started seeing lots of InnoDB failures after an OOM had killed MariaDB.

      Our systems are kind of light on memory and do hit OOM's sometimes but it shouldn't cause a failure in InnoDB recovery. When 10.6.9 was updated, we started seeing multiple failures per day. I rolled back to 10.6.8 yesterday and have not seen any more issues. So I think something in 10.6.9 is causing a problem.

      Each time the issue happens, the error log looks like this.

      2022-08-21 10:41:34 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2022-08-21 10:41:34 0 [Note] InnoDB: Number of pools: 1
      2022-08-21 10:41:34 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-08-21 10:41:34 0 [Note] InnoDB: Using Linux native AIO
      2022-08-21 10:41:34 0 [Note] InnoDB: Initializing buffer pool, total size = 2550136832, chunk size = 134217728
      2022-08-21 10:41:34 0 [Note] InnoDB: Completed initialization of buffer pool
      2022-08-21 10:41:34 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1383270444633,1383271021489
      2022-08-21 10:41:36 0 [Note] InnoDB: Starting final batch to recover 15643 pages from redo log.
      2022-08-21 10:41:37 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
      2022-08-21 10:41:37 0 [Note] InnoDB: Starting shutdown...
      2022-08-21 10:41:38 0 [ERROR] Plugin 'InnoDB' init function returned error.
      2022-08-21 10:41:38 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2022-08-21 10:41:38 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-08-21 10:41:38 0 [ERROR] Unknown/unsupported storage engine: InnoDB
      2022-08-21 10:41:38 0 [ERROR] Aborting
      

      There isn't anything unusual about our config. Here is one of them.

      default-storage-engine=InnoDB
      innodb_file_per_table=1
      innodb_buffer_pool_size=2400M
      innodb_log_file_size=600M
      innodb_strict_mode = 0
      innodb_use_native_aio = 1
      innodb_write_io_threads = 8
      innodb_read_io_threads = 8
      query_cache_type=0
      open_files_limit=50000
      thread_cache_size=4
      join_buffer_size=1024K
      tmp_table_size=30M
      max_heap_table_size=30M
      sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
      table_definition_cache=14000
      

      Fortunately, it's very easy to recover from this but it does take manual intervention to do. i.e. innodb_force_recovery=1

      Any ideas what could be causing this new issue and what we can do to correct it?

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              wk_bradp Brad
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.