Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24412

Mariadb 10.5: InnoDB: Upgrade after a crash is not supported

Details

    Description

      After upgrading MariaDB from 10.4.17 to 10.5.8, it did not start, it just crashed.
      Packages for 10.4.17 are erased and 10.5.8 are installed.

      2020-12-15 14:08:31 0 [Warning] You need to use --log-bin to make --binlog-format work.
      2020-12-15 14:08:31 0 [Note] InnoDB: innodb_page_size=65536
      2020-12-15 14:08:31 0 [Note] InnoDB: Using Linux native AIO
      2020-12-15 14:08:31 0 [Note] InnoDB: Uses event mutexes
      2020-12-15 14:08:31 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2020-12-15 14:08:31 0 [Note] InnoDB: Number of pools: 1
      2020-12-15 14:08:31 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2020-12-15 14:08:31 0 [Note] InnoDB: Disabling background log and ibuf IO write threads.
      2020-12-15 14:08:31 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
      2020-12-15 14:08:31 0 [Note] InnoDB: Completed initialization of buffer pool
      2020-12-15 14:08:31 0 [Note] InnoDB: innodb_force_recovery=6 skips redo log apply
      2020-12-15 14:08:31 0 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=0, page number=47827], should be [page id: space=0, page number=47795]
      201215 14:08:31 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.5.8-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=0
      max_threads=153
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467860 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      ??:0(my_print_stacktrace)[0x563637dda07e]
      ??:0(handle_fatal_signal)[0x56363786f5e5]
      sigaction.c:0(__restore_rt)[0x7eff5a279b20]
      ??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x563637b9ea61]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637c896dc]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637c96c8a]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637c96d04]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637cdb2b3]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637cdc39b]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637cbea63]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x56363754b451]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x563637b2d3b1]
      ??:0(ha_initialize_handlerton(st_plugin_int*))[0x563637872764]
      ??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain*, char const*, st_plugin_int*, st_mysql_sys_var*))[0x56363768c903]
      ??:0(plugin_init(int*, char**, int))[0x56363768dcf3]
      ??:0(unireg_abort)[0x5636375b61ed]
      ??:0(mysqld_main(int, char**))[0x5636375bbd85]
      ??:0(__libc_start_main)[0x7eff580a07b3]
      ??:0(_start)[0x5636375b087e]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit Soft Limit Hard Limit Units
      Max cpu time unlimited unlimited seconds
      Max file size unlimited unlimited bytes
      Max data size unlimited unlimited bytes
      Max stack size 8388608 unlimited bytes
      Max core file size unlimited unlimited bytes
      Max resident set unlimited unlimited bytes
      Max processes 126038 126038 processes
      Max open files 16384 16384 files
      Max locked memory 16777216 16777216 bytes
      Max address space unlimited unlimited bytes
      Max file locks unlimited unlimited locks
      Max pending signals 126038 126038 signals
      Max msgqueue size 819200 819200 bytes
      Max nice priority 0 0
      Max realtime priority 0 0
      Max realtime timeout unlimited unlimited us
      Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

      CONFIG

      [mysqld]
      datadir=/var/lib/mysql
      socket=/var/lib/mysql/mysql.sock
      log-error=/var/log/mariadb/mariadb.log
      pid-file=/var/tmp/mariadb.pid
      #tmpdir=/var/tmp
      log_warnings=1
      slow_query_log = 1
      slow_query_log_file = /var/log/mariadb/slowquery.log
      log_slow_verbosity = query_plan,explain

      bind-address = 0.0.0.0
      skip-name-resolve
      skip-external-locking
      performance_schema=off
      userstat = 1

      transaction-isolation = READ-COMMITTED

      binlog_format=ROW
      innodb_autoinc_lock_mode=2

      innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend
      innodb_data_home_dir = /var/lib/mysql
      innodb_doublewrite=0
      innodb_file_per_table=1

      sql_mode=NO_ENGINE_SUBSTITUTION
      innodb-default-row-format=DYNAMIC

      innodb_strict_mode=off

      innodb_flush_neighbors=0
      innodb_page_size=64K
      innodb_log_buffer_size = 2G
      innodb_log_file_size=1G
      innodb_open_files=16384

      Attachments

        1. 10.4.15.txt
          6 kB
        2. 10.4.5-10.5.6-wsrep.txt
          41 kB
        3. 10.4.5-10.5.6-wsrep.txt
          41 kB
        4. 10.5.6.txt
          3 kB
        5. 10.5.6 crash log.txt
          16 kB
        6. 10.5.8 crash.txt
          8 kB
        7. ib_logfile0_000-7FF.bin
          2 kB

        Issue Links

          Activity

            @Xesh See https://jira.mariadb.org/browse/MDEV-24578. Does your case seem similar? Do you get any results from running `journalctl -u mariadb | grep "InnoDB: Assertion failure"`?

            otto Otto Kekäläinen added a comment - @Xesh See https://jira.mariadb.org/browse/MDEV-24578 . Does your case seem similar? Do you get any results from running `journalctl -u mariadb | grep "InnoDB: Assertion failure"`?

            With ib_logfile0_000-7FF.bin I can reproduce this:

            cp ib_logfile0_000-7FF.bin /dev/shm/ib_logfile0
            truncate -s 5368709120 /dev/shm/ib_logfile0
            truncate -s 5368709120 /dev/shm/ib_logfile1
            sql/mariadbd --datadir=$(pwd)/mysql-test/var/install.db/ --innodb-log-group-home-dir /dev/shm
            

            10.5 165564d3c33ae3d677d70644a83afcb744bdbf65

            2022-11-25 9:23:08 0 [ERROR] InnoDB: Invalid log block checksum. block: 0 checkpoint no: 0 expected: 3965168067 found: 0
            2022-11-25 9:23:08 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.17, and it appears corrupted.
            

            This attempted to parse the 512 bytes read from ib_logfile0 at byte offset 1,734,884,864.

            10.4 0d586d62e5326053383eb2774b32ff20bbe2f4d5

            2022-11-25 9:35:22 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 423245228518 between the checkpoint 423245228518 and the end 423245228032.
            

            This attempted to parse the 512 bytes read from ib_logfile1 at byte offset 6,029,852,160-5,368,709,120=661,143,040.

            I checked the file offsets by debugging tools. For a complete test, we must write a fake 512-byte empty log block at the correct offset of the file ib_logfile1.

            marko Marko Mäkelä added a comment - With ib_logfile0_000-7FF.bin I can reproduce this: cp ib_logfile0_000-7FF.bin /dev/shm/ib_logfile0 truncate -s 5368709120 /dev/shm/ib_logfile0 truncate -s 5368709120 /dev/shm/ib_logfile1 sql/mariadbd --datadir=$(pwd)/mysql-test/var/install.db/ --innodb-log-group-home-dir /dev/shm 10.5 165564d3c33ae3d677d70644a83afcb744bdbf65 2022-11-25 9:23:08 0 [ERROR] InnoDB: Invalid log block checksum. block: 0 checkpoint no: 0 expected: 3965168067 found: 0 2022-11-25 9:23:08 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.17, and it appears corrupted. This attempted to parse the 512 bytes read from ib_logfile0 at byte offset 1,734,884,864. 10.4 0d586d62e5326053383eb2774b32ff20bbe2f4d5 2022-11-25 9:35:22 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 423245228518 between the checkpoint 423245228518 and the end 423245228032. This attempted to parse the 512 bytes read from ib_logfile1 at byte offset 6,029,852,160-5,368,709,120=661,143,040. I checked the file offsets by debugging tools. For a complete test, we must write a fake 512-byte empty log block at the correct offset of the file ib_logfile1 .

            The 2 checkpoint blocks in ib_logfile0_000-7FF.bin contain the checkpoint LSNs at byte offsets 0x208 and at 0x608: 0x628b61a9dd and 0x628b61a9e6. These are read correctly. The source_offset=6029852646 (0x167683fe6) in recv_log_recover_10_4() just needs to be rounded down to the start of the 512-byte block. It turns out that the code that is supposed to do the rounding was also trimming the offset to 32 bits (passing 0x67683e00 instead of 0x167683e00):

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 8db270b2a9b..511ccc5afe6 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -1412,7 +1412,7 @@ static dberr_t recv_log_recover_10_4()
             		return DB_CORRUPTION;
             	}
             
            -	recv_sys.read(source_offset & ~(OS_FILE_LOG_BLOCK_SIZE - 1),
            +	recv_sys.read(source_offset & ~lsn_t(OS_FILE_LOG_BLOCK_SIZE - 1),
             		      {buf, OS_FILE_LOG_BLOCK_SIZE});
             
             	ulint crc = log_block_calc_checksum_crc32(buf);
            

            I believe that this affects all upgrades where innodb_log_files_in_group×innodb_log_file_size exceeds 4 gigabytes and the current log position is located more than 4 gigabytes from the start of ib_logfile0. Multiple log files (before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5) were treated as if all files had been catenated together.

            marko Marko Mäkelä added a comment - The 2 checkpoint blocks in ib_logfile0_000-7FF.bin contain the checkpoint LSNs at byte offsets 0x208 and at 0x608: 0x628b61a9dd and 0x628b61a9e6. These are read correctly. The source_offset=6029852646 (0x167683fe6) in recv_log_recover_10_4() just needs to be rounded down to the start of the 512-byte block. It turns out that the code that is supposed to do the rounding was also trimming the offset to 32 bits (passing 0x67683e00 instead of 0x167683e00): diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 8db270b2a9b..511ccc5afe6 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1412,7 +1412,7 @@ static dberr_t recv_log_recover_10_4() return DB_CORRUPTION; } - recv_sys.read(source_offset & ~(OS_FILE_LOG_BLOCK_SIZE - 1), + recv_sys.read(source_offset & ~lsn_t(OS_FILE_LOG_BLOCK_SIZE - 1), {buf, OS_FILE_LOG_BLOCK_SIZE}); ulint crc = log_block_calc_checksum_crc32(buf); I believe that this affects all upgrades where innodb_log_files_in_group × innodb_log_file_size exceeds 4 gigabytes and the current log position is located more than 4 gigabytes from the start of ib_logfile0 . Multiple log files (before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5) were treated as if all files had been catenated together.

            In MariaDB 10.8, the upgrade check was refactored as part of MDEV-14425. We are reading the file from the correct offset even without this change (which I added as part of the merge of the fix for clarity):

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 00fb22e1625..1bd24ac8bc9 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -1649,7 +1649,7 @@ static dberr_t recv_log_recover_10_5(lsn_t lsn_offset)
               if (lsn_offset < (log_sys.is_pmem() ? log_sys.file_size : 4096))
                 memcpy_aligned<512>(buf, &log_sys.buf[lsn_offset & ~511], 512);
               else
            -    recv_sys.read(lsn_offset & ~511, {buf, 512});
            +    recv_sys.read(lsn_offset & ~lsn_t{511}, {buf, 512});
             
               if (!recv_check_log_block(buf))
               {
            

            marko Marko Mäkelä added a comment - In MariaDB 10.8, the upgrade check was refactored as part of MDEV-14425 . We are reading the file from the correct offset even without this change (which I added as part of the merge of the fix for clarity): diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 00fb22e1625..1bd24ac8bc9 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1649,7 +1649,7 @@ static dberr_t recv_log_recover_10_5(lsn_t lsn_offset) if (lsn_offset < (log_sys.is_pmem() ? log_sys.file_size : 4096)) memcpy_aligned<512>(buf, &log_sys.buf[lsn_offset & ~511], 512); else - recv_sys.read(lsn_offset & ~511, {buf, 512}); + recv_sys.read(lsn_offset & ~lsn_t{511}, {buf, 512}); if (!recv_check_log_block(buf)) {

            We just hit this bug and that caused our DB to fail upgrading to 10.6 when we tried to go to 10.6.11. Upgrading to 10.6.12 worked and we were able to reproduce this very easily by, like Marko mentioned, having innodb_log_files_in_group×innodb_log_file_size exceed 4G and have the log position be over 4G past the start of ib_logfile0. Shouldn't there be a warning on 10.6 versions' changelog page prior to 10.6.12 about this issue? It was quite jarring for us and the issue halted our upgrade process.

            delisson Delisson Silva added a comment - We just hit this bug and that caused our DB to fail upgrading to 10.6 when we tried to go to 10.6.11. Upgrading to 10.6.12 worked and we were able to reproduce this very easily by, like Marko mentioned, having innodb_log_files_in_group×innodb_log_file_size exceed 4G and have the log position be over 4G past the start of ib_logfile0. Shouldn't there be a warning on 10.6 versions' changelog page prior to 10.6.12 about this issue? It was quite jarring for us and the issue halted our upgrade process.

            People

              marko Marko Mäkelä
              shexphobos Xesh
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.