Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24412

Mariadb 10.5: InnoDB: Upgrade after a crash is not supported

Details

    Description

      After upgrading MariaDB from 10.4.17 to 10.5.8, it did not start, it just crashed.
      Packages for 10.4.17 are erased and 10.5.8 are installed.

      2020-12-15 14:08:31 0 [Warning] You need to use --log-bin to make --binlog-format work.
      2020-12-15 14:08:31 0 [Note] InnoDB: innodb_page_size=65536
      2020-12-15 14:08:31 0 [Note] InnoDB: Using Linux native AIO
      2020-12-15 14:08:31 0 [Note] InnoDB: Uses event mutexes
      2020-12-15 14:08:31 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2020-12-15 14:08:31 0 [Note] InnoDB: Number of pools: 1
      2020-12-15 14:08:31 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2020-12-15 14:08:31 0 [Note] InnoDB: Disabling background log and ibuf IO write threads.
      2020-12-15 14:08:31 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
      2020-12-15 14:08:31 0 [Note] InnoDB: Completed initialization of buffer pool
      2020-12-15 14:08:31 0 [Note] InnoDB: innodb_force_recovery=6 skips redo log apply
      2020-12-15 14:08:31 0 [ERROR] InnoDB: Space id and page no stored in the page, read in are [page id: space=0, page number=47827], should be [page id: space=0, page number=47795]
      201215 14:08:31 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.5.8-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=0
      max_threads=153
      thread_count=0
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467860 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      ??:0(my_print_stacktrace)[0x563637dda07e]
      ??:0(handle_fatal_signal)[0x56363786f5e5]
      sigaction.c:0(__restore_rt)[0x7eff5a279b20]
      ??:0(void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag))[0x563637b9ea61]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637c896dc]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637c96c8a]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637c96d04]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637cdb2b3]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637cdc39b]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x563637cbea63]
      ??:0(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x56363754b451]
      ??:0(wsrep_notify_status(wsrep::server_state::state, wsrep::view const*))[0x563637b2d3b1]
      ??:0(ha_initialize_handlerton(st_plugin_int*))[0x563637872764]
      ??:0(sys_var_pluginvar::sys_var_pluginvar(sys_var_chain*, char const*, st_plugin_int*, st_mysql_sys_var*))[0x56363768c903]
      ??:0(plugin_init(int*, char**, int))[0x56363768dcf3]
      ??:0(unireg_abort)[0x5636375b61ed]
      ??:0(mysqld_main(int, char**))[0x5636375bbd85]
      ??:0(__libc_start_main)[0x7eff580a07b3]
      ??:0(_start)[0x5636375b087e]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit Soft Limit Hard Limit Units
      Max cpu time unlimited unlimited seconds
      Max file size unlimited unlimited bytes
      Max data size unlimited unlimited bytes
      Max stack size 8388608 unlimited bytes
      Max core file size unlimited unlimited bytes
      Max resident set unlimited unlimited bytes
      Max processes 126038 126038 processes
      Max open files 16384 16384 files
      Max locked memory 16777216 16777216 bytes
      Max address space unlimited unlimited bytes
      Max file locks unlimited unlimited locks
      Max pending signals 126038 126038 signals
      Max msgqueue size 819200 819200 bytes
      Max nice priority 0 0
      Max realtime priority 0 0
      Max realtime timeout unlimited unlimited us
      Core pattern: |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e

      CONFIG

      [mysqld]
      datadir=/var/lib/mysql
      socket=/var/lib/mysql/mysql.sock
      log-error=/var/log/mariadb/mariadb.log
      pid-file=/var/tmp/mariadb.pid
      #tmpdir=/var/tmp
      log_warnings=1
      slow_query_log = 1
      slow_query_log_file = /var/log/mariadb/slowquery.log
      log_slow_verbosity = query_plan,explain

      bind-address = 0.0.0.0
      skip-name-resolve
      skip-external-locking
      performance_schema=off
      userstat = 1

      transaction-isolation = READ-COMMITTED

      binlog_format=ROW
      innodb_autoinc_lock_mode=2

      innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend
      innodb_data_home_dir = /var/lib/mysql
      innodb_doublewrite=0
      innodb_file_per_table=1

      sql_mode=NO_ENGINE_SUBSTITUTION
      innodb-default-row-format=DYNAMIC

      innodb_strict_mode=off

      innodb_flush_neighbors=0
      innodb_page_size=64K
      innodb_log_buffer_size = 2G
      innodb_log_file_size=1G
      innodb_open_files=16384

      Attachments

        1. 10.4.5-10.5.6-wsrep.txt
          41 kB
        2. 10.4.5-10.5.6-wsrep.txt
          41 kB
        3. 10.4.15.txt
          6 kB
        4. 10.5.6.txt
          3 kB
        5. 10.5.6 crash log.txt
          16 kB
        6. 10.5.8 crash.txt
          8 kB
        7. ib_logfile0_000-7FF.bin
          2 kB

        Issue Links

          Activity

            shexphobos Xesh created issue -
            shexphobos Xesh added a comment - - edited

            I have just downgraded it to 10.5.4 and it is working without any changes. Just stopped 10.5.8, downgraded to 10.5.4, and start MariaDB.
            Also, 10.5.7 is crashing, but 10.5.6 is working.

            shexphobos Xesh added a comment - - edited I have just downgraded it to 10.5.4 and it is working without any changes. Just stopped 10.5.8, downgraded to 10.5.4, and start MariaDB. Also, 10.5.7 is crashing, but 10.5.6 is working.

            What happened before the failed restart attempt? Could the database have been corrupted by the bug in 10.5.7 that was fixed in MDEV-24096?

            marko Marko Mäkelä added a comment - What happened before the failed restart attempt? Could the database have been corrupted by the bug in 10.5.7 that was fixed in MDEV-24096 ?
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            Labels upgrade need_feedback upgrade
            shexphobos Xesh made changes -
            Attachment 10.4.5-10.5.6-wsrep.txt [ 55272 ]
            shexphobos Xesh made changes -
            Attachment 10.4.5-10.5.6-wsrep.txt [ 55273 ]
            shexphobos Xesh added a comment -

            That bug has fixed in 10.5.8, right? I did not make the upgrade to 10.5.7, it was directly from 10.4.17 to 10.5.8.

            I restore the database from the backup with version 10.4.15, and it started without a problem. After that, I've made an upgrade to 10.5.6 and it started normally.
            Version 10.5.6 is crashing when Wsrep=on with innodb_force_recovery=6, without wsrep it is working. After commenting innodb_force_recovery=6 it is working and replicating.
            Is this a new bug?

            10.4.5-10.5.6-wsrep.txt

            shexphobos Xesh added a comment - That bug has fixed in 10.5.8, right? I did not make the upgrade to 10.5.7, it was directly from 10.4.17 to 10.5.8. I restore the database from the backup with version 10.4.15, and it started without a problem. After that, I've made an upgrade to 10.5.6 and it started normally. Version 10.5.6 is crashing when Wsrep=on with innodb_force_recovery=6, without wsrep it is working. After commenting innodb_force_recovery=6 it is working and replicating. Is this a new bug? 10.4.5-10.5.6-wsrep.txt

            I wonder why you are setting innodb_force_recovery at all. Its main purpose is allow data to be rescued in case the database is corrupted. In particular, setting innodb_force_recovery=6 will cause the write-ahead log to be skipped altogether. If the server was previously killed and not shut down cleanly, any kind of corruption will be possible.

            marko Marko Mäkelä added a comment - I wonder why you are setting innodb_force_recovery at all. Its main purpose is allow data to be rescued in case the database is corrupted. In particular, setting innodb_force_recovery=6 will cause the write-ahead log to be skipped altogether. If the server was previously killed and not shut down cleanly, any kind of corruption will be possible.
            shexphobos Xesh added a comment -

            It was left by mistake.

            shexphobos Xesh added a comment - It was left by mistake.

            I think that any database instance where innodb_force_recovery was set to 6 should be treated as seriously corrupted (only valid for extracting data with mysqldump or similar, and also that data dump should be carefully checked for any corruption). The only exception might be that it was known that the previous server runs ended in a clean shutdown. Also, before MDEV-19514, the innodb_force_recovery settings 4 or 5 could corrupt secondary indexes.

            The redo log and the undo log pages are the glue that make atomic any changes that involve multiple index pages or multiple indexes of a table. If those mechanisms are disabled, anything can happen. For example, imagine what would happen if the server was killed and proper log recovery was disabled in the middle of a B-tree page split or merge.

            Another sign of this kind of ‘garbage in, garbage out’ situation would be error messages "log sequence number is in the future".

            I am sorry, but I do not think that this can be addressed by a code fix (other than converting crashes to nicer error messages, as in MDEV-13542).

            marko Marko Mäkelä added a comment - I think that any database instance where innodb_force_recovery was set to 6 should be treated as seriously corrupted (only valid for extracting data with mysqldump or similar, and also that data dump should be carefully checked for any corruption). The only exception might be that it was known that the previous server runs ended in a clean shutdown. Also, before MDEV-19514 , the innodb_force_recovery settings 4 or 5 could corrupt secondary indexes. The redo log and the undo log pages are the glue that make atomic any changes that involve multiple index pages or multiple indexes of a table. If those mechanisms are disabled, anything can happen. For example, imagine what would happen if the server was killed and proper log recovery was disabled in the middle of a B-tree page split or merge. Another sign of this kind of ‘garbage in, garbage out’ situation would be error messages "log sequence number is in the future". I am sorry, but I do not think that this can be addressed by a code fix (other than converting crashes to nicer error messages, as in MDEV-13542 ).
            marko Marko Mäkelä made changes -
            Fix Version/s N/A [ 14700 ]
            Resolution Not a Bug [ 6 ]
            Status Open [ 1 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            shexphobos Xesh added a comment -

            1. Before I upgraded from 10.4.17 to 10.5.8, 10.4.17 is shutdown with only innodb_fast_shutdown = 0 option. None of the innodb_force_recovery options is used with this upgrade. During the start of 10.5.8, there was a message that ibtmp1 should be deleted with option innodb_fast_shutdown = 0 from the previous message. So 10.4.17 was stopped with this option.
            2. The only mistake was with trying to find a working version when I tried with clean restored database from 10.4.15 to 10.5.6 and by mistake option innodb_force_recovery is left by mistake.

            Your answer stands for "1." issue, upgrade right?

            shexphobos Xesh added a comment - 1. Before I upgraded from 10.4.17 to 10.5.8, 10.4.17 is shutdown with only innodb_fast_shutdown = 0 option. None of the innodb_force_recovery options is used with this upgrade. During the start of 10.5.8, there was a message that ibtmp1 should be deleted with option innodb_fast_shutdown = 0 from the previous message. So 10.4.17 was stopped with this option. 2. The only mistake was with trying to find a working version when I tried with clean restored database from 10.4.15 to 10.5.6 and by mistake option innodb_force_recovery is left by mistake. Your answer stands for "1." issue, upgrade right?

            shexphobos, can you provide a full sequence of steps that reproduce the problem? The option innodb_force_recovery is dangerous by design, and it is very well known that it may cause permanent, irrecoverable corruption.

            I am guessing that you set innodb_force_recovery=6 because the logic that was implemented in MDEV-12353 (along with a redo log format change) caused the normal server startup to be refused. I am not aware of that being broken. We have upgrade tests that should catch a failure of the 10.5 server to start up after a clean shutdown of the older server version. Did you see a message InnoDB: Upgrade after a crash is not supported at any point?

            marko Marko Mäkelä added a comment - shexphobos , can you provide a full sequence of steps that reproduce the problem? The option innodb_force_recovery is dangerous by design, and it is very well known that it may cause permanent, irrecoverable corruption. I am guessing that you set innodb_force_recovery=6 because the logic that was implemented in MDEV-12353 (along with a redo log format change) caused the normal server startup to be refused. I am not aware of that being broken. We have upgrade tests that should catch a failure of the 10.5 server to start up after a clean shutdown of the older server version. Did you see a message InnoDB: Upgrade after a crash is not supported at any point?
            shexphobos Xesh made changes -
            Attachment 10.4.15.txt [ 55419 ]
            shexphobos Xesh made changes -
            Attachment 10.5.6.txt [ 55420 ]
            shexphobos Xesh added a comment -

            systemctl stop mariadb (10.4.15)
            During shutdown option innodb_fast_shutdown=0 is used, otherwise 10.5.6 would not start.
            dnf update MariaDB* (update from 10.4.15 to 10.5.6
            dnf update galera-4* (update from galera-4-26.4.3-1 to galera-4-26.4.5)
            systemctl start mariadb (10.5..6)
            Config when stopped 10.4.15.txt
            Config when started 10.5.6.txt

            After 10.5.6 is started normally, I uncommented innodb_force_recovery=6 and after the next restart, it crashed.

            Yes, there is a message "[ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with Backup 10.4.15-MariaDB." when you try to start 10.5.6 without the previous shutdown of 10.4.15 without innodb_fast_shutdown=0.

            shexphobos Xesh added a comment - systemctl stop mariadb (10.4.15) During shutdown option innodb_fast_shutdown=0 is used, otherwise 10.5.6 would not start. dnf update MariaDB* (update from 10.4.15 to 10.5.6 dnf update galera-4* (update from galera-4-26.4.3-1 to galera-4-26.4.5) systemctl start mariadb (10.5..6) Config when stopped 10.4.15.txt Config when started 10.5.6.txt After 10.5.6 is started normally, I uncommented innodb_force_recovery=6 and after the next restart, it crashed. Yes, there is a message " [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with Backup 10.4.15-MariaDB." when you try to start 10.5.6 without the previous shutdown of 10.4.15 without innodb_fast_shutdown=0.
            shexphobos Xesh made changes -
            Attachment 10.5.6 crash log.txt [ 55421 ]
            shexphobos Xesh added a comment - 10.5.6 crash log.txt
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ]
            Resolution Not a Bug [ 6 ]
            Status Closed [ 6 ] Stalled [ 10000 ]

            shexphobos, let me reopen this and change the title to something that is closer to the original root cause. I maintain that the consequences of setting innodb_force_recovery=6 are not a bug.

            You say that MariaDB 10.5.6 and 10.5.8 refused to start up with a log file that was apparently created with Mariabackup 10.4. How did you get into that situation? The normal workflow for restoring a backup should be something like this:

            1. Run mariabackup --prepare to apply the log file. This will create a 0-byte ib_logfile0 file.
            2. Run mariabackup --copy-back to restore the applied backup, or perform the equivalent steps manually.
            3. Start the server. This will create a new log file (which will stay that it was created by MariaDB, not backup).

            I think that the following shortcut might work for restoring a backup from 10.3 or later to a newer server:

            1. Run mariabackup --prepare (matching the old server version) to apply the log file.
            2. Run mariabackup --copy-back to restore the applied backup, or perform the equivalent steps manually.
            3. Start the 10.5 server.

            By the way, I think that it is a good idea to run mariabackup --prepare always after mariabackup --backup, not only to save time when restoring the backup, but also to validate the backup (so that in case the backup was corrupted in some way, you will have a chance to get a valid backup). To avoid extra I/O workload on the database server, that command can be run on a separate system.

            Can you clarify what exactly you were doing, and whether my suggested procedures would work. Even if we had no code bug, our documentation might need some clarification.

            marko Marko Mäkelä added a comment - shexphobos , let me reopen this and change the title to something that is closer to the original root cause. I maintain that the consequences of setting innodb_force_recovery=6 are not a bug. You say that MariaDB 10.5.6 and 10.5.8 refused to start up with a log file that was apparently created with Mariabackup 10.4. How did you get into that situation? The normal workflow for restoring a backup should be something like this: Run mariabackup --prepare to apply the log file. This will create a 0-byte ib_logfile0 file. Run mariabackup --copy-back to restore the applied backup, or perform the equivalent steps manually. Start the server. This will create a new log file (which will stay that it was created by MariaDB, not backup). I think that the following shortcut might work for restoring a backup from 10.3 or later to a newer server: Run mariabackup --prepare (matching the old server version) to apply the log file. Run mariabackup --copy-back to restore the applied backup, or perform the equivalent steps manually. Start the 10.5 server. By the way, I think that it is a good idea to run mariabackup --prepare always after mariabackup --backup , not only to save time when restoring the backup, but also to validate the backup (so that in case the backup was corrupted in some way, you will have a chance to get a valid backup). To avoid extra I/O workload on the database server, that command can be run on a separate system. Can you clarify what exactly you were doing, and whether my suggested procedures would work. Even if we had no code bug, our documentation might need some clarification.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s N/A [ 14700 ]
            Affects Version/s 10.5.6 [ 24508 ]
            Summary Mariadb 10.5.8 crash after upgrade Mariadb 10.5: InnoDB: Upgrade after a crash is not supported
            shexphobos Xesh made changes -
            Attachment 10.5.8 crash.txt [ 55426 ]
            shexphobos Xesh added a comment -

            I'll start from the beginning.

            • I've restored the database from backup using "mariabackup --prepare --targer-dir=." with version 10.4.17
            • Version 10.4.17 is used and it started without a problem, I had problems with Inserts and deletes, this version is crashing constantly on any insert or delete, so I downgrade to 10.4.15.
            • Version 10.4.15 did not have these issues.
            • When I tried to migrate from 10.4.x to 10.5.x, version 10.5 could not start because " InnoDB: Upgrade after a crash is not supported", I downgrade it to 10.4.15, restored, started, and stopped with innodb_fast_shutdown=0. Then I upgraded it to 10.5.8, and it did not start normally I left innodb_force_recovery=6 by mistake. I could not start the 10.5.8 and 10.5.7 versions, they keep crashing even with commented innodb_force_recovery=6. Only version 10.5.6 could started without a problem. Command used are dnf update and downgrade.
            • Only good working versions are 10.4.15 and 10.5.6
            • When I tried to update 10.5.6 to 10.5.8 with the same working configuration uploaded as 10.5.6.txt, 10.5.8 crashed and it could not start
            • Again I downgrade it back to 10.5.6 started and stopped with innodb_fast_shutdown=0.
            • I made dnf update MariaDB* from 10.5.6 to 10.5.8, 10.5.6 is working without a problem but 10.5.8 after a simple update from 10.5.6 is not.
            • There is a message "[ERROR] InnoDB: Space id and page no stored in the page, read in are", the first time from 10.4.15 to 10.5.6 innodb_fast_shutdown=0 did help but from 10.5.6 to 10.5.8 it did not. 10.5.8 crash.txt
            shexphobos Xesh added a comment - I'll start from the beginning. I've restored the database from backup using "mariabackup --prepare --targer-dir=." with version 10.4.17 Version 10.4.17 is used and it started without a problem, I had problems with Inserts and deletes, this version is crashing constantly on any insert or delete, so I downgrade to 10.4.15. Version 10.4.15 did not have these issues. When I tried to migrate from 10.4.x to 10.5.x, version 10.5 could not start because " InnoDB: Upgrade after a crash is not supported", I downgrade it to 10.4.15, restored, started, and stopped with innodb_fast_shutdown=0. Then I upgraded it to 10.5.8, and it did not start normally I left innodb_force_recovery=6 by mistake. I could not start the 10.5.8 and 10.5.7 versions, they keep crashing even with commented innodb_force_recovery=6. Only version 10.5.6 could started without a problem. Command used are dnf update and downgrade. Only good working versions are 10.4.15 and 10.5.6 When I tried to update 10.5.6 to 10.5.8 with the same working configuration uploaded as 10.5.6.txt, 10.5.8 crashed and it could not start Again I downgrade it back to 10.5.6 started and stopped with innodb_fast_shutdown=0. I made dnf update MariaDB* from 10.5.6 to 10.5.8, 10.5.6 is working without a problem but 10.5.8 after a simple update from 10.5.6 is not. There is a message " [ERROR] InnoDB: Space id and page no stored in the page, read in are", the first time from 10.4.15 to 10.5.6 innodb_fast_shutdown=0 did help but from 10.5.6 to 10.5.8 it did not. 10.5.8 crash.txt
            marko Marko Mäkelä made changes -

            The lines in 10.5.8 crash.txt are truncated, and I cannot see the full message.

            If I understood correctly, there is no actual problem with the 10.5 server, and the whole chain of events was caused by a crashing bug in 10.4 (which I hope has been filed in a separate ticket).

            Apparently, the problems in 10.5 were caused by accidentally forgetting to remove the innodb_force_recovery=6 setting from the configuration, and there really is not much that we can improve in the code. shexphobos, do you agree?

            Theoretically, we could change the startup so that when innodb_force_recovery=6 is specified, a ‘lecture’ will be written to the error log and startup will be refused. The (disastrous) effect of the setting could still be achieved by manually deleting the log files. This is something that we could do in the next development version.

            I suspect that we have a similar case of incorrectly performed upgrade in MDEV-24578. I am linking these two tickets, so that our documentation team can easily find them.

            marko Marko Mäkelä added a comment - The lines in 10.5.8 crash.txt are truncated, and I cannot see the full message. If I understood correctly, there is no actual problem with the 10.5 server, and the whole chain of events was caused by a crashing bug in 10.4 (which I hope has been filed in a separate ticket). Apparently, the problems in 10.5 were caused by accidentally forgetting to remove the innodb_force_recovery=6 setting from the configuration, and there really is not much that we can improve in the code. shexphobos , do you agree? Theoretically, we could change the startup so that when innodb_force_recovery=6 is specified, a ‘lecture’ will be written to the error log and startup will be refused. The (disastrous) effect of the setting could still be achieved by manually deleting the log files. This is something that we could do in the next development version. I suspect that we have a similar case of incorrectly performed upgrade in MDEV-24578 . I am linking these two tickets, so that our documentation team can easily find them.

            shexphobos The log you posted above, is it from the first crash? Now your submission includes the logs from when you tried to recover and are unable to do so. But the first time it crashed something else maybe happened that corrupted the files and got you into trying innodb_force_recovery? If so, please post the journald log from the first crash.

            otto Otto Kekäläinen added a comment - shexphobos The log you posted above, is it from the first crash? Now your submission includes the logs from when you tried to recover and are unable to do so. But the first time it crashed something else maybe happened that corrupted the files and got you into trying innodb_force_recovery? If so, please post the journald log from the first crash.

            @Xesh See https://jira.mariadb.org/browse/MDEV-24578. Does your case seem similar? Do you get any results from running `journalctl -u mariadb | grep "InnoDB: Assertion failure"`?

            otto Otto Kekäläinen added a comment - @Xesh See https://jira.mariadb.org/browse/MDEV-24578 . Does your case seem similar? Do you get any results from running `journalctl -u mariadb | grep "InnoDB: Assertion failure"`?
            serg Sergei Golubchik made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Incomplete [ 4 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 116884 ] MariaDB v4 [ 158690 ]
            marko Marko Mäkelä made changes -
            Resolution Incomplete [ 4 ]
            Status Closed [ 6 ] Stalled [ 10000 ]

            With ib_logfile0_000-7FF.bin I can reproduce this:

            cp ib_logfile0_000-7FF.bin /dev/shm/ib_logfile0
            truncate -s 5368709120 /dev/shm/ib_logfile0
            truncate -s 5368709120 /dev/shm/ib_logfile1
            sql/mariadbd --datadir=$(pwd)/mysql-test/var/install.db/ --innodb-log-group-home-dir /dev/shm
            

            10.5 165564d3c33ae3d677d70644a83afcb744bdbf65

            2022-11-25 9:23:08 0 [ERROR] InnoDB: Invalid log block checksum. block: 0 checkpoint no: 0 expected: 3965168067 found: 0
            2022-11-25 9:23:08 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.17, and it appears corrupted.
            

            This attempted to parse the 512 bytes read from ib_logfile0 at byte offset 1,734,884,864.

            10.4 0d586d62e5326053383eb2774b32ff20bbe2f4d5

            2022-11-25 9:35:22 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 423245228518 between the checkpoint 423245228518 and the end 423245228032.
            

            This attempted to parse the 512 bytes read from ib_logfile1 at byte offset 6,029,852,160-5,368,709,120=661,143,040.

            I checked the file offsets by debugging tools. For a complete test, we must write a fake 512-byte empty log block at the correct offset of the file ib_logfile1.

            marko Marko Mäkelä added a comment - With ib_logfile0_000-7FF.bin I can reproduce this: cp ib_logfile0_000-7FF.bin /dev/shm/ib_logfile0 truncate -s 5368709120 /dev/shm/ib_logfile0 truncate -s 5368709120 /dev/shm/ib_logfile1 sql/mariadbd --datadir=$(pwd)/mysql-test/var/install.db/ --innodb-log-group-home-dir /dev/shm 10.5 165564d3c33ae3d677d70644a83afcb744bdbf65 2022-11-25 9:23:08 0 [ERROR] InnoDB: Invalid log block checksum. block: 0 checkpoint no: 0 expected: 3965168067 found: 0 2022-11-25 9:23:08 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.4.17, and it appears corrupted. This attempted to parse the 512 bytes read from ib_logfile0 at byte offset 1,734,884,864. 10.4 0d586d62e5326053383eb2774b32ff20bbe2f4d5 2022-11-25 9:35:22 0 [ERROR] InnoDB: Missing MLOG_CHECKPOINT at 423245228518 between the checkpoint 423245228518 and the end 423245228032. This attempted to parse the 512 bytes read from ib_logfile1 at byte offset 6,029,852,160-5,368,709,120=661,143,040. I checked the file offsets by debugging tools. For a complete test, we must write a fake 512-byte empty log block at the correct offset of the file ib_logfile1 .
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Attachment ib_logfile0_000-7FF.bin [ 67101 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s N/A [ 14700 ]
            Labels need_feedback upgrade upgrade
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]

            The 2 checkpoint blocks in ib_logfile0_000-7FF.bin contain the checkpoint LSNs at byte offsets 0x208 and at 0x608: 0x628b61a9dd and 0x628b61a9e6. These are read correctly. The source_offset=6029852646 (0x167683fe6) in recv_log_recover_10_4() just needs to be rounded down to the start of the 512-byte block. It turns out that the code that is supposed to do the rounding was also trimming the offset to 32 bits (passing 0x67683e00 instead of 0x167683e00):

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 8db270b2a9b..511ccc5afe6 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -1412,7 +1412,7 @@ static dberr_t recv_log_recover_10_4()
             		return DB_CORRUPTION;
             	}
             
            -	recv_sys.read(source_offset & ~(OS_FILE_LOG_BLOCK_SIZE - 1),
            +	recv_sys.read(source_offset & ~lsn_t(OS_FILE_LOG_BLOCK_SIZE - 1),
             		      {buf, OS_FILE_LOG_BLOCK_SIZE});
             
             	ulint crc = log_block_calc_checksum_crc32(buf);
            

            I believe that this affects all upgrades where innodb_log_files_in_group×innodb_log_file_size exceeds 4 gigabytes and the current log position is located more than 4 gigabytes from the start of ib_logfile0. Multiple log files (before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5) were treated as if all files had been catenated together.

            marko Marko Mäkelä added a comment - The 2 checkpoint blocks in ib_logfile0_000-7FF.bin contain the checkpoint LSNs at byte offsets 0x208 and at 0x608: 0x628b61a9dd and 0x628b61a9e6. These are read correctly. The source_offset=6029852646 (0x167683fe6) in recv_log_recover_10_4() just needs to be rounded down to the start of the 512-byte block. It turns out that the code that is supposed to do the rounding was also trimming the offset to 32 bits (passing 0x67683e00 instead of 0x167683e00): diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 8db270b2a9b..511ccc5afe6 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1412,7 +1412,7 @@ static dberr_t recv_log_recover_10_4() return DB_CORRUPTION; } - recv_sys.read(source_offset & ~(OS_FILE_LOG_BLOCK_SIZE - 1), + recv_sys.read(source_offset & ~lsn_t(OS_FILE_LOG_BLOCK_SIZE - 1), {buf, OS_FILE_LOG_BLOCK_SIZE}); ulint crc = log_block_calc_checksum_crc32(buf); I believe that this affects all upgrades where innodb_log_files_in_group × innodb_log_file_size exceeds 4 gigabytes and the current log position is located more than 4 gigabytes from the start of ib_logfile0 . Multiple log files (before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5) were treated as if all files had been catenated together.

            In MariaDB 10.8, the upgrade check was refactored as part of MDEV-14425. We are reading the file from the correct offset even without this change (which I added as part of the merge of the fix for clarity):

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 00fb22e1625..1bd24ac8bc9 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -1649,7 +1649,7 @@ static dberr_t recv_log_recover_10_5(lsn_t lsn_offset)
               if (lsn_offset < (log_sys.is_pmem() ? log_sys.file_size : 4096))
                 memcpy_aligned<512>(buf, &log_sys.buf[lsn_offset & ~511], 512);
               else
            -    recv_sys.read(lsn_offset & ~511, {buf, 512});
            +    recv_sys.read(lsn_offset & ~lsn_t{511}, {buf, 512});
             
               if (!recv_check_log_block(buf))
               {
            

            marko Marko Mäkelä added a comment - In MariaDB 10.8, the upgrade check was refactored as part of MDEV-14425 . We are reading the file from the correct offset even without this change (which I added as part of the merge of the fix for clarity): diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 00fb22e1625..1bd24ac8bc9 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1649,7 +1649,7 @@ static dberr_t recv_log_recover_10_5(lsn_t lsn_offset) if (lsn_offset < (log_sys.is_pmem() ? log_sys.file_size : 4096)) memcpy_aligned<512>(buf, &log_sys.buf[lsn_offset & ~511], 512); else - recv_sys.read(lsn_offset & ~511, {buf, 512}); + recv_sys.read(lsn_offset & ~lsn_t{511}, {buf, 512}); if (!recv_check_log_block(buf)) {
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Labels upgrade not-10.8 upgrade
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2022-11-28 11:37:15.0 2022-11-28 11:37:15.724
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5.19 [ 28511 ]
            Fix Version/s 10.6.12 [ 28513 ]
            Fix Version/s 10.7.8 [ 28515 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -

            We just hit this bug and that caused our DB to fail upgrading to 10.6 when we tried to go to 10.6.11. Upgrading to 10.6.12 worked and we were able to reproduce this very easily by, like Marko mentioned, having innodb_log_files_in_group×innodb_log_file_size exceed 4G and have the log position be over 4G past the start of ib_logfile0. Shouldn't there be a warning on 10.6 versions' changelog page prior to 10.6.12 about this issue? It was quite jarring for us and the issue halted our upgrade process.

            delisson Delisson Silva added a comment - We just hit this bug and that caused our DB to fail upgrading to 10.6 when we tried to go to 10.6.11. Upgrading to 10.6.12 worked and we were able to reproduce this very easily by, like Marko mentioned, having innodb_log_files_in_group×innodb_log_file_size exceed 4G and have the log position be over 4G past the start of ib_logfile0. Shouldn't there be a warning on 10.6 versions' changelog page prior to 10.6.12 about this issue? It was quite jarring for us and the issue halted our upgrade process.
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 171797

            People

              marko Marko Mäkelä
              shexphobos Xesh
              Votes:
              1 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.