Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28880

Upgrade from MariaDB 10.3.32 to MariaDB 10.6.8 has errors/issues/documentation

Details

    Description

      Hello,
      We upgraded 5 slaves and 1 master from MariaDB 10.3.32 to MariaDB 10.6.8
      Here are error, issue, documentation suggestions we encountered.

      1.
      4 of 5 of the slaves had the following error:

      [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.3.32.
      

      preventing startup after upgrade. so mariadb_upgrade could not be run and server could not be started

      Therefore we downgraded back to 10.3.32 and ran MySQL commands

      SET GLOBAL innodb_fast_shutdown = 0;
      SET GLOBAL innodb_buffer_pool_dump_at_shutdown=OFF;
      

      then shutdown then upgraded to 10.6.8 again with the same problem.

      We had to delete the ib_logfile0 file, upgrade to 10.6.8 and restart to upgrade.

      Here are related logs:

      2022-06-15 15:17:56 0 [Note] InnoDB: FTS optimize thread exiting.
      2022-06-15 15:17:56 1 [Note] InnoDB: to purge 2 transactions
      2022-06-15 15:17:56 0 [Note] InnoDB: Buffer pool(s) load aborted due to user instigated abort at 220615 15:17:56
      2022-06-15 15:17:56 0 [Note] InnoDB: Starting shutdown...
      2022-06-15 15:18:07 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2022-06-15 15:18:07 0 [Note] InnoDB: Shutdown completed; log sequence number 30535472345669; transaction id 71002773175
      2022-06-15 15:18:07 0 [Note] /usr/sbin/mysqld: Shutdown complete
       
      2022-06-15 15:24:17 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2022-06-15 15:24:17 0 [Note] InnoDB: Using transactional memory
      2022-06-15 15:24:17 0 [Note] InnoDB: Number of pools: 1
      2022-06-15 15:24:17 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-06-15 15:24:17 0 [Note] InnoDB: Using Linux native AIO
      2022-06-15 15:24:17 0 [Note] InnoDB: Initializing buffer pool, total size = 332859965440, chunk size = 134217728
      2022-06-15 15:24:18 0 [Note] InnoDB: Completed initialization of buffer pool
      2022-06-15 15:24:20 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.3.32.
      2022-06-15 15:24:20 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
      2022-06-15 15:24:20 0 [Note] InnoDB: Starting shutdown...
      2022-06-15 15:24:20 0 [ERROR] Plugin 'InnoDB' init function returned error.
      2022-06-15 15:24:20 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2022-06-15 15:24:20 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-06-15 15:24:20 0 [ERROR] Failed to initialize plugins.
      2022-06-15 15:24:20 0 [ERROR] Aborting
      

      Here are related my.cnf settings:

      innodb_log_files_in_group      = 1
      innodb_log_file_size           = 16G
      innodb_flush_log_at_trx_commit = 1
      

      What we did not try: Stop slave replication manually before shutdown and then upgrade.
      Other notes: We had major problems in the past before changing innodb_change_buffering to none. Basically restarting slaves broke some index in the database nearly all the time. We ran scripts to check all tables rebuild all the broken table indexes using OPTIMIZE TABLE. On the other hand one of the servers that did not get this error was one rebuild recently from mariadbbackup. However the backup server running the mariadbbackup had the same error upgrading. Just adding in case some recovery we did broke the database before upgrade, but we don't think that is the issue.
      Other notes: Mariadb database is about 1.5 TB, with peak about 50,000 selects per second, 50 deletes per second, 500 inserts per second, 500 updates per second on master, with averages over half the peak

      2.
      On four out of five of the slaves after upgrade, we got

      2022-06-16 13:57:57 8 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
      

      one server had it twenty times in the last month, one server had it four times, one server had it twice, one server had it once.
      CHECK TABLE TABLENAMEREDACTED shows no error.
      percona checksum showed no data difference.
      There was no other slave replication problems.
      TABLENAMEREDACTED is the same table on all servers, and it is actually relatively not a big table or busy table.

      We performed OPTIMIZE TABLE TABLENAMEREDACTED on the master and slaves to recreate table just in case.
      However, we are still gettings errors on slaves with the same table, even after performing OPTIMIZE TABLE TABLENAMEREDACTED.
      Also, percona checksum still shows no data difference between servers.
      CHECK TABLE TABLENAMEREDACTED still shows no error.
      There are still no other slave replication problems.

      Other notes: all servers are runing with the following setting which enable parallel replication.

      slave_parallel_threads = 8
      

      3.
      The documentation in MariaDB suggests running mariadb_upgrade after updating MariaDB.
      I would suggest putting in the documentation to restart mariadb after running mariadb_upgrade as that seems to potentially remove some problems that may occur because of startup without the fixes mariadb_upgrade does.

      Attachments

        Issue Links

          Activity

            FK F K added a comment -

            Is it "ususally" safe the way we did it?
            Shutdown 10.3 and delete the ib_logfile0 file, upgrade to 10.6.8 and start mariadb to upgrade

            FK F K added a comment - Is it "ususally" safe the way we did it? Shutdown 10.3 and delete the ib_logfile0 file, upgrade to 10.6.8 and start mariadb to upgrade

            No, I can never recommend deleting the ib_logfile*. It is safer to let the server resize the redo log on startup. That is, change one or both the parameters (the minimum innodb_log_files_in_group in MariaDB is 1), start up and shut down the old server, and then upgrade to 10.5 or later.

            I think that with 2×16G log files, you may very possibly have hit MDEV-24412. If you still have a backup of a old log file, we could verify that based on the first 2048 bytes.

            marko Marko Mäkelä added a comment - No, I can never recommend deleting the ib_logfile* . It is safer to let the server resize the redo log on startup. That is, change one or both the parameters (the minimum innodb_log_files_in_group in MariaDB is 1), start up and shut down the old server, and then upgrade to 10.5 or later. I think that with 2×16G log files, you may very possibly have hit MDEV-24412 . If you still have a backup of a old log file, we could verify that based on the first 2048 bytes.
            FK F K added a comment -

            Thank you for reply.
            Just a note, after this happened to be on the safe side we did run
            CHECK TABLE on all tables
            and percona checksum on all tables
            without error or difference in tables.

            FK F K added a comment - Thank you for reply. Just a note, after this happened to be on the safe side we did run CHECK TABLE on all tables and percona checksum on all tables without error or difference in tables.

            FK, only if the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace had been updated on shutdown, it was safe to delete the log files. Sometimes, a server shutdown could hang before that happens, and something could silently kill the shut down server. That is why I advise against manually deleting the files.

            Nothing can reliably catch all inconsistencies caused by lost logs. The worst thing that can happen is not that some committed transactions written since the last log checkpoint could be lost, but that some data pages (which had been written between the latest checkpoint and the time the server was killed and the log files removed) would be newer than FIL_PAGE_FILE_FLUSH_LSN. The result could be broken links between index pages, or incorrect links to undo pages. MDEV-13542 removed many crashes due to this but it did not fix the corruption itself.

            CHECK TABLE (especially after MDEV-24402) is a rather good consistency check, but it is not perfect.

            Because of the widespread unsafe practice of deleting ib_logfile*, I implemented MDEV-27199 in MariaDB Server 10.8.

            The feedback that this ticket is waiting for is a copy of the first 2048 bytes of ib_logfile0 for a cleanly shut down data directory for which an upgrade fails.

            marko Marko Mäkelä added a comment - FK , only if the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace had been updated on shutdown, it was safe to delete the log files. Sometimes, a server shutdown could hang before that happens, and something could silently kill the shut down server. That is why I advise against manually deleting the files. Nothing can reliably catch all inconsistencies caused by lost logs. The worst thing that can happen is not that some committed transactions written since the last log checkpoint could be lost, but that some data pages (which had been written between the latest checkpoint and the time the server was killed and the log files removed) would be newer than FIL_PAGE_FILE_FLUSH_LSN . The result could be broken links between index pages, or incorrect links to undo pages. MDEV-13542 removed many crashes due to this but it did not fix the corruption itself. CHECK TABLE (especially after MDEV-24402 ) is a rather good consistency check, but it is not perfect. Because of the widespread unsafe practice of deleting ib_logfile* , I implemented MDEV-27199 in MariaDB Server 10.8. The feedback that this ticket is waiting for is a copy of the first 2048 bytes of ib_logfile0 for a cleanly shut down data directory for which an upgrade fails.
            FK F K added a comment -

            > The feedback that this ticket is waiting for is a copy of the first 2048 bytes of ib_logfile0 for a cleanly shut down data directory for which an upgrade fails.

            This is no longer available.

            Of note, this ticket has two issues, one it the one you replied too, and one is regarding
            2022-06-16 13:57:57 8 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
            errors when there is no primary key.

            FK F K added a comment - > The feedback that this ticket is waiting for is a copy of the first 2048 bytes of ib_logfile0 for a cleanly shut down data directory for which an upgrade fails. This is no longer available. Of note, this ticket has two issues, one it the one you replied too, and one is regarding 2022-06-16 13:57:57 8 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED' errors when there is no primary key.

            People

              thiru Thirunarayanan Balathandayuthapani
              FK F K
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.