Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37949

Implement innodb_log_archive

    XMLWordPrintable

Details

    • Q1/2026 Server Development, Q1/2026 Server Maintenance, Q2/2026 Server Maintenance

    Description

      The InnoDB write-ahead log file (ib_logfile0) is pre-allocated to innodb_log_file_size and written as a ring buffer. This is good for write performance and space management, but unsuitable for arbitrary point-in-time recovery or for facilitating incremental backup.

      As noted in MDEV-14992, it would be better if the server took care of saving a sufficient amount of InnoDB write-ahead log to support backup. The SQL interface with a backup program would be something like the following:

      -- At the start of the backup, we check if log archiving is already running.
      SELECT @@GLOBAL.innodb_log_archive, variable_value
      FROM information_schema.global_status
      WHERE variable_name = 'INNODB_LSN_ARCHIVED';
      -- If BETWEEN 1 AND the LSN of the previous backup, skip copying InnoDB files (incremental backup)
      -- Else if necessary, request the log to be archived from the current LSN onwards.
      SET GLOBAL innodb_log_archive=ON;
      -- In this case, at the end of the backup, we would disable the log archiving again:
      SET GLOBAL innodb_log_archive=OFF;
      

      Log archiving could also be enabled indefinitely, to facilitate arbitrary point-in-time recovery of anything that is covered by the InnoDB write-ahead log. When MDEV-34705 is implemented and enabled, this would include the binlog. However, point-in-time recovery of DDL operations would not work without limitations even for an InnoDB-only deployment, because part of the data dictionary is stored in .frm files, whose creation is covered by a separate log (see MDEV-17567).

      The default value of innodb_log_archive is OFF, meaning that the log archiving is disabled by default.

      When innodb_log_archive=ON, changes of the parameter innodb_log_file_size will take place when the current log file is about to be filled up and a new file is being created and allocated. The log resizing logic (MDEV-27812) as well as the creation of a redundant log file ib_logfile101 will remain in use when innodb_log_archive=OFF.

      When innodb_log_archiving=ON, the server will write log to files like the following:

      ib_0000000000003000.log
      ib_0000000000400000.log
      ib_00000000007fd000.log
      ib_0000000000bfa000.log
      ib_0000000000ff7000.log
      ib_00000000013f4000.log
      

      The above example is with the minimum innodb_log_file_size=4M (0x400000 bytes in hexadecimal). The file names will refer to the log sequence number corresponding to offset 12288 (0x3000) in the file. At the start of each file, there will be a 12288-byte header that contains 32-bit offsets into checkpoint mini-transactions that end in FILE_CHECKPOINT records that point to the checkpoints.

      For innodb_log_archive=ON, we will impose a maximum innodb_log_file_size=4G, to keep the log file sizes manageable and to allow the format of 32-bit offsets to work.

      For efficient recovery from archived log files instead of ib_logfile0, two further start-up parameters will be introduced:

      parameter meaning
      innodb_log_recovery_start LSN to start recovery from (instead of the LSN of the mini-transaction that point to the latest available checkpoint); at the LSN we expect to find an optional sequence of FILE_MODIFY records and a FILE_CHECKPOINT record.
      innodb_log_recovery_target recovery point objective (end LSN of a backup)

      An implementation of backup may write these parameters into a .cnf file or pass these in a command line when invoking mariadbd. The idea of these parameters is to limit the scope of recovery and to avoid replaying an archived log from the very beginning to the very end.

      We must keep in mind that the creation and modification of some files, such as .frm files that form part of the data dictionary, are not covered by the InnoDB write-ahead log. Therefore, it is important to be able to stop the recovery at a specific LSN.

      Testing considerations

      mariadb-backup will assume innodb_log_archive=OFF. It will not attempt to read any other log files than ib_logfile0.

      We must keep in mind that there are several log I/O implementations:

      • on Linux /dev/shm or PMEM unless cmake -DWITH_INNODB_PMEM=OFF: memory-mapped reads and writes
      • else:
        • parsing during recovery is either via pread or memory-mapped, according to the innodb_log_file_mmap setting
        • writes are via pwrite

      All combinations must be covered in testing. I have made use of the regression test suite like this:

      # data directory stored outside /dev/shm, or server compiled WITH_INNODB_PMEM=OFF
      mysql-test/mtr --parallel=auto --big-test --force --mysqld=--loose-innodb-log-{archive,recovery-start=12288,file-size=4m,file-mmap=OFF} --skip-test=mariabackup
      mysql-test/mtr --parallel=auto --big-test --force --mysqld=--loose-innodb-log-{archive,recovery-start=12288,file-size=4m,file-mmap=ON} --skip-test=mariabackup
      # data directory in /dev/shm and server built WITH_INNODB_PMEM=ON
      mysql-test/mtr --parallel=auto --big-test --force --mysqld=--loose-innodb-log-{archive,recovery-start=12288,file-size=4m} --skip-test=mariabackup
      

      It would be beneficial extend our stress tests as follows:

      1. Cover both innodb_encrypt_log=OFF and innodb_encrypt_log=ON. Note that this parameter cannot be changed (at server restart) while innodb_log_archive=ON is set.
      2. Kill the server and determine the final LSN. This can be done by attempting startup with impossible innodb_log_recovery_target=12288 and checking the error message.
      3. Start the server with innodb_log_recovery_target set to the final LSN.
      4. Expect everything to work, except any writes to persistent tables. Some transactions, such as any reads at TRANSACTION ISOLATION LEVEL SERIALIZABLE may be blocked by locks that are held by recovered incomplete transactions.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Marko Mäkelä Marko Mäkelä
              Thirunarayanan Balathandayuthapani Thirunarayanan Balathandayuthapani
              Saahil Alam Saahil Alam
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.