Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19813 Aria crash recovery failures
  3. MDEV-18187

Aria engine: Redo phase failed with "error 192 when executing record redo_index_new_page" upon startup on a restored datadir

Details

    Description

      10.4 30da40bb8c

      2019-01-09 16:00:40 0 [Note] mysqld: Aria engine: starting recovery
      recovered pages: 0%
      2019-01-09 16:00:40 0 [ERROR] mysqld: failed to decrypt './test/t7'  rc: -1  dstlen: 0  size: 4294967275
       
      Got error 192 when executing record redo_index_new_page
      2019-01-09 16:00:40 0 [ERROR] mysqld: Aria engine: Redo phase failed
      tables to flush: 9 8 7 6 5 4 3 2 1 0
      2019-01-09 16:00:40 0 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
      2019-01-09 16:00:40 0 [ERROR] Plugin 'Aria' init function returned error.
      2019-01-09 16:00:40 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
      

      The backup dir (stored after prepare but before any server startup was attempted) is here:
      ftp://ftp.askmonty.org/public/mdev18187-backup.tar.gz

      To reproduce the failure, restore it to the datadir and start the server with

      --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-
      disk-tables=1 --plugin-load-add=file_key_management --file-key-management --file-key-management-filename=`pwd`/mysql-test/std_data/keys.txt
      

      (modify file-key-management-filename value as needed).


      Notes on the origin of the report. Please be aware it's not necessarily reproducible the same way.
      https://travis-ci.org/elenst/travis-tests/jobs/477279304

      The problem initially happened by running the concurrent test with MariaBackupFull scenario:

      elenst-jira-refs 5342873c9, mariadb-toolbox 6ded4b71568

      perl ./runall-new.pl --basedir=/home/travis/server --vardir=/home/travis/logs/vardir --duration=350 --threads=4 --seed=1547032472 --reporters=Backtrace,ErrorLog,Deadlock --skip-gendata --gendata-advanced --views --grammar=conf/mariadb/generic-dml.yy --redefine=conf/mariadb/bulk_insert.yy --mysqld=--log_output=FILE --mysqld=--max-statement-time=20 --mysqld=--lock-wait-timeout=10 --mysqld=--loose-innodb-lock-wait-timeout=5 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --filter=/home/travis/mariadb-toolbox/travis/10.4-combo-filter.ff --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--aria-encrypt-tables=1 --mysqld=--encrypt-tmp-disk-tables=1 --mysqld=--file-key-management --mysqld=--file-key-management-filename=/home/travis/mariadb-toolbox/data/keys.txt --mysqld=--plugin-load-add=file_key_management --engine=Aria --scenario=MariaBackupFull
      

      The test performs the following steps:

      • start the server with

        --no-defaults --basedir=/home/travis/server --lc-messages-dir=/home/travis/server/share --character-sets-dir=/home/travis/server/share/charsets --tmpdir=/home/travis/logs/vardir/tmp --core-file --datadir=/home/travis/logs/vardir/data --max-allowed-packet=128Mb --port=19300 --socket=/home/travis/logs/vardir/mysql.sock --pid-file=/home/travis/logs/vardir/mysql.pid --general-log --general-log-file=/home/travis/logs/vardir/mysql.log --log_output=FILE --max-statement-time=20 --lock-wait-timeout=10 --loose-innodb-lock-wait-timeout=5 --loose-debug_assert_on_not_freed_memory=0 --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-disk-tables=1 --file-key-management --file-key-management-filename=/home/travis/mariadb-toolbox/data/keys.txt --plugin-load-add=file_key_management
        

      • run concurrent DML/DDL flow on the server;
      • after ~100 seconds, take the full backup from the running server (succeeded, the log attached as mbackup_backup.log;
      • after another ~100 seconds, shut down the server gracefully (succeeded, the error log is attached as mysql.err_orig;
      • run mariabackup --prepare (succeded, the log attached as mbackup_prepare.log;
      • cleanup server datadir and restore the backup to it (succeeded, the log attached as mbackup_restore.log;
      • try to start the server with the same options as before.

      I have no information yet whether the problem happens on earlier versions.

      Attachments

        1. mbackup_backup.log
          21 kB
        2. mbackup_prepare.log
          2 kB
        3. mbackup_restore.log
          18 kB
        4. mysql.err
          11 kB
        5. mysql.err_orig
          4 kB

        Issue Links

          Activity

            Miloslavskiy Alexander added a comment -

            I made a fix, please review:
            https://github.com/MariaDB/server/pull/1888

            Miloslavskiy Alexander added a comment - I made a fix, please review: https://github.com/MariaDB/server/pull/1888

            This issue was resolved a long time ago in
            81258f14323e1d1ad0203bae93bc55a30d47c1b3

            The issue was that the encryption code wrote wrong data on page headers, which made it
            impossible for recovery to decrypt the data.
            Even after the fix the data in this bug report will not work as the data is already corrupted.

            monty Michael Widenius added a comment - This issue was resolved a long time ago in 81258f14323e1d1ad0203bae93bc55a30d47c1b3 The issue was that the encryption code wrote wrong data on page headers, which made it impossible for recovery to decrypt the data. Even after the fix the data in this bug report will not work as the data is already corrupted.

            New test case shows another issue

            monty Michael Widenius added a comment - New test case shows another issue

            The fix in https://github.com/MariaDB/server/pull/1888 is slight wrong.
            The issue is that when recovery is about to create a new data or index page, it check if the page already exits.
            If the page does not exists (file is too short) or contains wrong checksum, then the recovery code will recreate the old page.
            The bug was that the code that checked if the page existed didn't take into account encrypted pages.
            Adding a check if page could not be encrypted solved the issue.

            monty Michael Widenius added a comment - The fix in https://github.com/MariaDB/server/pull/1888 is slight wrong. The issue is that when recovery is about to create a new data or index page, it check if the page already exits. If the page does not exists (file is too short) or contains wrong checksum, then the recovery code will recreate the old page. The bug was that the code that checked if the page existed didn't take into account encrypted pages. Adding a check if page could not be encrypted solved the issue.

            Fix pushed to bb-10.4-monty for testing

            monty Michael Widenius added a comment - Fix pushed to bb-10.4-monty for testing

            There is a patch for this issue in 10.4+, so closing it.

            elenst Elena Stepanova added a comment - There is a patch for this issue in 10.4+, so closing it.

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.