Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL)
    • 10.5, 10.6
    • Storage Engine - Aria
    • None

    Description

      We have a number of bug reports related to Aria recovery problems, with different representation of said problems. After the first analysis performed by Monty on some of them, it appears they have a lot of common, first of all the fact that even though the data directory on which the recovery issue can be reproduced is available, it is not sufficient for fixing the issue, and a complete test case causing the initial corruption is needed. These test cases are concurrent and non-deterministic by nature, and quite often by just re-running the same test, we hit various representations of the recovery problem. Thus, i think it makes sense to group all these issues together, as one fix is likely to fix several bugs, and at the same time, while working on one bug, developers/testers are likely to have to deal with other ones.

      Actual bug reports are to be made subtasks of this one. They will be handled and closed as normal bug reports. The umbrella report will stay open until there are no open subtasks left.

      Examples of observed recovery issues from the subtasks:

      MDEV-18310

      Got error 121 when executing undo undo_key_delete
      

      MDEV-18203

      Got error 126 when executing undo undo_key_insert
      

      MDEV-20578

      Got error 126 when executing undo undo_key_delete
      

      MDEV-18187

      2019-01-09 16:00:40 0 [ERROR] mysqld: failed to decrypt './test/t7'  rc: -1  dstlen: 0  size: 4294967275
       Got error 192 when executing record redo_index_new_page
      

      MDEV-17912

      2018-12-05 18:38:33 0 [ERROR] mysqld: failed to decrypt './test/oltp46'  rc: -1  dstlen: 0  size: 8172
      Got error 192 when executing record redo_new_row_head
      

      MDEV-19576

      Got error 175 when executing record redo_index
      

      MDEV-19576

      Got error 175 when executing undo undo_row_insert
      

      MDEV-19718

      mysqld: /data/src/10.3/storage/maria/ma_blockrec.c:6358: _ma_apply_redo_insert_row_head_or_tail: Assertion `rownr == 0 && new_page' failed.
      

      MDEV-18461

      mysqld: /data/src/10.4/storage/maria/ma_loghandler.c:3862: translog_init_with_table: Assertion `sure_page <= last_page' failed.
      

      MDEV-18461

      mysqld: /home/travis/src/storage/maria/ma_blockrec.c:2879: write_block_record: Assertion `undo_lsn == ((LSN)1) || head_length == row_pos->length' failed.
      

      MDEV-18461

      Got error 176 when executing record redo_insert_row_head
      

      MDEV-20132

      Assertion `info->new_row.checksum == (*share->calc_checksum)(info, current_record)' failed
      

      Attachments

        Issue Links

          Activity

            MDEV-18203 now has an MTR test case (non-deterministic).

            elenst Elena Stepanova added a comment - MDEV-18203 now has an MTR test case (non-deterministic).

            I believe that every release of MariaDB Server is affected by this. Basically, any test that kills the server may fail due to a low-probability failure of Aria recovery. Here is a recent example:

            10.5 c0470caf5a80e69ad7d855a871c62cf72dc03b05

            main.grant_kill                          w14 [ fail ]
                    Test ended at 2022-09-06 08:42:12
            CURRENT_TEST: main.grant_kill
            Failed to start mysqld.1
            mysqltest failed but provided no output
             - saving '/home/buildbot/amd64-fedora-35/build/mysql-test/var/14/log/main.grant_kill/' to '/home/buildbot/amd64-fedora-35/build/mysql-test/var/log/main.grant_kill/'
            Retrying test main.grant_kill, attempt(2/3)...
            worker[14] > Restart  - not started
            ***Warnings generated in error logs during shutdown after running tests: main.grant_kill
            2022-09-06  8:42:12 0 [ERROR] mariadbd: Aria recovery failed. Please run aria_chk -r on all Aria tables (*.MAI) and delete all aria_log.######## files
            2022-09-06  8:42:12 0 [ERROR] Plugin 'Aria' init function returned error.
            2022-09-06  8:42:12 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
            2022-09-06  8:42:12 0 [ERROR] Could not open mysql.plugin table: "Unknown storage engine 'Aria'". Some plugins may be not loaded
            2022-09-06  8:42:12 0 [ERROR] Failed to initialize plugins.
            2022-09-06  8:42:12 0 [ERROR] Aborting
            main.func_like                           w9 [ pass ]     16
            main.default_storage_engine              w5 [ pass ]    472
            main.grant_kill                          w14 [ retry-pass ]      9
            

            It may make sense to produce rr replay traces of some of the failures (covering both the intentionally killed server and the failed recovery) so that they can be reliably debugged. For InnoDB recovery failures, having only a copy of the data directory that fails to start up is only half of the story. Often or usually, the actual problem resides on the ‘write’ side, and recovery only sees some corrupted input files.

            marko Marko Mäkelä added a comment - I believe that every release of MariaDB Server is affected by this. Basically, any test that kills the server may fail due to a low-probability failure of Aria recovery. Here is a recent example : 10.5 c0470caf5a80e69ad7d855a871c62cf72dc03b05 main.grant_kill w14 [ fail ] Test ended at 2022-09-06 08:42:12 CURRENT_TEST: main.grant_kill Failed to start mysqld.1 mysqltest failed but provided no output - saving '/home/buildbot/amd64-fedora-35/build/mysql-test/var/14/log/main.grant_kill/' to '/home/buildbot/amd64-fedora-35/build/mysql-test/var/log/main.grant_kill/' Retrying test main.grant_kill, attempt(2/3)... worker[14] > Restart - not started ***Warnings generated in error logs during shutdown after running tests: main.grant_kill 2022-09-06 8:42:12 0 [ERROR] mariadbd: Aria recovery failed. Please run aria_chk -r on all Aria tables (*.MAI) and delete all aria_log.######## files 2022-09-06 8:42:12 0 [ERROR] Plugin 'Aria' init function returned error. 2022-09-06 8:42:12 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed. 2022-09-06 8:42:12 0 [ERROR] Could not open mysql.plugin table: "Unknown storage engine 'Aria'". Some plugins may be not loaded 2022-09-06 8:42:12 0 [ERROR] Failed to initialize plugins. 2022-09-06 8:42:12 0 [ERROR] Aborting main.func_like w9 [ pass ] 16 main.default_storage_engine w5 [ pass ] 472 main.grant_kill w14 [ retry-pass ] 9 It may make sense to produce rr replay traces of some of the failures (covering both the intentionally killed server and the failed recovery) so that they can be reliably debugged. For InnoDB recovery failures, having only a copy of the data directory that fails to start up is only half of the story. Often or usually, the actual problem resides on the ‘write’ side, and recovery only sees some corrupted input files.

            People

              elenst Elena Stepanova
              elenst Elena Stepanova
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.