Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20578

Got error 126 when executing undo undo_key_delete upon Aria crash recovery

Details

    Description

      The failure looks somewhat similar to MDEV-18203, but this one is reproducible on all 10.2+ and does not require encryption (and it's undo_key_delete instead of undo_key_insert), so I'll file it separately.

      10.2 5ec4efb7

      2019-09-13  2:23:05 140564631615296 [Note] mysqld: Aria engine: starting recovery
      recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (18.4 seconds); transactions to roll back: 1
      Got error 126 when executing undo undo_key_delete
      2019-09-13  2:23:23 140564631615296 [ERROR] mysqld: Aria engine: Undo phase failed
      tables to flush: 9 8 7 6 5 4 3 2 1 0
      2019-09-13  2:23:23 140564631615296 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
      2019-09-13  2:23:23 140564631615296 [ERROR] Plugin 'Aria' init function returned error.
      2019-09-13  2:23:23 140564631615296 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
      2019-09-13  2:23:23 140564631615296 [Note] Plugin 'FEEDBACK' is disabled.
      2019-09-13  2:23:23 140564631615296 [ERROR] Aria engine is not enabled or did not start. The Aria engine must be enabled to continue as mysqld was configured with --with-aria-tmp-tables
      2019-09-13  2:23:23 140564631615296 [ERROR] Aborting
      

      Attention: use the RQG branch as described below, it is tuned for reproducing the failure
      Remember to use the correct basedir, and modify vardir if necessary.
      The test fails almost every time for me, but sometimes it misses the mark, just re-run it if it didn't fail.

      git clone https://github.com/MariaDB/randgen --branch mdev20578 rqg-mdev20578
      cd rqg-mdev20578
      perl runall-new.pl --basedir=<basedir> --grammar=./mdev20578.yy --skip-gendata --gendata-advanced --duration=60 --threads=1 --seed=1 --reporter=CrashRecovery --vardir=/dev/shm/mdev20578
      

      Attachments

        Issue Links

          Activity

            On 10.5 built with S3, in addition to the above (which is still reproducible), the following assertion failure happens after the failed recovery:

            10.5 46553c25

            mysqld: /data/src/10.5/storage/maria/ha_s3.cc:681: int ha_s3_init(void*): Assertion `maria_hton' failed.
            190913  2:48:15 [ERROR] mysqld got signal 6 ;
             
            #7  0x00007f18dcf4ff12 in __GI___assert_fail (assertion=0x55eae119b75b "maria_hton", file=0x55eae119b678 "/data/src/10.5/storage/maria/ha_s3.cc", line=681, function=0x55eae119b850 <ha_s3_init(void*)::__PRETTY_FUNCTION__> "int ha_s3_init(void*)") at assert.c:101
            #8  0x000055eae0c78976 in ha_s3_init (p=0x55eae2fef500) at /data/src/10.5/storage/maria/ha_s3.cc:681
            #9  0x000055eae04e6633 in ha_initialize_handlerton (plugin=0x55eae2f23988) at /data/src/10.5/sql/handler.cc:550
            #10 0x000055eae01e2417 in plugin_initialize (tmp_root=0x7ffd0ca4dda0, plugin=0x55eae2f23988, argc=0x55eae1a75318 <remaining_argc>, argv=0x55eae2ebaae8, options_only=false) at /data/src/10.5/sql/sql_plugin.cc:1437
            #11 0x000055eae01e30bf in plugin_init (argc=0x55eae1a75318 <remaining_argc>, argv=0x55eae2ebaae8, flags=0) at /data/src/10.5/sql/sql_plugin.cc:1719
            #12 0x000055eae0098a28 in init_server_components () at /data/src/10.5/sql/mysqld.cc:5106
            #13 0x000055eae0099bf3 in mysqld_main (argc=14, argv=0x55eae2ebaae8) at /data/src/10.5/sql/mysqld.cc:5629
            #14 0x000055eae008e7d0 in main (argc=14, argv=0x7ffd0ca4e7e8) at /data/src/10.5/sql/main.cc:25
            

            elenst Elena Stepanova added a comment - On 10.5 built with S3, in addition to the above (which is still reproducible), the following assertion failure happens after the failed recovery: 10.5 46553c25 mysqld: /data/src/10.5/storage/maria/ha_s3.cc:681: int ha_s3_init(void*): Assertion `maria_hton' failed. 190913 2:48:15 [ERROR] mysqld got signal 6 ;   #7 0x00007f18dcf4ff12 in __GI___assert_fail (assertion=0x55eae119b75b "maria_hton", file=0x55eae119b678 "/data/src/10.5/storage/maria/ha_s3.cc", line=681, function=0x55eae119b850 <ha_s3_init(void*)::__PRETTY_FUNCTION__> "int ha_s3_init(void*)") at assert.c:101 #8 0x000055eae0c78976 in ha_s3_init (p=0x55eae2fef500) at /data/src/10.5/storage/maria/ha_s3.cc:681 #9 0x000055eae04e6633 in ha_initialize_handlerton (plugin=0x55eae2f23988) at /data/src/10.5/sql/handler.cc:550 #10 0x000055eae01e2417 in plugin_initialize (tmp_root=0x7ffd0ca4dda0, plugin=0x55eae2f23988, argc=0x55eae1a75318 <remaining_argc>, argv=0x55eae2ebaae8, options_only=false) at /data/src/10.5/sql/sql_plugin.cc:1437 #11 0x000055eae01e30bf in plugin_init (argc=0x55eae1a75318 <remaining_argc>, argv=0x55eae2ebaae8, flags=0) at /data/src/10.5/sql/sql_plugin.cc:1719 #12 0x000055eae0098a28 in init_server_components () at /data/src/10.5/sql/mysqld.cc:5106 #13 0x000055eae0099bf3 in mysqld_main (argc=14, argv=0x55eae2ebaae8) at /data/src/10.5/sql/mysqld.cc:5629 #14 0x000055eae008e7d0 in main (argc=14, argv=0x7ffd0ca4e7e8) at /data/src/10.5/sql/main.cc:25

            The issue with S3 is fixed a long time ago (just checked).

            monty Michael Widenius added a comment - The issue with S3 is fixed a long time ago (just checked).

            An example of the datadir stored after the server was killed, before a recovery attempt was made: ftp://ftp.askmonty.org/public/mdev20578_data.tar.gz
            It's a big archive, almost 1Gb.

            elenst Elena Stepanova added a comment - An example of the datadir stored after the server was killed, before a recovery attempt was made: ftp://ftp.askmonty.org/public/mdev20578_data.tar.gz It's a big archive, almost 1Gb.

            The crash happens in this scenario:

            • Table with unique keys and non unique keys
            • Batch insert (LOAD DATA or INSERT ... SELECT) with REPLACE
            • Some insert succeeds followed by duplicate key error

            In the above scenario the table gets corrupted.

            The bug was that we don't generate any undo entry for the
            failed insert as the whole insert can be ignored by undo.
            The code did however not take into account that when bulk
            insert is used, we would write cached keys to the file on
            failure and undo would wrongly ignore these.

            Fixed by moving the writing of the cache keys after we write
            the aborted-insert event to the log.

            monty Michael Widenius added a comment - The crash happens in this scenario: Table with unique keys and non unique keys Batch insert (LOAD DATA or INSERT ... SELECT) with REPLACE Some insert succeeds followed by duplicate key error In the above scenario the table gets corrupted. The bug was that we don't generate any undo entry for the failed insert as the whole insert can be ignored by undo. The code did however not take into account that when bulk insert is used, we would write cached keys to the file on failure and undo would wrongly ignore these. Fixed by moving the writing of the cache keys after we write the aborted-insert event to the log.

            People

              monty Michael Widenius
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.