Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22108

Crash recovery fails with [ERROR] InnoDB: Malformed log record

Details

    Description

      Workflow of RQG test
       
      1. Create DB server and start it
      2. One session starts to run some SQL stream against this server
      3. At some point of time and not related to the state (waiting for result, sending statement, ...) of  the ongoing 2. the server process gets killed (KILL and not TERM).
      4. Make some copy of the data content+logs etc. of the killed server.
      5. Try to restart the killed server.
       
      5. fails with
      2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
      ...
      2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
      2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
      2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
      <some data>
      2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
      ...
      2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
      origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00
       
      MariaDB
      origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
      build with debug
       
      perl rqg.pl \ 
      --duration=120 \ 
      --queries=10000000 \
      --threads=1 \
      --no_mask \
      --seed=random \
      --gendata=conf/engines/many_indexes.zz \
      --rpl_mode=none \
      --engine=InnoDB \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--lock-wait-timeout=86400 \ 
      --mysqld=--log-output=none \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--log_bin_trust_function_creators=1 \ 
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--innodb_page_size=32K \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose_innodb_use_native_aio=1 \
      --mysqld=--log-bin \
      --mysqld=--net_read_timeout=30 \
      --reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
      --validators=None \
      --grammar=c00000.yy \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<path to MariaDB binaries> \
      --script_debug=_nix
      There might be several replay attempts required.
      

      Attachments

        1. 000196.log
          1.79 MB
        2. c00000.yy
          0.5 kB

        Issue Links

          Activity

            Does the following patch fix it?

            diff --git a/storage/innobase/include/mtr0log.h b/storage/innobase/include/mtr0log.h
            index d07fa069dfd..ee823caf6cb 100644
            --- a/storage/innobase/include/mtr0log.h
            +++ b/storage/innobase/include/mtr0log.h
            @@ -427,7 +427,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
                 if (oend + len > &log_ptr[16])
                 {
                   len+= oend - log_ptr - 15;
            -      if (len >= MIN_3BYTE)
            +      if (len >= MIN_3BYTE - 1)
                     len+= 2;
                   else if (len >= MIN_2BYTE)
                     len++;
            @@ -448,7 +448,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
               else if (len >= 3 && end + len > &log_ptr[16])
               {
                 len+= end - log_ptr - 15;
            -    if (len >= MIN_3BYTE)
            +    if (len >= MIN_3BYTE - 1)
                   len+= 2;
                 else if (len >= MIN_2BYTE)
                   len++;
            

            The record with the unfortunate value len == MIN_3BYTE - 1 was written by btr_page_reorganize_low().

            This bug is only possible with innodb_page_size=32k or innodb_page_size=64k. For smaller page sizes, we can never have that long redo log records, because for the length of a single record we have the constraint that log_phys_t::alloc_size(length) must not exceed innodb_page_size.

            marko Marko Mäkelä added a comment - Does the following patch fix it? diff --git a/storage/innobase/include/mtr0log.h b/storage/innobase/include/mtr0log.h index d07fa069dfd..ee823caf6cb 100644 --- a/storage/innobase/include/mtr0log.h +++ b/storage/innobase/include/mtr0log.h @@ -427,7 +427,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage, if (oend + len > &log_ptr[16]) { len+= oend - log_ptr - 15; - if (len >= MIN_3BYTE) + if (len >= MIN_3BYTE - 1) len+= 2; else if (len >= MIN_2BYTE) len++; @@ -448,7 +448,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage, else if (len >= 3 && end + len > &log_ptr[16]) { len+= end - log_ptr - 15; - if (len >= MIN_3BYTE) + if (len >= MIN_3BYTE - 1) len+= 2; else if (len >= MIN_2BYTE) len++; The record with the unfortunate value len == MIN_3BYTE - 1 was written by btr_page_reorganize_low() . This bug is only possible with innodb_page_size=32k or innodb_page_size=64k . For smaller page sizes, we can never have that long redo log records, because for the length of a single record we have the constraint that log_phys_t::alloc_size(length) must not exceed innodb_page_size .

            Yes
             
            Results of testing on origin/10.5 HEAD 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00 + Patch applied
            1. The test mentioned in the description (threads=1, pagesize=32K, grammar c00000.yy) replays the problem usually quite fast.
                 But now it seems to be (~ 350 RQG runs) no more capable to replay a failing crash recovery showing the following line in server error log 
                "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore"
            2. Test battery for covering a broad range of functionality (~ 1800 RQG runs):
                 None of the failing tests showed "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore"
            For 1. and 2.
                 Some of the RQG tests failed but in all cases because of other already known issues.
            

            mleich Matthias Leich added a comment - Yes   Results of testing on origin/10.5 HEAD 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00 + Patch applied 1. The test mentioned in the description (threads=1, pagesize=32K, grammar c00000.yy) replays the problem usually quite fast. But now it seems to be (~ 350 RQG runs) no more capable to replay a failing crash recovery showing the following line in server error log "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore" 2. Test battery for covering a broad range of functionality (~ 1800 RQG runs): None of the failing tests showed "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore" For 1. and 2. Some of the RQG tests failed but in all cases because of other already known issues.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.