Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22108

Crash recovery fails with [ERROR] InnoDB: Malformed log record

Details

    Description

      Workflow of RQG test
       
      1. Create DB server and start it
      2. One session starts to run some SQL stream against this server
      3. At some point of time and not related to the state (waiting for result, sending statement, ...) of  the ongoing 2. the server process gets killed (KILL and not TERM).
      4. Make some copy of the data content+logs etc. of the killed server.
      5. Try to restart the killed server.
       
      5. fails with
      2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
      ...
      2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
      2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
      2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
      <some data>
      2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
      ...
      2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
      origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00
       
      MariaDB
      origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
      build with debug
       
      perl rqg.pl \ 
      --duration=120 \ 
      --queries=10000000 \
      --threads=1 \
      --no_mask \
      --seed=random \
      --gendata=conf/engines/many_indexes.zz \
      --rpl_mode=none \
      --engine=InnoDB \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--lock-wait-timeout=86400 \ 
      --mysqld=--log-output=none \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--log_bin_trust_function_creators=1 \ 
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--innodb_page_size=32K \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose_innodb_use_native_aio=1 \
      --mysqld=--log-bin \
      --mysqld=--net_read_timeout=30 \
      --reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
      --validators=None \
      --grammar=c00000.yy \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<path to MariaDB binaries> \
      --script_debug=_nix
      There might be several replay attempts required.
      

      Attachments

        1. 000196.log
          1.79 MB
        2. c00000.yy
          0.5 kB

        Issue Links

          Activity

            mleich Matthias Leich created issue -
            mleich Matthias Leich made changes -
            Field Original Value New Value
            Description {noformat}
            Workflow of RQG test

            1. Create DB server and start it
            2. One session starts to run some SQL stream against this server
            3. At some point of time and not related to the state (waiting for result, sending statement, ...) of the ongoing 2. the server process gets killed (KILL and not TERM).
            4. Make some copy of the data content+logs etc. of the killed server.
            5. Try to restart the killed server.

            5. fails with
            2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
            ...
            2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
            2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
            2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
            <some data>
            2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
            ...
            2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
            origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00

            MariaDB
            origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
            build with debug
            {noformat}
            {noformat}
            Workflow of RQG test

            1. Create DB server and start it
            2. One session starts to run some SQL stream against this server
            3. At some point of time and not related to the state (waiting for result, sending statement, ...) of the ongoing 2. the server process gets killed (KILL and not TERM).
            4. Make some copy of the data content+logs etc. of the killed server.
            5. Try to restart the killed server.

            5. fails with
            2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
            ...
            2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
            2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
            2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
            <some data>
            2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
            ...
            2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
            origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00

            MariaDB
            origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
            build with debug

            perl rqg.pl \
            --duration=120 \
            --queries=10000000 \
            --threads=1 \
            --no_mask \
            --seed=random \
            --gendata=conf/engines/many_indexes.zz \
            --rpl_mode=none \
            --engine=InnoDB \
            --mysqld=--innodb_stats_persistent=off \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--log-output=none \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--innodb_page_size=32K \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose_innodb_use_native_aio=1 \
            --mysqld=--log-bin \
            --mysqld=--net_read_timeout=30 \
            --reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
            --validators=None \
            --grammar=c00000.yy \
            --workdir=<local settings> \
            --vardir=<local settings> \
            --mtr-build-thread=764 \
            --basedir1=<path to MariaDB binaries> \
            --script_debug=_nix
            {noformat}
            mleich Matthias Leich made changes -
            Attachment c00000.yy [ 51047 ]
            mleich Matthias Leich made changes -
            Description {noformat}
            Workflow of RQG test

            1. Create DB server and start it
            2. One session starts to run some SQL stream against this server
            3. At some point of time and not related to the state (waiting for result, sending statement, ...) of the ongoing 2. the server process gets killed (KILL and not TERM).
            4. Make some copy of the data content+logs etc. of the killed server.
            5. Try to restart the killed server.

            5. fails with
            2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
            ...
            2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
            2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
            2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
            <some data>
            2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
            ...
            2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
            origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00

            MariaDB
            origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
            build with debug

            perl rqg.pl \
            --duration=120 \
            --queries=10000000 \
            --threads=1 \
            --no_mask \
            --seed=random \
            --gendata=conf/engines/many_indexes.zz \
            --rpl_mode=none \
            --engine=InnoDB \
            --mysqld=--innodb_stats_persistent=off \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--log-output=none \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--innodb_page_size=32K \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose_innodb_use_native_aio=1 \
            --mysqld=--log-bin \
            --mysqld=--net_read_timeout=30 \
            --reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
            --validators=None \
            --grammar=c00000.yy \
            --workdir=<local settings> \
            --vardir=<local settings> \
            --mtr-build-thread=764 \
            --basedir1=<path to MariaDB binaries> \
            --script_debug=_nix
            {noformat}
            {noformat}
            Workflow of RQG test

            1. Create DB server and start it
            2. One session starts to run some SQL stream against this server
            3. At some point of time and not related to the state (waiting for result, sending statement, ...) of the ongoing 2. the server process gets killed (KILL and not TERM).
            4. Make some copy of the data content+logs etc. of the killed server.
            5. Try to restart the killed server.

            5. fails with
            2020-03-31 16:17:25 0 [Note] /home/mleich/10.5/bld_debug/sql/mysqld (mysqld 10.5.2-MariaDB-debug-log) starting as process 111337 ...
            ...
            2020-03-31 16:17:25 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=174187931
            2020-03-31 16:17:25 0 [ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore.
            2020-03-31 16:17:25 0 [Note] InnoDB: Dump from the start of the mini-transaction (LSN=173906840) to 100 bytes after the record:
            <some data>
            2020-03-31 16:17:27 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1537] with error Generic error
            ...
            2020-03-31 16:17:27 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

            RQG
            git clone https://github.com/mleich1/rqg --branch experimental RQG_mleich1
            origin/experimental d417497b476e263428422b4640cf63b8e9d16afe 2020-03-30T17:17:54+02:00

            MariaDB
            origin/10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00
            build with debug

            perl rqg.pl \
            --duration=120 \
            --queries=10000000 \
            --threads=1 \
            --no_mask \
            --seed=random \
            --gendata=conf/engines/many_indexes.zz \
            --rpl_mode=none \
            --engine=InnoDB \
            --mysqld=--innodb_stats_persistent=off \
            --mysqld=--lock-wait-timeout=86400 \
            --mysqld=--log-output=none \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--connect_timeout=60 \
            --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
            --mysqld=--log_bin_trust_function_creators=1 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--innodb_page_size=32K \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--loose_innodb_use_native_aio=1 \
            --mysqld=--log-bin \
            --mysqld=--net_read_timeout=30 \
            --reporters=CrashRecovery1,Deadlock1,ErrorLog,None,ServerDead \
            --validators=None \
            --grammar=c00000.yy \
            --workdir=<local settings> \
            --vardir=<local settings> \
            --mtr-build-thread=<local settings> \
            --basedir1=<path to MariaDB binaries> \
            --script_debug=_nix
            There might be several replay attempts required.
            {noformat}
            mleich Matthias Leich made changes -
            Assignee Matthias Leich [ mleich ] Marko Mäkelä [ marko ]
            Summary Draft: Crash recovery fails with [ERROR] InnoDB: Malformed log record Crash recovery fails with [ERROR] InnoDB: Malformed log record
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            Does the following patch fix it?

            diff --git a/storage/innobase/include/mtr0log.h b/storage/innobase/include/mtr0log.h
            index d07fa069dfd..ee823caf6cb 100644
            --- a/storage/innobase/include/mtr0log.h
            +++ b/storage/innobase/include/mtr0log.h
            @@ -427,7 +427,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
                 if (oend + len > &log_ptr[16])
                 {
                   len+= oend - log_ptr - 15;
            -      if (len >= MIN_3BYTE)
            +      if (len >= MIN_3BYTE - 1)
                     len+= 2;
                   else if (len >= MIN_2BYTE)
                     len++;
            @@ -448,7 +448,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage,
               else if (len >= 3 && end + len > &log_ptr[16])
               {
                 len+= end - log_ptr - 15;
            -    if (len >= MIN_3BYTE)
            +    if (len >= MIN_3BYTE - 1)
                   len+= 2;
                 else if (len >= MIN_2BYTE)
                   len++;
            

            The record with the unfortunate value len == MIN_3BYTE - 1 was written by btr_page_reorganize_low().

            This bug is only possible with innodb_page_size=32k or innodb_page_size=64k. For smaller page sizes, we can never have that long redo log records, because for the length of a single record we have the constraint that log_phys_t::alloc_size(length) must not exceed innodb_page_size.

            marko Marko Mäkelä added a comment - Does the following patch fix it? diff --git a/storage/innobase/include/mtr0log.h b/storage/innobase/include/mtr0log.h index d07fa069dfd..ee823caf6cb 100644 --- a/storage/innobase/include/mtr0log.h +++ b/storage/innobase/include/mtr0log.h @@ -427,7 +427,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage, if (oend + len > &log_ptr[16]) { len+= oend - log_ptr - 15; - if (len >= MIN_3BYTE) + if (len >= MIN_3BYTE - 1) len+= 2; else if (len >= MIN_2BYTE) len++; @@ -448,7 +448,7 @@ inline byte *mtr_t::log_write(const page_id_t id, const buf_page_t *bpage, else if (len >= 3 && end + len > &log_ptr[16]) { len+= end - log_ptr - 15; - if (len >= MIN_3BYTE) + if (len >= MIN_3BYTE - 1) len+= 2; else if (len >= MIN_2BYTE) len++; The record with the unfortunate value len == MIN_3BYTE - 1 was written by btr_page_reorganize_low() . This bug is only possible with innodb_page_size=32k or innodb_page_size=64k . For smaller page sizes, we can never have that long redo log records, because for the length of a single record we have the constraint that log_phys_t::alloc_size(length) must not exceed innodb_page_size .
            marko Marko Mäkelä made changes -
            mleich Matthias Leich made changes -
            Comment [ Unfortunately it did not help.
            /home/mleich/RQG_RR/storage/1585841382/000177.tgz ]

            Yes
             
            Results of testing on origin/10.5 HEAD 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00 + Patch applied
            1. The test mentioned in the description (threads=1, pagesize=32K, grammar c00000.yy) replays the problem usually quite fast.
                 But now it seems to be (~ 350 RQG runs) no more capable to replay a failing crash recovery showing the following line in server error log 
                "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore"
            2. Test battery for covering a broad range of functionality (~ 1800 RQG runs):
                 None of the failing tests showed "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore"
            For 1. and 2.
                 Some of the RQG tests failed but in all cases because of other already known issues.
            

            mleich Matthias Leich added a comment - Yes   Results of testing on origin/10.5 HEAD 718f18599a9bcf1e7c2d3f18416fca4f7124d00d 2020-03-26T16:05:25+02:00 + Patch applied 1. The test mentioned in the description (threads=1, pagesize=32K, grammar c00000.yy) replays the problem usually quite fast. But now it seems to be (~ 350 RQG runs) no more capable to replay a failing crash recovery showing the following line in server error log "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore" 2. Test battery for covering a broad range of functionality (~ 1800 RQG runs): None of the failing tests showed "[ERROR] InnoDB: Malformed log record; set innodb_force_recovery=1 to ignore" For 1. and 2. Some of the RQG tests failed but in all cases because of other already known issues.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5.3 [ 24263 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Fixed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 106448 ] MariaDB v4 [ 157526 ]

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.