Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12700

Allow innodb_read_only startup without prior slow shutdown

Details

    Description

      The parameter innodb_read_only which first appeared in MySQL 5.6 and MariaDB Server 10.0 disallows startup if the redo log is not logically empty or if the change buffer is not empty.

      It is technically possible to apply the needed changes in memory only, without writing them back to the data files. If MariaDB does that, it will be possible to check hot backups in a read-only fashion without any preparatory steps.

      Attachments

        Issue Links

          Activity

            There are several things that slow shutdown would do, compared to simply killing the server:

            • Completely merge the change buffer.
            • Purge all old history of committed transactions. (innodb_read_only does not care whether this happened.)
            • Create a log checkpoint (effectively clear the recovery log).

            Even with MDEV-19514 implemented, we still refuse read-only startup when the change buffer is not empty. The test innodb.innodb-change-buffer-recovery covers this scenario.

            Another stumbling stone would be that the redo log might not be empty.

            We can fix both issues, by simply being prepared to apply buffered changes or redo log records at any time when an affected page is added to the buffer pool. We would not write any redo log or add anything to buf_pool_t::flush_list.

            Note: Obviously, we cannot remove anything from recv_sys.pages in this mode, and neither would we modify any change buffer tree or bitmap pages, not even in the buffer pool. If a page is evicted from the buffer pool and reloaded later, we must apply the buffered changes or log records again.

            We may have to refuse startup if the buffer pool is not sufficiently large for buffering all redo log records that were written since the latest checkpoint. The usable capacity of the buffer pool would depend on the size of log that was written since the previous checkpoint.

            marko Marko Mäkelä added a comment - There are several things that slow shutdown would do, compared to simply killing the server: Completely merge the change buffer. Purge all old history of committed transactions. ( innodb_read_only does not care whether this happened.) Create a log checkpoint (effectively clear the recovery log). Even with MDEV-19514 implemented, we still refuse read-only startup when the change buffer is not empty. The test innodb.innodb-change-buffer-recovery covers this scenario. Another stumbling stone would be that the redo log might not be empty. We can fix both issues, by simply being prepared to apply buffered changes or redo log records at any time when an affected page is added to the buffer pool. We would not write any redo log or add anything to buf_pool_t::flush_list . Note: Obviously, we cannot remove anything from recv_sys.pages in this mode, and neither would we modify any change buffer tree or bitmap pages, not even in the buffer pool. If a page is evicted from the buffer pool and reloaded later, we must apply the buffered changes or log records again. We may have to refuse startup if the buffer pool is not sufficiently large for buffering all redo log records that were written since the latest checkpoint. The usable capacity of the buffer pool would depend on the size of log that was written since the previous checkpoint.

            I think that the redo log related part of this depends on MDEV-14481: Execute InnoDB crash recovery in the background.

            marko Marko Mäkelä added a comment - I think that the redo log related part of this depends on MDEV-14481 : Execute InnoDB crash recovery in the background.

            Note: This limitation can manifest itself in non-obvious ways. Here is a recent example:

            10.6 4b2e7616f85c5297d64ce44f9202d39104c06df1

            sql_sequence.read_only 'innodb'          w4 [ fail ]
                    Test ended at 2022-12-14 05:02:02
             
            CURRENT_TEST: sql_sequence.read_only
            mysqltest: At line 26: query 'create sequence s2 cache 5 engine=innodb' failed with wrong errno ER_UNKNOWN_STORAGE_ENGINE (1286): 'Unknown storage engine 'innodb'', instead of ER_CANT_CREATE_TABLE (1005)...
            

            The server error log shows a more complete picture:

            CURRENT_TEST: sql_sequence.read_only
            2022-12-14  5:00:48 0 [Note] /usr/sbin/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
            221214  5:01:48 [ERROR] mysqld got signal 6 ;
            …
            2022-12-14  5:02:01 0 [Warning] InnoDB: innodb_read_only prevents crash recovery
            2022-12-14  5:02:01 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1463] with error Read only transaction
            

            For an unknown reason (due to some bug), the server process failed to shut down, and it the process was forcibly kill -ABRT’ed by the test harness.

            Subsequently, InnoDB refused to start up, because the option innodb_read_only prevents all writes, including any that might be executed by crash recovery.

            Once the change buffer is removed (MDEV-29694), this limitation could be easier to lift. I do not think that this limitation can be completely lifted. If the innodb_buffer_pool_size is too small to hold copies of all recovered pages (that is, multiple recovery batches would be needed), there is no way to start up the server without writing anything to the file system.

            marko Marko Mäkelä added a comment - Note: This limitation can manifest itself in non-obvious ways. Here is a recent example : 10.6 4b2e7616f85c5297d64ce44f9202d39104c06df1 sql_sequence.read_only 'innodb' w4 [ fail ] Test ended at 2022-12-14 05:02:02   CURRENT_TEST: sql_sequence.read_only mysqltest: At line 26: query 'create sequence s2 cache 5 engine=innodb' failed with wrong errno ER_UNKNOWN_STORAGE_ENGINE (1286): 'Unknown storage engine 'innodb'', instead of ER_CANT_CREATE_TABLE (1005)... The server error log shows a more complete picture: CURRENT_TEST: sql_sequence.read_only 2022-12-14 5:00:48 0 [Note] /usr/sbin/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown 221214 5:01:48 [ERROR] mysqld got signal 6 ; … 2022-12-14 5:02:01 0 [Warning] InnoDB: innodb_read_only prevents crash recovery 2022-12-14 5:02:01 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1463] with error Read only transaction For an unknown reason (due to some bug), the server process failed to shut down, and it the process was forcibly kill -ABRT ’ed by the test harness. Subsequently, InnoDB refused to start up, because the option innodb_read_only prevents all writes, including any that might be executed by crash recovery. Once the change buffer is removed ( MDEV-29694 ), this limitation could be easier to lift. I do not think that this limitation can be completely lifted. If the innodb_buffer_pool_size is too small to hold copies of all recovered pages (that is, multiple recovery batches would be needed), there is no way to start up the server without writing anything to the file system.

            Here is a recent occurrence of sql_sequence.read_only failure:

            10.5 c9eff1a144ba44846373660a30d342d3f0dc91a5

            CURRENT_TEST: sql_sequence.read_only
            mysqltest: At line 26: query 'create sequence s2 cache 5 engine=innodb' failed with wrong errno 1286: 'Unknown storage engine 'innodb'', instead of 1005...
             
            The result from queries just before the failure was:
            …
            #4  0x000055ff3bc813c1 in debug_sync (thd=0x7f4b50000db8, sync_point_name=0x55ff3f54cee9 "innobase_connection_closed", name_len=26) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/debug_sync.cc:1580
            #5  0x000055ff3e03ce4d in innobase_close_connection (hton=0x55ff45289a28, thd=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/storage/innobase/handler/ha_innodb.cc:4785
            #6  0x000055ff3cbe0581 in ha_close_connection (thd=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/handler.cc:902
            #7  0x000055ff3b8e59cc in THD::free_connection (this=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/sql_class.cc:1615
            #8  0x000055ff3b575937 in unlink_thd (thd=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/mysqld.cc:2625
            

            This stack trace is from a hang on shutdown, caused by the earlier test main.kill_debug that had been executed on the same worker earlier:

            main.kill_debug 'innodb'                 w3 [ pass ]    344
            

            This is being tracked as MDEV-30084.

            marko Marko Mäkelä added a comment - Here is a recent occurrence of sql_sequence.read_only failure : 10.5 c9eff1a144ba44846373660a30d342d3f0dc91a5 CURRENT_TEST: sql_sequence.read_only mysqltest: At line 26: query 'create sequence s2 cache 5 engine=innodb' failed with wrong errno 1286: 'Unknown storage engine 'innodb'', instead of 1005...   The result from queries just before the failure was: … #4 0x000055ff3bc813c1 in debug_sync (thd=0x7f4b50000db8, sync_point_name=0x55ff3f54cee9 "innobase_connection_closed", name_len=26) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/debug_sync.cc:1580 #5 0x000055ff3e03ce4d in innobase_close_connection (hton=0x55ff45289a28, thd=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/storage/innobase/handler/ha_innodb.cc:4785 #6 0x000055ff3cbe0581 in ha_close_connection (thd=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/handler.cc:902 #7 0x000055ff3b8e59cc in THD::free_connection (this=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/sql_class.cc:1615 #8 0x000055ff3b575937 in unlink_thd (thd=0x7f4b50000db8) at /home/buildbot/buildbot/build/mariadb-10.5.21/sql/mysqld.cc:2625 This stack trace is from a hang on shutdown, caused by the earlier test main.kill_debug that had been executed on the same worker earlier: main.kill_debug 'innodb' w3 [ pass ] 344 This is being tracked as MDEV-30084 .

            The change buffer related part of this was fixed in MDEV-29694, by removing the change buffer.

            The redo log related part will not be done. See MDEV-14481 for details.

            marko Marko Mäkelä added a comment - The change buffer related part of this was fixed in MDEV-29694 , by removing the change buffer. The redo log related part will not be done. See MDEV-14481 for details.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.