Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19586

Replace recv_sys_t::addr_hash with a std::map

Details

    Description

      MDEV-12353 suggests one idea for improving InnoDB crash recovery speed: Read the to-be-recovered pages sorted by page number. Currently, recv_apply_hashed_log_recs() picks a ‘random’ page number from recv_sys->addr_hash and then reads a number of pages around that number.

      The most straightforward way to achieve this would seem to be to replace the hash table with an ordered data structure, such as std::map, which would automatically be iterated in an ordered fashion.

      While we are at it, we should merge the MDEV-12699 mlog_init_t to the same data structure.

      Attachments

        1. Remove_mlog_init.patch
          10 kB
          Marko Mäkelä
        2. Remove_recv_data_t.patch
          7 kB
          Marko Mäkelä

        Issue Links

          Activity

            kevg gave some valuable feedback:

            • std::forward_list uses a pointer to next member, but also a pointer to data. It would be better to store data inline, and pointer to next element.
            • recv_sys.pages does not need to be a pointer.

            Some more things remain to be done:

            • Remove mlog_init_t and store the information in recv_sys.pages.
            • Remove recv_data_copy_to_buf() and make the records in recv_sys.pages point directly to recv_sys.buf.
            marko Marko Mäkelä added a comment - kevg gave some valuable feedback: std::forward_list uses a pointer to next member, but also a pointer to data. It would be better to store data inline, and pointer to next element. recv_sys.pages does not need to be a pointer. Some more things remain to be done: Remove mlog_init_t and store the information in recv_sys.pages . Remove recv_data_copy_to_buf() and make the records in recv_sys.pages point directly to recv_sys.buf .

            It turns out that some cleanup has to be deferred.

            marko Marko Mäkelä added a comment - It turns out that some cleanup has to be deferred. Remove_mlog_init.patch is more feasible after MDEV-19514 has removed the call to mlog_init.ibuf_merge() . Remove_recv_data_t.patch could be feasible after modifying the format of redo log records ( MDEV-12353 ) and possibly also blocks ( MDEV-14425 ).
            marko Marko Mäkelä added a comment - - edited

            I evaluated the performance with the following simple benchmark:

            --source include/have_innodb.inc
            --source include/have_sequence.inc
            --source include/have_partition.inc
             
            SET GLOBAL innodb_flush_log_at_trx_commit=1;
            CREATE TABLE t1 (a SERIAL, b CHAR(255) NOT NULL)
            ENGINE=InnoDB
            PARTITION BY HASH(a) PARTITIONS 30
            SELECT NULL, 'b' FROM seq_1_to_1000000;
            SELECT now();
            --let $shutdown_timeout= 0
            --source include/shutdown_mysqld.inc
            --source include/start_mysqld.inc
            SELECT now();
            DROP TABLE t1;
            

            ./mtr --mysqld=--skip-innodb-buffer-pool-load-at-startup --mysqld=--innodb-buffer-pool-size=100m --mysqld=--innodb-page-size=4k innodb.recover
            

            To my surprise, recv_sys_t::add() turned out to become a bottleneck. It would spend a lot of time seeking to the end of the singly-linked list. Introducing the shortcut pointer recv_sys_t::recs_t::last fixed this bottleneck, and the performance became similar.

            I found the bottleneck by removing the line

            --source include/start_mysqld.inc
            

            (which would lead to the test failing), then backing up the data directory, and timing the recovery as follows:

            sudo perf record su marko -c '../sql/mysqld --skip-innodb-buffer-pool-load-at-startup --innodb-page-size=4k --innodb-buffer-pool-size=100m --lc-messages-dir=/dev/shm/10.5/sql/share/  --datadir=/dev/shm/10.5/mysql-test/var/log/innodb.recover-innodb/mysqld.1/data --shutdown'
            

            The last (invalid) option will cause the server to shut down immediately after InnoDB startup. Finally, I would analyze the data with

            sudo perf report
            

            This could be repeated as many times as needed, always restoring the data directory in between, so that on each execution, the exactly same redo log records will be processed.

            A larger recovery test on a bigger buffer pool, data set and possibly a hard disk would be needed to notice a significant performance difference.

            marko Marko Mäkelä added a comment - - edited I evaluated the performance with the following simple benchmark: --source include/have_innodb.inc --source include/have_sequence.inc --source include/have_partition.inc   SET GLOBAL innodb_flush_log_at_trx_commit=1; CREATE TABLE t1 (a SERIAL, b CHAR (255) NOT NULL ) ENGINE=InnoDB PARTITION BY HASH(a) PARTITIONS 30 SELECT NULL , 'b' FROM seq_1_to_1000000; SELECT now(); --let $shutdown_timeout= 0 --source include/shutdown_mysqld.inc --source include/start_mysqld.inc SELECT now(); DROP TABLE t1; ./mtr --mysqld=--skip-innodb-buffer-pool-load-at-startup --mysqld=--innodb-buffer-pool-size=100m --mysqld=--innodb-page-size=4k innodb.recover To my surprise, recv_sys_t::add() turned out to become a bottleneck. It would spend a lot of time seeking to the end of the singly-linked list. Introducing the shortcut pointer recv_sys_t::recs_t::last fixed this bottleneck, and the performance became similar. I found the bottleneck by removing the line --source include/start_mysqld.inc (which would lead to the test failing), then backing up the data directory, and timing the recovery as follows: sudo perf record su marko -c '../sql/mysqld --skip-innodb-buffer-pool-load-at-startup --innodb-page-size=4k --innodb-buffer-pool-size=100m --lc-messages-dir=/dev/shm/10.5/sql/share/ --datadir=/dev/shm/10.5/mysql-test/var/log/innodb.recover-innodb/mysqld.1/data --shutdown' The last (invalid) option will cause the server to shut down immediately after InnoDB startup. Finally, I would analyze the data with sudo perf report This could be repeated as many times as needed, always restoring the data directory in between, so that on each execution, the exactly same redo log records will be processed. A larger recovery test on a bigger buffer pool, data set and possibly a hard disk would be needed to notice a significant performance difference.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.