[MDEV-19586] Replace recv_sys_t::addr_hash with a std::map - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.5.0
Component/s: Storage Engine - InnoDB
Labels:
- performance
- recovery

Description

~~MDEV-12353~~ suggests one idea for improving InnoDB crash recovery speed: Read the to-be-recovered pages sorted by page number. Currently, recv_apply_hashed_log_recs() picks a ‘random’ page number from recv_sys->addr_hash and then reads a number of pages around that number.

The most straightforward way to achieve this would seem to be to replace the hash table with an ordered data structure, such as std::map, which would automatically be iterated in an ordered fashion.

While we are at it, we should merge the ~~MDEV-12699~~ mlog_init_t to the same data structure.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Remove_mlog_init.patch
2019-06-10 13:18
10 kB
Marko Mäkelä
Remove_recv_data_t.patch
2019-06-10 13:18
7 kB
Marko Mäkelä

Issue Links

relates to

MDEV-23380 InnoDB reads a page from disk despite parsing MLOG_INIT_FILE_PAGE2 record

Closed

MDEV-12353 Efficient InnoDB redo log record format

Closed

MDEV-12699 Improve crash recovery of corrupted data pages

Closed

MDEV-27326 Mariabackup being overwhelmed during the prepare phase while using 32GB of memory

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...

(3 mentioned in)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2019-05-31 19:15

kevg gave some valuable feedback:

std::forward_list uses a pointer to next member, but also a pointer to data. It would be better to store data inline, and pointer to next element.
recv_sys.pages does not need to be a pointer.

Some more things remain to be done:

Remove mlog_init_t and store the information in recv_sys.pages.
Remove recv_data_copy_to_buf() and make the records in recv_sys.pages point directly to recv_sys.buf.

Marko Mäkelä added a comment - 2019-05-31 19:15 kevg gave some valuable feedback: std::forward_list uses a pointer to next member, but also a pointer to data. It would be better to store data inline, and pointer to next element. recv_sys.pages does not need to be a pointer. Some more things remain to be done: Remove mlog_init_t and store the information in recv_sys.pages . Remove recv_data_copy_to_buf() and make the records in recv_sys.pages point directly to recv_sys.buf .

Marko Mäkelä added a comment - 2019-06-10 13:24

It turns out that some cleanup has to be deferred.

Remove_mlog_init.patch is more feasible after ~~MDEV-19514~~ has removed the call to mlog_init.ibuf_merge().
Remove_recv_data_t.patch could be feasible after modifying the format of redo log records (~~MDEV-12353~~) and possibly also blocks (~~MDEV-14425~~).

Marko Mäkelä added a comment - 2019-06-10 13:24 It turns out that some cleanup has to be deferred. Remove_mlog_init.patch is more feasible after MDEV-19514 has removed the call to mlog_init.ibuf_merge() . Remove_recv_data_t.patch could be feasible after modifying the format of redo log records ( MDEV-12353 ) and possibly also blocks ( MDEV-14425 ).

Marko Mäkelä added a comment - 2019-06-11 07:58 - edited

I evaluated the performance with the following simple benchmark:

--source include/have_innodb.inc

--source include/have_sequence.inc

--source include/have_partition.inc

SET GLOBAL innodb_flush_log_at_trx_commit=1;

CREATE TABLE t1 (a SERIAL, b CHAR(255) NOT NULL)

ENGINE=InnoDB

PARTITION BY HASH(a) PARTITIONS 30

SELECT NULL, 'b' FROM seq_1_to_1000000;

SELECT now();

--let $shutdown_timeout= 0

--source include/shutdown_mysqld.inc

--source include/start_mysqld.inc

SELECT now();

DROP TABLE t1;

./mtr --mysqld=--skip-innodb-buffer-pool-load-at-startup --mysqld=--innodb-buffer-pool-size=100m --mysqld=--innodb-page-size=4k innodb.recover

To my surprise, recv_sys_t::add() turned out to become a bottleneck. It would spend a lot of time seeking to the end of the singly-linked list. Introducing the shortcut pointer recv_sys_t::recs_t::last fixed this bottleneck, and the performance became similar.

I found the bottleneck by removing the line

--source include/start_mysqld.inc

(which would lead to the test failing), then backing up the data directory, and timing the recovery as follows:

sudo perf record su marko -c '../sql/mysqld --skip-innodb-buffer-pool-load-at-startup --innodb-page-size=4k --innodb-buffer-pool-size=100m --lc-messages-dir=/dev/shm/10.5/sql/share/  --datadir=/dev/shm/10.5/mysql-test/var/log/innodb.recover-innodb/mysqld.1/data --shutdown'

The last (invalid) option will cause the server to shut down immediately after InnoDB startup. Finally, I would analyze the data with

sudo perf report

This could be repeated as many times as needed, always restoring the data directory in between, so that on each execution, the exactly same redo log records will be processed.

A larger recovery test on a bigger buffer pool, data set and possibly a hard disk would be needed to notice a significant performance difference.

Marko Mäkelä added a comment - 2019-06-11 07:58 - edited I evaluated the performance with the following simple benchmark: --source include/have_innodb.inc --source include/have_sequence.inc --source include/have_partition.inc SET GLOBAL innodb_flush_log_at_trx_commit=1; CREATE TABLE t1 (a SERIAL, b CHAR (255) NOT NULL ) ENGINE=InnoDB PARTITION BY HASH(a) PARTITIONS 30 SELECT NULL , 'b' FROM seq_1_to_1000000; SELECT now(); --let $shutdown_timeout= 0 --source include/shutdown_mysqld.inc --source include/start_mysqld.inc SELECT now(); DROP TABLE t1; ./mtr --mysqld=--skip-innodb-buffer-pool-load-at-startup --mysqld=--innodb-buffer-pool-size=100m --mysqld=--innodb-page-size=4k innodb.recover To my surprise, recv_sys_t::add() turned out to become a bottleneck. It would spend a lot of time seeking to the end of the singly-linked list. Introducing the shortcut pointer recv_sys_t::recs_t::last fixed this bottleneck, and the performance became similar. I found the bottleneck by removing the line --source include/start_mysqld.inc (which would lead to the test failing), then backing up the data directory, and timing the recovery as follows: sudo perf record su marko -c '../sql/mysqld --skip-innodb-buffer-pool-load-at-startup --innodb-page-size=4k --innodb-buffer-pool-size=100m --lc-messages-dir=/dev/shm/10.5/sql/share/ --datadir=/dev/shm/10.5/mysql-test/var/log/innodb.recover-innodb/mysqld.1/data --shutdown' The last (invalid) option will cause the server to shut down immediately after InnoDB startup. Finally, I would analyze the data with sudo perf report This could be repeated as many times as needed, always restoring the data directory in between, so that on each execution, the exactly same redo log records will be processed. A larger recovery test on a bigger buffer pool, data set and possibly a hard disk would be needed to notice a significant performance difference.

MariaDB Server

Replace recv_sys_t::addr_hash with a std::map

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration