[MDEV-23399] 10.5 performance regression with IO-bound tpcc - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.4
Fix Version/s: 10.5.7
Component/s: Storage Engine - InnoDB
Labels:
- performance
- regression

Description

Triggered by this blog post from Percona.
Problem could be reproduced with sysbench-tpcc and Percona settings. Buffer pool 25G for 100G data set (1000 warehouses). Datadir located on SSD. tpcc workload with 32 benchmark threads on hardware with 16 cores/32 hyperthreads.
Throughput starts high and then decreases over a varying time period (500 .. 1200 seconds) to reach ~200 tps. Performance schema shows lots of time spent with buf_pool_mutex. CPU usage of the mariadbd process is rather low around 300%.
MySQL 8.0 does not show that problem. MariaDB 10.5.4 performs better than pre-10.5.5 shapshot.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

my.cnf.01
0.7 kB
2020-08-04 12:56

Issue Links

blocks

MDEV-12227 Defer writes to the InnoDB temporary tablespace

Closed

MDEV-14481 Execute InnoDB crash recovery in the background

Closed

MDEV-16526 Overhaul the InnoDB page flushing

Closed

MDEV-23720 Change innodb_log_optimize_ddl=OFF by default

Closed

MDEV-23756 Implement event-driven innodb_adaptive_flushing=OFF that ignores innodb_io_capacity

Open

causes

MDEV-25072 Hang + Heap Corruption: SIGABRT in __libc_message from _int_free on DBUGCloseFile

Closed

MDEV-25773 sysbench-TPCC failed to prepare with extremely slow execution

Closed

MDEV-25801 DROP DATABASE very slow after innodb undo log truncate

Closed

MDEV-27022 Buffer pool is being flushed during recovery

Closed

MDEV-28371 Assertion fold == id.fold() failed in buf_flush_check_neighbor()

Closed

is blocked by

MDEV-23410 buf_LRU_scan_and_free_block() fails to stop at first freed block

Closed

is caused by

MDEV-15053 Reduce buf_pool_t::mutex contention

Closed

MDEV-15058 Remove multiple InnoDB buffer pool instances

Closed

relates to

MDEV-11384 AliSQL: [Feature] Issue#19 BUFFER POOL LIST SCAN OPTIMIZATION

Closed

MDEV-21452 Use condition variables and normal mutexes instead of InnoDB os_event and mutex

Closed

MDEV-23719 Make lock_sys use page_id_t

Closed

MDEV-23855 InnoDB log checkpointing causes regression for write-heavy OLTP

Closed

MDEV-24022 mariabackup.undo_space_id failed in buildbot

Closed

MDEV-24278 InnoDB page cleaner keeps waking up on idle server

Closed

MDEV-24913 Assertion !recv_no_log_write in log_write_up_to()

Closed

MDEV-26004 Excessive wait times in buf_LRU_get_free_block()

Closed

MDEV-27466 Deadlock when purge thread races the flush thread for page lock under Innodb redo log space pressure

Closed

MDEV-28052 test main.implicit_commit crashed on sparc64

Closed

MDEV-28415 ALTER TABLE on a large table hangs InnoDB

Closed

MDEV-13670 [Note] InnoDB: page_cleaner: 1000ms intended loop took XXXXms. The settings might not be optimal. (flushed=0 and evicted=0, during the time.)

Closed

MDEV-14550 Error log flood : "InnoDB: page_cleaner: 1000ms intended loop took N ms. The settings might not be optimal."

Closed

MDEV-21330 Lock monitor doesn't print a semaphore's last reserved thread in non-debug builds and INFORMATION_SCHEMA.INNODB_SYS_SEMAPHORE_WAITS is totally broken

Closed

MDEV-24024 innodb.ibuf_not_empty failed in buildbot

Closed

links to

Percona Server 5.7: multi-threaded LRU flushing

(5 causes, 1 is blocked by, 2 is caused by, 15 relates to, 1 links to)

Activity

Ascending order - Click to sort in descending order

View 17 older comments

Marko Mäkelä added a comment - 2020-10-02 09:04

wlad, please review the squashed commit.

axel and krunalbauskar, please test the performance. I think that we must deal with ~~MDEV-23855~~ separately.

mleich, please run the wide battery of stress tests. In previous tests more than a week ago, some corruption or crashes on crash recovery occurred. I believe that the problem may have been fixed since then.

Marko Mäkelä added a comment - 2020-10-02 09:04 wlad , please review the squashed commit . axel and krunalbauskar , please test the performance. I think that we must deal with MDEV-23855 separately. mleich , please run the wide battery of stress tests. In previous tests more than a week ago, some corruption or crashes on crash recovery occurred. I believe that the problem may have been fixed since then.

Bernardo Perez added a comment - 2020-10-15 17:22

Hello,

I can see that this JIRA has moved to "stalled". I was wondering if it could be possible to have an understanding on the current state of the fix and a target minor version were this fix will arrive.

Thanks in advance.

Bernardo Perez added a comment - 2020-10-15 17:22 Hello, I can see that this JIRA has moved to "stalled". I was wondering if it could be possible to have an understanding on the current state of the fix and a target minor version were this fix will arrive. Thanks in advance.

Marko Mäkelä added a comment - 2020-10-15 18:06

This scenario (write-heavy workload that does not fit in the buffer pool) was addressed by rewriting most of the page cleaner thread and page flushing, by simplifying related data structures and reducing mutex operations. LRU flushing will now only be initiated by user threads, and the page cleaner thread will perform solely checkpoint-related flushing. There is no single-page flushing anymore, and the page cleaner will not wait for log writes or page latches.

Performance will be improved further in ~~MDEV-23855~~ for write-heavy cases where all data does fit in the buffer pool. Among other things, that will remove contention on fil_system.mutex between the page cleaner and threads executing write completion callbacks. The work is mostly done.

Marko Mäkelä added a comment - 2020-10-15 18:06 This scenario (write-heavy workload that does not fit in the buffer pool) was addressed by rewriting most of the page cleaner thread and page flushing, by simplifying related data structures and reducing mutex operations. LRU flushing will now only be initiated by user threads, and the page cleaner thread will perform solely checkpoint-related flushing. There is no single-page flushing anymore, and the page cleaner will not wait for log writes or page latches. Performance will be improved further in MDEV-23855 for write-heavy cases where all data does fit in the buffer pool. Among other things, that will remove contention on fil_system.mutex between the page cleaner and threads executing write completion callbacks. The work is mostly done.

Yap Sok Ann added a comment - 2021-03-08 03:59

... Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit d1ab89037a518fcffbc50c24e4bd94e4ec33aed0 unless log_warnings>2.

Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked.

This seems like a very nice design, but I have some concern about how it was done previously, and how it is still being done in the latest MySQL/Percona:

Call log_write_up_to() with the newest LSN of the modified page
Write out the modified page

As the block mutex is not held, does it mean that in between step 1 and step 2, some mtr can always further modify the page with a newer LSN?

If that's the case, a crash after step 2 would mean that the data files are now ahead of the redo log. What would be the consequences of that?

Sorry if this is a noob question. I am rather interested about innodb page flushing performance, and after trying to understand the code a little (still stuck with PXC 5.6 here), I am really curious what's the point of step 1 if it can't guarantee anything.

Yap Sok Ann added a comment - 2021-03-08 03:59 ... Furthermore, if the FIL_PAGE_LSN of a page is ahead of log_sys.get_flushed_lsn(), that is, what has been persistently written to the redo log, we would trigger a log flush and then resume the page flushing. This would unnecessarily limit the performance of the page cleaner thread and trigger the infamous messages "InnoDB: page_cleaner: 1000ms intended loop took 4450ms. The settings might not be optimal" that were suppressed in commit d1ab89037a518fcffbc50c24e4bd94e4ec33aed0 unless log_warnings>2. Our revised algorithm will make log_sys.get_flushed_lsn() advance at the start of buf_flush_lists(), and then execute a 'best effort' to write out all pages. The flush batches will skip pages that were modified since the log was written, or are are currently exclusively locked. This seems like a very nice design, but I have some concern about how it was done previously, and how it is still being done in the latest MySQL/Percona: Call log_write_up_to() with the newest LSN of the modified page Write out the modified page As the block mutex is not held, does it mean that in between step 1 and step 2, some mtr can always further modify the page with a newer LSN? If that's the case, a crash after step 2 would mean that the data files are now ahead of the redo log. What would be the consequences of that? Sorry if this is a noob question. I am rather interested about innodb page flushing performance, and after trying to understand the code a little (still stuck with PXC 5.6 here), I am really curious what's the point of step 1 if it can't guarantee anything.

Marko Mäkelä added a comment - 2021-08-17 07:54

sayap, sorry, I did not notice your comment. Generally, https://mariadb.zulipchat.com/ would be a better platform for such discussions.

In ~~MDEV-25948~~ we actually backtracked a little and removed the log_flush_task that would potentially reduce the amount of calls to log_flush_up_to(). There were several improvements to page flushing performance in MariaDB 10.5.12 and 10.6.4, and our testing in ~~MDEV-25451~~ is indicating rather stable throughput.

The block mutex was removed already in ~~MDEV-15053~~. I suppose that you mean the page latch? I think that we always hold the page latch when writing out a modified page. Before we write it, we will ensure that the FIL_PAGE_LSN is not ahead the durable position of the write-ahead log. Page writes are generally optional (~~MDEV-24626~~ removed the last exception). Only for log checkpoints, we must advance the MIN(oldest_modification) by page writes.

Marko Mäkelä added a comment - 2021-08-17 07:54 sayap , sorry, I did not notice your comment. Generally, https://mariadb.zulipchat.com/ would be a better platform for such discussions. In MDEV-25948 we actually backtracked a little and removed the log_flush_task that would potentially reduce the amount of calls to log_flush_up_to() . There were several improvements to page flushing performance in MariaDB 10.5.12 and 10.6.4, and our testing in MDEV-25451 is indicating rather stable throughput. The block mutex was removed already in MDEV-15053 . I suppose that you mean the page latch? I think that we always hold the page latch when writing out a modified page. Before we write it, we will ensure that the FIL_PAGE_LSN is not ahead the durable position of the write-ahead log. Page writes are generally optional ( MDEV-24626 removed the last exception). Only for log checkpoints, we must advance the MIN(oldest_modification) by page writes.

People

Assignee:: Marko Mäkelä

Reporter:: Axel Schwenke

Votes:: 7 Vote for this issue

Watchers:: 22 Start watching this issue

Dates

Created:: 2020-08-04 12:55

Updated:: 2024-09-06 10:01

Resolved:: 2020-10-15 18:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration