[MDEV-26609] Avoid deriving ELEMENT_PER_LATCH from cacheline Created: 2021-09-15  Updated: 2021-09-17  Resolved: 2021-09-17

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.6.5, 10.7.0

Type: Task Priority: Minor
Reporter: Krunal Bauskar Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: ARM

Attachments: PNG File 2numa - read-write workload.png     PNG File element-per-latch.png    
Issue Links:
Relates
relates to MDEV-22871 Contention on the buf_pool.page_hash Closed

 Description   
  • buffer pool has latches that protect access to pages.
  • there is a latch per N pages.
      (check page_hash_table for more details)
  • N is calculated based on the cacheline size.
  • for example: if cacheline size is
      : 64 then 7 pages pointers + 1 latch can be hosted on the same cacheline
      : 128 then 15 pages pointers + 1 latch can be hosted on the same cacheline
  • arm generally have wider cacheline so with arm 1 latch is used
      to access 15 pages vs with x86 1 latch is used to access 7 pages.
      Naturally, the contention is more with arm case.
  • said patch help relax this contention by limiting the elements
    per cacheline to 7 (+ 1 latch slot).
      for wider-cacheline (say 128), the remaining 8 slots are kept empty.
      this ensures there are no 2 latches on the same cacheline to avoid
    latch level contention.

----------
Said patch has shown improvement in performance in range of 2-5%.



 Comments   
Comment by Krunal Bauskar [ 2021-09-15 ]

patch submitted through the pr: https://github.com/MariaDB/server/pull/1910

Comment by Marko Mäkelä [ 2021-09-16 ]

I see that the cache line size is 128 bytes also on POWER, and 256 bytes on s390x. I hope that danblack can assess the performance impact on those architectures.

In 10.6, there is also lock_sys_t::hash_table::ELEMENTS_PER_LATCH that we may want to change in the same way.

This change would not affect AMD64 or IA-32 or other systems where the cache line size is 64 bytes.

Comment by Krunal Bauskar [ 2021-09-17 ]

As suggested, I tried extending the patch to lock_sys::hash_table and continued to observe a performance difference. Improvement for read-write is 5-6%. Check the attached graph.

Generated at Thu Feb 08 09:46:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.