Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30289

mtr_t::m_memo is causing frequent memory heap operations

Details

    Description

      As part of fixing MDEV-29603, the implementation of mtr_t::m_memo was changed to std::vector. According to some performance test analysis by wlad, that caused some performance regression due to more frequent memory heap allocation.

      Function Name   Total CPU [unit, %] Self CPU [unit, %]  Module
      | - [External Call]  WriteFile    9928 (7.55%)    9928 (7.55%)    kernelbase
      | - [External Call]_malloc_base   5897 (4.49%)    5897 (4.49%)    ucrtbase
      | - int __cdecl MYSQLparse(class THD *)   8604 (6.55%)    4392 (3.34%)    server
      | - unsigned short * __cdecl rec_get_offsets_func(unsigned char const *,struct dict_index_t const *,unsigned short *,unsigned __int64,unsigned __int64,struct mem_block_info_t * *)   6175 (4.70%)    4384 (3.34%)    server
      | - bool __cdecl page_cur_search_with_match(struct dtuple_t const *,enum page_cur_mode_t,unsigned __int64 *,unsigned __int64 *,struct page_cur_t *,struct rtr_info *) 11572 (8.80%)   3397 (2.58%)    server
      | - struct buf_block_t * __cdecl buf_page_get_low(class page_id_t,unsigned __int64,unsigned __int64,struct buf_block_t *,unsigned __int64,struct mtr_t *,enum dberr_t *)  8014 (6.10%)    3048 (2.32%)    server
      | - [External Call]  _free_base   2852 (2.17%)    2852 (2.17%)    ucrtbase
      | - [External Call] ReadFile  2030 (1.54%)    2030 (1.54%)    kernelbase
      

      Ideally, we would want to use a collection that implements something similar to the "small string optimization" of std::string. That is, we should allocate some memory for small mtr_t::m_memo as part of m_mtr itself. Only if too many objects are being latched in the mini-transaction (say, when very deep index trees are being accessed) we should need heap-based allocation.

      LLVM includes SmallVector, which is exactly what we would seem to need. It is also available in Boost as small_vector.

      Attachments

        Issue Links

          Activity

            I ran some simple performance tests. Removing the pointer indirection for mtr_t::m_memo will not affect throughput much. Basically, it will only move the malloc() and free() calls from some member functions of mtr_t to the constructor and destructor. Almost every user of mtr_t will also use mtr_t::m_memo.

            Furthermore, I tested a port of llvm::SmallVector with two preallocated sizes.

            connections baseline de-pointer SmallVector(16) SmallVector(8)
            10 96093.53 95399.75 96537.66 95452.55
            20 143770.51 143196.66 146576.71 142992.60
            30 147982.33 147249.38 150137.63 147663.97

            As we can see, a fixed allocation of 8 elements does not make a difference to the throughput (transactions per second) in this simple benchmark, but 16 does. This was Sysbench oltp_update_index with 8×100,000 rows, on RAM disk, run for 120 seconds for each number of concurrent connections.

            In the refactoring that is needed for fixing MDEV-29835, we should be able to eliminate garbage entries that are currently being stored in mtr_t::m_memo. This should reduce the required preallocation size further. An index B-tree that is 7 levels deep could easily comprise 1000⁷ = 10²¹ pages. The files in this test were only 2304 pages, and the index tree height would be at most 3.

            marko Marko Mäkelä added a comment - I ran some simple performance tests. Removing the pointer indirection for mtr_t::m_memo will not affect throughput much. Basically, it will only move the malloc() and free() calls from some member functions of mtr_t to the constructor and destructor. Almost every user of mtr_t will also use mtr_t::m_memo . Furthermore, I tested a port of llvm::SmallVector with two preallocated sizes. connections baseline de-pointer SmallVector(16) SmallVector(8) 10 96093.53 95399.75 96537.66 95452.55 20 143770.51 143196.66 146576.71 142992.60 30 147982.33 147249.38 150137.63 147663.97 As we can see, a fixed allocation of 8 elements does not make a difference to the throughput (transactions per second) in this simple benchmark, but 16 does. This was Sysbench oltp_update_index with 8×100,000 rows, on RAM disk, run for 120 seconds for each number of concurrent connections. In the refactoring that is needed for fixing MDEV-29835 , we should be able to eliminate garbage entries that are currently being stored in mtr_t::m_memo . This should reduce the required preallocation size further. An index B-tree that is 7 levels deep could easily comprise 1000⁷ = 10²¹ pages. The files in this test were only 2304 pages, and the index tree height would be at most 3.

            A substantial part of MDEV-29835 was fixed in MDEV-30400. This avoids redundant calls to buf_page_get_low() as well as many redundant entries in mtr_t::m_memo.

            marko Marko Mäkelä added a comment - A substantial part of MDEV-29835 was fixed in MDEV-30400 . This avoids redundant calls to buf_page_get_low() as well as many redundant entries in mtr_t::m_memo .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.