Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24789

Performance regression after MDEV-24671

Details

    Description

      The fix of MDEV-24671 introduced a serious performance regression that is observable at 32 concurrent connections.

      My current hypothesis based on some initial investigation is that the changed sizeof(trx->lock) caused an increase of cache misses.

      Attachments

        Issue Links

          Activity

            Reducing sizeof(trx_lock_t) did not help (I did it anyway), but some refactoring to reduce the hold time of lock_sys.mutex and lock_sys.wait_mutex seems to have done the trick. Thanks to axel for helping with the analysis and validation!

            marko Marko Mäkelä added a comment - Reducing sizeof(trx_lock_t) did not help (I did it anyway), but some refactoring to reduce the hold time of lock_sys.mutex and lock_sys.wait_mutex seems to have done the trick. Thanks to axel for helping with the analysis and validation!

            Some performance regression is still present. The function lock_wait() that was introduced in MDEV-24671 was holding lock_sys.wait_mutex for unnecessarily long time.

            marko Marko Mäkelä added a comment - Some performance regression is still present. The function lock_wait() that was introduced in MDEV-24671 was holding lock_sys.wait_mutex for unnecessarily long time.

            Even after reducing the lock_sys.wait_mutex hold time in lock_wait() to the minimum, some regression is present, and more work needs to be done.

            marko Marko Mäkelä added a comment - Even after reducing the lock_sys.wait_mutex hold time in lock_wait() to the minimum, some regression is present, and more work needs to be done.

            It looks like MDEV-24671 emphasized a pre-existing bottleneck on log_sys.mutex, which we hope to address in MDEV-14425. The remaining performance regression was only observed on RAM disk, not on real storage.

            That said, the latest change that is being tested should reduce contention on lock_sys.latch and lock_sys.wait_mutex to the absolute minimum.

            marko Marko Mäkelä added a comment - It looks like MDEV-24671 emphasized a pre-existing bottleneck on log_sys.mutex , which we hope to address in MDEV-14425 . The remaining performance regression was only observed on RAM disk, not on real storage. That said, the latest change that is being tested should reduce contention on lock_sys.latch and lock_sys.wait_mutex to the absolute minimum.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.