Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23635

Add notional delay (using existing ut_delay) while spinning for registering reader in rw-locks

Details

    Description

      • Currently when latches are acquired by the flow it spins w/o any wait before the latch is acquired. Check the logic here rw_lock_lock_word_decr
      • If we introduce a short delay to this logic using ut_delay before trying the next iteration it has shown to have +ve effect on performance.

      Attachments

        Issue Links

          Activity

            krunalbauskar, in MDEV-24167 and MDEV-24142 in 10.6, the old rw_lock_t was replaced with srw_lock and ssux_lock. Have you run benchmarks on that code?

            My idea to make S-latch acquisition a simple fetch_add() had to be abandoned, because it caused writer starvation (MDEV-24271). The compare-and-swap loop does look ugly, but I was not able to come up with anything better.

            marko Marko Mäkelä added a comment - krunalbauskar , in MDEV-24167 and MDEV-24142 in 10.6, the old rw_lock_t was replaced with srw_lock and ssux_lock . Have you run benchmarks on that code? My idea to make S-latch acquisition a simple fetch_add() had to be abandoned, because it caused writer starvation ( MDEV-24271 ). The compare-and-swap loop does look ugly, but I was not able to come up with anything better.

            @marko

            Not yet. First thing to try is plain 10.6 benchmark and then explore based on the finding.
            Said patch may still continue to help old releases like 10.4 if we plan to consider optimizing them.

            krunalbauskar Krunal Bauskar added a comment - @marko Not yet. First thing to try is plain 10.6 benchmark and then explore based on the finding. Said patch may still continue to help old releases like 10.4 if we plan to consider optimizing them.

            I think that it should be fine to add an ARM-specific optimization to older releases with an ARM-specific #ifdef.
            After the lesson of MDEV-23475 (and MDEV-24272), I would be wary to change anything on AMD64 in GA releases (10.5 or older). It could only be done after some extensive benchmarking, with different CPU microarchitectures. We know that the latency of the PAUSE instruction (which is a critical part of MY_RELAX_CPU or ut_delay on IA-32 and AMD64) has been drastically changed by Intel in Skylake, and possibly after that as well.

            marko Marko Mäkelä added a comment - I think that it should be fine to add an ARM-specific optimization to older releases with an ARM-specific #ifdef . After the lesson of MDEV-23475 (and MDEV-24272 ), I would be wary to change anything on AMD64 in GA releases (10.5 or older). It could only be done after some extensive benchmarking, with different CPU microarchitectures. We know that the latency of the PAUSE instruction (which is a critical part of MY_RELAX_CPU or ut_delay on IA-32 and AMD64) has been drastically changed by Intel in Skylake, and possibly after that as well.

            Somewhat related to this (and possibly improving the 10.6 implementation), https://rigtorp.se/spinlock/ discusses spinlocks (mutexes) claims that spinning on a read-modify-write instruction is less efficient than spinning on a read instruction. It is not immediately obvious to me how that could be applied to rw-locks, but maybe you could experiment with that, krunalbauskar?

            marko Marko Mäkelä added a comment - Somewhat related to this (and possibly improving the 10.6 implementation), https://rigtorp.se/spinlock/ discusses spinlocks (mutexes) claims that spinning on a read-modify-write instruction is less efficient than spinning on a read instruction. It is not immediately obvious to me how that could be applied to rw-locks, but maybe you could experiment with that, krunalbauskar ?

            @Marko,

            I went over the article. It suggests checking for the variable before attempting to write it. Fortunately, our server and most new generation system uses compare_exchange_strong there-by stimulating the same behavior but more efficiently at the processor level.


            just to add a note for a wider audience ... compare_exchange can be looked upon as 2 ops:
            compare (if the desired value is present) then try to exchange ... RARE event while the lock is being acquired
            compare (if the desired value is absent) return immediately .. common scenario leading to the loop.

            krunalbauskar Krunal Bauskar added a comment - @Marko, I went over the article. It suggests checking for the variable before attempting to write it. Fortunately, our server and most new generation system uses compare_exchange_strong there-by stimulating the same behavior but more efficiently at the processor level. just to add a note for a wider audience ... compare_exchange can be looked upon as 2 ops: compare (if the desired value is present) then try to exchange ... RARE event while the lock is being acquired compare (if the desired value is absent) return immediately .. common scenario leading to the loop.

            People

              Unassigned Unassigned
              krunalbauskar Krunal Bauskar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.