Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14374

UT_DELAY code : Removing hardware barrier for arm64 bit platform

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.3.2
    • 10.3.3
    • Server
    • Ubuntu 16.02 on Arm 64 bit platform

    Description

      would like to propose a change in the existing spin loop code in the function ut_delay () .Currently if the threads are not able to get the lock they spin for a while before going to sleep.
      The code is given below

      for (i = 0; i < delay * 50; i++) {
                              UT_RELAX_CPU(); Hardware barrier 
                              UT_COMPILER_BARRIER();Compiler barrier
                  }
      

      With default innodb_spin_wait_delay =6 the code loop for 300 times on a hardware barrier followed by a compiler barrier. I can understand compiler barrier so that function does not get optimized and it gives a small delay.

      # define UT_RELAX_CPU() do { \
           volatile int32      volatile_var; \
           int32 oldval= 0; \
           my_atomic_cas32(&volatile_var, &oldval, 1); \
         } while (0)
      #endif
       
      #if defined (__GNUC__)
      # define UT_COMPILER_BARRIER() __asm__ __volatile__ ("":::"memory").
      

      As Arm does not have Pause instruction Its not a wise idea to loop on a hardware barrier as well for increasing delay .We can get the same delay if we spin for around 200 times instead of default 6. Currently I don’t see any performance bottleneck with the existing code but in future if there is a good improvement in contention code the hardware barrier can cause unnecessary delay ( as barrier are supposed to be slow).

      In MYSQL/Percona we have only compiler barrier in place instead of hardware barrier. I did few testing using sysbench update_index and update_non index I see the performance
      Of default delay=6 with hardware barrier + compiler barrier is almost similar to delay=200 + compiler barrier.

      Note: If we use only compiler barrier with default delay=6 I see huge scalability issue from 64 threads onwards. It could be slowness of our contention code or missing equivalent pause instruction on arm platform.

      As the same performance can be achieved with a configuration change I would like to propose the removal of hardware barrier in a spin loop code.

      Please share your thoughts and request you to test the same .

      I see lot of discussion already on the community in this issue but for POWER PC.

      Attachments

        Issue Links

          Activity

            Sergey,

            Tried both the patch and its does not work well on Arm platform. I see tps drop by 40% for 64,128 and 256 threads in sysbench oltp_update_index.lua. test . I did a test for 20 tables with 100000 records in each tables.

            Can share more details but it the same setup which I sent an email to for tpds drop issue.

            ssethia Sandeep sethia added a comment - Sergey, Tried both the patch and its does not work well on Arm platform. I see tps drop by 40% for 64,128 and 256 threads in sysbench oltp_update_index.lua. test . I did a test for 20 tables with 100000 records in each tables. Can share more details but it the same setup which I sent an email to for tpds drop issue.

            ssethia, thanks for looking into this. I guess it may happen because InnoDB rwlocks need similar change. Now I have access to ARM hardware and can do some testing myself.

            svoj Sergey Vojtovich added a comment - ssethia , thanks for looking into this. I guess it may happen because InnoDB rwlocks need similar change. Now I have access to ARM hardware and can do some testing myself.

            Can you push my changes for the barrier as your patch may require lot of testing.

            ssethia Sandeep sethia added a comment - Can you push my changes for the barrier as your patch may require lot of testing.

            ssethia, two things:
            1. Either you or me should modify this patch as Marko suggested in PR. If you take it, please also mention in PR comment that you contribute this patch under terms of BSD-new license.
            2. Could you confirm it didn't cause performance regression with default innodb-spin-wait-delay (which is 6)?

            svoj Sergey Vojtovich added a comment - ssethia , two things: 1. Either you or me should modify this patch as Marko suggested in PR. If you take it, please also mention in PR comment that you contribute this patch under terms of BSD-new license. 2. Could you confirm it didn't cause performance regression with default innodb-spin-wait-delay (which is 6)?

            Verified that it didn't cause regressions on ARM with default settings.
            Applied Marko's version of the patch.

            svoj Sergey Vojtovich added a comment - Verified that it didn't cause regressions on ARM with default settings. Applied Marko's version of the patch.

            People

              svoj Sergey Vojtovich
              ssethia Sandeep sethia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.