Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14374

UT_DELAY code : Removing hardware barrier for arm64 bit platform

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.3.2
    • Fix Version/s: 10.3.3
    • Component/s: Server
    • Labels:
    • Environment:
      Ubuntu 16.02 on Arm 64 bit platform

      Description

      would like to propose a change in the existing spin loop code in the function ut_delay () .Currently if the threads are not able to get the lock they spin for a while before going to sleep.
      The code is given below

      for (i = 0; i < delay * 50; i++) {
                              UT_RELAX_CPU(); Hardware barrier 
                              UT_COMPILER_BARRIER();Compiler barrier
                  }
      

      With default innodb_spin_wait_delay =6 the code loop for 300 times on a hardware barrier followed by a compiler barrier. I can understand compiler barrier so that function does not get optimized and it gives a small delay.

      # define UT_RELAX_CPU() do { \
           volatile int32      volatile_var; \
           int32 oldval= 0; \
           my_atomic_cas32(&volatile_var, &oldval, 1); \
         } while (0)
      #endif
       
      #if defined (__GNUC__)
      # define UT_COMPILER_BARRIER() __asm__ __volatile__ ("":::"memory").
      

      As Arm does not have Pause instruction Its not a wise idea to loop on a hardware barrier as well for increasing delay .We can get the same delay if we spin for around 200 times instead of default 6. Currently I don’t see any performance bottleneck with the existing code but in future if there is a good improvement in contention code the hardware barrier can cause unnecessary delay ( as barrier are supposed to be slow).

      In MYSQL/Percona we have only compiler barrier in place instead of hardware barrier. I did few testing using sysbench update_index and update_non index I see the performance
      Of default delay=6 with hardware barrier + compiler barrier is almost similar to delay=200 + compiler barrier.

      Note: If we use only compiler barrier with default delay=6 I see huge scalability issue from 64 threads onwards. It could be slowness of our contention code or missing equivalent pause instruction on arm platform.

      As the same performance can be achieved with a configuration change I would like to propose the removal of hardware barrier in a spin loop code.

      Please share your thoughts and request you to test the same .

      I see lot of discussion already on the community in this issue but for POWER PC.

        Attachments

          Activity

            People

            • Assignee:
              svoj Sergey Vojtovich
              Reporter:
              ssethia Sandeep sethia
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: