Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21816

Suboptimal implementation of my_convert() for ARM64

    XMLWordPrintable

Details

    Description

      my_convert() in strings/ctype.c has this special optimization for i386 and x86_64:

      #if defined(__i386__) || defined(__x86_64__)
        /*
          Special loop for i386, it allows to refer to a
          non-aligned memory block as UINT32, which makes
          it possible to copy four bytes at once. This
          gives about 10% performance improvement comparing
          to byte-by-byte loop.
        */
        for ( ; length >= 4; length-= 4, from+= 4, to+= 4)
        {
          if ((*(uint32*)from) & 0x80808080)
            break;
          *((uint32*) to)= *((const uint32*) from);
        }
      #endif /* __i386__ */
       
      ... /* Unoptimized bytewise processing goes here */
      

      Two things can be improved about that code:

      1. 64-bit architectures like x86_64 could be optimized even further by processing 8 bytes at a time;
      2. Other 64-bit architectures like aarch64 could also benefit from the same optimization, rather than process the input byte by byte.

      In our case we see a few percent improvement in CPU-bound sysbench OLTP RO on AArch64, which is not too bad for such a simple optimization.

      Attachments

        Activity

          People

            Unassigned Unassigned
            kaamos Alexey Kopytov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.