Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22720

Improving performance of my_hash_sort_utf8(mb4)/my_strnncollsp*utf8(mb4)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.5.3
    • N/A
    • Server

    Description

      According to our performance investigations, these functions are hot when we
      use utf8 collation:
      1. my_hash_sort_utf8
      2. my_hash_sort_utf8mb4
      3. my_strnncollsp_utf8
      4. my_strnncollsp_utf8mb4

      Since the commit https://github.com/mysql/mysql-server/commit/67ce24796584
      from 2004 skipping trailing spaces in my_hash_sort_utf8 has been implemented
      via naive chopping spaces one after another from the end of the string:

      while (e > s && e[-1] == ' ')
          e--;
      

      We know that for CHAR(XXX) utf8 fields all unused space is filled with spaces.
      So, using brand new skip_trailing_space for my_hash_sort_utf8 function
      significantly increase performance here. See for a reference:
      https://github.com/mysql/mysql-server/commit/4fd18025f46e

      The performance increase achives by skipping not 1 but 8 spaces for every
      CPU cycle.

      For my_strnncollsp_utf8 function these approach can't be applied as is.
      However, we proved, than aggressive forward space skipping still improves
      performance with no degradation in all other cases. Following this idea,
      we introduce additional function for forward space skipping:

      /**
        Simultaneously skip space for two strings (ASCII spaces only).
        Small special routine function for my_strnncollsp_utf8(mb4) functions
      */
      static inline void skip_space(const uchar **sp, const uchar **tp,
                                    const uchar *const se, const uchar *const te) {
        while (*sp + 8 < se && *tp + 8 < te) {
          uint64_t s, t;
          memcpy(&s, *sp, 8);
          memcpy(&t, *tp, 8);
          if (s != 0x2020202020202020ULL || t != 0x2020202020202020ULL) break;
       
          *sp += 8;
          *tp += 8;
        }
        while (*sp < se && *tp < te && **sp == 0x20 && **tp == 0x20) {
          ++*sp;
          ++*tp;
        }
      }
      

      After that every time when we run into two simultaneous spaces while comparing
      two strings in my_strnncollsp_utf8 function, we invoke skip_space instead of
      slow utf8 processing code:

      while (s < se && t < te) {
        /* aggressive space skipping improves performance */
        if (*s == ' ' && *t == ' ') {
          skip_space(&s, &t, se, te);
          continue;
        }
        ... // utf8 processing
      }
      

      Using this simple approach we achived significant performance improvement in
      the case when two strings are equal (more than 7 times faster for CHAR(120)).

      If we speak about general database performance, we improve single-threaded
      sysbench OLTP RO by up to 2.5%.

      Suggested fix:
      1. For my_hash_sort_utf8(mb4) - use skip_trailing_space as is.
      2. For my_strnncollsp_utf8(mb4) - use aggressive forward space skipping right
      in the main loop which processes utf8 characters one by one.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov added a comment - - edited

            dmitriy.philimonov, thanks for you feedback.

            I've created a separate issue MDEV-22849 for the my_hash_sort_utf8xxx related improvement and pushed it to 10.5. The coming soon 10.5.4 release will including this change.

            For the my_strnncollsp_utf8 change, is there a chance to get your patch for unittest/gunit?
            We'd like to run various benchmarks with this change.
            As it is useful mostly for the non-default sql_mode=PAD_CHAR_TO_FULL_LENGTH, we need to make sure that this change does not introduce performance degradation for the default sql_mode, when PAD_CHAR_TO_FULL_LENGTH is not enabled.

            Thanks for your contribution!

            bar Alexander Barkov added a comment - - edited dmitriy.philimonov , thanks for you feedback. I've created a separate issue MDEV-22849 for the my_hash_sort_utf8xxx related improvement and pushed it to 10.5. The coming soon 10.5.4 release will including this change. For the my_strnncollsp_utf8 change, is there a chance to get your patch for unittest/gunit? We'd like to run various benchmarks with this change. As it is useful mostly for the non-default sql_mode=PAD_CHAR_TO_FULL_LENGTH, we need to make sure that this change does not introduce performance degradation for the default sql_mode, when PAD_CHAR_TO_FULL_LENGTH is not enabled. Thanks for your contribution!

            Dear Alexander,

            I'm glad that I was allowed to share the benchmark code with you. It uses unittest/gunit mini-benchmark engine from gtest. If somehow you don't have this engine, it's quite simple to write something similar based on std::chrono:

            #define BENCHMARK(func) \
              TEST(Microbenchmarks, func) { internal_do_microbenchmark(#func, func); }
             
            void internal_do_microbenchmark(const char *name, void (*func)(size_t)) {
            //calibration phase
              func(calibration_iterations);
            //estimation phase
              func(num_iterations);
            //print time spent in the func
            }
            

            The benchmark emulates CHAR(120) field. The field is filled with English/Russian/Chinese letters (1/2/3 utf8 bytes per character). The test consists of the cases where the field is filled fully/partially + the first non-equal letter is placed in the beginning/middle/end, or the strings are fully equal.

            I hope this will help. Put the code in unittest/gunit directory, resolve dependencies, estimate the results.

            #include "benchmark.h"
            namespace NMiniBenchmark {
              static size_t bytesSize = 360;
              static size_t charsSize = 120;
              static std::string GenerateStringEnglish(){
                std::string s(bytesSize,' ');
                for(size_t i=0;i<charsSize;++i) s[i]='A';         // 1 byte
                return s;
              }
              static std::string GenerateStringEnglishQuater(){
                std::string s(bytesSize,' '); //360 bytes
                // only 30 out of 120 characters are filled
                for(size_t i=0;i<charsSize/4;++i) s[i]='A';
                return s;
              }
              static std::string GenerateStringEnglish40(){
                std::string s = GenerateStringEnglish();
                s[40]='B'; return s;
              }
              static std::string GenerateStringEnglishZ(){
                std::string s(bytesSize,' ');
                for(size_t i=0;i<charsSize;++i) s[i]='Z';         // 1 byte
                return s;
              }
              static std::string GenerateStringRussian(){
                std::string s(bytesSize,' '); std::string ya="Б"; // 2 bytes
                for(size_t i=0,j=0;i<charsSize;++i,j+=2) memcpy(&s[j], ya.data(), ya.size());
                return s;
              }
              static std::string GenerateStringRussian40(){
                std::string s = GenerateStringRussian(); std::string ya="Б"; // 2 bytes
                memcpy(&s[80], ya.data(), ya.size()); return s;
              }
              static std::string GenerateStringRussianZ(){
                std::string s(bytesSize,' '); std::string ya="Я"; // 2 bytes
                for(size_t i=0,j=0;i<charsSize;++i,j+=2) memcpy(&s[j], ya.data(), ya.size());
                return s;
              }
              static std::string GenerateStringChinese(){
                std::string s(bytesSize,' '); std::string ya="我"; // 3 bytes
                for(size_t i=0,j=0;i<charsSize;++i,j+=3) memcpy(&s[j], ya.data(), ya.size());
                return s;
              }
              static std::string GenerateStringChinese40(){
                std::string s = GenerateStringChinese(); std::string ya="是"; // 3 bytes
                memcpy(&s[120],ya.data(),ya.size()); return s;
              }
             
              static std::string GenerateStringChineseZ(){
                std::string s(bytesSize,' '); std::string ya="是"; // 3 bytes
                for(size_t i=0,j=0;i<charsSize;++i,j+=3) memcpy(&s[j], ya.data(), ya.size());
                return s;
              }
             
              static std::string _English = GenerateStringEnglish();
              static std::string _English40 = GenerateStringEnglish40();
              static std::string _EnglishQ = GenerateStringEnglishQuater();
              static std::string _Russian = GenerateStringRussian();
              static std::string _Russian40 = GenerateStringRussian40();
              static std::string _Chinese = GenerateStringChinese();
              static std::string _Chinese40 = GenerateStringChinese40();
              static std::string _EnglishZ = GenerateStringEnglishZ();
              static std::string _RussianZ = GenerateStringRussianZ();
              static std::string _ChineseZ = GenerateStringChineseZ();
             
              TEST(SkipTrailingSpaceImprovement,CHECK_SIZES){
                std::string yarus="Я";    // 2 bytes
                std::string yachina="我"; // 3 bytes
                ASSERT_EQ(yarus.size(),2u);
                ASSERT_EQ(yachina.size(),3u);
                ASSERT_EQ(skip_trailing_space((const uchar*)_Russian.data(),_Russian.size())-(const uchar*)_Russian.data(), 240u);
                ASSERT_EQ(skip_trailing_space((const uchar*)_Chinese.data(),_Chinese.size())-(const uchar*)_Chinese.data(), 360u);
              }
             
              #define STRNNCOLLSP(i1,i2) \
              void STRNNCOLLSP_##i1 ##i2(size_t iterations){ \
                for(size_t i=0;i<iterations;++i) \
                  my_charset_utf8_general_ci.coll->strnncollsp(&my_charset_utf8_general_ci, \
                    (const uchar*)i1.c_str(),i1.size(), \
                    (const uchar*)i2.c_str(),i2.size(), \
                    0); \
              } \
              BENCHMARK(STRNNCOLLSP_##i1 ##i2)
             
             
              STRNNCOLLSP(_EnglishQ,_EnglishQ)
              STRNNCOLLSP(_English,_English)
              STRNNCOLLSP(_English,_English40)
              STRNNCOLLSP(_English,_EnglishZ)
              STRNNCOLLSP(_Russian,_Russian)
              STRNNCOLLSP(_Russian,_Russian40)
              STRNNCOLLSP(_Russian,_RussianZ)
              STRNNCOLLSP(_Chinese,_Chinese)
              STRNNCOLLSP(_Chinese,_Chinese40)
              STRNNCOLLSP(_Chinese,_ChineseZ)
            }
            

            Sincerely yours,
            Dmitriy Philimonov

            dmitriy.philimonov Dmitriy Philimonov added a comment - Dear Alexander, I'm glad that I was allowed to share the benchmark code with you. It uses unittest/gunit mini-benchmark engine from gtest. If somehow you don't have this engine, it's quite simple to write something similar based on std::chrono: #define BENCHMARK(func) \ TEST(Microbenchmarks, func) { internal_do_microbenchmark(#func, func); }   void internal_do_microbenchmark( const char *name, void (*func)( size_t )) { //calibration phase func(calibration_iterations); //estimation phase func(num_iterations); //print time spent in the func } The benchmark emulates CHAR(120) field. The field is filled with English/Russian/Chinese letters (1/2/3 utf8 bytes per character). The test consists of the cases where the field is filled fully/partially + the first non-equal letter is placed in the beginning/middle/end, or the strings are fully equal. I hope this will help. Put the code in unittest/gunit directory, resolve dependencies, estimate the results. #include "benchmark.h" namespace NMiniBenchmark { static size_t bytesSize = 360; static size_t charsSize = 120; static std::string GenerateStringEnglish(){ std::string s(bytesSize, ' ' ); for ( size_t i=0;i<charsSize;++i) s[i]= 'A' ; // 1 byte return s; } static std::string GenerateStringEnglishQuater(){ std::string s(bytesSize, ' ' ); //360 bytes // only 30 out of 120 characters are filled for ( size_t i=0;i<charsSize/4;++i) s[i]= 'A' ; return s; } static std::string GenerateStringEnglish40(){ std::string s = GenerateStringEnglish(); s[40]= 'B' ; return s; } static std::string GenerateStringEnglishZ(){ std::string s(bytesSize, ' ' ); for ( size_t i=0;i<charsSize;++i) s[i]= 'Z' ; // 1 byte return s; } static std::string GenerateStringRussian(){ std::string s(bytesSize, ' ' ); std::string ya= "Б" ; // 2 bytes for ( size_t i=0,j=0;i<charsSize;++i,j+=2) memcpy (&s[j], ya.data(), ya.size()); return s; } static std::string GenerateStringRussian40(){ std::string s = GenerateStringRussian(); std::string ya= "Б" ; // 2 bytes memcpy (&s[80], ya.data(), ya.size()); return s; } static std::string GenerateStringRussianZ(){ std::string s(bytesSize, ' ' ); std::string ya= "Я" ; // 2 bytes for ( size_t i=0,j=0;i<charsSize;++i,j+=2) memcpy (&s[j], ya.data(), ya.size()); return s; } static std::string GenerateStringChinese(){ std::string s(bytesSize, ' ' ); std::string ya= "我" ; // 3 bytes for ( size_t i=0,j=0;i<charsSize;++i,j+=3) memcpy (&s[j], ya.data(), ya.size()); return s; } static std::string GenerateStringChinese40(){ std::string s = GenerateStringChinese(); std::string ya= "是" ; // 3 bytes memcpy (&s[120],ya.data(),ya.size()); return s; }   static std::string GenerateStringChineseZ(){ std::string s(bytesSize, ' ' ); std::string ya= "是" ; // 3 bytes for ( size_t i=0,j=0;i<charsSize;++i,j+=3) memcpy (&s[j], ya.data(), ya.size()); return s; }   static std::string _English = GenerateStringEnglish(); static std::string _English40 = GenerateStringEnglish40(); static std::string _EnglishQ = GenerateStringEnglishQuater(); static std::string _Russian = GenerateStringRussian(); static std::string _Russian40 = GenerateStringRussian40(); static std::string _Chinese = GenerateStringChinese(); static std::string _Chinese40 = GenerateStringChinese40(); static std::string _EnglishZ = GenerateStringEnglishZ(); static std::string _RussianZ = GenerateStringRussianZ(); static std::string _ChineseZ = GenerateStringChineseZ();   TEST(SkipTrailingSpaceImprovement,CHECK_SIZES){ std::string yarus= "Я" ; // 2 bytes std::string yachina= "我" ; // 3 bytes ASSERT_EQ(yarus.size(),2u); ASSERT_EQ(yachina.size(),3u); ASSERT_EQ(skip_trailing_space(( const uchar*)_Russian.data(),_Russian.size())-( const uchar*)_Russian.data(), 240u); ASSERT_EQ(skip_trailing_space(( const uchar*)_Chinese.data(),_Chinese.size())-( const uchar*)_Chinese.data(), 360u); }   #define STRNNCOLLSP(i1,i2) \ void STRNNCOLLSP_##i1 ##i2( size_t iterations){ \ for ( size_t i=0;i<iterations;++i) \ my_charset_utf8_general_ci.coll->strnncollsp(&my_charset_utf8_general_ci, \ ( const uchar*)i1.c_str(),i1.size(), \ ( const uchar*)i2.c_str(),i2.size(), \ 0); \ } \ BENCHMARK(STRNNCOLLSP_##i1 ##i2)     STRNNCOLLSP(_EnglishQ,_EnglishQ) STRNNCOLLSP(_English,_English) STRNNCOLLSP(_English,_English40) STRNNCOLLSP(_English,_EnglishZ) STRNNCOLLSP(_Russian,_Russian) STRNNCOLLSP(_Russian,_Russian40) STRNNCOLLSP(_Russian,_RussianZ) STRNNCOLLSP(_Chinese,_Chinese) STRNNCOLLSP(_Chinese,_Chinese40) STRNNCOLLSP(_Chinese,_ChineseZ) } Sincerely yours, Dmitriy Philimonov

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            julien.fritsch Julien Fritsch added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.
            JIraAutomate JiraAutomate added a comment -

            Automated message:
            ----------------------------
            Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            JIraAutomate JiraAutomate added a comment - Automated message: ---------------------------- Since this issue has not been updated since 6 weeks, it's time to move it back to Stalled.

            After implementing MDEV-26572 in 10.7, these changes are not needed any more.

            bar Alexander Barkov added a comment - After implementing MDEV-26572 in 10.7, these changes are not needed any more.

            People

              bar Alexander Barkov
              dmitriy.philimonov Dmitriy Philimonov
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.