[MDEV-22849] Reuse skip_trailing_space() in my_hash_sort_utf8mbX Created: 2020-06-10  Updated: 2020-06-10  Resolved: 2020-06-10

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Task Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-22720 Improving performance of my_hash_sort... Stalled

 Description   

Replace the slow loop in my_hash_sort_utf8mbX() to the fast skip_trailing_spaces(), which consumes 8 bytes in one iteration, as follows:

diff --git a/strings/ctype-utf8.c b/strings/ctype-utf8.c
index b8e71b1f7a9..7434f968383 100644
--- a/strings/ctype-utf8.c
+++ b/strings/ctype-utf8.c
@@ -4992,13 +4992,11 @@ static void my_hash_sort_utf8mb3_nopad(CHARSET_INFO *cs, const uchar *s, size_t
 static void my_hash_sort_utf8mb3(CHARSET_INFO *cs, const uchar *s, size_t slen,
                                  ulong *nr1, ulong *nr2)
 {
-  const uchar *e= s+slen;
   /*
     Remove end space. We have to do this to be able to compare
     'A ' and 'A' as identical
   */
-  while (e > s && e[-1] == ' ')
-    e--;
+  const uchar *e= skip_trailing_space(s, slen);
   my_hash_sort_utf8mb3_nopad(cs, s, e - s, nr1, nr2);
 }
 
@@ -7436,13 +7434,11 @@ static void
 my_hash_sort_utf8mb4(CHARSET_INFO *cs, const uchar *s, size_t slen,
                      ulong *nr1, ulong *nr2)
 {
-  const uchar *e= s + slen;
   /*
     Remove end space. We do this to be able to compare
     'A ' and 'A' as identical
   */
-  while (e > s && e[-1] == ' ')
-    e--;
+  const uchar *e= skip_trailing_space(s, slen);
   my_hash_sort_utf8mb4_nopad(cs, s, e - s, nr1, nr2);
 }


Generated at Thu Feb 08 09:17:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.