Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.5.3
Description
According to our performance investigations, these functions are hot when we
use utf8 collation:
1. my_hash_sort_utf8
2. my_hash_sort_utf8mb4
3. my_strnncollsp_utf8
4. my_strnncollsp_utf8mb4
Since the commit https://github.com/mysql/mysql-server/commit/67ce24796584
from 2004 skipping trailing spaces in my_hash_sort_utf8 has been implemented
via naive chopping spaces one after another from the end of the string:
while (e > s && e[-1] == ' ') |
e--;
|
We know that for CHAR(XXX) utf8 fields all unused space is filled with spaces.
So, using brand new skip_trailing_space for my_hash_sort_utf8 function
significantly increase performance here. See for a reference:
https://github.com/mysql/mysql-server/commit/4fd18025f46e
The performance increase achives by skipping not 1 but 8 spaces for every
CPU cycle.
For my_strnncollsp_utf8 function these approach can't be applied as is.
However, we proved, than aggressive forward space skipping still improves
performance with no degradation in all other cases. Following this idea,
we introduce additional function for forward space skipping:
/**
|
Simultaneously skip space for two strings (ASCII spaces only).
|
Small special routine function for my_strnncollsp_utf8(mb4) functions
|
*/
|
static inline void skip_space(const uchar **sp, const uchar **tp, |
const uchar *const se, const uchar *const te) { |
while (*sp + 8 < se && *tp + 8 < te) { |
uint64_t s, t;
|
memcpy(&s, *sp, 8); |
memcpy(&t, *tp, 8); |
if (s != 0x2020202020202020ULL || t != 0x2020202020202020ULL) break; |
 |
*sp += 8;
|
*tp += 8;
|
}
|
while (*sp < se && *tp < te && **sp == 0x20 && **tp == 0x20) { |
++*sp;
|
++*tp;
|
}
|
}
|
After that every time when we run into two simultaneous spaces while comparing
two strings in my_strnncollsp_utf8 function, we invoke skip_space instead of
slow utf8 processing code:
while (s < se && t < te) { |
/* aggressive space skipping improves performance */ |
if (*s == ' ' && *t == ' ') { |
skip_space(&s, &t, se, te);
|
continue; |
}
|
... // utf8 processing |
}
|
Using this simple approach we achived significant performance improvement in
the case when two strings are equal (more than 7 times faster for CHAR(120)).
If we speak about general database performance, we improve single-threaded
sysbench OLTP RO by up to 2.5%.
Suggested fix:
1. For my_hash_sort_utf8(mb4) - use skip_trailing_space as is.
2. For my_strnncollsp_utf8(mb4) - use aggressive forward space skipping right
in the main loop which processes utf8 characters one by one.
Attachments
Issue Links
- is blocked by
-
MDEV-22849 Reuse skip_trailing_space() in my_hash_sort_utf8mbX
- Closed
- relates to
-
MDEV-23142 Improving performance of 'row_mysql_store_col_in_innobase_format'
- Open
-
MDEV-26572 Improve simple multibyte collation performance on the ASCII range
- Closed