Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.2.2, 10.3.0, 10.4.0, 10.5.0, 10.6.0
Description
2022-01-21 Update
This problem was solved under terms of MDEV-25904
Old description
Field_string::cmp() seems to do unnecessary work trimming trailing spaces:
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const |
{
|
size_t a_len, b_len; |
|
if (mbmaxlen() != 1) |
{
|
size_t char_len= Field_string::char_length(); |
a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
|
b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
|
}
|
else |
a_len= b_len= field_length;
|
/* |
We have to remove end space to be able to compare multi-byte-characters
|
like in latin_de 'ae' and 0xe4
|
*/
|
return field_charset()->strnncollsp(a_ptr, a_len, |
b_ptr, b_len);
|
}
|
In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.
A better approach would be to implement a new comparison function with this tentative API:
int strnncollsp_nchars(CHARSET_INFO *cs, |
const char *s1, size_t len1, |
const char *s2, size_t len2, |
size_t nchars); |
Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".
This new function should also help to fix a bug in the similar code in InnoDB: see MDEV-25440 for details.
Attachments
Issue Links
- duplicates
-
MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings
-
- Closed
-
- relates to
-
MDEV-25440 Assertion `cmp_rec_rec(rec, old_rec, offsets, old_offsets, m_index) > 0' failed in PageBulk::insert
-
- Closed
-
-
MDEV-26743 InnoDB: CHAR+nopad does not work well
-
- Closed
-
-
MDEV-26744 MyISAM, Aria, MEMORY: CHAR+nopad does not work well
-
- Open
-
-
MDEV-9711 NO PAD collations
-
- Closed
-
-
MDEV-25904 New collation functions to compare InnoDB style trimmed NO PAD strings
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue blocks |
Assignee | Alexander Barkov [ bar ] |
Description |
Field_string::cmp() seems to do unnecessary work trimming trailing spaces:
{code:cpp} int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const { size_t a_len, b_len; if (mbmaxlen() != 1) { size_t char_len= Field_string::char_length(); a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len); b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len); } else a_len= b_len= field_length; /* We have to remove end space to be able to compare multi-byte-characters like in latin_de 'ae' and 0xe4 */ return field_charset()->strnncollsp(a_ptr, a_len, b_ptr, b_len); } {code} In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like a waste of CPU. A better approach would be to implement a new comparison function with this tentative API: {code:cpp} int strnncollsp_nchars(CHARSET_INFO *cs, const char *s1, size_t len1, const char *s2, size_t len2, size_t nchars); {code} Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars". This new function should also help to fix a bug in the similar code in InnoDB: seeMDEV-25440 for details. |
{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:
{code:cpp} int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const { size_t a_len, b_len; if (mbmaxlen() != 1) { size_t char_len= Field_string::char_length(); a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len); b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len); } else a_len= b_len= field_length; /* We have to remove end space to be able to compare multi-byte-characters like in latin_de 'ae' and 0xe4 */ return field_charset()->strnncollsp(a_ptr, a_len, b_ptr, b_len); } {code} In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU. A better approach would be to implement a new comparison function with this tentative API: {code:cpp} int strnncollsp_nchars(CHARSET_INFO *cs, const char *s1, size_t len1, const char *s2, size_t len2, size_t nchars); {code} Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars". This new function should also help to fix a bug in the similar code in InnoDB: seeMDEV-25440 for details. |
Description |
{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:
{code:cpp} int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const { size_t a_len, b_len; if (mbmaxlen() != 1) { size_t char_len= Field_string::char_length(); a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len); b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len); } else a_len= b_len= field_length; /* We have to remove end space to be able to compare multi-byte-characters like in latin_de 'ae' and 0xe4 */ return field_charset()->strnncollsp(a_ptr, a_len, b_ptr, b_len); } {code} In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU. A better approach would be to implement a new comparison function with this tentative API: {code:cpp} int strnncollsp_nchars(CHARSET_INFO *cs, const char *s1, size_t len1, const char *s2, size_t len2, size_t nchars); {code} Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars". This new function should also help to fix a bug in the similar code in InnoDB: seeMDEV-25440 for details. |
{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:
{code:cpp} int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const { size_t a_len, b_len; if (mbmaxlen() != 1) { size_t char_len= Field_string::char_length(); a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len); b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len); } else a_len= b_len= field_length; /* We have to remove end space to be able to compare multi-byte-characters like in latin_de 'ae' and 0xe4 */ return field_charset()->strnncollsp(a_ptr, a_len, b_ptr, b_len); } {code} In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU. A better approach would be to implement a new comparison function with this tentative API: {code:cpp} int strnncollsp_nchars(CHARSET_INFO *cs, const char *s1, size_t len1, const char *s2, size_t len2, size_t nchars); {code} Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars". This new function should also help to fix a bug in the similar code in InnoDB: see |
Issue Type | Task [ 3 ] | Bug [ 1 ] |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Affects Version/s | 10.5.0 [ 23709 ] | |
Affects Version/s | 10.4.0 [ 23115 ] | |
Affects Version/s | 10.3.0 [ 22127 ] | |
Affects Version/s | 10.2.2 [ 22013 ] | |
Affects Version/s | 10.6.0 [ 24431 ] | |
Labels | corruption regression-10.2 tech_debt |
Status | Open [ 1 ] | In Progress [ 3 ] |
Assignee | Alexander Barkov [ bar ] | Marko Mäkelä [ marko ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Marko Mäkelä [ marko ] | Eugene Kosov [ kevg ] |
Fix Version/s | 10.6 [ 24028 ] |
Link |
This issue relates to |
Assignee | Eugene Kosov [ kevg ] | Sergei Golubchik [ serg ] |
Assignee | Sergei Golubchik [ serg ] | Alexander Barkov [ bar ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-26744 [ MDEV-26744 ] |
Workflow | MariaDB v3 [ 121147 ] | MariaDB v4 [ 143686 ] |
Link |
This issue is part of |
Link |
This issue blocks |
Description |
{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:
{code:cpp} int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const { size_t a_len, b_len; if (mbmaxlen() != 1) { size_t char_len= Field_string::char_length(); a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len); b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len); } else a_len= b_len= field_length; /* We have to remove end space to be able to compare multi-byte-characters like in latin_de 'ae' and 0xe4 */ return field_charset()->strnncollsp(a_ptr, a_len, b_ptr, b_len); } {code} In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU. A better approach would be to implement a new comparison function with this tentative API: {code:cpp} int strnncollsp_nchars(CHARSET_INFO *cs, const char *s1, size_t len1, const char *s2, size_t len2, size_t nchars); {code} Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars". This new function should also help to fix a bug in the similar code in InnoDB: see |
h2. 2022-01-21 Update
This problem was solved under terms of h2. Old description {{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces: {code:cpp} int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const { size_t a_len, b_len; if (mbmaxlen() != 1) { size_t char_len= Field_string::char_length(); a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len); b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len); } else a_len= b_len= field_length; /* We have to remove end space to be able to compare multi-byte-characters like in latin_de 'ae' and 0xe4 */ return field_charset()->strnncollsp(a_ptr, a_len, b_ptr, b_len); } {code} In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU. A better approach would be to implement a new comparison function with this tentative API: {code:cpp} int strnncollsp_nchars(CHARSET_INFO *cs, const char *s1, size_t len1, const char *s2, size_t len2, size_t nchars); {code} Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars". This new function should also help to fix a bug in the similar code in InnoDB: see |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Resolution | Duplicate [ 3 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Link |
This issue duplicates |
Link |
This issue relates to |
Link |
This issue is part of |