[MDEV-25449] Add MY_COLLATION_HANDLER::strnncollsp_nchars() Created: 2021-04-19  Updated: 2022-01-21  Resolved: 2022-01-21

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.2.2, 10.3.0, 10.4.0, 10.5.0, 10.6.0
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Duplicate Votes: 0
Labels: corruption, regression-10.2, tech_debt

Issue Links:
Duplicate
duplicates MDEV-25904 New collation functions to compare In... Closed
Relates
relates to MDEV-25440 Assertion `cmp_rec_rec(rec, old_rec, ... Closed
relates to MDEV-26743 InnoDB: CHAR+nopad does not work well Closed
relates to MDEV-26744 MyISAM, Aria, MEMORY: CHAR+nopad does... Open
relates to MDEV-9711 NO PAD collations Closed
relates to MDEV-25904 New collation functions to compare In... Closed

 Description   

2022-01-21 Update

This problem was solved under terms of MDEV-25904

Old description

Field_string::cmp() seems to do unnecessary work trimming trailing spaces:

int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;
 
  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: see MDEV-25440 for details.



 Comments   
Comment by Marko Mäkelä [ 2021-04-23 ]

This refactoring is needed for fixing the InnoDB index corruption bug MDEV-25440.

Comment by Alexander Barkov [ 2021-04-27 ]

Please find the patch here:

https://github.com/MariaDB/server/commit/9b0df8eee7b8298fead770da083ab9916e0267ec

Generated at Thu Feb 08 09:37:47 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.