[MDEV-25449] Add MY_COLLATION_HANDLER::strnncollsp_nchars() - Jira

Alexander Barkov created issue - 2021-04-19 08:08

Alexander Barkov made changes - 2021-04-19 08:08

Field	Original Value	New Value
Link		This issue blocks ~~MDEV-25440~~ [ ~~MDEV-25440~~ ]

Alexander Barkov made changes - 2021-04-19 08:08

Assignee

Alexander Barkov [ bar ]

Alexander Barkov made changes - 2021-04-19 08:09

Description

Field_string::cmp() seems to do unnecessary work trimming trailing spaces:

{code:cpp}
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;

  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}
{code}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like a waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

{code:cpp}
int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);
{code}

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: seeMDEV-25440 for details.

{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:

{code:cpp}
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;

  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}
{code}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

{code:cpp}
int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);
{code}

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: seeMDEV-25440 for details.

Alexander Barkov made changes - 2021-04-19 08:10

Description

{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:

{code:cpp}
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;

  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}
{code}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

{code:cpp}
int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);
{code}

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: seeMDEV-25440 for details.

{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:

{code:cpp}
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;

  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}
{code}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

{code:cpp}
int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);
{code}

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: see ~~MDEV-25440~~ for details.

Marko Mäkelä made changes - 2021-04-23 13:09

Issue Type

Task [ 3 ]

Bug [ 1 ]

Marko Mäkelä made changes - 2021-04-23 13:11

Link

This issue relates to ~~MDEV-9711~~ [ ~~MDEV-9711~~ ]

Marko Mäkelä made changes - 2021-04-23 13:11

Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Fix Version/s		10.5 [ 23123 ]
Fix Version/s		10.6 [ 24028 ]
Affects Version/s		10.5.0 [ 23709 ]
Affects Version/s		10.4.0 [ 23115 ]
Affects Version/s		10.3.0 [ 22127 ]
Affects Version/s		10.2.2 [ 22013 ]
Affects Version/s		10.6.0 [ 24431 ]
Labels		corruption regression-10.2 tech_debt

Alexander Barkov made changes - 2021-04-27 11:02

Status

Open [ 1 ]

In Progress [ 3 ]

Alexander Barkov made changes - 2021-04-27 11:36

Assignee	Alexander Barkov [ bar ]	Marko Mäkelä [ marko ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Marko Mäkelä made changes - 2021-04-29 11:01

Assignee

Marko Mäkelä [ marko ]

Eugene Kosov [ kevg ]

Sergei Golubchik made changes - 2021-05-25 12:51

Fix Version/s

10.6 [ 24028 ]

Alexander Barkov made changes - 2021-06-13 05:54

Link

This issue relates to ~~MDEV-25904~~ [ ~~MDEV-25904~~ ]

Sergei Golubchik made changes - 2021-06-14 08:21

Assignee

Eugene Kosov [ kevg ]

Sergei Golubchik [ serg ]

Sergei Golubchik made changes - 2021-07-23 20:45

Assignee	Sergei Golubchik [ serg ]	Alexander Barkov [ bar ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Alexander Barkov made changes - 2021-10-01 14:08

Link

This issue relates to ~~MDEV-26743~~ [ ~~MDEV-26743~~ ]

Alexander Barkov made changes - 2021-10-01 14:08

Link

This issue relates to MDEV-26744 [ MDEV-26744 ]

Sergei Golubchik made changes - 2021-12-06 21:35

Workflow

MariaDB v3 [ 121147 ]

MariaDB v4 [ 143686 ]

Sergei Golubchik made changes - 2022-01-19 12:26

Link

This issue is part of ~~MDEV-25440~~ [ ~~MDEV-25440~~ ]

Sergei Golubchik made changes - 2022-01-19 12:26

Link

This issue blocks ~~MDEV-25440~~ [ ~~MDEV-25440~~ ]

Alexander Barkov made changes - 2022-01-21 09:24

Description

{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:

{code:cpp}
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;

  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}
{code}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

{code:cpp}
int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);
{code}

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: see ~~MDEV-25440~~ for details.

h2. 2022-01-21 Update

This problem was solved under terms of ~~MDEV-25904~~

h2. Old description

{{Field_string::cmp()}} seems to do unnecessary work trimming trailing spaces:

{code:cpp}
int Field_string::cmp(const uchar *a_ptr, const uchar *b_ptr) const
{
  size_t a_len, b_len;

  if (mbmaxlen() != 1)
  {
    size_t char_len= Field_string::char_length();
    a_len= field_charset()->charpos(a_ptr, a_ptr + field_length, char_len);
    b_len= field_charset()->charpos(b_ptr, b_ptr + field_length, char_len);
  }
  else
    a_len= b_len= field_length;
  /*
    We have to remove end space to be able to compare multi-byte-characters
    like in latin_de 'ae' and 0xe4
  */
  return field_charset()->strnncollsp(a_ptr, a_len,
                                      b_ptr, b_len);
}
{code}

In absolute majority cases, the difference between strings is found in the very beginning of the compared strings. So doing charpos() on the two arguments, before passing them to the actual comparison function, looks like an inefficient waste of CPU.

A better approach would be to implement a new comparison function with this tentative API:

{code:cpp}
int strnncollsp_nchars(CHARSET_INFO *cs,
                       const char *s1, size_t len1,
                       const char *s2, size_t len2,
                       size_t nchars);
{code}

Internally, the exact virtial implementations of strnncollsp_nchars() would do the same with what strnncollsp() do in the same collation, but with an extra limit on "nchars".

This new function should also help to fix a bug in the similar code in InnoDB: see ~~MDEV-25440~~ for details.

Alexander Barkov made changes - 2022-01-21 09:24

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]
Resolution		Duplicate [ 3 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Alexander Barkov made changes - 2022-01-21 09:25

Link

This issue duplicates ~~MDEV-25904~~ [ ~~MDEV-25904~~ ]

Alexander Barkov made changes - 2022-01-21 09:25

Link

This issue relates to ~~MDEV-25440~~ [ ~~MDEV-25440~~ ]

Alexander Barkov made changes - 2022-01-21 09:25

Link

This issue is part of ~~MDEV-25440~~ [ ~~MDEV-25440~~ ]

MariaDB Server

Add MY_COLLATION_HANDLER::strnncollsp_nchars()

Details

Description

2022-01-21 Update

Old description

Attachments

Issue Links

Activity

People

Dates

Git Integration