[MDEV-31340] Remove MY_COLLATION_HANDLER::strcasecmp() Created: 2023-05-25  Updated: 2024-01-19

Status: In Review
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: 11.5

Type: Task Priority: Critical
Reporter: Alexander Barkov Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-27490 Allow full utf8mb4 for identifiers Stalled
Relates
relates to CONC-682 Collation IDs for utf8mbX_general_as_ci Open
relates to MDEV-33084 LASTVAL(t1) and LASTVAL(T1) do not wo... Open
relates to MDEV-33086 SHOW OPEN TABLES IN DB1 -- is case in... Open
relates to MDEV-33088 Cannot create triggers in the databas... Open
relates to MDEV-33110 HANDLER commands are case insensitive... Open
relates to MDEV-33119 User is case insensitive in INFORMATI... Open
relates to MDEV-33120 System log table names are case insen... Open
relates to MDEV-33050 Build-in schemas like oracle_schema a... Open
relates to MDEV-33085 Tables T1 and t1 do not work well wit... Open
relates to MDEV-33103 LOCK TABLE t1 AS t2 -- alias is not c... Open
relates to MDEV-33109 DROP DATABASE MYSQL -- does not drop ... Open

 Description   

There are two parallel comparison systems in MariaDB collation library, implemented as virtual functions in MY_COLLATION_HANDLER:

  • Comparison according to the collation, provided by these functions

      int     (*strnncoll)(CHARSET_INFO *,
                           const uchar *, size_t, const uchar *, size_t, my_bool);
      int     (*strnncollsp)(CHARSET_INFO *,
                             const uchar *, size_t, const uchar *, size_t);
     
      int     (*strnncollsp_nchars)(CHARSET_INFO *,
                                    const uchar *str1, size_t len1,
                                    const uchar *str2, size_t len2,
                                    size_t nchars,
                                    uint flags);
    

  • Comparison in case insensitive (but accent sensitive) style, implemented by this function:

      int  (*strcasecmp)(CHARSET_INFO *, const char *, const char *);
    

    Note, accent and case sensitivity of the collation does not matter. strcasecmp() always works using accent sensitive case insensitive comparison style.

These two parallel systems are redundant.

Note, strcasecmp() is used mostly to compare identifiers, while the functions of the first group are used to compare data.

Let's get rid of the second comparison system:
1. Remove MY_COLLATION_HANDLER::strcasecmp()
2. Introduce a new collation utf8mb4_general_as_ci. Note, it should work for the entire Unicode range U+0000..U+10FFFF.
3. Replace all calls for:

system_charset_info->coll->strcasecmp()

to calls for

my_charset_utf8mb4_general_as_ci->coll->strnncoll***()

The change would generally be quite mechanic. However, there is one small problem: strcasecmp() accepts 0-terminated strings, while the strnncoll-alike functions accept the pointer and the length. So some refactoring will be needed. Note, Monty earlier changed many MariaDB C data types to use LEX_CSTRING (instead of just a const char pointer) to store names. So this part of the current task will switch some more C data types to LEX_CSTRING.

tolower vs toupper comparison

Another option is to implement utf8mb3_general_as_ci which will compare upper cases (instead of lower cases).

The difference is in a few dozen BMP characters. This script finds all those characters:

CREATE OR REPLACE TABLE t1 (a CHAR(1) CHARACTER SET utf8mb4 COLLATE utf8mb4_uca1400_ai_ci) ENGINE=MyISAM;
DELIMITER $$
FOR i IN 0..0xFFFF
DO
  INSERT INTO t1 VALUES (CHAR(i USING ucs2));
END FOR;
$$
DELIMITER ;
ALTER TABLE t1
  ADD has_casefolding INT DEFAULT (BINARY LOWER(a)<>a OR BINARY UPPER(a)<>a),
  ADD KEY(has_casefolding, a);
 
CREATE OR REPLACE TABLE t21 AS
SELECT
  HEX(t1.a) AS hex_a, HEX(t2.a) AS hex_b,
  BINARY LOWER(t1.a)=LOWER(t2.a) eq_lower,
  BINARY UPPER(t1.a)=UPPER(t2.a) AS eq_upper,
  t1.a AS a, t2.a AS b
FROM
  t1 t1, t1 t2
WHERE
  t1.has_casefolding=1
AND (BINARY LOWER(t1.a)=LOWER(t2.a))<>(BINARY UPPER(t1.a)=UPPER(t2.a));
 
SELECT
  HEX(CONVERT(a USING ucs2)) AS unicode_a,
  HEX(CONVERT(b USING ucs2)) AS unicode_b,
  t21.* FROM t21;

+-----------+-----------+--------+--------+----------+----------+------+------+
| unicode_a | unicode_b | hex_a  | hex_b  | eq_lower | eq_upper | a    | b    |
+-----------+-----------+--------+--------+----------+----------+------+------+
|      1E9E |      00DF | E1BA9E |   C39F |        1 |        0 |    ẞ |    ß |
|      0399 |      0345 |   CE99 |   CD85 |        0 |        1 |    Ι |    ͅ |
|      03B9 |      0345 |   CEB9 |   CD85 |        0 |        1 |    ι |    ͅ |
|      1FBE |      0345 | E1BEBE |   CD85 |        0 |        1 |    ι |    ͅ |
|      212B |      00C5 | E284AB |   C385 |        1 |        0 |    Å |    Å |
|      212B |      00E5 | E284AB |   C3A5 |        1 |        0 |    Å |    å |
|      00C5 |      212B |   C385 | E284AB |        1 |        0 |    Å |    Å |
|      00E5 |      212B |   C3A5 | E284AB |        1 |        0 |    å |    Å |
|      0130 |      0049 |   C4B0 |     49 |        1 |        0 |    İ |    I |
|      0131 |      0049 |   C4B1 |     49 |        0 |        1 |    ı |    I |
|      0130 |      0069 |   C4B0 |     69 |        1 |        0 |    İ |    i |
|      0131 |      0069 |   C4B1 |     69 |        0 |        1 |    ı |    i |
|      0049 |      0130 |     49 |   C4B0 |        1 |        0 |    I |    İ |
|      0069 |      0130 |     69 |   C4B0 |        1 |        0 |    i |    İ |
|      0049 |      0131 |     49 |   C4B1 |        0 |        1 |    I |    ı |
|      0069 |      0131 |     69 |   C4B1 |        0 |        1 |    i |    ı |
|      212A |      004B | E284AA |     4B |        1 |        0 |    K |    K |
|      212A |      006B | E284AA |     6B |        1 |        0 |    K |    k |
|      004B |      212A |     4B | E284AA |        1 |        0 |    K |    K |
|      006B |      212A |     6B | E284AA |        1 |        0 |    k |    K |
|      017F |      0053 |   C5BF |     53 |        0 |        1 |    ſ |    S |
|      017F |      0073 |   C5BF |     73 |        0 |        1 |    ſ |    s |
|      0053 |      017F |     53 |   C5BF |        0 |        1 |    S |    ſ |
|      0073 |      017F |     73 |   C5BF |        0 |        1 |    s |    ſ |
|      1E9B |      1E60 | E1BA9B | E1B9A0 |        0 |        1 |    ẛ |    Ṡ |
|      1E9B |      1E61 | E1BA9B | E1B9A1 |        0 |        1 |    ẛ |    ṡ |
|      1E60 |      1E9B | E1B9A0 | E1BA9B |        0 |        1 |    Ṡ |    ẛ |
|      1E61 |      1E9B | E1B9A1 | E1BA9B |        0 |        1 |    ṡ |    ẛ |
|      03D0 |      0392 |   CF90 |   CE92 |        0 |        1 |    ϐ |    Β |
|      03D0 |      03B2 |   CF90 |   CEB2 |        0 |        1 |    ϐ |    β |
|      0392 |      03D0 |   CE92 |   CF90 |        0 |        1 |    Β |    ϐ |
|      03B2 |      03D0 |   CEB2 |   CF90 |        0 |        1 |    β |    ϐ |
|      03F5 |      0395 |   CFB5 |   CE95 |        0 |        1 |    ϵ |    Ε |
|      03F5 |      03B5 |   CFB5 |   CEB5 |        0 |        1 |    ϵ |    ε |
|      0395 |      03F5 |   CE95 |   CFB5 |        0 |        1 |    Ε |    ϵ |
|      03B5 |      03F5 |   CEB5 |   CFB5 |        0 |        1 |    ε |    ϵ |
|      03D1 |      0398 |   CF91 |   CE98 |        0 |        1 |    ϑ |    Θ |
|      03F4 |      0398 |   CFB4 |   CE98 |        1 |        0 |    ϴ |    Θ |
|      03D1 |      03B8 |   CF91 |   CEB8 |        0 |        1 |    ϑ |    θ |
|      03F4 |      03B8 |   CFB4 |   CEB8 |        1 |        0 |    ϴ |    θ |
|      0398 |      03D1 |   CE98 |   CF91 |        0 |        1 |    Θ |    ϑ |
|      03B8 |      03D1 |   CEB8 |   CF91 |        0 |        1 |    θ |    ϑ |
|      0398 |      03F4 |   CE98 |   CFB4 |        1 |        0 |    Θ |    ϴ |
|      03B8 |      03F4 |   CEB8 |   CFB4 |        1 |        0 |    θ |    ϴ |
|      0345 |      0399 |   CD85 |   CE99 |        0 |        1 |    ͅ |    Ι |
|      1FBE |      0399 | E1BEBE |   CE99 |        0 |        1 |    ι |    Ι |
|      0345 |      03B9 |   CD85 |   CEB9 |        0 |        1 |    ͅ |    ι |
|      1FBE |      03B9 | E1BEBE |   CEB9 |        0 |        1 |    ι |    ι |
|      0345 |      1FBE |   CD85 | E1BEBE |        0 |        1 |    ͅ |    ι |
|      0399 |      1FBE |   CE99 | E1BEBE |        0 |        1 |    Ι |    ι |
|      03B9 |      1FBE |   CEB9 | E1BEBE |        0 |        1 |    ι |    ι |
|      03F0 |      039A |   CFB0 |   CE9A |        0 |        1 |    ϰ |    Κ |
|      03F0 |      03BA |   CFB0 |   CEBA |        0 |        1 |    ϰ |    κ |
|      039A |      03F0 |   CE9A |   CFB0 |        0 |        1 |    Κ |    ϰ |
|      03BA |      03F0 |   CEBA |   CFB0 |        0 |        1 |    κ |    ϰ |
|      039C |      00B5 |   CE9C |   C2B5 |        0 |        1 |    Μ |    µ |
|      03BC |      00B5 |   CEBC |   C2B5 |        0 |        1 |    μ |    µ |
|      00B5 |      039C |   C2B5 |   CE9C |        0 |        1 |    µ |    Μ |
|      00B5 |      03BC |   C2B5 |   CEBC |        0 |        1 |    µ |    μ |
|      03D6 |      03A0 |   CF96 |   CEA0 |        0 |        1 |    ϖ |    Π |
|      03D6 |      03C0 |   CF96 |   CF80 |        0 |        1 |    ϖ |    π |
|      03A0 |      03D6 |   CEA0 |   CF96 |        0 |        1 |    Π |    ϖ |
|      03C0 |      03D6 |   CF80 |   CF96 |        0 |        1 |    π |    ϖ |
|      03F1 |      03A1 |   CFB1 |   CEA1 |        0 |        1 |    ϱ |    Ρ |
|      03F1 |      03C1 |   CFB1 |   CF81 |        0 |        1 |    ϱ |    ρ |
|      03A1 |      03F1 |   CEA1 |   CFB1 |        0 |        1 |    Ρ |    ϱ |
|      03C1 |      03F1 |   CF81 |   CFB1 |        0 |        1 |    ρ |    ϱ |
|      03C2 |      03A3 |   CF82 |   CEA3 |        0 |        1 |    ς |    Σ |
|      03A3 |      03C2 |   CEA3 |   CF82 |        0 |        1 |    Σ |    ς |
|      03C3 |      03C2 |   CF83 |   CF82 |        0 |        1 |    σ |    ς |
|      03C2 |      03C3 |   CF82 |   CF83 |        0 |        1 |    ς |    σ |
|      03D5 |      03A6 |   CF95 |   CEA6 |        0 |        1 |    ϕ |    Φ |
|      03D5 |      03C6 |   CF95 |   CF86 |        0 |        1 |    ϕ |    φ |
|      03A6 |      03D5 |   CEA6 |   CF95 |        0 |        1 |    Φ |    ϕ |
|      03C6 |      03D5 |   CF86 |   CF95 |        0 |        1 |    φ |    ϕ |
|      2126 |      03A9 | E284A6 |   CEA9 |        1 |        0 |    Ω |    Ω |
|      2126 |      03C9 | E284A6 |   CF89 |        1 |        0 |    Ω |    ω |
|      03A9 |      2126 |   CEA9 | E284A6 |        1 |        0 |    Ω |    Ω |
|      03C9 |      2126 |   CF89 | E284A6 |        1 |        0 |    ω |    Ω |
|      1C80 |      0412 | E1B280 |   D092 |        0 |        1 |    ᲀ |    В |
|      1C80 |      0432 | E1B280 |   D0B2 |        0 |        1 |    ᲀ |    в |
|      0412 |      1C80 |   D092 | E1B280 |        0 |        1 |    В |    ᲀ |
|      0432 |      1C80 |   D0B2 | E1B280 |        0 |        1 |    в |    ᲀ |
|      1C81 |      0414 | E1B281 |   D094 |        0 |        1 |    ᲁ |    Д |
|      1C81 |      0434 | E1B281 |   D0B4 |        0 |        1 |    ᲁ |    д |
|      0414 |      1C81 |   D094 | E1B281 |        0 |        1 |    Д |    ᲁ |
|      0434 |      1C81 |   D0B4 | E1B281 |        0 |        1 |    д |    ᲁ |
|      1C82 |      041E | E1B282 |   D09E |        0 |        1 |    ᲂ |    О |
|      1C82 |      043E | E1B282 |   D0BE |        0 |        1 |    ᲂ |    о |
|      041E |      1C82 |   D09E | E1B282 |        0 |        1 |    О |    ᲂ |
|      043E |      1C82 |   D0BE | E1B282 |        0 |        1 |    о |    ᲂ |
|      1C83 |      0421 | E1B283 |   D0A1 |        0 |        1 |    ᲃ |    С |
|      1C83 |      0441 | E1B283 |   D181 |        0 |        1 |    ᲃ |    с |
|      0421 |      1C83 |   D0A1 | E1B283 |        0 |        1 |    С |    ᲃ |
|      0441 |      1C83 |   D181 | E1B283 |        0 |        1 |    с |    ᲃ |
|      1C84 |      0422 | E1B284 |   D0A2 |        0 |        1 |    ᲄ |    Т |
|      1C85 |      0422 | E1B285 |   D0A2 |        0 |        1 |    ᲅ |    Т |
|      1C84 |      0442 | E1B284 |   D182 |        0 |        1 |    ᲄ |    т |
|      1C85 |      0442 | E1B285 |   D182 |        0 |        1 |    ᲅ |    т |
|      0422 |      1C84 |   D0A2 | E1B284 |        0 |        1 |    Т |    ᲄ |
|      0442 |      1C84 |   D182 | E1B284 |        0 |        1 |    т |    ᲄ |
|      1C85 |      1C84 | E1B285 | E1B284 |        0 |        1 |    ᲅ |    ᲄ |
|      0422 |      1C85 |   D0A2 | E1B285 |        0 |        1 |    Т |    ᲅ |
|      0442 |      1C85 |   D182 | E1B285 |        0 |        1 |    т |    ᲅ |
|      1C84 |      1C85 | E1B284 | E1B285 |        0 |        1 |    ᲄ |    ᲅ |
|      A64A |      1C88 | EA998A | E1B288 |        0 |        1 |    Ꙋ |    ᲈ |
|      A64B |      1C88 | EA998B | E1B288 |        0 |        1 |    ꙋ |    ᲈ |
|      1C88 |      A64A | E1B288 | EA998A |        0 |        1 |    ᲈ |    Ꙋ |
|      1C88 |      A64B | E1B288 | EA998B |        0 |        1 |    ᲈ |    ꙋ |
|      1C86 |      042A | E1B286 |   D0AA |        0 |        1 |    ᲆ |    Ъ |
|      1C86 |      044A | E1B286 |   D18A |        0 |        1 |    ᲆ |    ъ |
|      042A |      1C86 |   D0AA | E1B286 |        0 |        1 |    Ъ |    ᲆ |
|      044A |      1C86 |   D18A | E1B286 |        0 |        1 |    ъ |    ᲆ |
|      1C87 |      0462 | E1B287 |   D1A2 |        0 |        1 |    ᲇ |    Ѣ |
|      1C87 |      0463 | E1B287 |   D1A3 |        0 |        1 |    ᲇ |    ѣ |
|      0462 |      1C87 |   D1A2 | E1B287 |        0 |        1 |    Ѣ |    ᲇ |
|      0463 |      1C87 |   D1A3 | E1B287 |        0 |        1 |    ѣ |    ᲇ |
+-----------+-----------+--------+--------+----------+----------+------+------+

Note, toupper comparison considers more pairs as equal than tolower comparison:

SELECT SUM(eq_lower), SUM(eq_upper) FROM t21;

+---------------+---------------+
| SUM(eq_lower) | SUM(eq_upper) |
+---------------+---------------+
|            21 |            96 |
+---------------+---------------+

A more compact table with distinct pairs (the character with a smaller code point is on the left)

SELECT
  HEX(CONVERT(a USING ucs2)) AS unicode_a,
  HEX(CONVERT(b USING ucs2)) AS unicode_b,
  HEX(a), HEX(b),
  BINARY LOWER(a)=LOWER(b) AS eq_lower,
  BINARY UPPER(a)=UPPER(b) AS eq_upper,
  a,b
FROM
(
  SELECT DISTINCT IF(BINARY a<b,a,b) AS a,IF(binary a>=b,a,b) AS b from t21
) d1
ORDER BY eq_lower, unicode_a, unicode_b;

+-----------+-----------+--------+--------+----------+----------+-----+-----+
| unicode_a | unicode_b | HEX(a) | HEX(b) | eq_lower | eq_upper | a   | b   |
+-----------+-----------+--------+--------+----------+----------+-----+-----+
|      0049 |      0131 |     49 |   C4B1 |        0 |        1 |   I |   ı |
|      0053 |      017F |     53 |   C5BF |        0 |        1 |   S |   ſ |
|      00B5 |      039C |   C2B5 |   CE9C |        0 |        1 |   µ |   Μ |
|      0345 |      0399 |   CD85 |   CE99 |        0 |        1 |   ͅ |   Ι |
|      0392 |      03D0 |   CE92 |   CF90 |        0 |        1 |   Β |   ϐ |
|      0395 |      03F5 |   CE95 |   CFB5 |        0 |        1 |   Ε |   ϵ |
|      0398 |      03D1 |   CE98 |   CF91 |        0 |        1 |   Θ |   ϑ |
|      0399 |      1FBE |   CE99 | E1BEBE |        0 |        1 |   Ι |   ι |
|      039A |      03F0 |   CE9A |   CFB0 |        0 |        1 |   Κ |   ϰ |
|      03A0 |      03D6 |   CEA0 |   CF96 |        0 |        1 |   Π |   ϖ |
|      03A1 |      03F1 |   CEA1 |   CFB1 |        0 |        1 |   Ρ |   ϱ |
|      03A3 |      03C2 |   CEA3 |   CF82 |        0 |        1 |   Σ |   ς |
|      03A6 |      03D5 |   CEA6 |   CF95 |        0 |        1 |   Φ |   ϕ |
|      0412 |      1C80 |   D092 | E1B280 |        0 |        1 |   В |   ᲀ |
|      0414 |      1C81 |   D094 | E1B281 |        0 |        1 |   Д |   ᲁ |
|      041E |      1C82 |   D09E | E1B282 |        0 |        1 |   О |   ᲂ |
|      0421 |      1C83 |   D0A1 | E1B283 |        0 |        1 |   С |   ᲃ |
|      0422 |      1C84 |   D0A2 | E1B284 |        0 |        1 |   Т |   ᲄ |
|      042A |      1C86 |   D0AA | E1B286 |        0 |        1 |   Ъ |   ᲆ |
|      0462 |      1C87 |   D1A2 | E1B287 |        0 |        1 |   Ѣ |   ᲇ |
|      1C88 |      A64A | E1B288 | EA998A |        0 |        1 |   ᲈ |   Ꙋ |
|      0049 |      0130 |     49 |   C4B0 |        1 |        0 |   I |   İ |
|      004B |      212A |     4B | E284AA |        1 |        0 |   K |   K |
|      00C5 |      212B |   C385 | E284AB |        1 |        0 |   Å |   Å |
|      00DF |      1E9E |   C39F | E1BA9E |        1 |        0 |   ß |   ẞ |
|      03A9 |      2126 |   CEA9 | E284A6 |        1 |        0 |   Ω |   Ω |
+-----------+-----------+--------+--------+----------+----------+-----+-----+
26 rows in set (0.005 sec)

Cons of toupper vs tolower comparison

  • The tolower variant will compare exactly like strcasecmp() did
  • The toupper variant will compare close to how utf8mb3_general_ci works


 Comments   
Comment by Alexander Barkov [ 2023-11-30 ]

Hello serg,

Please review a patch for this task:

https://github.com/MariaDB/server/commit/02b85f1cf9bbdaf4c0fb65ab4c259eb1d8446e1c

Thanks.

Comment by Alexander Barkov [ 2023-12-28 ]

Hello serg,

This is a new patch version:
https://github.com/MariaDB/server/commit/15c315211befe7da677f9a533f0ceabe21caf9a8

Replied to your review comments by email.

Comment by Marko Mäkelä [ 2024-01-02 ]

I reviewed the InnoDB changes, except those related to MDEV-31953 and mem_pressure, which I assume will not be part of the final version.

For dict_table_schema_check(), it could be better to modify the type of dict_col_meta_t::name and the declarations of table_stats_schema and index_stats_schema. The constants innobase_index_reserve_name_ls, FTS_DOC_ID_COL_NAME_ls and FTS_DOC_ID_INDEX_NAME_ls could be renamed to less obfuscated ones GEN_CLUST_INDEX_ls, FTS_DOC_ID_ls and FTS_DOC_ID_INDEX_ls. Why are those constant defined as static const in header files? Could they be static constexpr instead? Or could the constants be declared in the header file, and defined only once in one compilation unit?

Have you measured the code size and performance impact of defining dict_table_get_col_name_ls() and similar functions inline? As far as I understand, it could be better to emit function calls than to duplicate code that includes loops or conditional branches.

Should the following be removed from client/mysql.cc?

#define cmp_database(A,B) my_strcasecmp_latin1((A), (B))

Should the functions my_strcasecmp_8bit() and my_strcasecmp_latin1() continue to be defined?

Comment by Alexander Barkov [ 2024-01-09 ]

Hi marko! Thanks for your review!
Please find a new version here:
https://github.com/MariaDB/server/commit/0f52062f908a3dbe64f7ba6929584a896117c7f3

I reviewed the InnoDB changes, except those related to MDEV-31953 and mem_pressure, which I assume will not be part of the final version.

True, the mem_pressure part won't be in the final version.

For dict_table_schema_check(), it could be better to modify the type of dict_col_meta_t::name and the declarations of table_stats_schema and index_stats_schema.

Done. It looks better. Thanks for this suggestion.

The constants innobase_index_reserve_name_ls, FTS_DOC_ID_COL_NAME_ls and FTS_DOC_ID_INDEX_NAME_ls could be renamed to less obfuscated ones GEN_CLUST_INDEX_ls, FTS_DOC_ID_ls and FTS_DOC_ID_INDEX_ls.

This rename does not seem to relate to the current task. Can we do it in a separate patch if needed?

Why are those constant defined as static const in header files? Could they be static constexpr instead? Or could the constants be declared in the header file, and defined only once in one compilation unit?

I changed them to static constexpr.

Have you measured the code size and performance impact of defining dict_table_get_col_name_ls() and similar functions inline? As far as I understand, it could be better to emit function calls than to duplicate code that includes loops or conditional branches.

No I have not. dict_table_get_col_name_ls() does not seem to have conditional branches.

Should the following be removed from client/mysql.cc?

#define cmp_database(A,B) my_strcasecmp_latin1((A), (B))

cmp_database() is used in the code. Why remove it?

Should the functions my_strcasecmp_8bit() and my_strcasecmp_latin1() continue to be defined?

Yes, latin1 case insensitive comparison is still used in the code.

Comment by Marko Mäkelä [ 2024-01-09 ]

Right, the function dict_table_get_col_name_ls() may not directly expand to any conditional branches (depending on the build parameters), but its execution will include them, for example strlen(). In fact, dict_table_get_col_name_ls() is duplicating some effort by invoking strlen() one more time, after dict_col_t::name() had already computed it for the returned name.

On my system, even when I enable -O3 build, I observe that dict_table_has_column() is actually invoking dict_table_get_col_name_ls() via a call instruction and not expanding it inline. I think that it would be better to declare and define dict_table_get_col_name_ls() and dict_table_get_v_col_name_ls() without inline. The functions dict_col_t::name() and dict_table_get_v_col_name() had better be refactored, something like this:

static const char *get_name(std::pair<const char*,size_t> n)
{
  if (n.first)
    for (size_t i= 0; i < n.second; i++)
      n.first+= strlen(n.first) + 1;
  return n.first;
}
 
static Lex_ident_column get_name_ls(std::pair<const char*,size_t> n); // to be defined
 
const char *dict_col_t::name_low(const dict_table_t& table) const
{
  // same code as name(), but without the for loop that was refactored to get_name() above
}
const char *dict_col_t::name(const dict_table_t& table) const
{
  return get_name(name_low(table));
}
static const char *dict_table_get_v_col_name_low(const dict_table_t *table, ulint col_nr)
{
  // same as the old code, but without the for loop at the end
}
const char *dict_table_get_v_col_name(const dict_table_t *table, ulint col_nr)
{
  return get_name(dict_table_get_v_col_name_low(table, col_nr));
}
Lex_ident_column dict_col_t::name_ls(const dict_table_t &table) const
{
  return get_name_ls(name_low(table));
}
Lex_ident_column dict_table_get_v_col_name_ls(const dict_table_t * table, ulint col_nr)
{
  return get_name_ls(dict_table_get_v_col_name_low(table, col_nr));
}

The get_name() part is currently duplicated in both functions. You would write get_name_ls() in such a way that the duplicate call to strlen() would be avoided.

Having less inline code defined in common header files should improve the compilation time. Since this is infrequently called code, any runtime speed advantage of inlining should not matter. Reduced code size could lead to faster running and easier-to-debug code.

Comment by Alexander Barkov [ 2024-01-16 ]

Hi marko, thanks for your suggestions.
Please see a new patch here:
https://github.com/MariaDB/server/commits/bb-11.4-bar-MDEV-31340/

Comment by Marko Mäkelä [ 2024-01-16 ]

Thank you, bar, the InnoDB changes look good to me. For any new or rewritten functions, such as dict_table_t::get_name_from_z_list() or dict_col_t::name(), I think that it is best to use the common formatting style instead of the archaic InnoDB one.

Comment by Alexander Barkov [ 2024-01-19 ]

Hi serg,

Please find the latest patch here:

https://github.com/MariaDB/server/commit/be66e975ecf15ae696d2e645da4aaf1a06cf0082

It addresses your and Marko's comments.

Thanks.

Generated at Thu Feb 08 10:23:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.