Details

    Description

      currently a collation lookup (collation name to collation id) is done (in get_collation_number_internal()) by linearly scanning the array of collations.

      This used to work fine, but the number of collations is constantly growing (MDEV-27009, MDEV-20912) and the linear search is starting to become noticeable.

      It would be a good idea to change it to a hash lookup.

      Same can be done for character sets, even though the number of charsets didn't grow that much.

      Attachments

        Issue Links

          Activity

            Hi Team,
            can I look into this task ?

            jitu1511 Jitesh Chawla added a comment - Hi Team, can I look into this task ?

            of course! please, do, thank you

            serg Sergei Golubchik added a comment - of course! please, do, thank you

            Hi Team,
            I have created the PR for task
            https://github.com/MariaDB/server/pull/3881

            jitu1511 Jitesh Chawla added a comment - Hi Team, I have created the PR for task https://github.com/MariaDB/server/pull/3881

            Benchmark for original version (3x utf32_uca1400_croatian_ai_ci and 3x latin1_bin):

            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_latin1'' collate latin1_bin);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (2,546 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_latin1'' collate latin1_bin);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (2,518 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_latin1'' collate latin1_bin);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (2,532 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (20,274 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (20,502 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (20,235 sec)
            

            Benchmark for optimised version (3x utf32_uca1400_croatian_ai_ci and 3x latin1_bin):

            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (3,360 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (3,318 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (3,320 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_latin1'' collate latin1_bin);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (2,597 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_latin1'' collate latin1_bin);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (2,598 sec)
             
            MariaDB [(none)]> DELIMITER ;
            MariaDB [(none)]> DELIMITER /
            MariaDB [(none)]> FOR i IN 1..1000000 DO
                ->   DO COALESCE(_latin1'' collate latin1_bin);
                -> END FOR;
                -> /
            Query OK, 0 rows affected (2,610 sec)
            

            Summary:
            Patched no PFS: croation 3.360 3.318 3.320
            Patched no PFS: latin1 2.597 2.598 2.610

            Unpatched no PFS: croation 20.274 20.502 20.235
            Unpatched no PFS: latin1 2.546 2.518 2.532

            Performance improvement for the utf32_uca1400_croatian_ai_ci sweet spot was 6x.
            Performance decline for the worst case latin1_bin was ~2%, which looks acceptable.

            svoj Sergey Vojtovich added a comment - Benchmark for original version (3x utf32_uca1400_croatian_ai_ci and 3x latin1_bin): MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_latin1'' collate latin1_bin); -> END FOR; -> / Query OK, 0 rows affected (2,546 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_latin1'' collate latin1_bin); -> END FOR; -> / Query OK, 0 rows affected (2,518 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_latin1'' collate latin1_bin); -> END FOR; -> / Query OK, 0 rows affected (2,532 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci); -> END FOR; -> / Query OK, 0 rows affected (20,274 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci); -> END FOR; -> / Query OK, 0 rows affected (20,502 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci); -> END FOR; -> / Query OK, 0 rows affected (20,235 sec) Benchmark for optimised version (3x utf32_uca1400_croatian_ai_ci and 3x latin1_bin): MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci); -> END FOR; -> / Query OK, 0 rows affected (3,360 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci); -> END FOR; -> / Query OK, 0 rows affected (3,318 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_utf32'' collate uca1400_croatian_ai_ci); -> END FOR; -> / Query OK, 0 rows affected (3,320 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_latin1'' collate latin1_bin); -> END FOR; -> / Query OK, 0 rows affected (2,597 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_latin1'' collate latin1_bin); -> END FOR; -> / Query OK, 0 rows affected (2,598 sec)   MariaDB [(none)]> DELIMITER ; MariaDB [(none)]> DELIMITER / MariaDB [(none)]> FOR i IN 1..1000000 DO -> DO COALESCE(_latin1'' collate latin1_bin); -> END FOR; -> / Query OK, 0 rows affected (2,610 sec) Summary: Patched no PFS: croation 3.360 3.318 3.320 Patched no PFS: latin1 2.597 2.598 2.610 Unpatched no PFS: croation 20.274 20.502 20.235 Unpatched no PFS: latin1 2.546 2.518 2.532 Performance improvement for the utf32_uca1400_croatian_ai_ci sweet spot was 6x. Performance decline for the worst case latin1_bin was ~2%, which looks acceptable.

            Hi svoj. Looks very good. Thanks!

            bar Alexander Barkov added a comment - Hi svoj . Looks very good. Thanks!
            Roel Roel Van de Paar added a comment - - edited

            No new regressions were found.

            Performance testing on a busy server showed a ~4.5x improvement, with max decline observed in a specific case at ~5% (i.e. -1.05x versus +4.5x).

            In general, OK to push if the small performance declines up to 5% are deemed acceptable.

            That said, perhaps some further improvents are needed to avoid small performance regressions.

            For example:

            CS 12.0.0 c92add291e636c797e6d6ddca605905541b2a441 (Optimized) Build 15/02/2025

            12.0.0-opt>FOR i IN 1..100000000 DO DO COALESCE(_utf8mb4''  collate utf8mb4_uca1400_ai_ci); END FOR; /
            Query OK, 0 rows affected (2 min 54.153 sec)
            

            Versus

            MDEV-35876 CS 12.0.0 c3f21762e9db30c4a5dd1e9ac676dbdafa303d4a (Optimized) Build 19/03/2025

            12.0.0-opt>FOR i IN 1..100000000 DO DO COALESCE(_utf8mb4''  collate utf8mb4_uca1400_ai_ci); END FOR; /
            Query OK, 0 rows affected (3 min 0.602 sec)
            

            Which is a 3.7% increase/performance decline.

            Roel Roel Van de Paar added a comment - - edited No new regressions were found. Performance testing on a busy server showed a ~4.5x improvement, with max decline observed in a specific case at ~5% (i.e. -1.05x versus +4.5x). In general, OK to push if the small performance declines up to 5% are deemed acceptable. That said, perhaps some further improvents are needed to avoid small performance regressions. For example: CS 12.0.0 c92add291e636c797e6d6ddca605905541b2a441 (Optimized) Build 15/02/2025 12.0.0-opt>FOR i IN 1..100000000 DO DO COALESCE(_utf8mb4'' collate utf8mb4_uca1400_ai_ci); END FOR; / Query OK, 0 rows affected (2 min 54.153 sec) Versus MDEV-35876 CS 12.0.0 c3f21762e9db30c4a5dd1e9ac676dbdafa303d4a (Optimized) Build 19/03/2025 12.0.0-opt>FOR i IN 1..100000000 DO DO COALESCE(_utf8mb4'' collate utf8mb4_uca1400_ai_ci); END FOR; / Query OK, 0 rows affected (3 min 0.602 sec) Which is a 3.7% increase/performance decline.

            People

              bar Alexander Barkov
              serg Sergei Golubchik
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.