Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25020

SELECT if there is IN clause with binary UUID in binary form is extremely slow since MariaDB 10.5.9

Details

    Description

      table with primery key binary(16)

      `SELECT from table WHERE key IN (binary form ids, ....)` is extremly slow

      while

      `SELECT from table WHERE key IN ((0x11e6a2ad166fc342abde002590a2360a),
      (0x11e6a2ad16750370aeaf002590a2360a),
      (0x11e6a2ad167a186a9e19002590a2360a),....)`
      is perfectly fast.

      IDs in binary form I mean this: https://tinyurl.com/ycfyftby

      Attachments

        1. log_fast.log
          96 kB
        2. log_slow.log
          6 kB
        3. screenshot-1.png
          screenshot-1.png
          44 kB

        Issue Links

          Activity

            forgie Ján Forgáč created issue -
            danblack Daniel Black made changes -
            Field Original Value New Value
            Assignee Sergei Petrunia [ psergey ]
            elenst Elena Stepanova made changes -
            Fix Version/s 10.5 [ 23123 ]
            psergei Sergei Petrunia made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            I was unable to reproduce. Can you share more details:

            • What collation is used in your connection to the database? ( select collation('abc'); will show it)
            • EXPLAINs for both queries
            • ideally, do this:

              set optimizer_trace=1;
              {run the query}
              select * from information_schema.optimizer_trace;
              

              and share the output with us.

            psergei Sergei Petrunia added a comment - I was unable to reproduce. Can you share more details: What collation is used in your connection to the database? ( select collation('abc'); will show it) EXPLAINs for both queries ideally, do this: set optimizer_trace=1; {run the query} select * from information_schema.optimizer_trace; and share the output with us.
            psergei Sergei Petrunia made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            psergei Sergei Petrunia made changes -
            Component/s Optimizer [ 10200 ]
            psergei Sergei Petrunia made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            Closing as "Cannot reproduce" for now. Feel free to re-open when more data is available.

            psergei Sergei Petrunia added a comment - Closing as "Cannot reproduce" for now. Feel free to re-open when more data is available.
            psergei Sergei Petrunia made changes -
            Fix Version/s N/A [ 14700 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Cannot Reproduce [ 5 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            forgie Ján Forgáč made changes -
            Attachment log_fast.log [ 58022 ]
            forgie Ján Forgáč made changes -
            Attachment log_slow.log [ 58023 ]
            forgie Ján Forgáč added a comment - - edited

            Rapid performance drop is obvious if in IN (....) are thousands or ten of thousands items.

            • 'collation("abc")' => 'utf8mb4_general_ci'
            • fast trace log: log_fast.log – not many iems in IN query so log is shorter
            • slow trace log: log_slow.log – slightly different query, but uses UUIDS in IN clause + not many iems in IN query so log is shorter + binary UUIDs are not properly represented in the text log ...

            Are these outputs enough for identifiing the problem?

            I think I do not have privilege to reopen this issue. Do I?

            forgie Ján Forgáč added a comment - - edited Rapid performance drop is obvious if in IN (....) are thousands or ten of thousands items. 'collation("abc")' => 'utf8mb4_general_ci' fast trace log: log_fast.log – not many iems in IN query so log is shorter slow trace log: log_slow.log – slightly different query, but uses UUIDS in IN clause + not many iems in IN query so log is shorter + binary UUIDs are not properly represented in the text log ... Are these outputs enough for identifiing the problem? I think I do not have privilege to reopen this issue. Do I?

            Re-opening

            psergei Sergei Petrunia added a comment - Re-opening
            psergei Sergei Petrunia made changes -
            Resolution Cannot Reproduce [ 5 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s N/A [ 14700 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 119650 ] MariaDB v4 [ 143681 ]
            allen.lee@mariadb.com Allen Lee (Inactive) made changes -
            Affects Version/s 10.5.13 [ 26026 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Summary SELECT if there is IN clouse with binary UUID in binary form is extremly slow sinc MariaDB 10.5.9 SELECT if there is IN clause with binary UUID in binary form is extremely slow since MariaDB 10.5.9
            psergei Sergei Petrunia made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            psergei Sergei Petrunia added a comment - - edited

            Analyzing the data in the provided files.
            10.1_test_result, 10.5_test_result:

                    10.5-reg        10.5-binary     10.1-reg    10.1-binary
                10  0.396300086     0.165200131     2.3458       2.0851
               100  2.345800193     1.408499915     11.937       0.1236
              1000  49.52399843     11.65679959     134.73       0.9971
             10000  488.1753836     113.1179959     1209.7       9.2002
            100000  771.0653742     179.5780646     1694         14.288
            

            Query plans:

                   10.5-regular  10.5-binary  10.1-regular  10.1-binary:
                10  range        range         range        range
               100  range        range         range        range
              1000  subquery     range         range        range
             10000  subquery     range         range        range
            100000  subquery     range         range        range
            

            psergei Sergei Petrunia added a comment - - edited Analyzing the data in the provided files. 10.1_test_result, 10.5_test_result: 10.5-reg 10.5-binary 10.1-reg 10.1-binary 10 0.396300086 0.165200131 2.3458 2.0851 100 2.345800193 1.408499915 11.937 0.1236 1000 49.52399843 11.65679959 134.73 0.9971 10000 488.1753836 113.1179959 1209.7 9.2002 100000 771.0653742 179.5780646 1694 14.288 Query plans: 10.5-regular 10.5-binary 10.1-regular 10.1-binary: 10 range range range range 100 range range range range 1000 subquery range range range 10000 subquery range range range 100000 subquery range range range
            psergei Sergei Petrunia made changes -
            Attachment screenshot-1.png [ 63716 ]

            Putting that on a chart:

            That is, for lookups on binary field, 10.1 was faster than 10.2.
            For non-binary (regular) field lookup, 10.5 is faster than 10.2. Note that 10.5 switches to using a subquery plan.

            psergei Sergei Petrunia added a comment - Putting that on a chart: That is, for lookups on binary field, 10.1 was faster than 10.2. For non-binary (regular) field lookup, 10.5 is faster than 10.2. Note that 10.5 switches to using a subquery plan.
            psergei Sergei Petrunia made changes -
            ccalender Chris Calender (Inactive) made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            weijiang Wei Jiang added a comment -

            I have observed the slowness even in 10.5.8 but the symptoms wasn't there on 10.5.4.

            weijiang Wei Jiang added a comment - I have observed the slowness even in 10.5.8 but the symptoms wasn't there on 10.5.4.
            maxmether Max Mether made changes -
            Priority Critical [ 2 ] Blocker [ 1 ]

            explain takes much more time in 10.5 as compared to 10.1 (140 sec vs 0.1 sec, ymmv).

            99.66% of the time 10.5 spends in SEL_ARG::update_weight_locally()

            With

            +  if (param->thd->variables.optimizer_max_sel_arg_weight)
                key1->update_weight_locally();
            

            explain in 10.5 runs with the same speed as in 10.1

            serg Sergei Golubchik added a comment - explain takes much more time in 10.5 as compared to 10.1 (140 sec vs 0.1 sec, ymmv). 99.66% of the time 10.5 spends in SEL_ARG::update_weight_locally() With + if (param->thd->variables.optimizer_max_sel_arg_weight) key1->update_weight_locally(); explain in 10.5 runs with the same speed as in 10.1

            Indeed, the call to update_weight_locally() is the problem.

            update_weight_locally() traverses the whole SEL_ARG tree ("locally" means it doesn't walk across next_key_part edges).

            Lets consider a call

            key_or(tree1, tree2) 
            

            where tree1 has a lot of SEL_ARG nodes,
            and tree2 is a tree that has just one SEL_ARG() node.

            The code in key_or() will use RB-Tree descend to find the location in tree1 where tree2 should be put in (grep for key1->find_range(key2) call). This takes O(LOG(N)). Adding O(N) in update_weight_locally() will indeed cause a big slowdown.

            psergei Sergei Petrunia added a comment - Indeed, the call to update_weight_locally() is the problem. update_weight_locally() traverses the whole SEL_ARG tree ("locally" means it doesn't walk across next_key_part edges). Lets consider a call key_or(tree1, tree2) where tree1 has a lot of SEL_ARG nodes, and tree2 is a tree that has just one SEL_ARG() node. The code in key_or() will use RB-Tree descend to find the location in tree1 where tree2 should be put in (grep for key1->find_range(key2) call). This takes O(LOG(N)). Adding O(N) in update_weight_locally() will indeed cause a big slowdown.

            Note that SEL_ARG::tree_delete() and SEL_ARG::insert() do update the value of SEL_ARG::weight.

            key_or() also messes with SEL_ARG nodes directly, when it joins trees with SEL_ARG objects representing identical or overlapping intervals. I've went through the code and it looks like neither of these direct SEL_ARG modifications change the weight of the tree.

            psergei Sergei Petrunia added a comment - Note that SEL_ARG::tree_delete() and SEL_ARG::insert() do update the value of SEL_ARG::weight. key_or() also messes with SEL_ARG nodes directly, when it joins trees with SEL_ARG objects representing identical or overlapping intervals. I've went through the code and it looks like neither of these direct SEL_ARG modifications change the weight of the tree.

            This patch passes the tests:

            diff --git a/sql/opt_range.cc b/sql/opt_range.cc
            index e3287e1bbea..e469070c8f6 100644
            --- a/sql/opt_range.cc
            +++ b/sql/opt_range.cc
            @@ -10206,6 +10206,9 @@ SEL_ARG *key_and_with_limit(RANGE_OPT_PARAM *param, uint keyno,
                ( 1  <  kp1 <= 2 AND 1 < kp2 < 10 ) OR
                ( 2  <  kp1 < 10 AND 1 < kp2 < 20 ) OR
                ( 10 <= kp1 < 20 AND 4 < kp2 < 20 )
            +
            +   Note: There is no need to manually update key1->weight when joining it with
            +   key2. Every operation we do on SEL_ARG tree key1 maintains its weight.
             */
             static SEL_ARG *
             key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2)
            @@ -10828,9 +10831,6 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2)
               }
               key1->use_count++;
             
            -  /* Re-compute the result tree's weight. */
            -  key1->update_weight_locally();
            -
               key1->max_part_no= max_part_no;
               return key1;
             }
            

            The only remaining place that uses update_weight_locally() is and_all_keys(). One can't just remove the update_weight_locally() call from there.

            psergei Sergei Petrunia added a comment - This patch passes the tests: diff --git a/sql/opt_range.cc b/sql/opt_range.cc index e3287e1bbea..e469070c8f6 100644 --- a/sql/opt_range.cc +++ b/sql/opt_range.cc @@ -10206,6 +10206,9 @@ SEL_ARG *key_and_with_limit(RANGE_OPT_PARAM *param, uint keyno, ( 1 < kp1 <= 2 AND 1 < kp2 < 10 ) OR ( 2 < kp1 < 10 AND 1 < kp2 < 20 ) OR ( 10 <= kp1 < 20 AND 4 < kp2 < 20 ) + + Note: There is no need to manually update key1->weight when joining it with + key2. Every operation we do on SEL_ARG tree key1 maintains its weight. */ static SEL_ARG * key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2) @@ -10828,9 +10831,6 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2) } key1->use_count++; - /* Re-compute the result tree's weight. */ - key1->update_weight_locally(); - key1->max_part_no= max_part_no; return key1; } The only remaining place that uses update_weight_locally() is and_all_keys(). One can't just remove the update_weight_locally() call from there.
            psergei Sergei Petrunia added a comment - https://github.com/MariaDB/server/commit/d17ed38893fa0213575813975212c8f4227c9972

            Addressed review input by Monty and pushed into 10.5

            psergei Sergei Petrunia added a comment - Addressed review input by Monty and pushed into 10.5
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.5.17 [ 27509 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.6.9 [ 27507 ]
            Fix Version/s 10.7.5 [ 27505 ]
            Fix Version/s 10.8.4 [ 27503 ]
            elenst Elena Stepanova made changes -
            psergei Sergei Petrunia made changes -
            Joriz Joris de Leeuw made changes -
            Joriz Joris de Leeuw made changes -
            forgie Ján Forgáč made changes -
            Comment [ This problem is still present in 10.6.12 and 10.6.15 as well as in 10.5.22.
            I didn't tested more versions. ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 167602

            People

              psergei Sergei Petrunia
              forgie Ján Forgáč
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.