Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25020

SELECT if there is IN clause with binary UUID in binary form is extremely slow since MariaDB 10.5.9

Details

    Description

      table with primery key binary(16)

      `SELECT from table WHERE key IN (binary form ids, ....)` is extremly slow

      while

      `SELECT from table WHERE key IN ((0x11e6a2ad166fc342abde002590a2360a),
      (0x11e6a2ad16750370aeaf002590a2360a),
      (0x11e6a2ad167a186a9e19002590a2360a),....)`
      is perfectly fast.

      IDs in binary form I mean this: https://tinyurl.com/ycfyftby

      Attachments

        1. log_fast.log
          96 kB
        2. log_slow.log
          6 kB
        3. screenshot-1.png
          screenshot-1.png
          44 kB

        Issue Links

          Activity

            Indeed, the call to update_weight_locally() is the problem.

            update_weight_locally() traverses the whole SEL_ARG tree ("locally" means it doesn't walk across next_key_part edges).

            Lets consider a call

            key_or(tree1, tree2) 
            

            where tree1 has a lot of SEL_ARG nodes,
            and tree2 is a tree that has just one SEL_ARG() node.

            The code in key_or() will use RB-Tree descend to find the location in tree1 where tree2 should be put in (grep for key1->find_range(key2) call). This takes O(LOG(N)). Adding O(N) in update_weight_locally() will indeed cause a big slowdown.

            psergei Sergei Petrunia added a comment - Indeed, the call to update_weight_locally() is the problem. update_weight_locally() traverses the whole SEL_ARG tree ("locally" means it doesn't walk across next_key_part edges). Lets consider a call key_or(tree1, tree2) where tree1 has a lot of SEL_ARG nodes, and tree2 is a tree that has just one SEL_ARG() node. The code in key_or() will use RB-Tree descend to find the location in tree1 where tree2 should be put in (grep for key1->find_range(key2) call). This takes O(LOG(N)). Adding O(N) in update_weight_locally() will indeed cause a big slowdown.

            Note that SEL_ARG::tree_delete() and SEL_ARG::insert() do update the value of SEL_ARG::weight.

            key_or() also messes with SEL_ARG nodes directly, when it joins trees with SEL_ARG objects representing identical or overlapping intervals. I've went through the code and it looks like neither of these direct SEL_ARG modifications change the weight of the tree.

            psergei Sergei Petrunia added a comment - Note that SEL_ARG::tree_delete() and SEL_ARG::insert() do update the value of SEL_ARG::weight. key_or() also messes with SEL_ARG nodes directly, when it joins trees with SEL_ARG objects representing identical or overlapping intervals. I've went through the code and it looks like neither of these direct SEL_ARG modifications change the weight of the tree.

            This patch passes the tests:

            diff --git a/sql/opt_range.cc b/sql/opt_range.cc
            index e3287e1bbea..e469070c8f6 100644
            --- a/sql/opt_range.cc
            +++ b/sql/opt_range.cc
            @@ -10206,6 +10206,9 @@ SEL_ARG *key_and_with_limit(RANGE_OPT_PARAM *param, uint keyno,
                ( 1  <  kp1 <= 2 AND 1 < kp2 < 10 ) OR
                ( 2  <  kp1 < 10 AND 1 < kp2 < 20 ) OR
                ( 10 <= kp1 < 20 AND 4 < kp2 < 20 )
            +
            +   Note: There is no need to manually update key1->weight when joining it with
            +   key2. Every operation we do on SEL_ARG tree key1 maintains its weight.
             */
             static SEL_ARG *
             key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2)
            @@ -10828,9 +10831,6 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2)
               }
               key1->use_count++;
             
            -  /* Re-compute the result tree's weight. */
            -  key1->update_weight_locally();
            -
               key1->max_part_no= max_part_no;
               return key1;
             }
            

            The only remaining place that uses update_weight_locally() is and_all_keys(). One can't just remove the update_weight_locally() call from there.

            psergei Sergei Petrunia added a comment - This patch passes the tests: diff --git a/sql/opt_range.cc b/sql/opt_range.cc index e3287e1bbea..e469070c8f6 100644 --- a/sql/opt_range.cc +++ b/sql/opt_range.cc @@ -10206,6 +10206,9 @@ SEL_ARG *key_and_with_limit(RANGE_OPT_PARAM *param, uint keyno, ( 1 < kp1 <= 2 AND 1 < kp2 < 10 ) OR ( 2 < kp1 < 10 AND 1 < kp2 < 20 ) OR ( 10 <= kp1 < 20 AND 4 < kp2 < 20 ) + + Note: There is no need to manually update key1->weight when joining it with + key2. Every operation we do on SEL_ARG tree key1 maintains its weight. */ static SEL_ARG * key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2) @@ -10828,9 +10831,6 @@ key_or(RANGE_OPT_PARAM *param, SEL_ARG *key1,SEL_ARG *key2) } key1->use_count++; - /* Re-compute the result tree's weight. */ - key1->update_weight_locally(); - key1->max_part_no= max_part_no; return key1; } The only remaining place that uses update_weight_locally() is and_all_keys(). One can't just remove the update_weight_locally() call from there.
            psergei Sergei Petrunia added a comment - https://github.com/MariaDB/server/commit/d17ed38893fa0213575813975212c8f4227c9972

            Addressed review input by Monty and pushed into 10.5

            psergei Sergei Petrunia added a comment - Addressed review input by Monty and pushed into 10.5

            People

              psergei Sergei Petrunia
              forgie Ján Forgáč
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.