Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6696

Improve UNION with LIMIT execution

    XMLWordPrintable

Details

    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 11.1(EOL)
    • None
    • None

    Description

      Hi guys, I have some queries like this:

      SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;$;294007/3nfe-1%" 
      UNION 
      SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;294007/3nfe-1%" 
      UNION 
      SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "j;1;%;%294007/3nfe-1%" 
      UNION 
      SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;$;%294007/3nfe-1%" 
      UNION 
      SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;294007/3nfe-1%" 
      UNION 
      SELECT doc_hash_key FROM mov_documentos WHERE doc_hash_key LIKE "%;%;%;%294007/3nfe-1%"
      LIMIT 1

      It's a search about a document number inside all documents (ok i could use others index but that's not the case), the problem is....
      The first SELECT query (with % at end of the query) return really fast (since it can use left side of % to search inside index), but executing others queries are very slow, the question is

      could we optimize this kind of query, since we don't have ORDER BY and we have a LIMIT?

      for example, (i don't know how is the code today and if this could break a shard strategy)
      1) execute the first query,
      2) if this return more than LIMIT + OFFSET rows, stop all unions,
      3) if not go to next query
      3.1) check that SELECT execution order (1,2,3,4,5,...) is necessary since we need deterministic results in UNION order or we will need a ORDER BY and loose this logic
      3.2) maybe if we want something faster we could do an: estimate cost of each query and execute queries using estimate cost order (i think this could be an optimizer option, instead of an SQL_ORDER_BLABLA, this avoid sql standard problems and we could use this in some old systems, if it fail we get back with the standard query order), or extend the first UNION clausule with UNION NO_ORDER, or something like it, i'm considering here two problems... in this example the first query is the faster... and return the right data, if i change the first with the last, maybe i could get the wrong result, in this case the (3.2) optimization couldn't be used, but think about a server that the optimizer stay on and we forget to set it off (like query cache = on and we don't use SQL_NO_CACHE) an NO_ORDER or something like it could be nice to ensure that union will be executed in the right order even with optimizer option turned on just an idea... maybe we could use other better idea, for example of SQL_CACHE, only the first SELECT allow the SQL_CACHE others SELECT don't allow it, something like it is nice
      (select sql_cache sql_no_union_order * form blabal union select * from blabla2 ....)
      with this we could mix in a single complex query both union order and non union order
      select (sql_union order) , (sql_no_union_order) from (any other query)

      the explain of this query today is (count( * )=1.826.326 rows):

      id select_type table type possible_keys key key_len ref rows Extra execution time without flush cache (seconds)
      1 PRIMARY mov_documentos range PRIMARY PRIMARY 77   1 Using where; Using index 0.031 <- fast
      2 UNION mov_documentos range PRIMARY PRIMARY 77   628434 Using where; Using index 6.755 <- slow (maybe a outlier result)
      3 UNION mov_documentos range PRIMARY PRIMARY 77   628434 Using where; Using index 0.671 <- relative fast
      4 UNION mov_documentos index   PRIMARY 77   1826326 Using where; Using index 1.467 <- slow
      5 UNION mov_documentos index   PRIMARY 77   1826326 Using where; Using index 1.373 <- slow
      6 UNION mov_documentos index   PRIMARY 77   1826326 Using where; Using index 1.528 <- slow
        UNION RESULT <union1,2,3,4,5,6> or <union optimized 1,4,2,5,3,6> ALL           (Union limit optimized) Total query time: 5.258

      if we execute a diferent union since we can optimize based in query cost, we could show at last row (7): something like "<union 1,4,2,5,3,6>" and at final Extra column could put "<Union order optimizer>"

      in this case we have a optimization from 5.258 seconds to 0.031 seconds ~= 170x faster

      well that's all =]

      Attachments

        Activity

          People

            psergei Sergei Petrunia
            rspadim roberto spadim
            Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.