Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20250

Implement early partition pruning before statement start

    XMLWordPrintable

Details

    Description

      Extend MDEV-11084 to also not open automatically excluded partitions

      In particular, INSERT, DELETE, UPDATE and SELECT should not open any partitions that can be excluded by parsing the WHERE clause.

      Contents

      1. Details about what is [not] opened/locked
      1.1. open_tables() call
      1.2. lock_tables (actually statement start) 
      1.3. Statement end: close_thread_tables
       
      2. The goal of this MDEV
      3. Solutions
      3.1 Variant 1: Split open_tables and lock_tables, Early Pruning
      3.1.1 Opening-and-locking problem
      3.1.2 Why two pruning steps
      3.1.3 Early pruning and UPDATE/DELETE
      3.2 Variant 2: Modified locking semantics of ha_partition::lock.
       
      4. Background - How does MySQL do pruning
      4.1 Pruning for SELECTs
      4.2 Pruning for INSERTs
      4.3 Pruning for DELETEs 
      

      1. Details about what is [not] opened/locked

      The steps at query processing are:

      1.1. open_tables() call

      This will create handler object and call handler::open.
      Note that this is done only if the table was not found in the Table Cache. Most queries are expected to hit the table cache and will not do this.

      Example stack traces:

        #1  0x000055555670182b in innobase_create_handler 
        #2  0x00005555563167db in get_new_handler 
        #3  0x00005555566092a6 in ha_partition::create_handlers 
        #4  0x000055555660a013 in ha_partition::setup_engine_array 
        #5  0x000055555660a573 in ha_partition::get_from_handler_file 
        #6  0x0000555556601fc2 in ha_partition::initialize_partition 
        #7  0x0000555556601411 in partition_create_handler 
        #8  0x00005555563167db in get_new_handler 
        #9  0x00005555560b5d9d in TABLE_SHARE::init_from_binary_frm_image 
        #10 0x00005555560b128f in open_table_def 
        #11 0x0000555556236d6a in tdc_acquire_share 
        #12 0x0000555555e85f2e in open_table 
      

        #0  ha_innobase::open 
        #1  0x000055555631eb35 in handler::ha_open 
        #2  0x000055555661a5e4 in ha_partition::open_read_partitions 
        #3  0x000055555660bed2 in ha_partition::open 
        #4  0x000055555631eb35 in handler::ha_open 
        #5  0x00005555560bde90 in open_table_from_share 
        #6  0x0000555555e8656e in open_table 
      

      1.2. lock_tables (actually statement start)

      Historically this was "locking" for MyISAM. Now it's more about informing the storage engine that the statement is about to start. The calls made are handler::store_lock and handler::external_lock. Inside a transaction or statement, handler::start_stmt is called instead.
      Example stack traces:

        #0  ha_innobase::store_lock 
        #1  0x000055555660d248 in ha_partition::store_lock 
        #2  0x00005555564968e4 in get_lock_data 
        #3  0x00005555564950a0 in mysql_lock_tables 
        #4  0x0000555555e8e5d8 in lock_tables 
      

        #0  ha_innobase::external_lock 
        #1  0x000055555632b7f0 in handler::ha_external_lock 
        #2  0x000055555660cd13 in ha_partition::external_lock 
        #3  0x000055555632b7f0 in handler::ha_external_lock 
        #4  0x0000555556495592 in lock_external 
        #5  0x0000555556495218 in mysql_lock_tables 
        #6  0x00005555564950d4 in mysql_lock_tables 
        #7  0x0000555555e8e5d8 in lock_tables 
      

      1.3. Statement end: close_thread_tables

      This will make handler::reset call

        #0  ha_innobase::reset 
        #1  0x000055555632bcb9 in handler::ha_reset 
        #2  0x000055555661af11 in ha_partition::reset 
        #3  0x000055555632bcb9 in handler::ha_reset 
        #4  0x0000555555e83f6e in close_thread_table 
        #5  0x0000555555e83bf3 in close_thread_tables 
      

      2. The goal of this MDEV

      It is about not doing the Step#1.2. lock_tables (actually statement start) for partitions that the statement will not access. TODO: and also Step 3?

      This way, a statement will not "touch" the partitions that can be pruned away by partition pruning. The only exception is when the statement is opening the table (as opposed to getting it from the open table cache which is what should normally happen).

      3. Solutions

      3.1 Variant 1: Split open_tables and lock_tables, Early Pruning

      The code is supposed to work as follows:

      • open_tables (see above: this will not have O(#partitions) overhead)
      • Name Resolution
      • NEW: Early Partition Pruning.
      • "locking tables" (See above, it's not actually locking). Here we would benefit from Early Partition Pruning.
      • Late Partition Pruning (as was done before).

      (See section 3.1.2 below for discussion why we need both Early Partition Pruning and Late Partition Pruning)

      3.1.1 Opening-and-locking problem

      The problem is that currently for some statements, the code works as follows:

      • Step 1: open_tables + "Locking tables"
      • Step 2: Name Resolution.
      • Step 3: Late Partition pruning

      See also details in the comment: https://jira.mariadb.org/browse/MDEV-20250?focusedCommentId=290037&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-290037

      3.1.2 Why two pruning steps

      In most cases, one can do Partition Pruning without having to access the query's table data. This kind of pruning can be done in "Early Partition Pruning" mentioned above.

      However, there are examples where Partition Pruning on tableX requires reading rows from tableY (So, tableY must be "locked"). See the comments, reference to two-partition-prunings.sql.

      Note that we will need to

      • Make sure Early Partition Pruning doesn't attempt to read tables.
      • Also do Late Partition Pruning without this restriction.
      • Perhaps, find a way to quickly check if it makes any sense to do Late Partition Pruning and skip it if no (which is more common)?

      3.1.3 Early pruning and UPDATE/DELETE

      A question: if MDEV-28883 has already provided the right code structure for UPDATE/DELETE statements, perhaps we should implement the "do-not-open" logic for these statements, first? (It is not already implemented, is it?)

      3.2 Variant 2: Modified locking semantics of ha_partition::lock.

      A possible alternative

      • Leave all current opening/"locking" code in place.
      • However, all calls made in Section 1.2 lock_tables to ha_partition ( store_lock, external_lock, etc) will not affect the underlying partitions.
      • Add a new handler->lock_partitions() call which is made in the optimizer right after partition pruning is done.
      • ha_partition::lock_partitions() will made appropriate calls to ha_partition's child table handlers.
        • After that, possibly "unlock" the ha_partition itself (TODO does this have any impact except for MyISAM?).

      Will the above be able to cause e.g. deadlocks? It looks like no, but this needs to be elaborated on.

      3.2.1 Where to put handler::lock_partitions call

      See the comments, example with a reference to two-partition-prunings.sql.

      This means that the call to handler::lock_partitions() should be made on a per-SELECT basis.
      In the above example, it will be made for the subquery, then we will do partition pruning for the top-level select, and then we will make it for the top-level select.


      4. Background - How does MySQL do pruning

      Wanted to check how this was implemented in MySQL.

      4.1 Pruning for SELECTs

      Debugging a basic SELECT example, I can see that prune_partitions is called twice: in Query_block::prepare and in Query_block::optimize.

      create table ten(a int primary key);
      insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
      create table t1 (a int, b int) partition by hash(a) partitions 10;
      explain select * from t1 where a in (1,2,3);
      

      #0  prune_partitions (thd=0x7fff38001040, table=0x7fff380d0650, query_block=0x7fff38174170, pprune_cond=0x7fff38175408) at /home/psergey/dev-git/mysql-9.1/sql/range_optimizer/partition_pruning.cc:262
      #1  0x0000555558f127e8 in Query_block::apply_local_transforms (this=0x7fff38174170, thd=0x7fff38001040, prune=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_resolver.cc:879
      #2  0x0000555558f11bc9 in Query_block::prepare (this=0x7fff38174170, thd=0x7fff38001040, insert_field_list=0x0) at /home/psergey/dev-git/mysql-9.1/sql/sql_resolver.cc:642
      #3  0x0000555558f48fed in Sql_cmd_select::prepare_inner (this=0x7fff38acb878, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:673
      #4  0x0000555558f489d4 in Sql_cmd_dml::prepare (this=0x7fff38acb878, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:587
      #5  0x0000555558f492b7 in Sql_cmd_dml::execute (this=0x7fff38acb878, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:728
      #6  0x0000555558eb8acb in mysql_execute_command (thd=0x7fff38001040, first_level=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:4669
      #7  0x0000555558ebafb5 in dispatch_sql_command (thd=0x7fff38001040, parser_state=0x7fffd02658f0, is_retry=false) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:5331
      

      #0  prune_partitions (thd=0x7fff38001040, table=0x7fff380d0650, query_block=0x7fff38ac9980, pprune_cond=0x7fff38acac18) at /home/psergey/dev-git/mysql-9.1/sql/range_optimizer/partition_pruning.cc:262
      #1  0x0000555558e7b98a in JOIN::prune_table_partitions (this=0x7fff38acda28) at /home/psergey/dev-git/mysql-9.1/sql/sql_optimizer.cc:2848
      #2  0x0000555558e73f35 in JOIN::optimize (this=0x7fff38acda28, finalize_access_paths=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_optimizer.cc:506
      #3  0x0000555558f4cef2 in Query_block::optimize (this=0x7fff38ac9980, thd=0x7fff38001040, finalize_access_paths=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:2109
      #4  0x0000555559009a23 in Query_expression::optimize (this=0x7fff38ac98c0, thd=0x7fff38001040, materialize_destination=0x0, finalize_access_paths=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_union.cc:508
      #5  0x0000555558f4a70c in Sql_cmd_dml::execute_inner (this=0x7fff38acd658, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:1094
      #6  0x0000555558f4971b in Sql_cmd_dml::execute (this=0x7fff38acd658, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:792
      #7  0x0000555558eb8acb in mysql_execute_command (thd=0x7fff38001040, first_level=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:4669
      #8  0x0000555558ebafb5 in dispatch_sql_command (thd=0x7fff38001040, parser_state=0x7fffd02658f0, is_retry=false) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:5331
      

      4.2 Pruning for INSERTs

      insert into t1 values (11,11), (12,12);
      

      This is done in Sql_cmd_dml::prepare():

      #0  Sql_cmd_insert_base::prepare_inner (this=0x7fff38ad4a78, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_insert.cc:1473
      #1  0x0000555558f489d4 in Sql_cmd_dml::prepare (this=0x7fff38ad4a78, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:587
      #2  0x0000555558f492b7 in Sql_cmd_dml::execute (this=0x7fff38ad4a78, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:728
      #3  0x0000555558eb5524 in mysql_execute_command (thd=0x7fff38001040, first_level=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:3656
      #4  0x0000555558ebafb5 in dispatch_sql_command (thd=0x7fff38001040, parser_state=0x7fffd02658f0, is_retry=false) at /home/psergey/dev-git/mysql-9.
      1/sql/sql_parse.cc:5331
      

      4.3 Pruning for DELETEs

      explain delete from t1 where a in (1,2);
      

      Again, there are two calls to prune_partitions:

      #0  prune_partitions (thd=0x7fff38001040, table=0x7fff380d0650, query_block=0x7fff38ad2a80, pprune_cond=0x7fff38ad3578) at /home/psergey/dev-git/mysql-9.1/sql/range_optimizer/partition_pruning.cc:252
      #1  0x0000555558f127e8 in Query_block::apply_local_transforms (this=0x7fff38ad2a80, thd=0x7fff38001040, prune=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_resolver.cc:879
      #2  0x00005555596d1378 in Sql_cmd_delete::prepare_inner (this=0x7fff38ad4b18, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_delete.cc:887
      #3  0x0000555558f489d4 in Sql_cmd_dml::prepare (this=0x7fff38ad4b18, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:587
      #4  0x0000555558f492b7 in Sql_cmd_dml::execute (this=0x7fff38ad4b18, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:728
      #5  0x0000555558eb5524 in mysql_execute_command (thd=0x7fff38001040, first_level=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:3656
      

      and

      #0  prune_partitions (thd=0x7fff38001040, table=0x7fff380d0650, query_block=0x7fff38ad2a80, pprune_cond=0x7fff38ad3578) at /home/psergey/dev-git/mysql-9.1/sql/range_optimizer/partition_pruning.cc:262
      #1  0x00005555596cf1dc in Sql_cmd_delete::delete_from_single_table (this=0x7fff38ad4b18, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_delete.cc:393
      #2  0x00005555596d1609 in Sql_cmd_delete::execute_inner (this=0x7fff38ad4b18, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_delete.cc:913
      #3  0x0000555558f4971b in Sql_cmd_dml::execute (this=0x7fff38ad4b18, thd=0x7fff38001040) at /home/psergey/dev-git/mysql-9.1/sql/sql_select.cc:792
      #4  0x0000555558eb5524 in mysql_execute_command (thd=0x7fff38001040, first_level=true) at /home/psergey/dev-git/mysql-9.1/sql/sql_parse.cc:3656
      

      Attachments

        Issue Links

          Activity

            People

              holyfoot Alexey Botchkov
              serg Sergei Golubchik
              Votes:
              1 Vote for this issue
              Watchers:
              18 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.