Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39927

Optimizing for distributed engines

    XMLWordPrintable

Details

    • New Feature
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      Optimizing for distributed engines

      Assuming engine L1 (local table) and D1 (distributed table) optimizer
      decided that we will use L1 before D1 and one can use a key to access
      D1 from L1.

      The optimizer will use the following algorithm:

      • Read one row from L1
      • Ask engine to read all matching rows based on key access from D1
      • Loop over all the returned rows from D1
      • Check if the row combination L1:D1 matches the where clause
      • If yes, send the row combination forwards (to join with next
        table, send it the the end user etc.

      This means that for every row in L1 there will be a remote access to D1,
      which can be very slow.

      Another problem is that pushdown conditions to remote engines cannot
      include fields from L1 or any other previous table as the values
      in the pushdown changes for every row combination.

      The task is to create a new read-by-key interface that reduced the number
      of remote access and also optimizes pushdown conditions for remote tables

      Note that if D1 would be the first table, this performance problem does
      not exists. In this case the engine would scan D1 and the engine would
      stream the values to the server and do local key lookups in L1 (which
      is fast).

      ---------------------

      Suggested solution:

      When the SQL layer notices that the next table is a distributed table
      it will push all previous row combination into a buffer (like we do
      with hash join). When buffer is full it will do one request for
      all the buffered row combinations to D1.

      New handler interfaces:

      /*
        Prepare a multi-read by key for a distributed engine
       
        index        Index number
        index_parts  How may index parts are used
        param        Parameters used by the pushdown condition,
                     in left-to-right order
        params       Number of params in the pushdown condition
      */
       
      int handler::index_multi_key_read_with_pushdown_prepare(
          uint index, uint index_parts, Field *param, uint params,
          uint64 record_length, Item *pushdown);
      

      /*
         Start a read of of multiple rows
       
         row_data           pointer to record with key & params
         rows               number of rows in row_data
       
         Row data is record (like for other handler operations) consisting
         of data for 'index_parts' key field (exactly like for a index_read())
         followed by data for each param.  The layout is exactly as for
         a normal record, except that null map is after the first key.
         The 'param' will initially point to the first record in row_data.
      */
      int handler::index_multi_key_read_params(uchar **row_data, uint32 rows);
      

      /*
         Read a row from the result of index_mult_key_read()
       
         record   Record for read data
         offset   offset (starting from 0) for the first matching row in row_data
      */
       
      int handler::index_multi_key_read(uchar *record, uint32 *offset);
      

      The call sequence for the handle will be:

      • index_multi_key_read_with_pushdown_prepare()
      • until no more data in L1
      • index_multi_key_read_params()
      • call index_multi_key_read() until HA_ERR_END_OF_FILE
      • index_end()

      Pushdown condition and params:

      • The pushdown condition will contain constants, function calls,
        parameters (pointers to elements in the param list) and fields in D1.
      • One can move the param fields to other rows in row_data with
        the field::move_field_offset().

      Notes:

      • The SQL layer will sort row data in key_parts + param data order.
      • The SQL layer will remove all duplicates of key + param combinations
        before sending to the engine.
      • The above will reduce calls for example if there is a lot of
        request to the same row in D1.

      As the pushdown can contain things like BETWEEN, any returned row from
      D1 may match several rows in L1. The SQL layer has to check all returned
      rows from D1 with the corresponding row L1 with the same key combination.
      This is why having the 'offset' in index_multi_key_read() would be useful
      as it speeds up finding from where to start the check.
      The SQL Layer should also check the WHERE clause for all row combinations
      as the engine may have ignored checking some functions it does not support.

      Attachments

        Activity

          People

            nikitamalyavin Nikita Malyavin
            monty Michael Widenius
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 15d
                15d
                Remaining:
                Remaining Estimate - 15d
                15d
                Logged:
                Time Spent - Not Specified
                Not Specified

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.