Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26013

distinct not work properly in some cases for spider tables

Details

    Description

      To reproduce:

      SET @@session.spider_same_server_link = ON; 
       
      GRANT ALL PRIVILEGES ON *.* TO 'spinne'@'127.0.0.1'  IDENTIFIED BY 'Widow2021!';
       
      CREATE SERVER data1
      FOREIGN DATA WRAPPER mysql
      OPTIONS(
      HOST '127.0.0.1',
      DATABASE 'test',
      USER 'spinne',
      PASSWORD 'Widow2021!'
      ); 
       
      CREATE TABLE IF NOT EXISTS `spider_tab` (
        `col1` varchar(10) NOT NULL,
        `col2` varchar(10) NOT NULL,
        PRIMARY KEY (`col1`,`col2`)
      ) ENGINE=SPIDER DEFAULT CHARSET=utf8 COMMENT='wrapper "mariadb", table "spider_data_tab"'
      PARTITION BY LIST COLUMNS(`col2`)
      (PARTITION `pt1` DEFAULT COMMENT = 'srv "data1"' ENGINE = SPIDER);
       
      CREATE TABLE IF NOT EXISTS `spider_data_tab` (
        `col1` varchar(10) NOT NULL,
        `col2` varchar(10) NOT NULL,
        `col3` datetime NOT NULL DEFAULT '0001-01-01 00:00:00',
        PRIMARY KEY (`col1`,`col2`, `col3`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
       PARTITION BY RANGE  COLUMNS(`col3`)
      (PARTITION `pmax` VALUES LESS THAN (MAXVALUE) ENGINE = InnoDB);
       
      insert into `spider_data_tab` VALUES
      ("aaaaaaaaa","00100","2020-05-05 00:00:00")
      ,("bbbbbbbbb","00999","2020-05-06 00:00:00")
      ,("bbbbbbbbb","00100","2020-05-06 00:00:00")
      ,("ccccccccc","00999","2020-05-06 00:00:00")
      ,("ddddddddd","00999","2020-05-06 00:00:00")
      ,("eeeeeeeee","00999","2020-05-06 00:00:00")
      ,("fffffffff","00999","2020-05-06 00:00:00")
      ,("ggggggggg","00999","2020-05-06 00:00:00")
      ,("hhhhhhhhh","00999","2020-05-06 00:00:00");
       
       SELECT distinct col2 FROM spider_tab WHERE col2="00999";
       
      SELECT distinct col2 FROM spider_data_tab WHERE col2="00999";
      

      Attachments

        Activity

          I dug the code of 10.4 HEAD (fe9a0bf). There are a few assignments to JOIN::select_distinct and most of them substitute FALSE to the variable. The variable can be set to TRUE only by JOIN::init()

            void init(THD *thd_arg, List<Item> &fields_arg, ulonglong select_options_arg,
                 select_result *result_arg)
            {
              ...
              select_distinct= MY_TEST(select_options & SELECT_DISTINCT);
              ...
            }
          

          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - I dug the code of 10.4 HEAD (fe9a0bf). There are a few assignments to JOIN::select_distinct and most of them substitute FALSE to the variable. The variable can be set to TRUE only by JOIN::init() void init(THD *thd_arg, List<Item> &fields_arg, ulonglong select_options_arg, select_result *result_arg) { ... select_distinct= MY_TEST(select_options & SELECT_DISTINCT); ... }

          In fact, JOIN::select_distinct is TRUE at JOIN::init(). The variable is set to FALSE in JOIN::optimize_stage2().

          (gdb) watch -location select_distinct
          Hardware watchpoint 2: -location select_distinct
          (gdb) c
          Continuing.
          ...
          Thread 27 "mysqld" hit Hardware watchpoint 2: -location select_distinct
           
          Old value = true
          New value = false
          JOIN::optimize_stage2 (this=0x7fb798016258) at /home/vagrant/repo/mariadb-server/sql/sql_select.cc:2737
          

          2735	/*  Change DISTINCT to GROUP BY */
          2736	select_distinct= 0;
          2737	no_order= !order;
          2738	if (all_order_fields_used
          

          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - In fact, JOIN::select_distinct is TRUE at JOIN::init() . The variable is set to FALSE in JOIN::optimize_stage2() . (gdb) watch -location select_distinct Hardware watchpoint 2: -location select_distinct (gdb) c Continuing. ... Thread 27 "mysqld" hit Hardware watchpoint 2: -location select_distinct   Old value = true New value = false JOIN::optimize_stage2 (this=0x7fb798016258) at /home/vagrant/repo/mariadb-server/sql/sql_select.cc:2737 2735 /* Change DISTINCT to GROUP BY */ 2736 select_distinct= 0; 2737 no_order= !order; 2738 if (all_order_fields_used

          I think I finally understand the problem. The function JOIN::optimize_stage2() optimizes away DISTINCT (i.e., sets select_distinct = 0) and sets no_order = 1 but nullifies group_list (sql_select.cc:2765). This could result in a wrong handling of DISTINCT clause in group_by_handlers. So, a possible fix would be the following:

          diff --git a/sql/sql_select.cc b/sql/sql_select.cc
          index d8116b20cbe..72b5be0f63e 100644
          --- a/sql/sql_select.cc
          +++ b/sql/sql_select.cc
          @@ -3275,7 +3275,7 @@ bool JOIN::make_aggr_tables_info()
               if (ht && ht->create_group_by)
               {
                 /* Check if the storage engine can intercept the query */
          -      Query query= {&all_fields, select_distinct, tables_list, conds,
          +      Query query= {&all_fields, (select_distinct || (no_order && group_optimized_away)), tables_list, conds,
                               group_list, order ? order : group_list, having};
                 group_by_handler *gbh= ht->create_group_by(thd, &query);
          

          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - - edited I think I finally understand the problem. The function JOIN::optimize_stage2() optimizes away DISTINCT (i.e., sets select_distinct = 0 ) and sets no_order = 1 but nullifies group_list (sql_select.cc:2765). This could result in a wrong handling of DISTINCT clause in group_by_handlers. So, a possible fix would be the following: diff --git a/sql/sql_select.cc b/sql/sql_select.cc index d8116b20cbe..72b5be0f63e 100644 --- a/sql/sql_select.cc +++ b/sql/sql_select.cc @@ -3275,7 +3275,7 @@ bool JOIN::make_aggr_tables_info() if (ht && ht->create_group_by) { /* Check if the storage engine can intercept the query */ - Query query= {&all_fields, select_distinct, tables_list, conds, + Query query= {&all_fields, (select_distinct || (no_order && group_optimized_away)), tables_list, conds, group_list, order ? order : group_list, having}; group_by_handler *gbh= ht->create_group_by(thd, &query);
          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - serg I pushed my fix. Please review: https://github.com/MariaDB/server/commit/5e16a49f5d95488bf3297cd51a1a0772612a2e35

          serg I replied to the last email from you. Please take another look at https://github.com/MariaDB/server/commit/ee59187906d87a71106080d1c5728f32cdf59382

          nayuta-yanagisawa Nayuta Yanagisawa (Inactive) added a comment - - edited serg I replied to the last email from you. Please take another look at https://github.com/MariaDB/server/commit/ee59187906d87a71106080d1c5728f32cdf59382

          People

            nayuta-yanagisawa Nayuta Yanagisawa (Inactive)
            Richard Richard Stracke
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.