Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36307

group by handler missing constant fields when selecting from a view

Details

    Description

      We use spider as an example. The test spider/bugfix.mdev_26345 fails with --view-protocol.

      A simpler testcase, without using view protocol, could be:

      # ... (setup)
      create table t2 (a int, b int, PRIMARY KEY (a, b));
      create table t1 (a int, b int, PRIMARY KEY (a, b)) ENGINE=Spider
      COMMENT='WRAPPER "mysql", srv "srv",TABLE "t2"';
      insert into t1 VALUES (1,4), (1,2), (2,11);
      create view v as SELECT MIN(b), a FROM t1 WHERE a=1;
      select * from v;
      # ... (cleanup)
      

      The expected output is

      MIN(b)	a
      2	1
      

      but the actual one is

      MIN(b)	a
      NULL	1
      

      The reason is that an Item_temptable_field is created for MIN(b), but the field itself is not added to the actual temp table because it is constant.

      More specifically, when a group by handler is successfully created (in JOIN::make_aggr_tables_info), we first call create_tmp_table() to create a temp table for the GBH, but we skip adding the field to the table (in the table->field linked list):

      // Create_tmp_table::add_fields > create_tmp_table > JOIN::make_aggr_tables_info:
            if (item->const_item() &&
                param->hidden_field_count < (fieldnr + 1))
              continue; // We don't have to store this
      

      We then call change_to_use_tmp_fields(), in which we create an Item_temptable_field for each of the select field (later become the join->fields Item list after the call join->set_items_ref_array(join->items1) in do_select()):

      // change_to_use_tmp_fields > JOIN::make_aggr_tables_info:
          else if ((field= item->get_tmp_table_field()))
          {
            if (item->type() == Item::SUM_FUNC_ITEM && field->table->group)
              item_field= ((Item_sum*) item)->result_item(thd, field);
            else
              item_field= (Item *) new (thd->mem_root) Item_temptable_field(thd, field);
      

      Therefore GBH has no way to store the query result to the (skipped) table field, even if it has the result. Later when writing results from join->fields to the record (select_unit::fill_record) then from the record to a tmp row (select_unit::write_record) that will be sent as results (select_send::send_data), the underlying Field has no value thus NULL.

      Why does it work with view but without GBH? Without GBH, no temp table is created for the GBH to store results to, and no Item_temptable_fields is created either. So the mismatch between the temp table fields and join->fields does not happen.

      Why does it work without view but with GBH? Without view, TMP_TABLE_ALL_COLUMNS is not set (see the code block below), causing item->get_tmp_table_field() to return NULL, so no Item_temptable_field will be created for MIN(b), and the item will remain the Item_sum_min, whose constant value was computed during optimization, so it will not be NULL without GBH storing any results in the (skipped) field.

      // mysql_derived_prepare:
        if (!(derived->table) &&
            derived->derived_result->create_result_table(thd, &unit->types, FALSE,
                                                         (first_select->options |
                                                         thd->variables.option_bits |
                                                         TMP_TABLE_ALL_COLUMNS),
                                                         &derived->alias,
                                                         FALSE, FALSE, FALSE,
                                                         0))
      

      This is a general issue that could happen to any GBH. However, in the mariadb codebase, only spider and the sequence engine use GBH, and I am not sure if a case can be produced for the sequence engine due to restrictions of sequence tables.

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            The same issue happens for derived tables, since TMP_TABLE_ALL_COLUMNS is set too:

            select * from (SELECT MIN(b), a FROM t1 WHERE a=1) as v;

            outputs the same wrong results (NULL, 1).

            One can easily check if a TABLE_LIST belongs to a view by checking that its belong_to_view field is not NULL. However this is not the case for derived tables as belong_to_derived is only true if the derived table has type DTYPE_MERGE, which is not the case in the example above.

            ycp Yuchen Pei added a comment - The same issue happens for derived tables, since TMP_TABLE_ALL_COLUMNS is set too: select * from ( SELECT MIN (b), a FROM t1 WHERE a=1) as v; outputs the same wrong results (NULL, 1) . One can easily check if a TABLE_LIST belongs to a view by checking that its belong_to_view field is not NULL. However this is not the case for derived tables as belong_to_derived is only true if the derived table has type DTYPE_MERGE, which is not the case in the example above.
            ycp Yuchen Pei added a comment -

            Another idea is: we could use a sledgehammer use the check the NULLness of thd->derived_tables to determine whether to create a GBH. This fixes the coverage of non-merged derived tables. We could even do this at the sql layer.

            ycp Yuchen Pei added a comment - Another idea is: we could use a sledgehammer use the check the NULLness of thd->derived_tables to determine whether to create a GBH. This fixes the coverage of non-merged derived tables. We could even do this at the sql layer.

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.