Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20371

Invalid reads at plan refinement stage: join->positions instead of best_positions

    XMLWordPrintable

Details

    Description

      The semi-join optimizer has two phases:

      • When building the join order, advance_sj_state() is invoked after a table has been added to the join prefix.
      • When the join order is picked, fix_semijoin_strategies_for_picked_join_order() is called to make the final fixes in semi-join optimization

      Both function share some code (e.g. calls optimize_wo_join_buffering()). The issue is:

      • the first phase operates on current join prefix: join->positions.
      • the second phase operates on the picked join order, join->best_positions.

      but some code just refers to the join->positions. That array contains the last-considered query plan. (Sometimes, it's the same as the picked query plan. In general case, it is not).

      This can lead to hard-to-catch bugs.

      Let's apply this patch to make the issue visible:

      diff --git a/sql/opt_subselect.cc b/sql/opt_subselect.cc
      index 599642b3a26..f0e74157f9a 100644
      --- a/sql/opt_subselect.cc
      +++ b/sql/opt_subselect.cc
      @@ -2786,6 +2786,18 @@ void advance_sj_state(JOIN *join, table_map remaining_tables, uint idx,
           &pos->dups_weedout_picker,
           NULL,
         };
      +  
      +  MEM_UNDEFINED(&pos->firstmatch_picker, sizeof(pos->firstmatch_picker));
      +  MEM_UNDEFINED(&pos->loosescan_picker, sizeof(pos->loosescan_picker));
      +  MEM_UNDEFINED(&pos->sjmat_picker, sizeof(pos->sjmat_picker));
      +  MEM_UNDEFINED(&pos->dups_weedout_picker, sizeof(pos->dups_weedout_picker));
      +
      +  // The following classes use virtual functions so their memory needs to
      +  // be properly initialized
      +  new (&pos->firstmatch_picker) Firstmatch_picker;
      +  new (&pos->loosescan_picker) LooseScan_picker;
      +  new (&pos->sjmat_picker) Sj_materialization_picker;
      +  new (&pos->dups_weedout_picker) Duplicate_weedout_picker;
       
         if (join->emb_sjm_nest)
         {
      diff --git a/sql/sql_select.cc b/sql/sql_select.cc
      index ff07a7aea89..b01769209fe 100644
      --- a/sql/sql_select.cc
      +++ b/sql/sql_select.cc
      @@ -8146,6 +8146,9 @@ choose_plan(JOIN *join, table_map join_tables)
           if (greedy_search(join, join_tables, search_depth, prune_level,
                             use_cond_selectivity))
             DBUG_RETURN(TRUE);
      +    for (uint k=0; k < join->table_count; k++) {
      +      MEM_UNDEFINED(&join->positions[k], sizeof(join->positions[k]));
      +    }
         }
       
         /* 
      

      Running

      ./mysql-test-run --valgrind --mem --parallel=6 --force main/subselect*test
      

      Produces a lot of valgrind failures.
      Example error

      main.subselect_exists2in                 w2 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2019-08-17 21:45:39
      line
      ==20460== Thread 6:
      ==20460== Use of uninitialised value of size 8
      ==20460==    at 0x95378C: prev_record_reads(st_position*, unsigned int, unsigned long long) (sql_select.cc:9775)
      ==20460==    by 0x94C2DC: best_access_path(JOIN*, st_join_table*, unsigned long long, unsigned int, bool, double, st_position*, st_position*) (sql_select.cc:7303)
      ==20460==    by 0xAD1970: fix_semijoin_strategies_for_picked_join_order(JOIN*) (opt_subselect.cc:3773)
      ==20460==    by 0x95419A: JOIN::get_best_combination() (sql_select.cc:10163)
      ==20460==    by 0x93B8CF: JOIN::optimize_stage2() (sql_select.cc:2241)
      ==20460==    by 0x93B732: JOIN::optimize_inner() (sql_select.cc:2220)
      ==20460==    by 0x9391D9: JOIN::optimize() (sql_select.cc:1563)
      ==20460==    by 0x94417B: mysql_select(THD*, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) (sql_select.cc:4592)
      ==20460==    by 0x93463C: handle_select(THD*, LEX*, select_result*, unsigned long) (sql_select.cc:413)
      ==20460==    by 0x8FD029: execute_sqlcom_select(THD*, TABLE_LIST*) (sql_parse.cc:6356)
      ==20460==    by 0x8F2D64: mysql_execute_command(THD*) (sql_parse.cc:3898)
      ==20460==    by 0x900DBE: mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) (sql_parse.cc:7908)
      ==20460==    by 0x8ECF02: dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) (sql_parse.cc:1842)
      ==20460==    by 0x8EB647: do_command(THD*) (sql_parse.cc:1359)
      ==20460==    by 0xA6B307: do_handle_one_connection(CONNECT*) (sql_connect.cc:1404)
      ==20460==    by 0xA6B056: handle_one_connection (sql_connect.cc:1306)
      

      Attachments

        Issue Links

          Activity

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.