Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34911

Make SUBSTR(col, 1, n) = const_str sargable

Details

    Description

      Conditions looking like

      SUBSTR(str_col, 1, n) = const_str 
      

      where the prefix of str_col is compared against a constant value can be rewritten as range conditions as

      const_str_low <= str_col <= const_str_high
      

      which will improve the optimizer capabilities of building better execution plan (for example, using indexes on column str_col and applying range optimizations).
      The similar transformation is already implemented for conditions like

      str_col LIKE 'foo%'
      

      aiming for the same goal.

      Similar rewrite is implemented for DATE() and YEAR() function at MDEV-8320, and that approach may be used as a reference.

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            Testing at the current main f0961301c81c7f5b009c012c076abc326b203b4a, I'm getting the same "Illegal mix of collations" error, despite it containing the MDEV-35041 patch.

            So I cannot easily compare LIKE with SUBSTR in this case.

            However, if we do explain format=json select * from t where c like 'b%_c%';, then the error disappears and the output has access_type range. So something fishy is happening with the escape here

            ycp Yuchen Pei added a comment - - edited Testing at the current main f0961301c81c7f5b009c012c076abc326b203b4a, I'm getting the same "Illegal mix of collations" error, despite it containing the MDEV-35041 patch. So I cannot easily compare LIKE with SUBSTR in this case. However, if we do explain format=json select * from t where c like 'b%_c%'; , then the error disappears and the output has access_type range. So something fishy is happening with the escape here
            ycp Yuchen Pei added a comment - - edited

            I did some investigation. It failed at escaping, more specifically when appending a backslash:

            // escape_like_characters:
                  if ((ret= my_ci_wc_mb(cs, (my_wc_t) '\\', dst, dst_end)) <= 0)
                    return true; /* No space - no LIKE optimize */
            

            the wc to mb function resolves to my_wc_mb_8bit

            int my_wc_mb_8bit(CHARSET_INFO *cs,my_wc_t wc,
            		  uchar *str,
            		  uchar *end)
            {
              MY_UNI_IDX *idx;
             
              if (str >= end)
                return MY_CS_TOOSMALL;
              
              for (idx=cs->tab_from_uni; idx->tab ; idx++)
              {
                if (idx->from <= wc && idx->to >= wc)
                {
                  str[0]= idx->tab[wc - idx->from];
                  return (!str[0] && wc) ? MY_CS_ILUNI : 1; // returns MY_CS_ILUNI here
                }
              }
              return MY_CS_ILUNI;
            }
            

            The cs->tab_from_uni->tab has the following arrangement:

            (rr) p *cs->tab_from_uni->tab@128
            $51 = "\000\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !\"#$%&'()*+,-./0123456789:;<=>?\000ABCDEFGHIJKLMNOPQRSTUVWXYZ\000\000\000\000_\000abcdefghijklmnopqrstuvwxyz\000\000\000\000"
            

            and unlike ascii, the 92nd element is not backslash, but a \000.

            ycp Yuchen Pei added a comment - - edited I did some investigation. It failed at escaping, more specifically when appending a backslash: // escape_like_characters: if ((ret= my_ci_wc_mb(cs, (my_wc_t) '\\' , dst, dst_end)) <= 0) return true ; /* No space - no LIKE optimize */ the wc to mb function resolves to my_wc_mb_8bit int my_wc_mb_8bit(CHARSET_INFO *cs,my_wc_t wc, uchar *str, uchar *end) { MY_UNI_IDX *idx;   if (str >= end) return MY_CS_TOOSMALL; for (idx=cs->tab_from_uni; idx->tab ; idx++) { if (idx->from <= wc && idx->to >= wc) { str[0]= idx->tab[wc - idx->from]; return (!str[0] && wc) ? MY_CS_ILUNI : 1; // returns MY_CS_ILUNI here } } return MY_CS_ILUNI; } The cs->tab_from_uni->tab has the following arrangement: (rr) p *cs->tab_from_uni->tab@128 $51 = "\000\001\002\003\004\005\006\a\b\t\n\v\f\r\016\017\020\021\022\023\024\025\026\027\030\031\032\033\034\035\036\037 !\"#$%&'()*+,-./0123456789:;<=>?\000ABCDEFGHIJKLMNOPQRSTUVWXYZ\000\000\000\000_\000abcdefghijklmnopqrstuvwxyz\000\000\000\000" and unlike ascii, the 92nd element is not backslash, but a \000 .
            ycp Yuchen Pei added a comment - - edited

            The "good" news is that the corresponding

            c like 'b\\%_c%'
            

            failed for the same reason, i.e. my_wc_mb_8bit failing to convert a backslash to the swe7 charset. So I think it is ok and can be regarded as intended behaviour that this case does not result in a sargable substr. psergei

            In below the returned conv is NULL due to the conversion failure of the backslash char, which leads to the error reporting by my_coll_agg_error(args, nargs, fname.str, item_sep);

            // Type_std_attributes::agg_item_set_converter:
              for (i= 0, arg= args; i < nargs; i++, arg+= item_sep)
              {
                Item* conv= (*arg)->safe_charset_converter(thd, coll.collation); // the returned conv == NULL
                if (conv == *arg)
                  continue;
             
                if (!conv)
                {
                  if (nargs >=2 && nargs <= 3)
                  {
                    /* restore the original arguments for better error message */
                    args[0]= safe_args[0];
                    args[item_sep]= safe_args[1];
                  }
                  if (nargs == 1 && single_err)
                  {
                    /*
                      Use *single_err to produce an error message mentioning two
                      collations.
                    */
                    if (single_err->first)
                      my_coll_agg_error(args[0]->collation, single_err->coll, fname.str);
                    else
                      my_coll_agg_error(single_err->coll, args[0]->collation, fname.str);
                  }
                  else
                    my_coll_agg_error(args, nargs, fname.str, item_sep);
                  return TRUE;
                }
            

            Trace:

            my_wc_mb_8bit > my_convert_using_func > my_convert > copy_and_convert > String::copy > String::copy > Item_string::Item_string > Item::const_charset_converter > Item::const_charset_converter > Item_string::safe_charset_converter > Type_std_attributes::agg_item_set_converter > Type_std_attributes::agg_arg_charsets > Type_std_attributes::agg_arg_charsets_for_comparison > Item_func_or_sum::agg_arg_charsets_for_comparison > Item_func_like::fix_length_and_dec > Item_func::fix_fields > Item_func_like::fix_fields > Item::fix_fields_if_needed > Item::fix_fields_if_needed_for_scalar > Item::fix_fields_if_needed_for_bool > setup_conds > setup_without_group > JOIN::prepare > mysql_select

            ycp Yuchen Pei added a comment - - edited The "good" news is that the corresponding c like 'b\\%_c%' failed for the same reason, i.e. my_wc_mb_8bit failing to convert a backslash to the swe7 charset. So I think it is ok and can be regarded as intended behaviour that this case does not result in a sargable substr. psergei In below the returned conv is NULL due to the conversion failure of the backslash char, which leads to the error reporting by my_coll_agg_error(args, nargs, fname.str, item_sep); // Type_std_attributes::agg_item_set_converter: for (i= 0, arg= args; i < nargs; i++, arg+= item_sep) { Item* conv= (*arg)->safe_charset_converter(thd, coll.collation); // the returned conv == NULL if (conv == *arg) continue ;   if (!conv) { if (nargs >=2 && nargs <= 3) { /* restore the original arguments for better error message */ args[0]= safe_args[0]; args[item_sep]= safe_args[1]; } if (nargs == 1 && single_err) { /* Use *single_err to produce an error message mentioning two collations. */ if (single_err->first) my_coll_agg_error(args[0]->collation, single_err->coll, fname.str); else my_coll_agg_error(single_err->coll, args[0]->collation, fname.str); } else my_coll_agg_error(args, nargs, fname.str, item_sep); return TRUE; } Trace: my_wc_mb_8bit > my_convert_using_func > my_convert > copy_and_convert > String::copy > String::copy > Item_string::Item_string > Item::const_charset_converter > Item::const_charset_converter > Item_string::safe_charset_converter > Type_std_attributes::agg_item_set_converter > Type_std_attributes::agg_arg_charsets > Type_std_attributes::agg_arg_charsets_for_comparison > Item_func_or_sum::agg_arg_charsets_for_comparison > Item_func_like::fix_length_and_dec > Item_func::fix_fields > Item_func_like::fix_fields > Item::fix_fields_if_needed > Item::fix_fields_if_needed_for_scalar > Item::fix_fields_if_needed_for_bool > setup_conds > setup_without_group > JOIN::prepare > mysql_select

            Testing done.
            MDEV-35564 is marked as not a bug, so this task is ok to push

            lstartseva Lena Startseva added a comment - Testing done. MDEV-35564 is marked as not a bug, so this task is ok to push
            ycp Yuchen Pei added a comment -

            Thanks for the testing - pushed e021770667851233c8cda34e9360606adfd3ea0c to main.

            btw julien.fritsch, there are two unreleased 11.8.* versions in the dropdown menu of fix version/s, is that right? I picked the first one 11.8.0

            ycp Yuchen Pei added a comment - Thanks for the testing - pushed e021770667851233c8cda34e9360606adfd3ea0c to main. btw julien.fritsch , there are two unreleased 11.8.* versions in the dropdown menu of fix version/s, is that right? I picked the first one 11.8.0

            People

              ycp Yuchen Pei
              oleg.smirnov Oleg Smirnov
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.