Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6878

Use of uninitialized saved_primary_key in Mrr_ordered_index_reader::resume_read()

Details

    Description

      I believe that Mrr_ordered_index_reader::resume_read() is using saved_primary_key uninitialized if the current read hasn't been interrupted. This manifests itself in our case with the following case:

      SET SESSION
        optimizer_switch="mrr=on,mrr_sort_keys=on",
        join_cache_level=8; 
       
      SELECT DISTINCT a.x FROM a LEFT JOIN b ON (a.x = b.x) WHERE ...

      Our query produces an EXPLAIN containing:

      "Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan"

      With the result of the query we get many warnings like:

      "Warning 1366 Incorrect string value: '\xE6S\x01\x00\x00\x00...' for column 'y' at row 93"

      This is because the buffer being purported to be column 'y' is uninitialized or random data, and doesn't pass as valid UTF-8.

      Unfortunately I don't have a minimal test case for this yet, but I am able to reproduce it with sensitive data locally and can prove that the below patch fixes the symptom.

      Patch follows:

      --- sql/multi_range_read.cc        2014-07-10 23:01:30.000000000 -0700
      +++ sql/multi_range_read.cc        2014-10-15 19:34:56.000000000 -0700
      @@ -467,6 +467,9 @@ void Mrr_ordered_index_reader::position(
       
       void Mrr_ordered_index_reader::resume_read()
       {
      +  if (have_saved_rowid == FALSE)
      +    return;
      +
         TABLE *table= file->get_table();
         KEY *used_index= &table->key_info[file->active_index];
         key_restore(table->record[0], saved_key_tuple, 
      @@ -477,6 +480,8 @@ void Mrr_ordered_index_reader::resume_re
                       &table->key_info[table->s->primary_key],
                       table->key_info[table->s->primary_key].key_length);
         }
      +
      +  have_saved_rowid= FALSE;
       }

      Attachments

        Activity

          psergei Sergei Petrunia added a comment - - edited

          I could create a small testcase:

          create table ten(a int);
          insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
           
          create table t14 (
            pk varchar(32) character set utf8 primary key,
            kp1 char(32) not null,
            col1 varchar(32),
            key (kp1)
          );
           
          insert into t14 
          select 
            concat('pk-', 1000 +A.a),
            concat('kp1-', 1000 +A.a),
            concat('val-', 1000 +A.a)
          from test.ten A ;
           
          create table t16 as select kp1 as a from t14;
           
          set join_cache_level=8;
          set optimizer_switch='mrr=on,mrr_sort_keys=on';
          select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a;
          show warnings;
          +---------+------+---------------------------------------------------------------------------------+
          | Level   | Code | Message                                                                         |
          +---------+------+---------------------------------------------------------------------------------+
          | Warning | 1366 | Incorrect string value: '\xA5\xA5\xA5\xA5\xA5\xA5...' for column 'pk' at row 11 |
          +---------+------+---------------------------------------------------------------------------------+

          explain select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a;
          +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
          | id   | select_type | table | type | possible_keys | key  | key_len | ref       | rows | Extra                                                               |
          +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
          |    1 | SIMPLE      | t16   | ALL  | NULL          | NULL | NULL    | NULL      |   10 |                                                                     |
          |    1 | SIMPLE      | t14   | ref  | kp1           | kp1  | 32      | j12.t16.a |    1 | Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan |
          +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+

          psergei Sergei Petrunia added a comment - - edited I could create a small testcase: create table ten(a int); insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);   create table t14 ( pk varchar(32) character set utf8 primary key, kp1 char(32) not null, col1 varchar(32), key (kp1) );   insert into t14 select concat('pk-', 1000 +A.a), concat('kp1-', 1000 +A.a), concat('val-', 1000 +A.a) from test.ten A ;   create table t16 as select kp1 as a from t14;   set join_cache_level=8; set optimizer_switch='mrr=on,mrr_sort_keys=on'; select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a; show warnings; +---------+------+---------------------------------------------------------------------------------+ | Level | Code | Message | +---------+------+---------------------------------------------------------------------------------+ | Warning | 1366 | Incorrect string value: '\xA5\xA5\xA5\xA5\xA5\xA5...' for column 'pk' at row 11 | +---------+------+---------------------------------------------------------------------------------+ explain select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a; +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+ | 1 | SIMPLE | t16 | ALL | NULL | NULL | NULL | NULL | 10 | | | 1 | SIMPLE | t14 | ref | kp1 | kp1 | 32 | j12.t16.a | 1 | Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan | +------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+

          I'm a little concerned about checking have_saved_rowid. It is defined as

            /* TRUE <=> saved_rowid has the last saved rowid */
            bool have_saved_rowid;

          which I suspect is not exactly the same as "whether saved_key_tuple and saved_primary_key point to valid tuples that resume_read() should restore"

          psergei Sergei Petrunia added a comment - I'm a little concerned about checking have_saved_rowid. It is defined as /* TRUE <=> saved_rowid has the last saved rowid */ bool have_saved_rowid; which I suspect is not exactly the same as "whether saved_key_tuple and saved_primary_key point to valid tuples that resume_read() should restore"

          The patch is here:

          http://lists.askmonty.org/pipermail/commits/2014-October/006793.html

          @ jeremycole, could you test your case with this patch?

          psergei Sergei Petrunia added a comment - The patch is here: http://lists.askmonty.org/pipermail/commits/2014-October/006793.html @ jeremycole , could you test your case with this patch?
          jeremycole Jeremy Cole added a comment -

          psergey: I was wondering about that. I thought about adding a new boolean was_interrupted but it seemed like (to my uninformed self) that have_saved_rowid was already for that purpose. It looks like your patch also corrects the symptom we were seeing with production data/query as well, so the patch looks fine to me.

          I am a bit concerned that none of this fairly important optimizer code (it would seem most of MRR) has any meaningful test coverage, such that even my potentially incorrect patch doesn't cause any tests to fail, and so that these bugs can exist up to now.

          jeremycole Jeremy Cole added a comment - psergey : I was wondering about that. I thought about adding a new boolean was_interrupted but it seemed like (to my uninformed self) that have_saved_rowid was already for that purpose. It looks like your patch also corrects the symptom we were seeing with production data/query as well, so the patch looks fine to me. I am a bit concerned that none of this fairly important optimizer code (it would seem most of MRR) has any meaningful test coverage, such that even my potentially incorrect patch doesn't cause any tests to fail, and so that these bugs can exist up to now.
          jeremycole Jeremy Cole added a comment -

          All tests pass here, with the new patch.

          jeremycole Jeremy Cole added a comment - All tests pass here, with the new patch.

          (didn't see the comments)

          psergei Sergei Petrunia added a comment - (didn't see the comments)

          jeremycole, there was some RQG-based testing before (the interrupt_read()/resume_read() functions were added as fix for https://bugs.launchpad.net/maria/+bug/671340, before 5.3/5.5 was GA). MP AB's jira shows there was more RQG testing done after 5.3/5.5 GA, also.

          I agree with your argument that mtr tests do not have much coverage - to fully test code around interrupt/resume_read one needs to run out of buffer space and/or lookup keys at some particular points, etc. This makes it hard to test with MTR - one needs to run loads of similar queries.

          I've asked elenst to run RQG tests again on fix for this bug.

          psergei Sergei Petrunia added a comment - jeremycole , there was some RQG-based testing before (the interrupt_read()/resume_read() functions were added as fix for https://bugs.launchpad.net/maria/+bug/671340 , before 5.3/5.5 was GA). MP AB's jira shows there was more RQG testing done after 5.3/5.5 GA, also. I agree with your argument that mtr tests do not have much coverage - to fully test code around interrupt/resume_read one needs to run out of buffer space and/or lookup keys at some particular points, etc. This makes it hard to test with MTR - one needs to run loads of similar queries. I've asked elenst to run RQG tests again on fix for this bug.

          People

            psergei Sergei Petrunia
            jeremycole Jeremy Cole
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.