[MDEV-6878] Use of uninitialized saved_primary_key in Mrr_ordered_index_reader::resume_read() Created: 2014-10-16  Updated: 2014-10-28  Resolved: 2014-10-16

Status: Closed
Project: MariaDB Server
Component/s: Optimizer
Affects Version/s: 10.0.12
Fix Version/s: 10.0.15

Type: Bug Priority: Critical
Reporter: Jeremy Cole Assignee: Sergei Petrunia
Resolution: Fixed Votes: 0
Labels: ds-mrr, optimizer


 Description   

I believe that Mrr_ordered_index_reader::resume_read() is using saved_primary_key uninitialized if the current read hasn't been interrupted. This manifests itself in our case with the following case:

SET SESSION
  optimizer_switch="mrr=on,mrr_sort_keys=on",
  join_cache_level=8; 
 
SELECT DISTINCT a.x FROM a LEFT JOIN b ON (a.x = b.x) WHERE ...

Our query produces an EXPLAIN containing:

"Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan"

With the result of the query we get many warnings like:

"Warning 1366 Incorrect string value: '\xE6S\x01\x00\x00\x00...' for column 'y' at row 93"

This is because the buffer being purported to be column 'y' is uninitialized or random data, and doesn't pass as valid UTF-8.

Unfortunately I don't have a minimal test case for this yet, but I am able to reproduce it with sensitive data locally and can prove that the below patch fixes the symptom.

Patch follows:

--- sql/multi_range_read.cc        2014-07-10 23:01:30.000000000 -0700
+++ sql/multi_range_read.cc        2014-10-15 19:34:56.000000000 -0700
@@ -467,6 +467,9 @@ void Mrr_ordered_index_reader::position(
 
 void Mrr_ordered_index_reader::resume_read()
 {
+  if (have_saved_rowid == FALSE)
+    return;
+
   TABLE *table= file->get_table();
   KEY *used_index= &table->key_info[file->active_index];
   key_restore(table->record[0], saved_key_tuple, 
@@ -477,6 +480,8 @@ void Mrr_ordered_index_reader::resume_re
                 &table->key_info[table->s->primary_key],
                 table->key_info[table->s->primary_key].key_length);
   }
+
+  have_saved_rowid= FALSE;
 }



 Comments   
Comment by Sergei Petrunia [ 2014-10-16 ]

I could create a small testcase:

create table ten(a int);
insert into ten values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
 
create table t14 (
  pk varchar(32) character set utf8 primary key,
  kp1 char(32) not null,
  col1 varchar(32),
  key (kp1)
);
 
insert into t14 
select 
  concat('pk-', 1000 +A.a),
  concat('kp1-', 1000 +A.a),
  concat('val-', 1000 +A.a)
from test.ten A ;
 
create table t16 as select kp1 as a from t14;
 
set join_cache_level=8;
set optimizer_switch='mrr=on,mrr_sort_keys=on';
select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a;
show warnings;
+---------+------+---------------------------------------------------------------------------------+
| Level   | Code | Message                                                                         |
+---------+------+---------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xA5\xA5\xA5\xA5\xA5\xA5...' for column 'pk' at row 11 |
+---------+------+---------------------------------------------------------------------------------+

explain select * from t16 straight_join t14 force index(kp1) where t14.kp1=t16.a;
+------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref       | rows | Extra                                                               |
+------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+
|    1 | SIMPLE      | t16   | ALL  | NULL          | NULL | NULL    | NULL      |   10 |                                                                     |
|    1 | SIMPLE      | t14   | ref  | kp1           | kp1  | 32      | j12.t16.a |    1 | Using join buffer (flat, BKAH join); Key-ordered Rowid-ordered scan |
+------+-------------+-------+------+---------------+------+---------+-----------+------+---------------------------------------------------------------------+

Comment by Sergei Petrunia [ 2014-10-16 ]

I'm a little concerned about checking have_saved_rowid. It is defined as

  /* TRUE <=> saved_rowid has the last saved rowid */
  bool have_saved_rowid;

which I suspect is not exactly the same as "whether saved_key_tuple and saved_primary_key point to valid tuples that resume_read() should restore"

Comment by Sergei Petrunia [ 2014-10-16 ]

The patch is here:

http://lists.askmonty.org/pipermail/commits/2014-October/006793.html

@ jeremycole, could you test your case with this patch?

Comment by Jeremy Cole [ 2014-10-16 ]

psergey: I was wondering about that. I thought about adding a new boolean was_interrupted but it seemed like (to my uninformed self) that have_saved_rowid was already for that purpose. It looks like your patch also corrects the symptom we were seeing with production data/query as well, so the patch looks fine to me.

I am a bit concerned that none of this fairly important optimizer code (it would seem most of MRR) has any meaningful test coverage, such that even my potentially incorrect patch doesn't cause any tests to fail, and so that these bugs can exist up to now.

Comment by Jeremy Cole [ 2014-10-16 ]

All tests pass here, with the new patch.

Comment by Sergei Petrunia [ 2014-10-16 ]

(didn't see the comments)

Comment by Sergei Petrunia [ 2014-10-16 ]

jeremycole, there was some RQG-based testing before (the interrupt_read()/resume_read() functions were added as fix for https://bugs.launchpad.net/maria/+bug/671340, before 5.3/5.5 was GA). MP AB's jira shows there was more RQG testing done after 5.3/5.5 GA, also.

I agree with your argument that mtr tests do not have much coverage - to fully test code around interrupt/resume_read one needs to run out of buffer space and/or lookup keys at some particular points, etc. This makes it hard to test with MTR - one needs to run loads of similar queries.

I've asked elenst to run RQG tests again on fix for this bug.

Generated at Thu Feb 08 07:15:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.