Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5177

ha_partition and innodb index intersection produce fewer rows (MySQL Bug#70703)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5.33a
    • 5.5.37
    • None
    • None

    Description

      Even after fix for MDEV-5112 (and MySQL Bug#69581), a query that runs index_merge intersection (union is probably affected too) over partitioned table, may return wrong query result.

      Testcase:

      create table t11 (
        a int not null,
        b int not null,
        pk int not null,
        primary key (pk),
        key(a),
        key(b)
      ) partition by hash(pk) partitions 10;
       
      insert into t11 values (1,2,4); -- both
      insert into t11 values (1,0,17);  -- left
      insert into t11 values (1,2,25);   -- both
       
      insert into t11 values (10,20,122); 
      insert into t11 values (10,20,123);
       
      -- Now, fill in some data so that the optimizer choses index_merge
      create table t12 (a int);
      insert into t12 values (0),(1),(2),(3),(4),(5),(6),(7),(8),(9);
       
      insert into t11 select 1,2, 200 + A.a + 10*B.a + 100*C.a from t12 A, t12 B, t12 C;
       
      insert into t11 select 10+A.a + 10*B.a + 100*C.a + 1000*D.a,
                             10+A.a + 10*B.a + 100*C.a  + 1000*D.a, 
                             2000 + A.a + 10*B.a + 100*C.a + 1000*D.a
                             from t12 A, t12 B, t12 C ,t12 D;
       
      -- This should show index_merge, using intersect
      explain select * from t11 where a=1 and b=2 and  pk between 1 and 999999 ;
      -- 794 rows in output
      select * from t11 where a=1 and b=2 and  pk between 1 and 999 ;
      -- 802 rows in output
      select * from t11 ignore index(a,b)  where a=1 and b=2 and  pk between 1 and 999 ;

      Attachments

        Issue Links

          Activity

            psergei Sergei Petrunia created issue -
            agadal Arnaud Gadal added a comment -

            It seems it's ok with MySQL 5.6.16, I obtain 802 rows in either cases.
            MariaDB 5.5.35 still returns 794 and 802.

            agadal Arnaud Gadal added a comment - It seems it's ok with MySQL 5.6.16, I obtain 802 rows in either cases. MariaDB 5.5.35 still returns 794 and 802.
            psergei Sergei Petrunia made changes -
            Field Original Value New Value
            Fix Version/s 5.5.36 [ 14600 ]

            Repeatable on MySQL 5.6.15 though.

            psergei Sergei Petrunia added a comment - Repeatable on MySQL 5.6.15 though.

            But not on mysql-5.5.35. It seems the bug is somewhere in the code common between mariadb-5.5 and mysql-5.6.

            psergei Sergei Petrunia added a comment - But not on mysql-5.5.35. It seems the bug is somewhere in the code common between mariadb-5.5 and mysql-5.6.

            MDEV-5555 might be the same issue as this one.

            psergei Sergei Petrunia added a comment - MDEV-5555 might be the same issue as this one.
            elenst Elena Stepanova added a comment - - edited

            The bug was fixed in mysql/5.6 tree by the following revision:

            revno: 5592 [merge]
            revision-id: aditya.a@oracle.com-20131105140125-tecvsaelewn37kla
            parent: tor.didriksen@oracle.com-20131105112641-n71zwdnmx9q8kq8f
            parent: aditya.a@oracle.com-20131105135526-agm7nxh5mbf6s6fn
            committer: Aditya A <aditya.a@oracle.com>
            branch nick: mysql-5.6
            timestamp: Tue 2013-11-05 19:31:25 +0530
            message:
              Bug#17588348: INDEX MERGE USED ON PARTITIONED TABLE 
                             CAN RETURN WRONG RESULT SET
              
              [Merge from 5.5]
              -
                ------------------------------------------------------------
                revno: 2875.437.288
                revision-id: aditya.a@oracle.com-20131105135526-agm7nxh5mbf6s6fn
                parent: tor.didriksen@oracle.com-20131105090257-p7pfzgf7wgs8eau1
                committer: Aditya A <aditya.a@oracle.com>
                branch nick: mysql-5.5
                timestamp: Tue 2013-11-05 19:25:26 +0530
                message:
                  Bug#17588348: INDEX MERGE USED ON PARTITIONED TABLE 
                                 CAN RETURN WRONG RESULT SET
                  
                  PROBLEM
                  -------
                  In ha_partition::cmp_ref() we were only calling the 
                  underlying cmp_ref() of storage engine if the records
                  are in the same partiton,else we sort by partition and
                  returns the result.But the index merge intersect 
                  algorithm expects first to sort by row-id first and 
                  then by partition id.
                  
                  FIX
                  ---
                  Compare the refernces first using storage engine cmp_ref
                  and then if references are equal(only happens if 
                  non clustered index is used) then sort it by partition id.
                  
                  [Approved by Mattiasj #rb3755]
                  -

            But it does not fix the test case from MDEV-5555. It fails on the current mysql/5.6 tree.

            elenst Elena Stepanova added a comment - - edited The bug was fixed in mysql/5.6 tree by the following revision: revno: 5592 [merge] revision-id: aditya.a@oracle.com-20131105140125-tecvsaelewn37kla parent: tor.didriksen@oracle.com-20131105112641-n71zwdnmx9q8kq8f parent: aditya.a@oracle.com-20131105135526-agm7nxh5mbf6s6fn committer: Aditya A <aditya.a@oracle.com> branch nick: mysql-5.6 timestamp: Tue 2013-11-05 19:31:25 +0530 message: Bug#17588348: INDEX MERGE USED ON PARTITIONED TABLE CAN RETURN WRONG RESULT SET [Merge from 5.5] - ------------------------------------------------------------ revno: 2875.437.288 revision-id: aditya.a@oracle.com-20131105135526-agm7nxh5mbf6s6fn parent: tor.didriksen@oracle.com-20131105090257-p7pfzgf7wgs8eau1 committer: Aditya A <aditya.a@oracle.com> branch nick: mysql-5.5 timestamp: Tue 2013-11-05 19:25:26 +0530 message: Bug#17588348: INDEX MERGE USED ON PARTITIONED TABLE CAN RETURN WRONG RESULT SET PROBLEM ------- In ha_partition::cmp_ref() we were only calling the underlying cmp_ref() of storage engine if the records are in the same partiton,else we sort by partition and returns the result.But the index merge intersect algorithm expects first to sort by row-id first and then by partition id. FIX --- Compare the refernces first using storage engine cmp_ref and then if references are equal(only happens if non clustered index is used) then sort it by partition id. [Approved by Mattiasj #rb3755] - But it does not fix the test case from MDEV-5555 . It fails on the current mysql/5.6 tree.

            Again, full analysis:

            • index_merge/intersect requires that scans over key=const are ROR-scans
              (i.e. the storage engine produces records in rowid order).
            • ha_partition::cmp_ref() defines rowid order as lexicographic order of
              (partition_id, underlying_engine_rowid) pairs.
            • ha_partition::handle_ordered_index_scan() merges row streams from underlying
              partitions. Its priority queue is set to sort rows by key column values.
              partition_id values are ignored. As a result, index scans do not have the ROR
              property.

            MySQL's solution is to change ha_partition::cmp_ref() so that the rowid order
            is (underlying_engine_rowid, partition_id). This doesn't solve the problem.
            The priority queue in handle_ordered_index_scan()
            there is still no warranty that handle_ordered_index_scan() will produce rows
            in rowid order.

            Our solution is to make ha_partition::handle_ordered_index_scan() to produce rows in the order of (key_value, partition_id).

            psergei Sergei Petrunia added a comment - Again, full analysis: index_merge/intersect requires that scans over key=const are ROR-scans (i.e. the storage engine produces records in rowid order). ha_partition::cmp_ref() defines rowid order as lexicographic order of (partition_id, underlying_engine_rowid) pairs. ha_partition::handle_ordered_index_scan() merges row streams from underlying partitions. Its priority queue is set to sort rows by key column values. partition_id values are ignored. As a result, index scans do not have the ROR property. MySQL's solution is to change ha_partition::cmp_ref() so that the rowid order is (underlying_engine_rowid, partition_id). This doesn't solve the problem. The priority queue in handle_ordered_index_scan() there is still no warranty that handle_ordered_index_scan() will produce rows in rowid order. Our solution is to make ha_partition::handle_ordered_index_scan() to produce rows in the order of (key_value, partition_id).
            psergei Sergei Petrunia made changes -
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            agadal Arnaud Gadal added a comment -

            I've just tried on the latest MariaDB 5.5.36 and I still get 794 and 802 rows although this bug is tagged as fixed...

            agadal Arnaud Gadal added a comment - I've just tried on the latest MariaDB 5.5.36 and I still get 794 and 802 rows although this bug is tagged as fixed...

            Indeed, the bug was fixed by revno 4074, but immediately re-appered with the postfix 4075:

            ------------------------------------------------------------
            revno: 4074 [merge]
            revision-id: psergey@askmonty.org-20140217145354-3apakilmvbwmbflc
            parent: sergii@pisem.net-20140217100940-phj213cgrun90d4j
            parent: psergey@askmonty.org-20140217145054-90puw5v7oi6iapi4
            committer: Sergey Petrunya <psergey@askmonty.org>
            branch nick: 5.5-push
            timestamp: Mon 2014-02-17 18:53:54 +0400
            message:
              Merge
                ------------------------------------------------------------
                revno: 4066.1.1
                revision-id: psergey@askmonty.org-20140217145054-90puw5v7oi6iapi4
                parent: svoj@mariadb.org-20140213124108-tyzdv1v2u8zg2dfu
                committer: Sergey Petrunya <psergey@askmonty.org>
                branch nick: 5.5
                timestamp: Mon 2014-02-17 18:50:54 +0400
                message:
                  MDEV-5177: ha_partition and innodb index intersection produce fewer rows (MySQL Bug#70703)
                  MDEV-5555: Incorrect index_merge on BTREE indices
                  - In ha_partition, make ordered index reads return rows in rowid order
                    when index columns are the same.

            revno: 4075
            revision-id: psergey@askmonty.org-20140217192007-zp0pafkydlzgd0yw
            parent: psergey@askmonty.org-20140217145354-3apakilmvbwmbflc
            committer: Sergey Petrunya <psergey@askmonty.org>
            branch nick: 5.5
            timestamp: Mon 2014-02-17 23:20:07 +0400
            message:
              Post-merge fixes: merge MariaDB's fix for MDEV-5177 and MDEV-5555 
              with Oracle's fix for Bug#17588348 by reverting Oracle's fix.

            elenst Elena Stepanova added a comment - Indeed, the bug was fixed by revno 4074, but immediately re-appered with the postfix 4075: ------------------------------------------------------------ revno: 4074 [merge] revision-id: psergey@askmonty.org-20140217145354-3apakilmvbwmbflc parent: sergii@pisem.net-20140217100940-phj213cgrun90d4j parent: psergey@askmonty.org-20140217145054-90puw5v7oi6iapi4 committer: Sergey Petrunya <psergey@askmonty.org> branch nick: 5.5-push timestamp: Mon 2014-02-17 18:53:54 +0400 message: Merge ------------------------------------------------------------ revno: 4066.1.1 revision-id: psergey@askmonty.org-20140217145054-90puw5v7oi6iapi4 parent: svoj@mariadb.org-20140213124108-tyzdv1v2u8zg2dfu committer: Sergey Petrunya <psergey@askmonty.org> branch nick: 5.5 timestamp: Mon 2014-02-17 18:50:54 +0400 message: MDEV-5177: ha_partition and innodb index intersection produce fewer rows (MySQL Bug#70703) MDEV-5555: Incorrect index_merge on BTREE indices - In ha_partition, make ordered index reads return rows in rowid order when index columns are the same. revno: 4075 revision-id: psergey@askmonty.org-20140217192007-zp0pafkydlzgd0yw parent: psergey@askmonty.org-20140217145354-3apakilmvbwmbflc committer: Sergey Petrunya <psergey@askmonty.org> branch nick: 5.5 timestamp: Mon 2014-02-17 23:20:07 +0400 message: Post-merge fixes: merge MariaDB's fix for MDEV-5177 and MDEV-5555 with Oracle's fix for Bug#17588348 by reverting Oracle's fix.
            elenst Elena Stepanova made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Reopened [ 4 ]

            It is interesting, how the bug could reappear.. mysql-test/t/partition.test has a testcase for MDEV-5177. Why didn't the testcase fail?

            psergei Sergei Petrunia added a comment - It is interesting, how the bug could reappear.. mysql-test/t/partition.test has a testcase for MDEV-5177 . Why didn't the testcase fail?

            Still, somehow it reappeared. If I run manually the testcase provided here, I'm able to observe different numbers of rows to be returned.

            psergei Sergei Petrunia added a comment - Still, somehow it reappeared. If I run manually the testcase provided here, I'm able to observe different numbers of rows to be returned.

            Ok, the problem with the testcase is that I've added into a file that runs with storage_engine=myisam. This was appropriate for MDEV-5555 (which has a testcase with MyISAM), but not for this bug.

            psergei Sergei Petrunia added a comment - Ok, the problem with the testcase is that I've added into a file that runs with storage_engine=myisam. This was appropriate for MDEV-5555 (which has a testcase with MyISAM), but not for this bug.

            Debugging the query, I see that merged index scans do not return records in
            rowid order:
            index=1 rowid=0400-04000000
            index=1 rowid=0700-11000000
            index=1 rowid=0500-19000000
            index=1 rowid=0000-c8000000
            index=1 rowid=0100-c9000000
            index=1 rowid=0200-ca000000
            index=1 rowid=0300-cb000000
            index=1 rowid=0400-cc000000
            index=1 rowid=0500-cd000000
            index=1 rowid=0600-ce000000
            index=1 rowid=0700-cf000000
            index=1 rowid=0800-d0000000
            ...

            Debugging further, one can see the reason of this. The table's keys are:

            primary key (pk),
            key(a),
            key(b)

            extended_keys feature causes the key to be (pk, a)

            When I added cmp_key_then_part_id(), it defined the ordering as
            1. order by key
            2. then order by part_id.

            index scan is done on a=1, which means #1 is constant. From that it followed that cmp_key_then_part_id will return records ordered by part_id. That ordering will match the ordering defined by ha_partition::cmp_ref() which starts by comparing part_id.

            In reality, the following happens: cmp_key_then_part_id() defines ordering as
            1. order by key, extended_key_parts (i.e. pk)
            2. then order by part_id.

            index scan returns values with a=1, but different values of pk. The ordering doesn't match ordering of ha_partition::cmp_ref anymore.

            psergei Sergei Petrunia added a comment - Debugging the query, I see that merged index scans do not return records in rowid order: index=1 rowid=0400-04000000 index=1 rowid=0700-11000000 index=1 rowid=0500-19000000 index=1 rowid=0000-c8000000 index=1 rowid=0100-c9000000 index=1 rowid=0200-ca000000 index=1 rowid=0300-cb000000 index=1 rowid=0400-cc000000 index=1 rowid=0500-cd000000 index=1 rowid=0600-ce000000 index=1 rowid=0700-cf000000 index=1 rowid=0800-d0000000 ... Debugging further, one can see the reason of this. The table's keys are: primary key (pk), key(a), key(b) extended_keys feature causes the key to be (pk, a) When I added cmp_key_then_part_id(), it defined the ordering as 1. order by key 2. then order by part_id. index scan is done on a=1, which means #1 is constant. From that it followed that cmp_key_then_part_id will return records ordered by part_id. That ordering will match the ordering defined by ha_partition::cmp_ref() which starts by comparing part_id. In reality, the following happens: cmp_key_then_part_id() defines ordering as 1. order by key, extended_key_parts (i.e. pk) 2. then order by part_id. index scan returns values with a=1, but different values of pk. The ordering doesn't match ordering of ha_partition::cmp_ref anymore.

            == Ideas on how this could be fixed ==

            Ordered index scans over ha_partition return records in the order of (key, extended_key_parts, ...).
            There is no way to change that.

            It could be nice to get index_merge to inform the storage engine that it needs the index scans to be rowid-ordered but not key-ordered, but this is a too big change for a GA version.

            psergei Sergei Petrunia added a comment - == Ideas on how this could be fixed == Ordered index scans over ha_partition return records in the order of (key, extended_key_parts, ...). There is no way to change that. It could be nice to get index_merge to inform the storage engine that it needs the index scans to be rowid-ordered but not key-ordered, but this is a too big change for a GA version.

            A possible solution:

            index scan on ha_partition must return records ordered by

            key_value, ext_key_parts (if there are any)

            When ROR-index_merge is used, the key_value part is fixed, which means that
            index scans will return rows ordered by:

            ext_key_parts (if there are any)

            On the other hand, ROR-index_merge requires that rows come in the order that
            matches the ordering imposed by ha_partition::cmp_ref().

            The solution is:
            1. Let index scan on ha_partition return rows ordered by

            key_rec_cmp, then part_id - for tables with extended keys (we do this now)
            key_rec_cmp, then underlying_rowid, then part_id - for tables without extended keys.

            2. Let ha_partition::cmp_ref() compare rowids by comparing underlying_rowid
            first, then comparing part_id.

            psergei Sergei Petrunia added a comment - A possible solution: index scan on ha_partition must return records ordered by key_value, ext_key_parts (if there are any) When ROR-index_merge is used, the key_value part is fixed, which means that index scans will return rows ordered by: ext_key_parts (if there are any) On the other hand, ROR-index_merge requires that rows come in the order that matches the ordering imposed by ha_partition::cmp_ref(). The solution is: 1. Let index scan on ha_partition return rows ordered by key_rec_cmp, then part_id - for tables with extended keys (we do this now) key_rec_cmp, then underlying_rowid, then part_id - for tables without extended keys. 2. Let ha_partition::cmp_ref() compare rowids by comparing underlying_rowid first, then comparing part_id.

            Committed another fix. I'll need a review for it.

            psergei Sergei Petrunia added a comment - Committed another fix. I'll need a review for it.
            psergei Sergei Petrunia made changes -
            Fix Version/s 5.5.37 [ 15000 ]
            Fix Version/s 5.5.36 [ 14600 ]
            psergei Sergei Petrunia made changes -
            Status Reopened [ 4 ] In Progress [ 3 ]
            psergei Sergei Petrunia made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]

            igor, as you have requested: please find the attached testcase (mdev5555-innodb.test) which fails in MySQL 5.6.

            psergei Sergei Petrunia added a comment - igor , as you have requested: please find the attached testcase (mdev5555-innodb.test) which fails in MySQL 5.6.
            psergei Sergei Petrunia made changes -
            Attachment mdev5555-innodb.test [ 27000 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Igor Babaev [ igor ]
            igor Igor Babaev (Inactive) made changes -
            Assignee Igor Babaev [ igor ] Sergei Petrunia [ psergey ]
            elenst Elena Stepanova made changes -

            Fix pushed into 5.5 tree

            psergei Sergei Petrunia added a comment - Fix pushed into 5.5 tree
            psergei Sergei Petrunia made changes -
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow defaullt [ 29413 ] MariaDB v2 [ 43392 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Workflow MariaDB v2 [ 43392 ] MariaDB v3 [ 63153 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 63153 ] MariaDB v4 [ 147156 ]

            People

              psergei Sergei Petrunia
              psergei Sergei Petrunia
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.