Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6231

Wrong result (extra rows) via cursor in a stored procedure

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 5.5.37, 10.0.11
    • 5.5.49
    • OTHER
    • None
    • 5.5.44, 5.5.58

    Description

      Initially reported as http://bugs.skysql.com/show_bug.cgi?id=429

      Test case:

      DROP TABLE IF EXISTS test_a, test_b, test_result;
      DROP PROCEDURE IF EXISTS sp_test;
       
      CREATE TABLE test_a (num INT);
      CREATE TABLE test_b (num INT);
      CREATE TABLE test_result (num INT, cnt INT);
       
      --delimiter |
      CREATE PROCEDURE sp_test() 
      BEGIN
      DECLARE v_num  INT;
      DECLARE v_done   INT DEFAULT 0;
       
      /* Declare cursors */
      DECLARE v_cursor CURSOR FOR
          SELECT A.num
          FROM   test_a A
          WHERE  ( SELECT COUNT(*)    FROM   test_b    WHERE  num = A.num ) = 0;
       
      DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done = 1;
       
      START TRANSACTION;
       
      OPEN v_cursor;
          DELETE FROM test_result;
      v_loop: LOOP
       
          FETCH v_cursor INTO v_num;
          /* No more rows*/
          IF v_done = 1 THEN
              LEAVE v_loop;
          END IF;
       
          INSERT INTO test_result VALUES (v_num, (SELECT count(*) FROM test_b where num = v_num));
       
      END LOOP v_loop;
      CLOSE v_cursor;
      COMMIT;
          SELECT * FROM test_result;
      END |
      --delimiter ;
       
      INSERT INTO test_a VALUES (1),(2),(3),(4),(5);
      INSERT INTO test_b VALUES (2),(3);
       
      CALL sp_test();
      CALL sp_test();
       
      DROP TABLE IF EXISTS test_a, test_b, test_result;
      DROP PROCEDURE IF EXISTS sp_test;

      Actual result:

      CALL sp_test();
      num     cnt
      1       0
      4       0
      5       0
      CALL sp_test();
      num     cnt
      1       0
      2       1
      3       1
      4       0
      5       0

      Expected result:

      CALL sp_test();
      num     cnt
      1       0
      4       0
      5       0
      CALL sp_test();
      num     cnt
      1       0
      4       0
      5       0

      The problem appeared in 5.5 tree with the following revision:

                  revno: 3402.1.4
                  revision-id: timour@askmonty.org-20120529211853-hww47vl7d4u4ae23
                  parent: timour@askmonty.org-20120524110828-r0mm8sm1vn8a095e
                  committer: timour@askmonty.org
                  branch nick: 5.5-lpb944706
                  timestamp: Wed 2012-05-30 00:18:53 +0300
                  message:
                    Patch for mdev-287: CHEAP SQ: A query with subquery in SELECT list, EXISTS, inner joins takes hundreds t
      imes longer
                    
                    Analysis:
                    
                    The fix for lp:944706 introduces early subquery optimization.
                    While a subquery is being optimized some of its predicates may be
                    removed. In the test case, the EXISTS subquery is constant, and is
                    evaluated to TRUE. As a result the whole OR is TRUE, and thus the
                    correlated condition "b = alias1.b" is optimized away. The subquery
                    becomes non-correlated.
                    
                    The subquery cache is designed to work only for correlated subqueries.
                    If constant subquery optimization is disallowed, then the constant
                    subquery is not evaluated, the subquery remains correlated, and its
                    execution is cached. As a result execution is fast.
                    
                    However, when the constant subquery was optimized away, it was neither
                    cached by the subquery cache, nor it was cached by the internal subquery
                    caching. The latter was due to the fact that the subquery still appeared
                    as correlated to the subselect_XYZ_engine::exec methods, and they
                    re-executed the subquery on each call to Item_subselect::exec.
                    
                   Solution:
                    
                    The solution is to update the correlated status of the subquery after it has
                    been optimized. This status consists of:
                    - st_select_lex::is_correlated
                    - Item_subselect::is_correlated
                    - SELECT_LEX::uncacheable
                    - SELECT_LEX_UNIT::uncacheable
                    The status is updated by st_select_lex::update_correlated_cache(), and its
                    caller st_select_lex::optimize_unflattened_subqueries. The solution relies
                    on the fact that the optimizer already called
                    st_select_lex::update_used_tables() for each subquery. This allows to
                    efficiently update the correlated status of each subquery without walking
                    the whole subquery tree.
                    
                    Notice that his patch is an improvement over MySQL 5.6 and older, where
                    subqueries are not pre-optimized, and the above analysis is not possible.

      Attachments

        Issue Links

          Activity

            shin shin added a comment -

            Is this bug will be resolved on 5.5.3.8 ?
            And also this bug is occured on MariaDB 10.

            shin shin added a comment - Is this bug will be resolved on 5.5.3.8 ? And also this bug is occured on MariaDB 10.

            Debugging, one can see that st_select_lex::update_correlated_cache() changes the subquery from correlated to uncorrelated on the second CALL statement.

            (gdb) p dbug_print_item(sl->master_unit()->item)
            $173 = 0x152da20 "(select count(0) from j1.test_b where (j1.A.num = j1.test_b.num))"

            Note: making PS instead of cursor and re-running it produces a correct result each time.

            psergei Sergei Petrunia added a comment - Debugging, one can see that st_select_lex::update_correlated_cache() changes the subquery from correlated to uncorrelated on the second CALL statement. (gdb) p dbug_print_item(sl->master_unit()->item) $173 = 0x152da20 "(select count(0) from j1.test_b where (j1.A.num = j1.test_b.num))" Note: making PS instead of cursor and re-running it produces a correct result each time.

            When we are in st_select_lex::update_correlated_cache() during the first CALL:

            (gdb) p ((Item*)join->conds)
              $230 = (Item_func_eq *) 0x7fffcd0f2960
            (gdb) p dbug_print_item(join->conds)
              $231 = 0x152da20 "(`j1`.`test_b`.`num` = `j1`.`A`.`num`)"
            (gdb) p/x join->conds->used_tables()
              $232 = 0x4000000000000001

            Everything is OK: the WHERE condition contains a reference to outside. This
            causes the subquery to be correlated.

            Some details about the condition:

            (gdb) p ((Item*)join->conds)->args[0]
              $233 = (Item_field *) 0x7fffcd0efdd0
            (gdb) p dbug_print_item(((Item*)join->conds)->args[0])
              $234 = 0x152da20 "`j1`.`test_b`.`num`"
            (gdb) p (((Item*)join->conds)->args[0])->used_tables()
              $235 = 1
            (gdb)  
            (gdb) p ((Item*)join->conds)->args[1]
              $236 = (Item_field *) 0x7fffcd0efed0
            (gdb) p dbug_print_item(((Item*)join->conds)->args[1])
              $237 = 0x152da20 "`j1`.`A`.`num`"
            (gdb) p/x (((Item*)join->conds)->args[1])->used_tables()
              $239 = 0x4000000000000000

            (I am a little bit surprised that there is an Item_field that has
            used_tables()=OUTER_REF_TABLE_BIT. Shouldn't it be an Item_ref?)

            psergei Sergei Petrunia added a comment - When we are in st_select_lex::update_correlated_cache() during the first CALL: (gdb) p ((Item*)join->conds) $230 = (Item_func_eq *) 0x7fffcd0f2960 (gdb) p dbug_print_item(join->conds) $231 = 0x152da20 "(`j1`.`test_b`.`num` = `j1`.`A`.`num`)" (gdb) p/x join->conds->used_tables() $232 = 0x4000000000000001 Everything is OK: the WHERE condition contains a reference to outside. This causes the subquery to be correlated. Some details about the condition: (gdb) p ((Item*)join->conds)->args[0] $233 = (Item_field *) 0x7fffcd0efdd0 (gdb) p dbug_print_item(((Item*)join->conds)->args[0]) $234 = 0x152da20 "`j1`.`test_b`.`num`" (gdb) p (((Item*)join->conds)->args[0])->used_tables() $235 = 1 (gdb) (gdb) p ((Item*)join->conds)->args[1] $236 = (Item_field *) 0x7fffcd0efed0 (gdb) p dbug_print_item(((Item*)join->conds)->args[1]) $237 = 0x152da20 "`j1`.`A`.`num`" (gdb) p/x (((Item*)join->conds)->args[1])->used_tables() $239 = 0x4000000000000000 (I am a little bit surprised that there is an Item_field that has used_tables()=OUTER_REF_TABLE_BIT. Shouldn't it be an Item_ref?)

            When we are in st_select_lex::update_correlated_cache() during the second CALL:

            (gdb) p/x join->conds->used_tables()
              $217 = 0x1
            (gdb) p dbug_print_item(join->conds)
              $218 = 0x152da20 "(j1.A.num = j1.test_b.num)"

            The condition's used_tables() doesn't include OUTER_REF_TABLE_BIT. This is
            incorrect.

            Let's explore the details:

            (gdb) p ((Item*)join->conds)->args[0]
              $221 = (Item_field *) 0x7fffcd0efed0
            (gdb) p dbug_print_item(((Item*)join->conds)->args[0])
              $222 = 0x152da20 "j1.A.num"
            (gdb) p (((Item*)join->conds)->args[0])->used_tables()
              $223 = 1
             
            (gdb) p ((Item*)join->conds)->args[1]
              $224 = (Item_field *) 0x7fffcd0efdd0
            (gdb) p dbug_print_item(((Item*)join->conds)->args[1])
              $225 = 0x152da20 "j1.test_b.num"
            (gdb) p (((Item*)join->conds)->args[1])->used_tables()
              $226 = 1

            psergei Sergei Petrunia added a comment - When we are in st_select_lex::update_correlated_cache() during the second CALL: (gdb) p/x join->conds->used_tables() $217 = 0x1 (gdb) p dbug_print_item(join->conds) $218 = 0x152da20 "(j1.A.num = j1.test_b.num)" The condition's used_tables() doesn't include OUTER_REF_TABLE_BIT. This is incorrect. Let's explore the details: (gdb) p ((Item*)join->conds)->args[0] $221 = (Item_field *) 0x7fffcd0efed0 (gdb) p dbug_print_item(((Item*)join->conds)->args[0]) $222 = 0x152da20 "j1.A.num" (gdb) p (((Item*)join->conds)->args[0])->used_tables() $223 = 1   (gdb) p ((Item*)join->conds)->args[1] $224 = (Item_field *) 0x7fffcd0efdd0 (gdb) p dbug_print_item(((Item*)join->conds)->args[1]) $225 = 0x152da20 "j1.test_b.num" (gdb) p (((Item*)join->conds)->args[1])->used_tables() $226 = 1

            It looks like fixed, checking when:

            REATE TABLE test_a (num INT);
            CREATE TABLE test_b (num INT);
            CREATE TABLE test_result (num INT, cnt INT);
            CREATE PROCEDURE sp_test() 
            BEGIN
            DECLARE v_num  INT;
            DECLARE v_done   INT DEFAULT 0;
            /* Declare cursors */
            DECLARE v_cursor CURSOR FOR
            SELECT A.num
            FROM   test_a A
            WHERE  ( SELECT COUNT(*)    FROM   test_b    WHERE  num = A.num ) = 0;
            DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done = 1;
            START TRANSACTION;
            OPEN v_cursor;
            DELETE FROM test_result;
            v_loop: LOOP
            FETCH v_cursor INTO v_num;
            /* No more rows*/
            IF v_done = 1 THEN
            LEAVE v_loop;
            END IF;
            INSERT INTO test_result VALUES (v_num, (SELECT count(*) FROM test_b where num = v_num));
            END LOOP v_loop;
            CLOSE v_cursor;
            COMMIT;
            SELECT * FROM test_result;
            END |
            INSERT INTO test_a VALUES (1),(2),(3),(4),(5);
            INSERT INTO test_b VALUES (2),(3);
            CALL sp_test();
            num	cnt
            1	0
            4	0
            5	0
            CALL sp_test();
            num	cnt
            1	0
            4	0
            5	0
            DROP TABLE test_a, test_b, test_result;
            DROP PROCEDURE sp_test;
            

            sanja Oleksandr Byelkin added a comment - It looks like fixed, checking when: REATE TABLE test_a (num INT); CREATE TABLE test_b (num INT); CREATE TABLE test_result (num INT, cnt INT); CREATE PROCEDURE sp_test() BEGIN DECLARE v_num INT; DECLARE v_done INT DEFAULT 0; /* Declare cursors */ DECLARE v_cursor CURSOR FOR SELECT A.num FROM test_a A WHERE ( SELECT COUNT(*) FROM test_b WHERE num = A.num ) = 0; DECLARE CONTINUE HANDLER FOR NOT FOUND SET v_done = 1; START TRANSACTION; OPEN v_cursor; DELETE FROM test_result; v_loop: LOOP FETCH v_cursor INTO v_num; /* No more rows*/ IF v_done = 1 THEN LEAVE v_loop; END IF; INSERT INTO test_result VALUES (v_num, (SELECT count(*) FROM test_b where num = v_num)); END LOOP v_loop; CLOSE v_cursor; COMMIT; SELECT * FROM test_result; END | INSERT INTO test_a VALUES (1),(2),(3),(4),(5); INSERT INTO test_b VALUES (2),(3); CALL sp_test(); num cnt 1 0 4 0 5 0 CALL sp_test(); num cnt 1 0 4 0 5 0 DROP TABLE test_a, test_b, test_result; DROP PROCEDURE sp_test;

            It was fixed somewhere before 5.5.49 (can't build earlier version)

            sanja Oleksandr Byelkin added a comment - It was fixed somewhere before 5.5.49 (can't build earlier version)

            People

              sanja Oleksandr Byelkin
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.