Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13603

innodb_fast_shutdown=0 may fail to purge all history

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-deb-xenial-ppc64le/builds/1363/steps/mtr/logs/stdio

      innodb.dml_purge '64k,innodb'            w3 [ fail ]
              Test ended at 2017-08-20 16:06:45
       
      CURRENT_TEST: innodb.dml_purge
      --- /usr/share/mysql/mysql-test/suite/innodb/r/dml_purge.result	2017-08-20 14:24:53.000000000 -0400
      +++ /dev/shm/var/3/log/dml_purge.reject	2017-08-20 16:06:45.122366993 -0400
      @@ -15,8 +15,8 @@
        DB_ROLL_PTR=0x80000000000000,
        b=0x80000002)
       header=0x000018090074 (a=0x80000003,
      - DB_TRX_ID=0x000000000000,
      - DB_ROLL_PTR=0x80000000000000,
      + DB_TRX_ID=0x000000000506,
      + DB_ROLL_PTR=0x04000001370317,
        b=0x7ffffffd)
       header=0x030008030000 (a=0x73757072656d756d00)
       SELECT * FROM t1;
       
      mysqltest: Result content mismatch
      

      Attachments

        Issue Links

          Activity

            Also the test innodb.table_flags can occasionally fail due to the same problem:

            CURRENT_TEST: innodb.table_flags
            /usr/local/mariadb-10.3.5-linux-x86_64/bin/mysqltest: Error on delete of '/usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/var/4/tmp/table_flags/ibtmp1' (Errcode: 2 "No such file or directory")
            --- /usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/suite/innodb/r/table_flags.result	2018-02-07 17:51:05.000000000 +0200
            +++ /usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/suite/innodb/r/table_flags.reject	2018-02-08 02:06:12.954068170 +0200
            @@ -72,8 +72,8 @@
              CLUSTER_NAME=NULL(0 bytes),
              SPACE=0x00000002)
             header=0x000048150310 (NAME='test/td',
            - DB_TRX_ID=0x000000000000,
            - DB_ROLL_PTR=0x80000000000000,
            + DB_TRX_ID=0x00000000000b,
            + DB_ROLL_PTR=0x85000001390110,
              ID=0x0000000000000012,
              N_COLS=0x80000001,
              TYPE=0x00000021,
            @@ -82,8 +82,8 @@
              CLUSTER_NAME=NULL(0 bytes),
              SPACE=0x00000003)
             header=0x000058150200 (NAME='test/tp',
            - DB_TRX_ID=0x000000000000,
            - DB_ROLL_PTR=0x80000000000000,
            + DB_TRX_ID=0x00000000000f,
            + DB_ROLL_PTR=0x870000013b0110,
              ID=0x0000000000000014,
              N_COLS=0x80000001,
              TYPE=0x000009a1,
            @@ -102,8 +102,8 @@
              CLUSTER_NAME=NULL(0 bytes),
              SPACE=0x00000001)
             header=0x000050150074 (NAME='test/tz',
            - DB_TRX_ID=0x000000000000,
            - DB_ROLL_PTR=0x80000000000000,
            + DB_TRX_ID=0x00000000000d,
            + DB_ROLL_PTR=0x860000013a0110,
              ID=0x0000000000000013,
              N_COLS=0x80000001,
              TYPE=0x00000023,
             
            mysqltest: Result content mismatch
            

            marko Marko Mäkelä added a comment - Also the test innodb.table_flags can occasionally fail due to the same problem: CURRENT_TEST: innodb.table_flags /usr/local/mariadb-10.3.5-linux-x86_64/bin/mysqltest: Error on delete of '/usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/var/4/tmp/table_flags/ibtmp1' (Errcode: 2 "No such file or directory") --- /usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/suite/innodb/r/table_flags.result 2018-02-07 17:51:05.000000000 +0200 +++ /usr/local/mariadb-10.3.5-linux-x86_64/mysql-test/suite/innodb/r/table_flags.reject 2018-02-08 02:06:12.954068170 +0200 @@ -72,8 +72,8 @@ CLUSTER_NAME=NULL(0 bytes), SPACE=0x00000002) header=0x000048150310 (NAME='test/td', - DB_TRX_ID=0x000000000000, - DB_ROLL_PTR=0x80000000000000, + DB_TRX_ID=0x00000000000b, + DB_ROLL_PTR=0x85000001390110, ID=0x0000000000000012, N_COLS=0x80000001, TYPE=0x00000021, @@ -82,8 +82,8 @@ CLUSTER_NAME=NULL(0 bytes), SPACE=0x00000003) header=0x000058150200 (NAME='test/tp', - DB_TRX_ID=0x000000000000, - DB_ROLL_PTR=0x80000000000000, + DB_TRX_ID=0x00000000000f, + DB_ROLL_PTR=0x870000013b0110, ID=0x0000000000000014, N_COLS=0x80000001, TYPE=0x000009a1, @@ -102,8 +102,8 @@ CLUSTER_NAME=NULL(0 bytes), SPACE=0x00000001) header=0x000050150074 (NAME='test/tz', - DB_TRX_ID=0x000000000000, - DB_ROLL_PTR=0x80000000000000, + DB_TRX_ID=0x00000000000d, + DB_ROLL_PTR=0x860000013a0110, ID=0x0000000000000013, N_COLS=0x80000001, TYPE=0x00000023,   mysqltest: Result content mismatch

            The problem appears to be that slow shutdown is not always running purge to completion. This can be repeated more easily by applying the patch from the description of MDEV-13697 to the test, and adjusting the result.

            ./mtr --mysqld=--debug=d,purge --mem --parallel=4 --repeat=10 innodb.dml_purge,4k{,,,}
            

            When it fails, in the failed run there will be messages

            ?func: purge: reset DB_TRX_ID=0x…

            corresponding to the DB_TRX_ID that are shown in the result difference.
            These messages were issued after server restart and not during the slow shutdown, as purge is supposed to be run.

            This can be repeated with different values of -mysqld=-innodb-purge-threads=1 to 32 (the default is 4).

            If I change the test to avoid restart (use wait_all_purged.inc and FLUSH TABLE t1 FOR EXPORT) instead of performing the slow shutdown, it does not fail for me.

            marko Marko Mäkelä added a comment - The problem appears to be that slow shutdown is not always running purge to completion. This can be repeated more easily by applying the patch from the description of MDEV-13697 to the test, and adjusting the result. ./mtr --mysqld=--debug=d,purge --mem --parallel=4 --repeat=10 innodb.dml_purge,4k{,,,} When it fails, in the failed run there will be messages ?func: purge: reset DB_TRX_ID=0x… corresponding to the DB_TRX_ID that are shown in the result difference. These messages were issued after server restart and not during the slow shutdown, as purge is supposed to be run. This can be repeated with different values of - mysqld= -innodb-purge-threads=1 to 32 (the default is 4). If I change the test to avoid restart (use wait_all_purged.inc and FLUSH TABLE t1 FOR EXPORT ) instead of performing the slow shutdown, it does not fail for me.

            The problem turns out to be an incorrect check:

            @@ -2463,12 +2459,7 @@ srv_purge_should_exit(ulint n_purged)
             		return(true);
             	}
             	/* Slow shutdown was requested. */
            -	if (n_purged) {
            -		/* The previous round still did some work. */
            -		return(false);
            -	}
            -	/* Exit if there are no active transactions to roll back. */
            -	return(trx_sys.any_active_transactions() == 0);
            +	return !trx_sys.any_active_transactions() && !trx_sys.history_size();
             }
             
             /*********************************************************************//**
            

            If we happened to have n_purged==0 while some transaction was still active, and then that transaction was added to the history list, we would prematurely stop the purge. It is more appropriate to first check for trx_sys.any_active_transactions() == 0 (that count can only decrease during shutdown) and then for trx_sys.history_size() == 0 (that count typically decreases, but can increase when the remaining active transactions are committed or rolled back). It does not make any sense to check n_purged at all.

            marko Marko Mäkelä added a comment - The problem turns out to be an incorrect check: @@ -2463,12 +2459,7 @@ srv_purge_should_exit(ulint n_purged) return(true); } /* Slow shutdown was requested. */ - if (n_purged) { - /* The previous round still did some work. */ - return(false); - } - /* Exit if there are no active transactions to roll back. */ - return(trx_sys.any_active_transactions() == 0); + return !trx_sys.any_active_transactions() && !trx_sys.history_size(); } /*********************************************************************//** If we happened to have n_purged==0 while some transaction was still active, and then that transaction was added to the history list, we would prematurely stop the purge. It is more appropriate to first check for trx_sys.any_active_transactions() == 0 (that count can only decrease during shutdown) and then for trx_sys.history_size() == 0  (that count typically decreases, but can increase when the remaining active transactions are committed or rolled back). It does not make any sense to check n_purged at all.

            The issue exists already in MariaDB 5.5:

            		do {
            			n_pages_purged = trx_purge(srv_purge_batch_size);
             
            			n_total_purged += n_pages_purged;
             
            		} while (n_pages_purged > 0 && !srv_fast_shutdown);
            

            I think that it suffices to fix this in MariaDB 10.3 only. Thanks to MDEV-12289, MariaDB can upgrade from older versions without a slow shutdown.

            marko Marko Mäkelä added a comment - The issue exists already in MariaDB 5.5: do { n_pages_purged = trx_purge(srv_purge_batch_size);   n_total_purged += n_pages_purged;   } while (n_pages_purged > 0 && !srv_fast_shutdown); I think that it suffices to fix this in MariaDB 10.3 only. Thanks to MDEV-12289 , MariaDB can upgrade from older versions without a slow shutdown.

            The issue exists already in MySQL 3.23.49 (the first InnoDB version that I have access to). In the function srv_master_thread(), there are two loops that simply wait for n_pages_purged==0, without first waiting for the number of active transactions to reach 0.

            marko Marko Mäkelä added a comment - The issue exists already in MySQL 3.23.49 (the first InnoDB version that I have access to). In the function srv_master_thread() , there are two loops that simply wait for n_pages_purged==0 , without first waiting for the number of active transactions to reach 0.

            People

              marko Marko Mäkelä
              alice Alice Sherepa
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.