Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16952

Introduce SET GLOBAL innodb_max_purge_lag_wait

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/kvm-rpm-centos74-amd64-debug/builds/755

      gcol.innodb_virtual_debug_purge 'innodb' w3 [ fail ]
              Test ended at 2018-08-10 21:48:13
       
      CURRENT_TEST: gcol.innodb_virtual_debug_purge
      --- /usr/share/mysql-test/suite/gcol/r/innodb_virtual_debug_purge.result	2018-08-10 19:08:32.000000000 +0000
      +++ /dev/shm/var/3/log/innodb_virtual_debug_purge.reject	2018-08-10 21:48:13.389484458 +0000
      @@ -200,7 +200,7 @@
       disconnect prevent_purge;
       connection default;
       SET DEBUG_SYNC='now WAIT_FOR purge_start';
      -InnoDB		1 transactions not purged
      +InnoDB		2 transactions not purged
       SET DEBUG_SYNC='now SIGNAL release';
       SET GLOBAL debug_dbug=@old_dbug;
       connection truncate;
       
      mysqltest: Result content mismatch
       
       - skipping '/dev/shm/var/3/log/gcol.innodb_virtual_debug_purge-innodb/'
       
      Retrying test gcol.innodb_virtual_debug_purge, attempt(2/3)...
      

      Attachments

        Issue Links

          Activity

            Earlier in the test, we successfully got '0 transactions not purged'.
            The problem ought to be somewhere between these lines in the test:

            --source ../../innodb/include/wait_all_purged.inc
             
            SET DEBUG_SYNC= 'now SIGNAL purged';
             
            connection default;
            --echo /* connection default */ ALTER TABLE t1 ADD COLUMN c INT GENERATED ALWAYS AS(a+b), ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=SHARED;
            --reap
            SHOW CREATE TABLE t1;
             
            SELECT * FROM t1;
             
            DROP TABLE t1;
             
            # Test adding index on existing virtual column
            CREATE TABLE t1 (a INT, b INT, c INT GENERATED ALWAYS AS(a+b));
             
            INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4);
             
            connection con1;
            --echo # disable purge
            BEGIN; SELECT * FROM t0;
             
            connection default;
            DELETE FROM t1 WHERE a = 1;
             
            UPDATE t1 SET a = 2, b = 2 WHERE a = 5;
             
            INSERT INTO t1(a, b) VALUES (6, 6);
             
            SET DEBUG_SYNC= 'inplace_after_index_build SIGNAL uncommitted WAIT_FOR purged';
            send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE;
             
            connection con1;
            SET DEBUG_SYNC= 'now WAIT_FOR uncommitted';
             
            DELETE FROM t1 WHERE a = 3;
             
            UPDATE t1 SET a = 7, b = 7 WHERE a = 4;
             
            INSERT INTO t1(a, b) VALUES (8, 8);
             
            --echo # enable purge
            COMMIT;
             
            --echo # wait for purge to process the deleted/updated records.
            let $wait_all_purged=1;
            --source ../../innodb/include/wait_all_purged.inc
            

            Unfortunately, I am unable to repeat this, and there were no additional messages in the server error log.

            marko Marko Mäkelä added a comment - Earlier in the test, we successfully got '0 transactions not purged'. The problem ought to be somewhere between these lines in the test: --source ../../innodb/include/wait_all_purged.inc   SET DEBUG_SYNC= 'now SIGNAL purged' ;   connection default ; --echo /* connection default */ ALTER TABLE t1 ADD COLUMN c INT GENERATED ALWAYS AS(a+b), ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=SHARED; --reap SHOW CREATE TABLE t1;   SELECT * FROM t1;   DROP TABLE t1;   # Test adding index on existing virtual column CREATE TABLE t1 (a INT , b INT , c INT GENERATED ALWAYS AS (a+b));   INSERT INTO t1(a, b) VALUES (1, 1), (2, 2), (3, 3), (4, 4);   connection con1; --echo # disable purge BEGIN ; SELECT * FROM t0;   connection default ; DELETE FROM t1 WHERE a = 1;   UPDATE t1 SET a = 2, b = 2 WHERE a = 5;   INSERT INTO t1(a, b) VALUES (6, 6);   SET DEBUG_SYNC= 'inplace_after_index_build SIGNAL uncommitted WAIT_FOR purged' ; send ALTER TABLE t1 ADD INDEX idx (c), ALGORITHM=INPLACE, LOCK=NONE;   connection con1; SET DEBUG_SYNC= 'now WAIT_FOR uncommitted' ;   DELETE FROM t1 WHERE a = 3;   UPDATE t1 SET a = 7, b = 7 WHERE a = 4;   INSERT INTO t1(a, b) VALUES (8, 8);   --echo # enable purge COMMIT ;   --echo # wait for purge to process the deleted/updated records. let $wait_all_purged=1; --source ../../innodb/include/wait_all_purged.inc Unfortunately, I am unable to repeat this, and there were no additional messages in the server error log.

            I suspect that the test sporadically fails due to the CI system being overloaded. The client-side polling is much less efficient than server-side polling would be. The client would request a large amount of expensive-to-compute information and throw out most of the SHOW ENGINE INNODB STATUS output, 10 times per second, for 60 seconds.

            I think that it is more efficient and more useful to implement server-side waiting:

            SET GLOBAL innodb_max_purge_lag_wait=0;
            

            If purge is not allowed to proceed due to an active read view that is held by some connection, potentially including the current one, then the wait may be interrupted to normal statement or test case timeout. But it would no longer limited to the 60 seconds that the wait_all_purged.inc implemented.

            This new dummy global variable would also assist in upgrades from 10.2 or earlier to 10.3 or later, avoiding MDEV-15912 that was caused by the undo log format change that MDEV-12288 introduced in 10.3.

            marko Marko Mäkelä added a comment - I suspect that the test sporadically fails due to the CI system being overloaded. The client-side polling is much less efficient than server-side polling would be. The client would request a large amount of expensive-to-compute information and throw out most of the SHOW ENGINE INNODB STATUS output, 10 times per second, for 60 seconds. I think that it is more efficient and more useful to implement server-side waiting: SET GLOBAL innodb_max_purge_lag_wait=0; If purge is not allowed to proceed due to an active read view that is held by some connection, potentially including the current one, then the wait may be interrupted to normal statement or test case timeout. But it would no longer limited to the 60 seconds that the wait_all_purged.inc implemented. This new dummy global variable would also assist in upgrades from 10.2 or earlier to 10.3 or later, avoiding MDEV-15912 that was caused by the undo log format change that MDEV-12288 introduced in 10.3.

            Marko, any chances this flag can come with some regression on the replication thread?.
            We are testing 10.4.17 (it was running 10.4.15 before) on a host that receives quite a bunch of REPLACEs. Same hardware with a host running 10.4.15 shows no lag.
            The only difference I have found is that by default

            innodb_max_purge_lag_wait

            was set to 4294967295 (the default) when we upgraded it to 10.4.17.
            I have set it back to 0 and restarted mariadb, and so far it looks like the host is catching up. Early to say this was the culprit as we have some strange writes pattern, but I would like to know if this could be the reason of this sudden lag?

            marostegui Manuel Arostegui added a comment - Marko, any chances this flag can come with some regression on the replication thread?. We are testing 10.4.17 (it was running 10.4.15 before) on a host that receives quite a bunch of REPLACEs. Same hardware with a host running 10.4.15 shows no lag. The only difference I have found is that by default innodb_max_purge_lag_wait was set to 4294967295 (the default) when we upgraded it to 10.4.17. I have set it back to 0 and restarted mariadb, and so far it looks like the host is catching up. Early to say this was the culprit as we have some strange writes pattern, but I would like to know if this could be the reason of this sudden lag?

            marostegui, this is a special variable whose value should always read as a constant, even after SET GLOBAL was executed.

            The SET GLOBAL statement is special: as long as the connection that executed as is active, that connection will wait until the history list length is at most the desired value. If that connection is holding an open read view, then it is possible that the history list length will keep growing. Say, if you executed

            START TRANSACTION WITH CONSISTENT SNAPSHOT;
            SET GLOBAL innodb_max_purge_lag_wait=0;
            

            then that would be almost guaranteed to kill performance. Only in the special case that there was nothing to purge, we would get through. In any other case, the read view that we opened before initiating the wait, would prevent purge of any transactions that were committed after the read view was created.

            The idea was that this SET GLOBAL statement would typically only be executed while the server is idle, and no new write transactions are arriving. It should be possible to use KILL QUERY or KILL CONNECTION to abort the wait. Or just disconnect the client.

            If SELECT @@GLOBAL.innodb.max_purge_lag_wait; is returning different values, that is a bug that could be filed and fixed. My intention was that it would always return a constant. I did not test that.

            Note: Specifying an initial value of innodb_max_purge_lag_wait in the start-up configuration has no effect.

            marko Marko Mäkelä added a comment - marostegui , this is a special variable whose value should always read as a constant, even after SET GLOBAL was executed. The SET GLOBAL statement is special: as long as the connection that executed as is active, that connection will wait until the history list length is at most the desired value. If that connection is holding an open read view, then it is possible that the history list length will keep growing. Say, if you executed START TRANSACTION WITH CONSISTENT SNAPSHOT; SET GLOBAL innodb_max_purge_lag_wait=0; then that would be almost guaranteed to kill performance. Only in the special case that there was nothing to purge, we would get through. In any other case, the read view that we opened before initiating the wait, would prevent purge of any transactions that were committed after the read view was created. The idea was that this SET GLOBAL statement would typically only be executed while the server is idle, and no new write transactions are arriving. It should be possible to use KILL QUERY or KILL CONNECTION to abort the wait. Or just disconnect the client. If SELECT @@GLOBAL.innodb.max_purge_lag_wait; is returning different values, that is a bug that could be filed and fixed. My intention was that it would always return a constant. I did not test that. Note: Specifying an initial value of innodb_max_purge_lag_wait in the start-up configuration has no effect.

            Thank you for the detailed answer. We will keep looking for the culprit of these lags!
            Thanks again

            marostegui Manuel Arostegui added a comment - Thank you for the detailed answer. We will keep looking for the culprit of these lags! Thanks again
            TheWitness Larry Adams added a comment -

            Is there any good documentation for this yet guys? I've got a massive history list due to very large number of update/insert's over the day, and then at the end of the day I rename that table. I wonder if that creates some sort of issue. But the history list is in the billions and my ibdata1 is 3+ TB. Not sure how to allow MariaDB to catch up if ever.

            TheWitness Larry Adams added a comment - Is there any good documentation for this yet guys? I've got a massive history list due to very large number of update/insert's over the day, and then at the end of the day I rename that table. I wonder if that creates some sort of issue. But the history list is in the billions and my ibdata1 is 3+ TB. Not sure how to allow MariaDB to catch up if ever.
            TheWitness Larry Adams added a comment -

            Make that hundreds of millions.

            MariaDB [cacti]> show global status like '%history%';
            +----------------------------+-----------+
            | Variable_name              | Value     |
            +----------------------------+-----------+
            | Innodb_history_list_length | 176100483 |
            +----------------------------+-----------+
            1 row in set (0.007 sec)
            

            TheWitness Larry Adams added a comment - Make that hundreds of millions. MariaDB [cacti]> show global status like '%history%'; +----------------------------+-----------+ | Variable_name | Value | +----------------------------+-----------+ | Innodb_history_list_length | 176100483 | +----------------------------+-----------+ 1 row in set (0.007 sec)

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.