Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34166

Server could hang with BP < 80M under stress

Details

    Description

      This issue was found while analysing MDEV-28800. For low buffer pool size, the scenario sometimes hangs instead of exiting with Fatal Error. The issue is likely there for long time but is not repeatable easily. It is better to fix it as it can affect internal tests using low BP.

      The issue can be repeated using debug enforcement to allocate more locks. The expected behaviour is ER_LOCK_TABLE_FULL instead of a server hang.

      Code

      diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
      index 0891ad5ceb7..d9a3c96aab0 100644
      --- a/storage/innobase/lock/lock0lock.cc
      +++ b/storage/innobase/lock/lock0lock.cc
      @@ -1739,6 +1739,11 @@ lock_rec_find_similar_on_page(
       	const trx_t*    trx)            /*!< in: transaction */
       {
       	ut_ad(lock_mutex_own());
      +	DBUG_EXECUTE_IF("innodb_skip_lock_bitmap", {
      +		if (!trx->in_rollback) {
      +			return nullptr;
      +		}
      +	});
       
       	for (/* No op */;
       	     lock != NULL;
      

      Test

      --source include/have_innodb.inc
      --source include/have_debug.inc
      --source include/have_debug_sync.inc
       
      call mtr.add_suppression("\\[Warning\\] InnoDB: Over 67 percent of the buffer pool.*");
       
      CREATE TABLE t1 (col1 INT) ENGINE=InnoDB;
      INSERT INTO t1 VALUES (1),(2),(3),(4),(5);
       
      SET DEBUG_DBUG="+d,innodb_skip_lock_bitmap";
       
      --error ER_LOCK_TABLE_FULL
      INSERT INTO t1 SELECT a.* FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g LIMIT 45000;
       
      SET DEBUG_DBUG="-d,innodb_skip_lock_bitmap";
       
      SELECT COUNT(*) FROM t1;
       
      DROP TABLE t1;
      

      Attachments

        Issue Links

          Activity

            debarun Debarun Banerjee created issue -

            BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages.

            Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP.

            marko Uploaded patch for review.

            debarun Debarun Banerjee added a comment - BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages. Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP. marko Uploaded patch for review.
            debarun Debarun Banerjee made changes -
            Field Original Value New Value
            Fix Version/s 10.6 [ 24028 ]
            debarun Debarun Banerjee made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            debarun Debarun Banerjee made changes -
            Assignee Debarun Banerjee [ JIRAUSER54513 ] Marko Mäkelä [ marko ]
            Status Confirmed [ 10101 ] In Review [ 10002 ]
            Roel Roel Van de Paar made changes -
            Labels hang
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            debarun Debarun Banerjee made changes -
            Fix Version/s 10.5.26 [ 29832 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            JIraAutomate JiraAutomate made changes -
            Fix Version/s 10.6.19 [ 29833 ]
            Fix Version/s 10.11.9 [ 29834 ]
            Fix Version/s 11.1.6 [ 29835 ]
            Fix Version/s 11.2.5 [ 29836 ]
            Fix Version/s 11.4.3 [ 29837 ]
            Roel Roel Van de Paar made changes -
            Description This issue was found while analysing[ MDEV-28800|https://jira.mariadb.org/browse/MDEV-28800]. For low buffer pool size, the scenario sometimes hangs instead of exiting with Fatal Error. The issue is likely there for long time but is not repeatable easily. It is better to fix it as it can affect internal tests using low BP.

            The issue can be repeated using debug enforcement to allocate more locks. The expected behaviour is ER_LOCK_TABLE_FULL instead of a server hang.

            +Code+
            {code:C++}
            diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
            index 0891ad5ceb7..d9a3c96aab0 100644
            --- a/storage/innobase/lock/lock0lock.cc
            +++ b/storage/innobase/lock/lock0lock.cc
            @@ -1739,6 +1739,11 @@ lock_rec_find_similar_on_page(
              const trx_t* trx) /*!< in: transaction */
             {
              ut_ad(lock_mutex_own());
            + DBUG_EXECUTE_IF("innodb_skip_lock_bitmap", {
            + if (!trx->in_rollback) {
            + return nullptr;
            + }
            + });
             
              for (/* No op */;
              lock != NULL;
            {code}

            +Test+
            {code:C++}
            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc

            call mtr.add_suppression("\\[Warning\\] InnoDB: Over 67 percent of the buffer pool.*");

            CREATE TABLE t1 (col1 INT) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1),(2),(3),(4),(5);

            SET DEBUG_DBUG="+d,innodb_skip_lock_bitmap";

            --error ER_LOCK_TABLE_FULL
            INSERT INTO t1 SELECT a.* FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g LIMIT 45000;

            SET DEBUG_DBUG="-d,innodb_skip_lock_bitmap";

            SELECT COUNT(*) FROM t1;

            DROP TABLE t1;
            {code}
            This issue was found while analysing MDEV-28800. For low buffer pool size, the scenario sometimes hangs instead of exiting with Fatal Error. The issue is likely there for long time but is not repeatable easily. It is better to fix it as it can affect internal tests using low BP.

            The issue can be repeated using debug enforcement to allocate more locks. The expected behaviour is ER_LOCK_TABLE_FULL instead of a server hang.

            +Code+
            {code:C++}
            diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
            index 0891ad5ceb7..d9a3c96aab0 100644
            --- a/storage/innobase/lock/lock0lock.cc
            +++ b/storage/innobase/lock/lock0lock.cc
            @@ -1739,6 +1739,11 @@ lock_rec_find_similar_on_page(
              const trx_t* trx) /*!< in: transaction */
             {
              ut_ad(lock_mutex_own());
            + DBUG_EXECUTE_IF("innodb_skip_lock_bitmap", {
            + if (!trx->in_rollback) {
            + return nullptr;
            + }
            + });
             
              for (/* No op */;
              lock != NULL;
            {code}

            +Test+
            {code:C++}
            --source include/have_innodb.inc
            --source include/have_debug.inc
            --source include/have_debug_sync.inc

            call mtr.add_suppression("\\[Warning\\] InnoDB: Over 67 percent of the buffer pool.*");

            CREATE TABLE t1 (col1 INT) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (1),(2),(3),(4),(5);

            SET DEBUG_DBUG="+d,innodb_skip_lock_bitmap";

            --error ER_LOCK_TABLE_FULL
            INSERT INTO t1 SELECT a.* FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g LIMIT 45000;

            SET DEBUG_DBUG="-d,innodb_skip_lock_bitmap";

            SELECT COUNT(*) FROM t1;

            DROP TABLE t1;
            {code}
            marko Marko Mäkelä made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 206179
            marko Marko Mäkelä made changes -

            People

              debarun Debarun Banerjee
              debarun Debarun Banerjee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.