Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34166

Server could hang with BP < 80M under stress

Details

    Description

      This issue was found while analysing MDEV-28800. For low buffer pool size, the scenario sometimes hangs instead of exiting with Fatal Error. The issue is likely there for long time but is not repeatable easily. It is better to fix it as it can affect internal tests using low BP.

      The issue can be repeated using debug enforcement to allocate more locks. The expected behaviour is ER_LOCK_TABLE_FULL instead of a server hang.

      Code

      diff --git a/storage/innobase/lock/lock0lock.cc b/storage/innobase/lock/lock0lock.cc
      index 0891ad5ceb7..d9a3c96aab0 100644
      --- a/storage/innobase/lock/lock0lock.cc
      +++ b/storage/innobase/lock/lock0lock.cc
      @@ -1739,6 +1739,11 @@ lock_rec_find_similar_on_page(
       	const trx_t*    trx)            /*!< in: transaction */
       {
       	ut_ad(lock_mutex_own());
      +	DBUG_EXECUTE_IF("innodb_skip_lock_bitmap", {
      +		if (!trx->in_rollback) {
      +			return nullptr;
      +		}
      +	});
       
       	for (/* No op */;
       	     lock != NULL;
      

      Test

      --source include/have_innodb.inc
      --source include/have_debug.inc
      --source include/have_debug_sync.inc
       
      call mtr.add_suppression("\\[Warning\\] InnoDB: Over 67 percent of the buffer pool.*");
       
      CREATE TABLE t1 (col1 INT) ENGINE=InnoDB;
      INSERT INTO t1 VALUES (1),(2),(3),(4),(5);
       
      SET DEBUG_DBUG="+d,innodb_skip_lock_bitmap";
       
      --error ER_LOCK_TABLE_FULL
      INSERT INTO t1 SELECT a.* FROM t1 a, t1 b, t1 c, t1 d, t1 e, t1 f, t1 g LIMIT 45000;
       
      SET DEBUG_DBUG="-d,innodb_skip_lock_bitmap";
       
      SELECT COUNT(*) FROM t1;
       
      DROP TABLE t1;
      

      Attachments

        Issue Links

          Activity

            BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages.

            Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP.

            marko Uploaded patch for review.

            debarun Debarun Banerjee added a comment - BUF_LRU_MIN_LEN (256) is too high value for low buffer pool(BP) size. For example, for BP size lower than 80M and 16 K page size, the limit is more than 5% of total BP and for lowest BP 5M, it is 80% of the BP. Non-data objects like explicit locks could occupy part of the BP pool reducing the pages available for LRU. If LRU reaches minimum limit and if no free pages are available, server would hang with page cleaner not able to free any more pages. Fix: To avoid such hang, we adjust the LRU limit lower than the limit for data objects as checked in buf_LRU_check_size_of_non_data_objects() i.e. one page less than 5% of BP. marko Uploaded patch for review.

            People

              debarun Debarun Banerjee
              debarun Debarun Banerjee
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.