Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34167

We fail to terminate transaction early with ER_LOCK_TABLE_FULL when lock memory is growing

Details

    Description

      This issue was found while analysing MDEV-28800. When lock memory started occupying large amount of buffer pool we fail to terminate the transaction with ER_LOCK_TABLE_FULL. It eventually leads to a server crash with Innodb exiting with Fatal Error.

      ib::fatal() << "Over 95 percent of the buffer pool is"
                  " occupied by lock heaps"
      

      This is a regression looks to have been introduced in 10.6 by

      commit b6a2472489accf0ae9ac3655ffe9b2997ab267ba
      Author: Daniel Black <daniel@mariadb.org>
      Date:   Tue Feb 22 17:42:59 2022 +1100
       
          MDEV-27891: SIGSEGV in InnoDB buffer pool resize
      

         bool running_out() const
         {
           return !recv_recovery_is_on() &&
      -      UNIV_UNLIKELY(UT_LIST_GET_LEN(free) + UT_LIST_GET_LEN(LRU) <
      -                    std::min(curr_size, old_size) / 4);
      +      UT_LIST_GET_LEN(free) + UT_LIST_GET_LEN(LRU) <
      +        n_chunks_new / 4 * chunks->size;
      +  }
      

      The integer division (n_chunks_new / 4) becomes zero whenever the total number of chunks are < 4 making the check completely ineffective for such cases. Also the check is inaccurate for larger chunks and needs to be corrected.

      I will put forward a patch soon.

      Attachments

        Issue Links

          Activity

            debarun Debarun Banerjee created issue -

            The issue can be reproduced by the same test mentioned in MDEV-28800

            Start server with --innodb_buffer_pool_size=20M
             
            CREATE TABLE t1 (d DOUBLE) ENGINE=InnoDB;
            INSERT INTO t1 VALUES (0x0061),(0x0041),(0x00E0),(0x00C0),(0x1EA3),(0x1EA2),(0x00E3),(0x00C3),(0x00E1),(0x00C1),(0x1EA1),(0x1EA0);
            INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6;
            

            Also, the mtr test in MDEV-34166 would be the easiest to test it once we have the bug merged to 10.6.

            I have uploaded the patch for review.

            debarun Debarun Banerjee added a comment - The issue can be reproduced by the same test mentioned in MDEV-28800 Start server with --innodb_buffer_pool_size=20M   CREATE TABLE t1 (d DOUBLE) ENGINE=InnoDB; INSERT INTO t1 VALUES (0x0061),(0x0041),(0x00E0),(0x00C0),(0x1EA3),(0x1EA2),(0x00E3),(0x00C3),(0x00E1),(0x00C1),(0x1EA1),(0x1EA0); INSERT INTO t1 SELECT t1.* FROM t1,t1 t2,t1 t3,t1 t4,t1 t5,t1 t6; Also, the mtr test in MDEV-34166 would be the easiest to test it once we have the bug merged to 10.6. I have uploaded the patch for review.
            debarun Debarun Banerjee made changes -
            Field Original Value New Value
            Fix Version/s 10.6 [ 24028 ]
            Labels regression-10.6
            debarun Debarun Banerjee made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            debarun Debarun Banerjee made changes -
            Assignee Debarun Banerjee [ JIRAUSER54513 ] Marko Mäkelä [ marko ]
            Status Confirmed [ 10101 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -

            Thank you. I wonder if we should revise sel_set_rec_lock() as well. All other code paths that result in ER_LOCK_TABLE_FULL are guarded solely by buf_pool.running_out(), but in that function we are additionally checking the number of explicit locks held by the current transaction.

            marko Marko Mäkelä added a comment - Thank you. I wonder if we should revise sel_set_rec_lock() as well. All other code paths that result in ER_LOCK_TABLE_FULL are guarded solely by buf_pool.running_out() , but in that function we are additionally checking the number of explicit locks held by the current transaction.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            marko Interesting observation but I think it is not void of impact and I would be sceptic making such changes part of this patch in 10.6. I checked the commit that introduced "UT_LIST_GET_LEN(trx->lock.trx_locks) > 10000" in sel_set_rec_lock() but no specific comment there. Intuitively, we are trying to avoid throwing error for transactions that haven't allocated a lot of locks in locking select path. So, it could cause a change in behaviour and perhaps something we could consider in 11.* versions later.

            debarun Debarun Banerjee added a comment - marko Interesting observation but I think it is not void of impact and I would be sceptic making such changes part of this patch in 10.6. I checked the commit that introduced "UT_LIST_GET_LEN(trx->lock.trx_locks) > 10000" in sel_set_rec_lock() but no specific comment there. Intuitively, we are trying to avoid throwing error for transactions that haven't allocated a lot of locks in locking select path. So, it could cause a change in behaviour and perhaps something we could consider in 11.* versions later.
            debarun Debarun Banerjee made changes -
            Fix Version/s 10.6.19 [ 29833 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            JIraAutomate JiraAutomate made changes -
            Fix Version/s 10.11.9 [ 29834 ]
            Fix Version/s 11.1.6 [ 29835 ]
            Fix Version/s 11.2.5 [ 29836 ]
            Fix Version/s 11.4.3 [ 29837 ]

            debarun, thank you. I see that the condition was originally added (MySQL link) almost 20 years ago in MySQL 4.1.8.

            marko Marko Mäkelä added a comment - debarun , thank you. I see that the condition was originally added ( MySQL link ) almost 20 years ago in MySQL 4.1.8.

            CREATE TABLE t (c INT, d INT, KEY d(d)) ENGINE=InnoDB;
            INSERT INTO t VALUES (0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8);
            INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g;
            

            Before the fix:

            11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized)

            11.5.0-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g;
            ERROR 1206 (HY000): The total number of locks exceeds the lock table size
            

            11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized)

            2024-05-29 16:15:11 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.
             
            =====================================
            2024-05-29 16:15:11 0x14d3585f06c0 INNODB MONITOR OUTPUT
            

            After the fix (10.6 used as 11.5 does not contain the fix yet):

            10.6.19 5ba542e9eec0ef5d69883c89755211f7d0e5d0e1 (Optimized)

            10.6.19-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g;
            Query OK, 4782969 rows affected (32.084 sec)
            Records: 4782969  Duplicates: 0  Warnings: 0
            

            Thank you debarun for fixing this!

            Roel Roel Van de Paar added a comment - CREATE TABLE t (c INT , d INT , KEY d(d)) ENGINE=InnoDB; INSERT INTO t VALUES (0,0),(1,1),(2,2),(3,3),(4,4),(5,5),(6,6),(7,7),(8,8); INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g; Before the fix: 11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized) 11.5.0-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g; ERROR 1206 (HY000): The total number of locks exceeds the lock table size 11.5.0 e4afa610539ae01164485554e2de839bea9de816 (Optimized) 2024-05-29 16:15:11 4 [Warning] InnoDB: Over 67 percent of the buffer pool is occupied by lock heaps or the adaptive hash index! Check that your transactions do not set too many row locks. innodb_buffer_pool_size=126M. Starting the InnoDB Monitor to print diagnostics.   ===================================== 2024-05-29 16:15:11 0x14d3585f06c0 INNODB MONITOR OUTPUT After the fix (10.6 used as 11.5 does not contain the fix yet): 10.6.19 5ba542e9eec0ef5d69883c89755211f7d0e5d0e1 (Optimized) 10.6.19-opt>INSERT INTO t SELECT a.* FROM t a,t b,t c,t d,t e,t f,t g; Query OK, 4782969 rows affected (32.084 sec) Records: 4782969 Duplicates: 0 Warnings: 0 Thank you debarun for fixing this!
            Roel Roel Van de Paar made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 206179

            People

              debarun Debarun Banerjee
              debarun Debarun Banerjee
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.