Details

    Description

      Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW-locks across handler API calls. (Until that version, btr_search_latch for the adaptive hash index could be held, and there was a special call handlerton::release_temporary_latches.)

      During UPDATE operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini-transaction and to protect the current page with the page latch (buf_block_t::lock) across calls:

      1. Introduce row_prebuilt_t::mtr and keep it open.
      2. Avoid mtr_t::commit() between row reads
      3. Avoid storing & restoring btr_pcur_t position
      4. If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then btr_pcur_store_position(); mtr.commit() will have to be called before the wait and mtr.start();btr_pcur_restore_position(); after it.

      This change could remove any benefit of the row_prebuilt_t::fetch_cache (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables.

      Mini-transactions for single-row UPDATE/DELETE

      1. Search and S-latch the PRIMARY KEY leaf page (get explicit transactional lock)
      2. X-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction
      3. Write undo log record
      4. Modify the PRIMARY KEY index
      5. (For each off-page column, use 1 mini-transaction per page written.)
      6. (For each secondary index, modify the index.)
      7. Commit the user transaction

      There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini-transactions, because the first two writes were merged.)

      We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini-transactions will be needed:

      1. Search and X-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with implicit transactional locking)
      2. (For each off-page column, use 1 mini-transaction per page written.)
      3. (For each secondary index, modify the index.)
      4. Commit the user transaction

      If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single-row user transaction.)

      The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for UPDATE and DELETE. When there is no locking conflict, this should greatly reduce the contention on lock_sys.mutex.

      Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in log_sys.mutex or whatever MDEV-14425 will be replacing it with.

      Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini-transaction.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            Component/s Locking [ 10900 ]
            Component/s Optimizer [ 10200 ]
            NRE Projects RM_105_CANDIDATE
            psergei Sergei Petrunia made changes -
            Attachment psergey-mdev16232-poc-r1.diff [ 47938 ]
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Eugene Kosov [ kevg ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 10.5 [ 23123 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Assignee Eugene Kosov [ kevg ] Thirunarayanan Balathandayuthapani [ thiru ]
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.8 [ 26121 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 87363 ] MariaDB v4 [ 131825 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.8 [ 26121 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.9 [ 26905 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 10.10 [ 27530 ]
            AirFocus AirFocus made changes -
            Description Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW-locks across handler API calls. (Until that version, {{btr_search_latch}} for the adaptive hash index could be held, and there was a special call {{handlerton::release_temporary_latches}}.)

            During {{UPDATE}} operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini-transaction and to protect the current page with the page latch ({{buf_block_t::lock}}) across calls:
            # Introduce {{row_prebuilt_t::mtr}} and keep it open.
            # Avoid {{mtr_t::commit()}} between row reads
            # Avoid storing & restoring {{btr_pcur_t}} position
            # If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then {{btr_pcur_store_position(); mtr.commit()}} will have to be called before the wait and {{mtr.start();btr_pcur_restore_position();}} after it.

            This change could remove any benefit of the {{row_prebuilt_t::fetch_cache}} (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables.

            h2. Mini-transactions for single-row {{UPDATE}}/{{DELETE}}
            # Search and S-latch the PRIMARY KEY leaf page (get *explicit* transactional lock)
            # X-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction
            # Write undo log record
            # Modify the PRIMARY KEY index
            # (For each off-page column, use 1 mini-transaction per page written.)
            # (For each secondary index, modify the index.)
            # Commit the user transaction

            There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini-transactions, because the first two writes were merged.)

            We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini-transactions will be needed:
            # Search and X-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with *implicit* transactional locking)
            # (For each off-page column, use 1 mini-transaction per page written.)
            # (For each secondary index, modify the index.)
            # Commit the user transaction

            If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single-row user transaction.)

            The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for {{UPDATE}} and {{DELETE}}. When there is no locking conflict, this should greatly reduce the contention on {{lock_sys.mutex}}.

            Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in {{log_sys.mutex}} or whatever MDEV-14425 will be replacing it with.

            Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini-transaction.
            Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW\-locks across handler API calls. (Until that version, {{btr_search_latch}} for the adaptive hash index could be held, and there was a special call {{handlerton::release_temporary_latches}}.)

            During {{UPDATE}} operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini\-transaction and to protect the current page with the page latch ({{buf_block_t::lock}}) across calls:

            # Introduce {{row_prebuilt_t::mtr}} and keep it open.
            # Avoid {{mtr_t::commit()}} between row reads
            # Avoid storing & restoring {{btr_pcur_t}} position
            # If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then {{btr_pcur_store_position(); mtr.commit()}} will have to be called before the wait and {{mtr.start();btr_pcur_restore_position();}} after it.

            This change could remove any benefit of the {{row_prebuilt_t::fetch_cache}} (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables.

            h2. Mini-transactions for single-row {{UPDATE}}/{{DELETE}}

            # Search and S\-latch the PRIMARY KEY leaf page (get *explicit* transactional lock)
            # X\-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction
            # Write undo log record
            # Modify the PRIMARY KEY index
            # (For each off-page column, use 1 mini-transaction per page written.)
            # (For each secondary index, modify the index.)
            # Commit the user transaction

            There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini\-transactions, because the first two writes were merged.)

            We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini\-transactions will be needed:

            # Search and X\-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with *implicit* transactional locking)
            # (For each off-page column, use 1 mini-transaction per page written.)
            # (For each secondary index, modify the index.)
            # Commit the user transaction

            If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single\-row user transaction.)

            The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for {{UPDATE}} and {{DELETE}}. When there is no locking conflict, this should greatly reduce the contention on {{lock_sys.mutex}}.

            Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in {{log_sys.mutex}} or whatever MDEV-14425 will be replacing it with.

            Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini\-transaction.
            marko Marko Mäkelä made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Sergei Petrunia [ psergey ]
            psergei Sergei Petrunia made changes -
            Fix Version/s 10.12 [ 28320 ]
            Fix Version/s 10.11 [ 27614 ]
            psergei Sergei Petrunia made changes -
            serg Sergei Golubchik made changes -
            serg Sergei Golubchik made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.1 [ 28549 ]
            Fix Version/s 11.0 [ 28320 ]
            psergei Sergei Petrunia made changes -
            Assignee Sergei Petrunia [ psergey ] Oleg Smirnov [ JIRAUSER50405 ]
            psergei Sergei Petrunia made changes -
            Assignee Oleg Smirnov [ JIRAUSER50405 ] Thirunarayanan Balathandayuthapani [ thiru ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Fix Version/s 11.2 [ 28603 ]
            Fix Version/s 11.1 [ 28549 ]
            julien.fritsch Julien Fritsch made changes -
            julien.fritsch Julien Fritsch made changes -
            marko Marko Mäkelä made changes -
            Fix Version/s 11.3 [ 28565 ]
            Fix Version/s 11.2 [ 28603 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.3 [ 28565 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.5 [ 29506 ]
            Fix Version/s 11.4 [ 29301 ]
            julien.fritsch Julien Fritsch made changes -
            Issue Type Task [ 3 ] New Feature [ 2 ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 11.6 [ 29515 ]
            Fix Version/s 11.5 [ 29506 ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Debarun Banerjee [ JIRAUSER54513 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.7 [ 29815 ]
            Fix Version/s 11.6 [ 29515 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 156702 103675
            ralf.gebhardt Ralf Gebhardt made changes -
            Issue Type New Feature [ 2 ] Task [ 3 ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Fix Version/s 11.8 [ 29921 ]
            Fix Version/s 11.7 [ 29815 ]
            julien.fritsch Julien Fritsch made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 11.8 [ 29921 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -

            People

              debarun Debarun Banerjee
              marko Marko Mäkelä
              Votes:
              5 Vote for this issue
              Watchers:
              21 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.