Details
-
Task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
Description
Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW-locks across handler API calls. (Until that version, btr_search_latch for the adaptive hash index could be held, and there was a special call handlerton::release_temporary_latches.)
During UPDATE operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini-transaction and to protect the current page with the page latch (buf_block_t::lock) across calls:
- Introduce row_prebuilt_t::mtr and keep it open.
- Avoid mtr_t::commit() between row reads
- Avoid storing & restoring btr_pcur_t position
- If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then btr_pcur_store_position(); mtr.commit() will have to be called before the wait and mtr.start();btr_pcur_restore_position(); after it.
This change could remove any benefit of the row_prebuilt_t::fetch_cache (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables.
Mini-transactions for single-row UPDATE/DELETE
- Search and S-latch the PRIMARY KEY leaf page (get explicit transactional lock)
- X-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction
- Write undo log record
- Modify the PRIMARY KEY index
- (For each off-page column, use 1 mini-transaction per page written.)
- (For each secondary index, modify the index.)
- Commit the user transaction
There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini-transactions, because the first two writes were merged.)
We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini-transactions will be needed:
- Search and X-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with implicit transactional locking)
- (For each off-page column, use 1 mini-transaction per page written.)
- (For each secondary index, modify the index.)
- Commit the user transaction
If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single-row user transaction.)
The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for UPDATE and DELETE. When there is no locking conflict, this should greatly reduce the contention on lock_sys.mutex.
Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in log_sys.mutex or whatever MDEV-14425 will be replacing it with.
Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini-transaction.
Attachments
Issue Links
- blocks
-
MDEV-21452 Use condition variables and normal mutexes instead of InnoDB os_event and mutex
-
- Closed
-
-
MDEV-30078 SQL Layer support for: Use fewer InnoDB mini-transactions
-
- Stalled
-
- relates to
-
MDEV-17603 Allow statement-based replication for REPLACE and INSERT…ON DUPLICATE KEY UPDATE
-
- Closed
-
-
MDEV-21974 InnoDB DML under backup locks make buffer pool usage grow permanently
-
- Open
-
-
MDEV-24813 Locking full table scan fails to use table-level locking
-
- In Review
-
-
MDEV-33251 Redundant check on prebuilt::n_rows_fetched overflow
-
- Closed
-
-
MDEV-34791 Redundant page lookups hurt performance
-
- Closed
-
-
MDEV-10962 Deadlock with 3 concurrent DELETEs by unique key
-
- Closed
-
-
MDEV-11215 Several locks taken to same record inside a transaction.
-
- Stalled
-
-
MDEV-14425 Change the InnoDB redo log format to reduce write amplification
-
- Closed
-
-
MDEV-16168 Performance regression on sysbench write benchmarks from 10.2 to 10.3
-
- Closed
-
-
MDEV-16675 Unnecessary explicit lock acquisition during UPDATE or DELETE
-
- Closed
-
-
MDEV-18746 Reduce the amount of mem_heap_create() or malloc()
-
- Open
-
-
MDEV-22413 Server hangs upon UPDATE/DELETE on a view reading from versioned partitioned table
-
- Closed
-
-
MDEV-24224 Gap lock on delete in 10.5 using READ COMMITTED
-
- Closed
-
-
MDEV-26779 reduce lock_sys.wait_mutex contention by using spinloop construct
-
- Closed
-
-
MDEV-30835 Inconsistent blocking of UPDATE and DELETE with the same WHERE clause
-
- Open
-
-
MDEV-36308 Unreasonable block in repeatable read
-
- Open
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Link | This issue relates to MDEV-11215 [ MDEV-11215 ] |
Link |
This issue relates to |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.4 [ 22408 ] |
Component/s | Locking [ 10900 ] | |
Component/s | Optimizer [ 10200 ] | |
NRE Projects | RM_105_CANDIDATE |
Attachment | psergey-mdev16232-poc-r1.diff [ 47938 ] |
Assignee | Thirunarayanan Balathandayuthapani [ thiru ] | Eugene Kosov [ kevg ] |
Link |
This issue relates to |
Fix Version/s | 10.5 [ 23123 ] |
Link | This issue relates to MDEV-18746 [ MDEV-18746 ] |
Link | This issue relates to MDEV-21974 [ MDEV-21974 ] |
Link |
This issue relates to |
Link |
This issue blocks |
Link |
This issue relates to |
Link | This issue relates to MDEV-24813 [ MDEV-24813 ] |
Link |
This issue relates to |
Assignee | Eugene Kosov [ kevg ] | Thirunarayanan Balathandayuthapani [ thiru ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Fix Version/s | 10.8 [ 26121 ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Workflow | MariaDB v3 [ 87363 ] | MariaDB v4 [ 131825 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.8 [ 26121 ] |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.9 [ 26905 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Fix Version/s | 10.11 [ 27614 ] | |
Fix Version/s | 10.10 [ 27530 ] |
Description |
Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW-locks across handler API calls. (Until that version, {{btr_search_latch}} for the adaptive hash index could be held, and there was a special call {{handlerton::release_temporary_latches}}.)
During {{UPDATE}} operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini-transaction and to protect the current page with the page latch ({{buf_block_t::lock}}) across calls: # Introduce {{row_prebuilt_t::mtr}} and keep it open. # Avoid {{mtr_t::commit()}} between row reads # Avoid storing & restoring {{btr_pcur_t}} position # If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then {{btr_pcur_store_position(); mtr.commit()}} will have to be called before the wait and {{mtr.start();btr_pcur_restore_position();}} after it. This change could remove any benefit of the {{row_prebuilt_t::fetch_cache}} (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables. h2. Mini-transactions for single-row {{UPDATE}}/{{DELETE}} # Search and S-latch the PRIMARY KEY leaf page (get *explicit* transactional lock) # X-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction # Write undo log record # Modify the PRIMARY KEY index # (For each off-page column, use 1 mini-transaction per page written.) # (For each secondary index, modify the index.) # Commit the user transaction There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini-transactions, because the first two writes were merged.) We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini-transactions will be needed: # Search and X-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with *implicit* transactional locking) # (For each off-page column, use 1 mini-transaction per page written.) # (For each secondary index, modify the index.) # Commit the user transaction If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single-row user transaction.) The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for {{UPDATE}} and {{DELETE}}. When there is no locking conflict, this should greatly reduce the contention on {{lock_sys.mutex}}. Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in {{log_sys.mutex}} or whatever Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini-transaction. |
Since MariaDB 10.2.2, InnoDB never holds any mutexes or RW\-locks across handler API calls. (Until that version, {{btr_search_latch}} for the adaptive hash index could be held, and there was a special call {{handlerton::release_temporary_latches}}.) During {{UPDATE}} operations, and also possibly during reads that perform range scans, it could help a lot to reuse the same InnoDB mini\-transaction and to protect the current page with the page latch ({{buf_block_t::lock}}) across calls: # Introduce {{row_prebuilt_t::mtr}} and keep it open. # Avoid {{mtr_t::commit()}} between row reads # Avoid storing & restoring {{btr_pcur_t}} position # If there is any possibility of a delay (such as, waiting for a row read from another table, or waiting for client connection I/O), then {{btr_pcur_store_position(); mtr.commit()}} will have to be called before the wait and {{mtr.start();btr_pcur_restore_position();}} after it. This change could remove any benefit of the {{row_prebuilt_t::fetch_cache}} (after 4 consecutive row reads, it’d prefetch 8 rows). Removing this cache would greatly reduce the InnoDB memory usage for partitioned tables. h2. Mini-transactions for single-row {{UPDATE}}/{{DELETE}} # Search and S\-latch the PRIMARY KEY leaf page (get *explicit* transactional lock) # X\-latch the PRIMARY KEY leaf page, update transaction directory page (rollback segment header page), allocate&initialize first undo log page of the transaction # Write undo log record # Modify the PRIMARY KEY index # (For each off-page column, use 1 mini-transaction per page written.) # (For each secondary index, modify the index.) # Commit the user transaction There are 1 read-only mini-transaction and 4 read-write mini-transactions for a 1-row user transaction! (With MariaDB 10.3.5, only 3 read-write mini\-transactions, because the first two writes were merged.) We can actually use a single mini-transaction for all this. Only if there are secondary indexes or off-page columns, multiple mini\-transactions will be needed: # Search and X\-latch the PRIMARY KEY leaf page, update transaction directory page, allocate&initialize first undo log page, write undo log record, modify the PRIMARY KEY index (with *implicit* transactional locking) # (For each off-page column, use 1 mini-transaction per page written.) # (For each secondary index, modify the index.) # Commit the user transaction If there are no off-page columns or secondary indexes, the user transaction commit can be merged to the same mini-transaction. (This is a special case for a single\-row user transaction.) The merging of the 'read' and 'write' steps under a single page lock would implement implicit locking for {{UPDATE}} and {{DELETE}}. When there is no locking conflict, this should greatly reduce the contention on {{lock_sys.mutex}}. Using fewer mini-transactions for writes also means less communication with the redo log buffer, which should reduce contention in {{log_sys.mutex}} or whatever Note: For any record modifications, we must always commit and restart the mini-transaction between rows, because we cannot move to another B-tree page after acquiring an undo page lock. Reads can reuse the same mini\-transaction. |
Assignee | Thirunarayanan Balathandayuthapani [ thiru ] | Sergei Petrunia [ psergey ] |
Fix Version/s | 10.12 [ 28320 ] | |
Fix Version/s | 10.11 [ 27614 ] |
Link | This issue relates to MDEV-30078 [ MDEV-30078 ] |
Link | This issue is blocked by MDEV-30078 [ MDEV-30078 ] |
Link | This issue relates to MDEV-30078 [ MDEV-30078 ] |
Fix Version/s | 11.1 [ 28549 ] | |
Fix Version/s | 11.0 [ 28320 ] |
Assignee | Sergei Petrunia [ psergey ] | Oleg Smirnov [ JIRAUSER50405 ] |
Assignee | Oleg Smirnov [ JIRAUSER50405 ] | Thirunarayanan Balathandayuthapani [ thiru ] |
Link | This issue relates to MDEV-30835 [ MDEV-30835 ] |
Link | This issue blocks MDEV-16402 [ MDEV-16402 ] |
Fix Version/s | 11.2 [ 28603 ] | |
Fix Version/s | 11.1 [ 28549 ] |
Link | This issue blocks MDEV-30078 [ MDEV-30078 ] |
Link | This issue is blocked by MDEV-30078 [ MDEV-30078 ] |
Fix Version/s | 11.3 [ 28565 ] | |
Fix Version/s | 11.2 [ 28603 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Fix Version/s | 11.4 [ 29301 ] | |
Fix Version/s | 11.3 [ 28565 ] |
Fix Version/s | 11.5 [ 29506 ] | |
Fix Version/s | 11.4 [ 29301 ] |
Issue Type | Task [ 3 ] | New Feature [ 2 ] |
Link |
This issue relates to |
Fix Version/s | 11.6 [ 29515 ] | |
Fix Version/s | 11.5 [ 29506 ] |
Assignee | Thirunarayanan Balathandayuthapani [ thiru ] | Debarun Banerjee [ JIRAUSER54513 ] |
Fix Version/s | 11.7 [ 29815 ] | |
Fix Version/s | 11.6 [ 29515 ] |
Zendesk Related Tickets | 156702 103675 |
Issue Type | New Feature [ 2 ] | Task [ 3 ] |
Link |
This issue relates to |
Fix Version/s | 11.8 [ 29921 ] | |
Fix Version/s | 11.7 [ 29815 ] |
Priority | Critical [ 2 ] | Major [ 3 ] |
Fix Version/s | 11.8 [ 29921 ] |
Link | This issue relates to MDEV-36308 [ MDEV-36308 ] |
Link | This issue blocks MDEV-16402 [ MDEV-16402 ] |