Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11634

Improve the InnoDB change buffer




      The InnoDB change buffer https://blogs.oracle.com/mysqlinnodb/entry/mysql_5_5_innodb_change aims to make the write patterns of leaf pages of non-unique, non-spatial indexes more sequential. If a leaf page is not present in the buffer pool, the operation can be buffered by writing a record to the special change buffer B-tree, provided that no page overflow or underflow can occur. When the page is read into the buffer pool for whatever reason, the change buffer will be merged to it.

      The change buffer format has severe design problems. Actually we still support all change buffer formats (MySQL 4.0 and earlier MySQL 4.1 with innodb_file_per_table, 5.0 with ROW_FORMAT=COMPACT, 5.5 with delete and purge buffering), even though an upgrade should always be preceded by a slow shutdown that should have emptied the change buffer.

      The key in the 5.5 format is (tablespace_id, page_number, operation_count), followed by the operation code (insert/delete/purge), record metadata, and the actual data of the record.

      On DROP INDEX or DROP TABLE in a shared tablespace, InnoDB cannot easily delete all buffered records for the tablespace. So, it will not even try. Instead, on page allocation, InnoDB will try to drop buffered changes if any existed.

      If the InnoDB change buffer key was something like (tablespace_id, index_id, page_number), it would be easy to discard all buffered changes for a given index. We could even avoid writing index metadata to the change buffer records. But this would require that the dictionary metadata be available to the buffer pool interface that takes care of merging buffered changes.

      Allocating the change buffer in the InnoDB system tablespace is problematic. IMPORT/EXPORT would work better if this link to the system tablespace did not exist.
      On the other hand, while having a dedicated change buffer in each tablespace would make IMPORT/EXPORT easier, the page write access pattens would be less sequential than with the current global change buffer in the system tablespace.

      If the InnoDB change buffer is to be preserved, it would be good to define it as a no-rollback persistent table that privileged users can read.


        Issue Links



              Unassigned Unassigned
              marko Marko Mäkelä
              3 Vote for this issue
              14 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.