[MDEV-12288] Reset DB_TRX_ID when the history is removed, to speed up MVCC Created: 2017-03-17 Updated: 2023-12-19 Resolved: 2017-07-07 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.3.1 |
| Type: | Task | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | performance, transactions | ||
| Attachments: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 10.3.1-2 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The InnoDB clustered index record system columns DB_TRX_ID,DB_ROLL_PTR are used by multi-versioning and for determining if a record is implicitly locked. After the history is no longer needed, these columns can safely be reset to 0 and 1<<55 (to indicate a fresh insert). When a reader sees 0 in the DB_TRX_ID column, it can instantly determine that the record is present the read view. There is no need to acquire the transaction system mutex to check if the transaction exists, because writes can never be conducted by a transaction whose ID is 0. The persistent InnoDB undo log is split into two parts: insert_undo and update_undo. The insert_undo log is discarded at transaction commit or rollback, and the update_undo log is processed by the purge subsystem. As part of this change, we must merge the two types of undo logs into one, and the purge subsystem will reset the DB_TRX_ID whenever a clustered index record is ‘touched’. Upgrade considerationsThis will change the persistent InnoDB file formats, not only in the undo log and redo log, but also in the data files. There are some debug assertions that would not allow any record to contain DB_TRX_ID=0. A new redo log format tag must be introduced so that the writes of the system columns can be properly redo-logged. (See The undo log format will be changed as well. To be able to get rid of legacy code, InnoDB startup should detect if any old-format undo logs are present. If yes, startup will be refused, and the user must perform a slow shutdown (SET GLOBAL innodb_fast_shutdown=0) with the old server in order to empty the undo logs. A proof-of-concept implementation for 10.2 consists of 4 consecutive commits. It is missing any of the above-mentioned upgrade logic. |
| Comments |
| Comment by Marko Mäkelä [ 2017-03-17 ] |
|
I attached experimental patches to port this to MySQL 5.7.17 for performance evaluation. The second patch also depends on a patch that is attached to Based on a quick test with MySQL 5.7.17 and these patches, the setting internal_tmp_disk_storage_engine=MyISAM should be used when testing the patches related to temporary tables. (MariaDB does not support InnoDB as the optimizer-internal storage engine, and |
| Comment by Marko Mäkelä [ 2017-04-18 ] |
|
The following tasks are prerequisites for including the patch in a release:
|
| Comment by Marko Mäkelä [ 2017-07-06 ] |
|
bb-10.3-marko passes an upgrade test (see the commit message for details). The only thing that remains to be done is redo logging for setting DB_TRX_ID,DB_ROLL_PTR on ROW_FORMAT=COMPRESSED pages. We probably have to introduce a new redo log record type for that. |
| Comment by Jan Lindström (Inactive) [ 2017-07-06 ] |
|
Firstly, changes look correct but I have to say most of the critical changes are on code that is not familiar to me in great detail. Secondly, if upgrade tests pass, this code also works, there could be a new test case where both insert and update undo records are produced to persistent storage and then see that crash recovery works correctly (on different page sizes including compressed row format). Ok to push. |
| Comment by Marko Mäkelä [ 2017-07-06 ] |
|
Thank you for the review. I responded to your review comments, and will next implement the remaining redo log format changes. |
| Comment by Marko Mäkelä [ 2017-07-07 ] |
|
I implemented a new redo log format version and the MLOG_ZIP_WRITE_TRX_ID record. Nevertheless, the fields do get reset sometimes (during innodb_zip.bug56680 even for ROW_FORMAT=COMPRESSED tables). The implemented file format changes will allow the resetting to be improved later. So, I would push this now, before the file formats are frozen. |
| Comment by Marko Mäkelä [ 2017-07-07 ] |
|
When testing the recovery of the added MLOG_ZIP_WRITE_TRX_ID record, I noticed that the system columns are not being reset in every case, such as soon after committing an INSERT. I decided to push this nevertheless, so that we will have the necessary file format changes in place. The bug that the history is not always being reset can be fixed later when time permits. |
| Comment by Marko Mäkelä [ 2017-08-28 ] |
|
The resetting of the DB_TRX_ID column was fixed and regression tests added in |
| Comment by Marko Mäkelä [ 2017-12-04 ] |
|
svoj noticed that the function lock_rec_convert_impl_to_expl() was unnecessarily looking up trx_id=0, and acquiring trx_sys->mutex when doing the futile lookup. |
| Comment by Marko Mäkelä [ 2018-03-07 ] |
|
For the record: Due to this change, InnoDB moved to a single persistent undo log. By design, this ought to fix the upstream MySQL Bug #55283, which to my knowledge is still open. The bug should be present in all upstream InnoDB versions at least since MySQL 5.0, where the two-phase commit mechanism was introduced. |