[MDEV-6076] Persistent AUTO_INCREMENT for InnoDB Created: 2014-04-11 Updated: 2024-01-19 Resolved: 2016-12-29 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.2.4 |
| Type: | Task | Priority: | Major |
| Reporter: | Jan Lindström (Inactive) | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | innodb | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | 10.2.4-1 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The current auto_increment behavior in the InnoDB engine is sub-optimal. As it currently functions, the auto_increment value is stored until the server shuts down or resets, and then is rebuilt based on values in the table when it starts up again. Furthermore, in 5.6 this ought to become even worse, because tables can be evicted from the InnoDB data dictionary cache. We may get a too low auto-increment value even without shutdown/restart. When a table is evicted, InnoDB will forget the current auto-increment value, and it will do SELECT MAX(auto_inc_column) next time when the table is accessed. |
| Comments |
| Comment by Jan Lindström (Inactive) [ 2014-09-01 ] | |||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2014-12-02 ] | |||||||||||||||||||||||||||||||||||
|
Current problem: When we update a auto_increment value for a table t1 on one client and auto_increment value for a table t2 on second client, this leads lock wait. | |||||||||||||||||||||||||||||||||||
| Comment by Rick James (Inactive) [ 2016-02-16 ] | |||||||||||||||||||||||||||||||||||
|
See also http://bugs.mysql.com/bug.php?id=199 | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-11-30 ] | |||||||||||||||||||||||||||||||||||
|
MySQL 8.0.0 implements WL#6204: InnoDB persistent max value for autoinc columns. The patch is complex, but one thing is better than in the contributed patch that I have seen. I think that we can combine the two ideas as follows:
With the above approach, no changes should be necessary to crash recovery, backup tools and redo-log based replication. They would simply start scanning the redo log from some log checkpoint until the latest completed mini-transaction. If an auto-increment value was updated, there would be a redo log record for it. The last write would win; the value as of the latest completed mini-transaction would ultimately appear in the root page. The same approach should be possible for any dynamic metadata that we could imagine: UPDATE_TIME, maximum LSN or TRX_ID in a table, number of delete-marked records in an index, number of purgeable records, whatever we can imagine. We would use some bytes within an InnoDB tablespace for persisting the value, and we would have 2 fields in a main-memory struct: the last redo-logged value and the last value written to the page. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-01 ] | |||||||||||||||||||||||||||||||||||
|
In other words, my idea is to split the main-memory representation of a buffer pool page into two parts: the base part, and a ‘shadow’ portion that is resides in some auxiliary data structure outside the page frame in the buffer pool. A correct implementation of the idea must fulfill the following invariants:
| |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-01 ] | |||||||||||||||||||||||||||||||||||
|
Premature optimization is the source of all evil. I think that we should run some benchmarks before starting to implement complicated logic. What if we updated the AUTO_INCREMENT value directly in the root page, but protected it with SX latch instead of X latch? What would the performance be in a workload that involves concurrent readers and writers and frequent page allocations? The test setup could be something like this:
The above startup parameters ensure that readers will actually traverse the index tree and that the B-tree will consist of multiple levels, increasing the contention on the root page latch. The initialization could be something like this:
The workload would be something like this: A few readers executing
and one or more writers executing
or (in another benchmark)
or (yet another benchmark)
With the benchmark, I would like to see if there is any noticeable performance regression from the simple patch (write the auto-increment counter directly to the root page, protected by SX-latch) with respect to the baseline (no persistent auto-increment counter). | |||||||||||||||||||||||||||||||||||
| Comment by zhangyuan [ 2016-12-01 ] | |||||||||||||||||||||||||||||||||||
|
Use SX lock is a good idea! But so far we have not supported SX lock in AliSQL5.6. I made a sysbench test in AliSQL5.6, the result shows that the impact of x latch is so little.
with different oltp-tables-count
summary | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-01 ] | |||||||||||||||||||||||||||||||||||
|
Thanks, these results are encouraging. Somewhat similar to the MySQL 8.0.0 implementation, I would persist the auto-increment counter in row_ins_clust_index_entry_low(), letting btr_cur_search_to_nth_level() acquire and hold the root page SX-latch or X-latch for us. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-12 ] | |||||||||||||||||||||||||||||||||||
|
The persistent AUTO_INCREMENT counter will introduce some changes to semantics. Some of them correspond to changes introduced in MySQL 8.0.0, while others are unique to the implementation in MariaDB. First the common changes:
| |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-14 ] | |||||||||||||||||||||||||||||||||||
|
Please review the branch bb-10.2-mdev-6076. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-14 ] | |||||||||||||||||||||||||||||||||||
|
mysql_upgrade will need some work to set the AUTO_INCREMENT attributes for InnoDB tables. We may also want to file a follow-up task for removing the requirement of having an index on AUTO_INCREMENT columns. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-15 ] | |||||||||||||||||||||||||||||||||||
|
I think that we can do upgrade from old data files transparently (without additions to mysql_upgrade) as follows: If PAGE_ROOT_AUTO_INC is 0, the table must be either empty or in old format. If the table is nonempty and an index on AUTO_INCREMENT column exists, initialize dict_table_t::autoinc from MAX(auto_increment_column) as we used to. Else, effectively let the AUTO_INCREMENT sequence start from 1. Note: I filed MDEV-11578 to remove the requirement for AUTO_INCREMENT columns to be indexed. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-15 ] | |||||||||||||||||||||||||||||||||||
|
I was unsure if PAGE_ROOT_AUTO_INC always was 0 on other than secondary index leaf pages. (Before MySQL 5.5, InnoDB had the bad habit of writing uninitialized garbage to unused data fields in pages.) Luckily it turns out that page_create() in mysql-3.23.49 initializes PAGE_MAX_TRX_ID to 0 whenever it is creating a B-tree page. This means that the upgrade method that I implemented (if PAGE_ROOT_AUTO_INC is 0 and the root page is not empty, read MAX(auto_inc_column)) is sound. I can only think of one possible glitch: Import a data file from a
The last statement would reset the persistent AUTO_INCREMENT sequence to the current maximum value. (MDEV-11578 may change the semantics.) | |||||||||||||||||||||||||||||||||||
| Comment by zhangyuan [ 2016-12-15 ] | |||||||||||||||||||||||||||||||||||
|
When delete all records from root page(not by truncate), root page will recreate by page_create_empty. PAGE_ROOT_AUTO_INC will be reset to 0.
| |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-15 ] | |||||||||||||||||||||||||||||||||||
|
Thanks, Zhangyuan! Yes, page_create_empty() or its callers must be adjusted so that PAGE_ROOT_AUTO_INC will be preserved. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-15 ] | |||||||||||||||||||||||||||||||||||
|
Some more code changes are needed to accommodate virtual columns (fix the test failures in --suite=vcol). | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-15 ] | |||||||||||||||||||||||||||||||||||
|
I pushed a fix to the conflict with | |||||||||||||||||||||||||||||||||||
| Comment by Jan Lindström (Inactive) [ 2016-12-16 ] | |||||||||||||||||||||||||||||||||||
|
This looks good now. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-16 ] | |||||||||||||||||||||||||||||||||||
|
Thanks! I just pushed to bb-10.2-mdev-6076 rebased to latest 10.2, so that we avoid one test failure due to a bug in slow shutdown that was introduced in The only code change since the review is the correction of a debug assertion that I had added late yesterday:
I also removed many restarts from the test innodb.autoinc_persist, testing more things across each restart. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2016-12-29 ] | |||||||||||||||||||||||||||||||||||
|
Now that 10.2.3 was released and 10.1 was merged to 10.2, I finally pushed this. | |||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-04-19 ] | |||||||||||||||||||||||||||||||||||
|
|