[MDEV-17520] Instant ALTER TABLE for failure-free column type changes Created: 2018-10-22 Updated: 2024-01-18 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | None |
| Type: | Task | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | ddl, instant | ||
| Attachments: |
|
|||||||||||||||||||||||||||||||||||||||||||||
| Issue Links: |
|
|||||||||||||||||||||||||||||||||||||||||||||
| Sub-Tasks: |
|
|||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The original InnoDB ROW_FORMAT=REDUNDANT is essentially storing each field as variable-length and possibly NULL. For that format, we can trivially allow an instantaneous change of a column from NOT NULL to NULL. The space-optimized row formats COMPACT and DYNAMIC do not allow NULL values to be represented for columns that were originally declared as NOT NULL. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for ROW_FORMAT=REDUNDANT in Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
The clustered index leaf pages of the table would be gradually converted into something that resembles ROW_FORMAT=REDUNDANT as a result of modifications. This will increase the size usage a little. Also, ROW_FORMAT=REDUNDANT limits the maximum in-page record size to 16,383 bytes, which for innodb_page_size=64k is less than the limit for COMPACT or DYNAMIC. Because secondary index records would remain in the original ROW_FORMAT, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from NOT NULL to NULL will require the indexes to be rebuilt if ROW_FORMAT is not REDUNDANT. Any INSERT or UPDATE after an instant ALTER that removes a NOT NULL constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles ROW_FORMAT=REDUNDANT, with the following differences:
In ROW_FORMAT=REDUNDANT, the record header will store the length of each column (including fixed-length and NULL columns), using n_fields or 2·n_fields bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·n_fields+1 bytes. The metadata BLOB that was introduced in MDEV-15562 will be augmented. The flag 1U << 14 will be set in dict_instant_t::non_pk_col_map[] for ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC columns that were originally created as NOT NULL but no longer carry this attribute. Based on this information, we will initialize n_nullable and n_core_null_bytes for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and ROW_FORMAT=REDUNDANT will use the latest definition. |
| Comments |
| Comment by Marko Mäkelä [ 2018-11-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I created bb-10.4-MDEV-17520 with a refactored version of what was originally developed in I changed the scope of this task and
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Some code changes of this work will be needed in the final version of | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Currently, this only fully works for ROW_FORMAT=REDUNDANT. For DYNAMIC and COMPRESSED, the basic case of INSERT after the instantaneous removal of NOT NULL will convert the pages to the flexible format. The UPDATE code paths have not been adjusted yet, and SELECT will misinterpret old pages, wrongly expecting there to be ‘is null’ flags in the record headers. Due to these omissions, the functionality is only enabled for ROW_FORMAT=REDUNDANT at the moment, and MDEV-17520.patch
To eliminate the effect of the adaptive hash index and the file system operations, we test as follows:
As expected, the INSERT performance is a little worse than with ROW_FORMAT=REDUNDANT, but a little better than with the default ROW_FORMAT=DYNAMIC.
This is because the file format is constant in that case. The INSERT duration would only vary by some tens of seconds. Finally, here are the results with the baseline version (the latest 10.4 that was merged to the branch):
There does not seem to be significant performance degradation, or the degradation is within noise levels (the times can vary a couple of tens of seconds when rerunning). For the table ti, the results are not comparable, because the format is different. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-25 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The following benchmark is showing a slight performance improvement (instead of degradation) for the branch (1883da2a7e7f1fff067b62fbfdc9174e24b93a12) compared to plain 10.4 (27f3329ff6cb755b600d536347669bef1a7d98b5):
The difference is about 2%, while the noise should be 1% or less. For the slower row_format=dynamic, the test takes 5 minutes to run on my system. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-30 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The latest bb-10.4-MDEV-17520 passes the full mysql-test-run. Known problems:
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-12-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I reran my above mentioned benchmarks using clang 7.0.1 -O2 (instead of gcc 8.2.0 -O2, which I used last time). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-12-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The test main.function_defaults_innodb turned out to be crashing randomly, and after my fix, it is returning wrong result. That is currently the only disabled test. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Matthias Leich [ 2018-12-04 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-12-05 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mleich, the assertions on rec_offs_comp() or page_rec_is_comp() or page_is_comp() are certainly failing due to this work. I just need a list of all such assertion failures (with stack traces), so that I can evaluate them. I have relaxed many such assertions with || index->dual_format(), and it should be applicable also in this case. Today I finished addressing the correctness-critical FIXME comments, except the one in dict_stats_analyze_index_level(). | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-12-27 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I rebased the work on the latest 10.4 (slightly after the 10.4.1 release) and pushed to bb-10.4-MDEV-17520-2. The main reason why the work was stalled is the increased space usage due to the dual-format leaf pages. For ROW_FORMAT=COMPACT and ROW_FORMAT=DYNAMIC, a better approach would be to store multiple versions of the table definition in the metadata BLOB that was introduced in | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-04-03 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MDEV-17520_sec_indexes.10.4.3.diff |