[MDEV-20950] Reduce size of record offsets Created: 2019-11-02 Updated: 2023-02-24 Resolved: 2019-12-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.2.31, 10.3.22, 10.4.12, 10.5.1 |
| Type: | Task | Priority: | Major |
| Reporter: | Eugene Kosov (Inactive) | Assignee: | Eugene Kosov (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Description |
|
Now offsets have stack part and heap part which is used when stack part can not hold that big amount of fields in a table. `mem_heap_t` is used to very wide tables. And the idea is to replace this dichotomy with stack only wider array with optimized offset size: uint16_t instead of ulint. |
| Comments |
| Comment by Marko Mäkelä [ 2019-11-03 ] | |||||||||||||||
|
The maximum number of index fields varies as follows:
Also, in some contexts, such as accessing DB_TRX_ID or DB_ROLL_PTR, we will only need a prefix of the clustered index leaf page record fields. I hope that we can avoid allocating 1024*2 bytes for the offsets in many cases. | |||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-11-04 ] | |||||||||||||||
|
This is one possible solution. https://godbolt.org/z/mQ9kTj Accessing some offset is 2 pointer dereferences unless compiler will optimize it e.g. inside a loop. Design notes:
| |||||||||||||||
| Comment by Marko Mäkelä [ 2019-11-20 ] | |||||||||||||||
|
I mostly liked your changes so far. For innodb_page_size=64k, you must check the actual maximum record size of ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC records. I am rather sure that it is more than 16383 bytes, which you are assuming. Extrapolating your changes to 10.3, which introduces the REC_OFFS_DEFAULT bit, it looks like the maximum record size would shrink to 8191 bytes, which would become a problem also with innodb_page_size=32k. Perhaps we can do with a single flag bit, REC_OFFS_EXTERNAL = 0x8000? The REC_OFFS_SQL_NULL flag could be replaced with a magic value, such as 0x8000. We know that REC_OFFS_EXTERNAL can only be set if index->is_primary(), and hence we also know that the start offset of externally stored fields must be at least 6+7 (the combined size of the system columns DB_TRX_ID and DB_ROLL_PTR). For the rare case where a NULL value is updated in place to a NOT NULL value, or vice versa, for a ROW_FORMAT=REDUNDANT record, we can use rec_…_old() accessor functions directly. Similarly, in 10.3+, REC_OFFS_DEFAULT would no longer be a flag, but we could use a magic constant value 0x8001. Anything in the range of 0x8000 and 0x800c would be distinguishable from REC_OFFS_EXTERNAL. As far as I understand, the following changes are yet to be made:
| |||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2019-11-25 ] | |||||||||||||||
|
I have refactored my patch so it now uses uppers bits of the offset differently. Every offset has 2 bits for flags. It can contain 4 different values. My patch uses it to store offset type (normal, off-page, NULL). One more type INSTANTLY ADDED can easily be added while merging patch to 10.3 | |||||||||||||||
| Comment by Marko Mäkelä [ 2019-11-25 ] | |||||||||||||||
|
Thanks, this looks better. Still, I’d like to see how we would set the flags in the offsets header in the 10.3 version. We should have more bits available in that header field; the maximum extra_size should be around 2005 bytes, which fits in 11 bits. So, we should have 5 flag bits at our disposal. Also, I would like you to remove the heap parameter of rec_get_offsets(). | |||||||||||||||
| Comment by Marko Mäkelä [ 2019-11-25 ] | |||||||||||||||
|
Please file a separate 10.5 task for removing the heap parameter of rec_get_offsets(). It could be a too big change for GA versions. | |||||||||||||||
| Comment by Matthias Leich [ 2019-12-03 ] | |||||||||||||||
|
| |||||||||||||||
| Comment by Etienne Guesnet [ 2020-01-30 ] | |||||||||||||||
|
The way this task was resolved conflicts with a system typedef in AIX. offset_t is already defined. Can I suggest to use a more specific name? E.g. inno_offset_t. |