I mostly liked your changes so far. For innodb_page_size=64k, you must check the actual maximum record size of ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC records. I am rather sure that it is more than 16383 bytes, which you are assuming. Extrapolating your changes to 10.3, which introduces the REC_OFFS_DEFAULT bit, it looks like the maximum record size would shrink to 8191 bytes, which would become a problem also with innodb_page_size=32k.
Perhaps we can do with a single flag bit, REC_OFFS_EXTERNAL = 0x8000? The REC_OFFS_SQL_NULL flag could be replaced with a magic value, such as 0x8000. We know that REC_OFFS_EXTERNAL can only be set if index->is_primary(), and hence we also know that the start offset of externally stored fields must be at least 6+7 (the combined size of the system columns DB_TRX_ID and DB_ROLL_PTR). For the rare case where a NULL value is updated in place to a NOT NULL value, or vice versa, for a ROW_FORMAT=REDUNDANT record, we can use rec_…_old() accessor functions directly. Similarly, in 10.3+, REC_OFFS_DEFAULT would no longer be a flag, but we could use a magic constant value 0x8001. Anything in the range of 0x8000 and 0x800c would be distinguishable from REC_OFFS_EXTERNAL.
As far as I understand, the following changes are yet to be made:
- In non-debug builds, make rec_offs_set_n_alloc() no-op and omit the corresponding element of the offsets header. The new first element of offsets would be rec_offs_n_fields().
- Remove the heap parameter of rec_get_offsets(), and possibly adjust some mem_heap_create() or mem_heap_alloc() calls accordingly.
The maximum number of index fields varies as follows:
Also, in some contexts, such as accessing DB_TRX_ID or DB_ROLL_PTR, we will only need a prefix of the clustered index leaf page record fields.
I hope that we can avoid allocating 1024*2 bytes for the offsets in many cases.