[MDEV-20950] Reduce size of record offsets Created: 2019-11-02  Updated: 2023-02-24  Resolved: 2019-12-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: 10.2.31, 10.3.22, 10.4.12, 10.5.1

Type: Task Priority: Major
Reporter: Eugene Kosov (Inactive) Assignee: Eugene Kosov (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-19230 Get rid of dynamic part of record off... Closed
Problem/Incident
causes MDEV-21595 typedef offset_t conflicts on AIX Closed
causes MDEV-23198 mysqld crash when running REPLACE update Closed
causes MDEV-30567 rec_get_offsets() is not optimal Closed
Relates
relates to MDEV-21313 Introduce a class to store record off... Open
relates to MDEV-18746 Reduce the amount of mem_heap_create(... Open
relates to MDEV-30720 Expert feedback on changing the type ... Open

 Description   

Now offsets have stack part and heap part which is used when stack part can not hold that big amount of fields in a table. `mem_heap_t` is used to very wide tables. And the idea is to replace this dichotomy with stack only wider array with optimized offset size: uint16_t instead of ulint.



 Comments   
Comment by Marko Mäkelä [ 2019-11-03 ]

The maximum number of index fields varies as follows:

  • Clustered index non-leaf pages: At most 17 (up to 16 fields of PRIMARY KEY and a child page number)
  • Clustered index leaf pages: 1021 or 1024
  • Secondary index non-leaf pages: 16+16+1 (up to 16 fields of KEY and 16 fields of PRIMARY KEY) and a child page number
  • Secondary index leaf pages: 16+16

Also, in some contexts, such as accessing DB_TRX_ID or DB_ROLL_PTR, we will only need a prefix of the clustered index leaf page record fields.

I hope that we can avoid allocating 1024*2 bytes for the offsets in many cases.

Comment by Eugene Kosov (Inactive) [ 2019-11-04 ]

This is one possible solution. https://godbolt.org/z/mQ9kTj

Accessing some offset is 2 pointer dereferences unless compiler will optimize it e.g. inside a loop.

Design notes:

  • Only caller creates offsets either on stack or on heap.
  • Caller should have an ability to create offsets on a heap, because of possible long recursion on FK checks.
Comment by Marko Mäkelä [ 2019-11-20 ]

I mostly liked your changes so far. For innodb_page_size=64k, you must check the actual maximum record size of ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC records. I am rather sure that it is more than 16383 bytes, which you are assuming. Extrapolating your changes to 10.3, which introduces the REC_OFFS_DEFAULT bit, it looks like the maximum record size would shrink to 8191 bytes, which would become a problem also with innodb_page_size=32k.

Perhaps we can do with a single flag bit, REC_OFFS_EXTERNAL = 0x8000? The REC_OFFS_SQL_NULL flag could be replaced with a magic value, such as 0x8000. We know that REC_OFFS_EXTERNAL can only be set if index->is_primary(), and hence we also know that the start offset of externally stored fields must be at least 6+7 (the combined size of the system columns DB_TRX_ID and DB_ROLL_PTR). For the rare case where a NULL value is updated in place to a NOT NULL value, or vice versa, for a ROW_FORMAT=REDUNDANT record, we can use rec_…_old() accessor functions directly. Similarly, in 10.3+, REC_OFFS_DEFAULT would no longer be a flag, but we could use a magic constant value 0x8001. Anything in the range of 0x8000 and 0x800c would be distinguishable from REC_OFFS_EXTERNAL.

As far as I understand, the following changes are yet to be made:

  • In non-debug builds, make rec_offs_set_n_alloc() no-op and omit the corresponding element of the offsets header. The new first element of offsets would be rec_offs_n_fields().
  • Remove the heap parameter of rec_get_offsets(), and possibly adjust some mem_heap_create() or mem_heap_alloc() calls accordingly.
Comment by Eugene Kosov (Inactive) [ 2019-11-25 ]

I have refactored my patch so it now uses uppers bits of the offset differently. Every offset has 2 bits for flags. It can contain 4 different values. My patch uses it to store offset type (normal, off-page, NULL). One more type INSTANTLY ADDED can easily be added while merging patch to 10.3

Comment by Marko Mäkelä [ 2019-11-25 ]

Thanks, this looks better. Still, I’d like to see how we would set the flags in the offsets header in the 10.3 version. We should have more bits available in that header field; the maximum extra_size should be around 2005 bytes, which fits in 11 bits. So, we should have 5 flag bits at our disposal.

Also, I would like you to remove the heap parameter of rec_get_offsets().

Comment by Marko Mäkelä [ 2019-11-25 ]

Please file a separate 10.5 task for removing the heap parameter of rec_get_offsets(). It could be a too big change for GA versions.

Comment by Matthias Leich [ 2019-12-03 ]

Results of RQG testing on bb-10.2-MDEV-20950-stack-offsets commit 1f1bfc025505a9444333f3202795148b198cae92
a) Grammar is a derivate of table_stress_innodb.yy  (excessive concurrent DDL and DML but neither partitioning nor foreign keys nor versioning)
     1475 finished tests,
     1470 * pass
      5 * mysqld: /storage/innobase/handler/handler0alter.cc:5111: bool prepare_inplace_alter_table_dict(Alter_inplace_info*, const TABLE*, const TABLE*, const char*, ulint, ulint, ulint, bool, bool): Assertion `user_table->get_ref_count() == 1 || ctx->online' failed.  hit.
       The statement mentioned in server error log was all time like
        ALTER TABLE t1 ADD FULLTEXT INDEX IF NOT EXISTS `<name1>` ( col_text ), ADD FULLTEXT INDEX IF NOT EXISTS `<name2>` ( col_text )           (name1 != name2)
        and affecting the column col_text TEXT.
        It is likely that other threads run DDL affecting col_text or around FULLTEXT too.
      There are similar bug reports in JIRA but none of them is open.  I tend to guess that the assert above is in actual 10.2 too.
b) RQG tests for broad range coverage
     1172 finished tests
      1161 * pass
     11* fail     These bugs are already in JIRA.
      

Comment by Etienne Guesnet [ 2020-01-30 ]

The way this task was resolved conflicts with a system typedef in AIX. offset_t is already defined.

Can I suggest to use a more specific name? E.g. inno_offset_t.
I have open an issue MDEV-21595.

Generated at Thu Feb 08 09:03:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.