Details

    Description

      Now offsets have stack part and heap part which is used when stack part can not hold that big amount of fields in a table. `mem_heap_t` is used to very wide tables. And the idea is to replace this dichotomy with stack only wider array with optimized offset size: uint16_t instead of ulint.

      Attachments

        Issue Links

          Activity

            The maximum number of index fields varies as follows:

            • Clustered index non-leaf pages: At most 17 (up to 16 fields of PRIMARY KEY and a child page number)
            • Clustered index leaf pages: 1021 or 1024
            • Secondary index non-leaf pages: 16+16+1 (up to 16 fields of KEY and 16 fields of PRIMARY KEY) and a child page number
            • Secondary index leaf pages: 16+16

            Also, in some contexts, such as accessing DB_TRX_ID or DB_ROLL_PTR, we will only need a prefix of the clustered index leaf page record fields.

            I hope that we can avoid allocating 1024*2 bytes for the offsets in many cases.

            marko Marko Mäkelä added a comment - The maximum number of index fields varies as follows: Clustered index non-leaf pages: At most 17 (up to 16 fields of PRIMARY KEY and a child page number) Clustered index leaf pages: 1021 or 1024 Secondary index non-leaf pages: 16+16+1 (up to 16 fields of KEY and 16 fields of PRIMARY KEY ) and a child page number Secondary index leaf pages: 16+16 Also, in some contexts, such as accessing DB_TRX_ID or DB_ROLL_PTR , we will only need a prefix of the clustered index leaf page record fields. I hope that we can avoid allocating 1024*2 bytes for the offsets in many cases.

            This is one possible solution. https://godbolt.org/z/mQ9kTj

            Accessing some offset is 2 pointer dereferences unless compiler will optimize it e.g. inside a loop.

            Design notes:

            • Only caller creates offsets either on stack or on heap.
            • Caller should have an ability to create offsets on a heap, because of possible long recursion on FK checks.
            kevg Eugene Kosov (Inactive) added a comment - This is one possible solution. https://godbolt.org/z/mQ9kTj Accessing some offset is 2 pointer dereferences unless compiler will optimize it e.g. inside a loop. Design notes: Only caller creates offsets either on stack or on heap. Caller should have an ability to create offsets on a heap, because of possible long recursion on FK checks.
            marko Marko Mäkelä added a comment - - edited

            I mostly liked your changes so far. For innodb_page_size=64k, you must check the actual maximum record size of ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC records. I am rather sure that it is more than 16383 bytes, which you are assuming. Extrapolating your changes to 10.3, which introduces the REC_OFFS_DEFAULT bit, it looks like the maximum record size would shrink to 8191 bytes, which would become a problem also with innodb_page_size=32k.

            Perhaps we can do with a single flag bit, REC_OFFS_EXTERNAL = 0x8000? The REC_OFFS_SQL_NULL flag could be replaced with a magic value, such as 0x8000. We know that REC_OFFS_EXTERNAL can only be set if index->is_primary(), and hence we also know that the start offset of externally stored fields must be at least 6+7 (the combined size of the system columns DB_TRX_ID and DB_ROLL_PTR). For the rare case where a NULL value is updated in place to a NOT NULL value, or vice versa, for a ROW_FORMAT=REDUNDANT record, we can use rec_…_old() accessor functions directly. Similarly, in 10.3+, REC_OFFS_DEFAULT would no longer be a flag, but we could use a magic constant value 0x8001. Anything in the range of 0x8000 and 0x800c would be distinguishable from REC_OFFS_EXTERNAL.

            As far as I understand, the following changes are yet to be made:

            • In non-debug builds, make rec_offs_set_n_alloc() no-op and omit the corresponding element of the offsets header. The new first element of offsets would be rec_offs_n_fields().
            • Remove the heap parameter of rec_get_offsets(), and possibly adjust some mem_heap_create() or mem_heap_alloc() calls accordingly.
            marko Marko Mäkelä added a comment - - edited I mostly liked your changes so far. For innodb_page_size=64k , you must check the actual maximum record size of ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC records. I am rather sure that it is more than 16383 bytes, which you are assuming. Extrapolating your changes to 10.3, which introduces the REC_OFFS_DEFAULT bit, it looks like the maximum record size would shrink to 8191 bytes, which would become a problem also with innodb_page_size=32k . Perhaps we can do with a single flag bit, REC_OFFS_EXTERNAL = 0x8000 ? The REC_OFFS_SQL_NULL flag could be replaced with a magic value, such as 0x8000 . We know that REC_OFFS_EXTERNAL can only be set if index->is_primary() , and hence we also know that the start offset of externally stored fields must be at least 6+7 (the combined size of the system columns DB_TRX_ID and DB_ROLL_PTR ). For the rare case where a NULL value is updated in place to a NOT NULL value, or vice versa, for a ROW_FORMAT=REDUNDANT record, we can use rec_…_old() accessor functions directly. Similarly, in 10.3+, REC_OFFS_DEFAULT would no longer be a flag, but we could use a magic constant value 0x8001 . Anything in the range of 0x8000 and 0x800c would be distinguishable from REC_OFFS_EXTERNAL . As far as I understand, the following changes are yet to be made: In non-debug builds, make rec_offs_set_n_alloc() no-op and omit the corresponding element of the offsets header. The new first element of offsets would be rec_offs_n_fields() . Remove the heap parameter of rec_get_offsets() , and possibly adjust some mem_heap_create() or mem_heap_alloc() calls accordingly.

            I have refactored my patch so it now uses uppers bits of the offset differently. Every offset has 2 bits for flags. It can contain 4 different values. My patch uses it to store offset type (normal, off-page, NULL). One more type INSTANTLY ADDED can easily be added while merging patch to 10.3

            kevg Eugene Kosov (Inactive) added a comment - I have refactored my patch so it now uses uppers bits of the offset differently. Every offset has 2 bits for flags. It can contain 4 different values. My patch uses it to store offset type (normal, off-page, NULL). One more type INSTANTLY ADDED can easily be added while merging patch to 10.3

            Thanks, this looks better. Still, I’d like to see how we would set the flags in the offsets header in the 10.3 version. We should have more bits available in that header field; the maximum extra_size should be around 2005 bytes, which fits in 11 bits. So, we should have 5 flag bits at our disposal.

            Also, I would like you to remove the heap parameter of rec_get_offsets().

            marko Marko Mäkelä added a comment - Thanks, this looks better. Still, I’d like to see how we would set the flags in the offsets header in the 10.3 version. We should have more bits available in that header field; the maximum extra_size should be around 2005 bytes, which fits in 11 bits. So, we should have 5 flag bits at our disposal. Also, I would like you to remove the heap parameter of rec_get_offsets() .

            Please file a separate 10.5 task for removing the heap parameter of rec_get_offsets(). It could be a too big change for GA versions.

            marko Marko Mäkelä added a comment - Please file a separate 10.5 task for removing the heap parameter of rec_get_offsets() . It could be a too big change for GA versions.
            mleich Matthias Leich added a comment - - edited

            Results of RQG testing on bb-10.2-MDEV-20950-stack-offsets commit 1f1bfc025505a9444333f3202795148b198cae92
            a) Grammar is a derivate of table_stress_innodb.yy  (excessive concurrent DDL and DML but neither partitioning nor foreign keys nor versioning)
                 1475 finished tests,
                 1470 * pass
                  5 * mysqld: /storage/innobase/handler/handler0alter.cc:5111: bool prepare_inplace_alter_table_dict(Alter_inplace_info*, const TABLE*, const TABLE*, const char*, ulint, ulint, ulint, bool, bool): Assertion `user_table->get_ref_count() == 1 || ctx->online' failed.  hit.
                   The statement mentioned in server error log was all time like
                    ALTER TABLE t1 ADD FULLTEXT INDEX IF NOT EXISTS `<name1>` ( col_text ), ADD FULLTEXT INDEX IF NOT EXISTS `<name2>` ( col_text )           (name1 != name2)
                    and affecting the column col_text TEXT.
                    It is likely that other threads run DDL affecting col_text or around FULLTEXT too.
                  There are similar bug reports in JIRA but none of them is open.  I tend to guess that the assert above is in actual 10.2 too.
            b) RQG tests for broad range coverage
                 1172 finished tests
                  1161 * pass
                 11* fail     These bugs are already in JIRA.
                  
            

            mleich Matthias Leich added a comment - - edited Results of RQG testing on bb-10.2-MDEV-20950-stack-offsets commit 1f1bfc025505a9444333f3202795148b198cae92 a) Grammar is a derivate of table_stress_innodb.yy (excessive concurrent DDL and DML but neither partitioning nor foreign keys nor versioning) 1475 finished tests, 1470 * pass 5 * mysqld: /storage/innobase/handler/handler0alter.cc:5111: bool prepare_inplace_alter_table_dict(Alter_inplace_info*, const TABLE*, const TABLE*, const char*, ulint, ulint, ulint, bool, bool): Assertion `user_table->get_ref_count() == 1 || ctx->online' failed. hit. The statement mentioned in server error log was all time like ALTER TABLE t1 ADD FULLTEXT INDEX IF NOT EXISTS `<name1>` ( col_text ), ADD FULLTEXT INDEX IF NOT EXISTS `<name2>` ( col_text ) (name1 != name2) and affecting the column col_text TEXT. It is likely that other threads run DDL affecting col_text or around FULLTEXT too. There are similar bug reports in JIRA but none of them is open. I tend to guess that the assert above is in actual 10.2 too. b) RQG tests for broad range coverage 1172 finished tests 1161 * pass 11* fail These bugs are already in JIRA.
            EGuesnet Etienne Guesnet added a comment - - edited

            The way this task was resolved conflicts with a system typedef in AIX. offset_t is already defined.

            Can I suggest to use a more specific name? E.g. inno_offset_t.
            I have open an issue MDEV-21595.

            EGuesnet Etienne Guesnet added a comment - - edited The way this task was resolved conflicts with a system typedef in AIX. offset_t is already defined. Can I suggest to use a more specific name? E.g. inno_offset_t. I have open an issue MDEV-21595 .

            People

              kevg Eugene Kosov (Inactive)
              kevg Eugene Kosov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.