Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30720

Expert feedback on changing the type of rec_offs typedef and its impact on the code

Details

    Description

      thejaka posted this as a comment, but it deserves its own story to properly tackle. Thejaka’s comment is reprinted below-

      When analyzing the performance issue, one thing we noticed was higher rate of TLB cache misses. The profiling data showed that rec_get_offset_func was the bottleneck compared to 10.2 code. A noticeable difference between 10.2 code and 10.4 code is the type change of rec_offs type. In 10.2 it is ulint and in 10.2 unsigned short int. In 64 bit binaries ulint takes 8 bytes and unsigned short int is 2 bytes.

      Further, in C/C++ applications memory alignment is a popular problem, especially, when it comes to Intel processors like Xeon. In addition, the type change is also present in 10.2.43 which shows performance variations just like 10.4. We want to experiment whether the performance variation is due to the type change of rec_offs type. To experiment this, we would like to change the typedef and run few queries we identified. 

      E.g,,

      -typedef unsigned short int rec_offs;

      +typedef ulint rec_offs; 

      OR

       -typedef unsigned short int rec_offs;

      +typedef uint32_t rec_offs; 

      However, we would like to understand the full impact of this change.

      Specifically, we have the following questions:

      Does this change affect persisted data?

      Can we make a such change without modifying the upgrade logic?

      If this change breaks anything, can mtrs catch it? OR are there any other tests we need to run to detect issues with this change?

      In my brief analysis, I figured out that the debug builds can have impact from this change, because of the following code:

      /* Length of the rec_get_offsets() header */
      static const size_t REC_OFFS_HEADER_SIZE=
      #ifdef UNIV_DEBUG
      #ifndef UNIV_INNOCHECKSUM
          sizeof(rec_t *) / sizeof(rec_offs) +
          sizeof(dict_index_t *) / sizeof(rec_offs) +
      #endif /* UNIV_INNOCHECKSUM */
      #endif /* UNIV_DEBUG */
          2;

      For debug builds rec_offs header size is longer than 2. If we change the rec_offs type, the header size will become shorter for debug builds. I am not sure whether that matters, probably it does but yet to find that code. Seems release builds are not affected. Further, we mostly utilize compact row format and persisted data seems to be not affected by the type change.

      We would like to get your expert feedback on changing the type of rec_offs typedef and its impact on the code.

      Attachments

        Issue Links

          Activity

            I think that before spending time on evaluating whether the format change made in MDEV-20950 should be reverted, we need to fix MDEV-30567. As far as I can see, MDEV-20950 introduced some conditional branches where simple bitwise arithmetics would have worked. Fixing that is the subject of MDEV-30567.

            In debug builds, rec_offs_validate() checks that the offsets have been computed for the specified record and index. To facilitate this, the pointers to the record and index were stored in one element of the offsets array, each. Now that the underlying type no longer is size_t but essentially uint16_t, each pointer is being stored in 2 or 4 elements.

            This change has no impact on the persistent storage format. The maximum length of a record should be 16383 bytes, even with innodb_page_size=64k.

            marko Marko Mäkelä added a comment - I think that before spending time on evaluating whether the format change made in MDEV-20950 should be reverted, we need to fix MDEV-30567 . As far as I can see, MDEV-20950 introduced some conditional branches where simple bitwise arithmetics would have worked. Fixing that is the subject of MDEV-30567 . In debug builds, rec_offs_validate() checks that the offsets have been computed for the specified record and index. To facilitate this, the pointers to the record and index were stored in one element of the offsets array, each. Now that the underlying type no longer is size_t but essentially uint16_t , each pointer is being stored in 2 or 4 elements. This change has no impact on the persistent storage format. The maximum length of a record should be 16383 bytes, even with innodb_page_size=64k .

            I think that we must do the following before considering any changes to the storage format of rec_offs:

            1. Identify and remove redundant calls to rec_get_offsets(). One such case is when we are reading the child page number after page_cur_search_with_match() had already computed and thrown away the offsets.
            2. Implement a forward iterator of index record fields and use it instead of invoking rec_get_offsets(). One place where this could be useful is page_cur_search_with_match() when not all columns of a multi-column key need to be compared.
            marko Marko Mäkelä added a comment - I think that we must do the following before considering any changes to the storage format of rec_offs : Identify and remove redundant calls to rec_get_offsets() . One such case is when we are reading the child page number after page_cur_search_with_match() had already computed and thrown away the offsets. Implement a forward iterator of index record fields and use it instead of invoking rec_get_offsets() . One place where this could be useful is page_cur_search_with_match() when not all columns of a multi-column key need to be compared.

            People

              marko Marko Mäkelä
              midenok Aleksey Midenkov
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.