Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33219

Optimize handler::binlog_log_row usage

    XMLWordPrintable

Details

    Description

      MDEV-16329 adds a little per-row overhead related to the handler::binlog_log_row function.

      1. A new function call.
      2. what's more important, another double dispatch+branch table->s->online_alter_binlog != 0

      A patch binlog_log_row (1).diff has been provided by monty however the impact has been put in doubt.
      A few tests have been made to measure the performance impact of the original version and the supposed improvement.

      Build options: gcc 13.2, DBUG=ON, WSREP=OFF, ASAN=0, -O3, -flto, -fuse-ld=mold

      Note: rows_changed++ is included into the scope of measurement to avoid the conflicts during patching.

      1. time measurement with std::chrono::high_resolution_clock.

      commit is compared against the patched version.

      Results:
      Monty: 15ns avg
      Nikita: 20ns avg

      Also an approach was made to tune up likely/unlikely, which didn't change the situation.

      An assumption is made that 5ns appears because of serialization that could be caused by high_resolution_clock, especially if it was implemented with a system call.

      2. clock_gettime

      clock_gettime should be implemented through vdso on linux, that is, avoiding any extra synchronization.

      Results:
      Monty: 15ns avg
      Nikita: 20ns avg

      The difference is still suspiciuosly big hinting that a syncronization is been made, or a misprediction constantly happens.
      Later turned out that clock_gettime is also not helpful when a non-synchronized measurement is had to be made:
      https://bert-hubert.blogspot.com/2017/03/on-linux-vdso-and-clockgettime.html
      https://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/

      3. rdtsc

      In comparison with the rest rdtsc does not make any synchronization.
      https://bert-hubert.blogspot.com/2017/03/on-linux-vdso-and-clockgettime.html
      https://btorpey.github.io/blog/2014/02/18/clock-sources-in-linux/

      commit

      Monty: 45 cycles
      Nikita: 45 cycles

      Now we see that a cost of roughly having an extra if + function call is negligible, unless a synchronization follows.

      Note: rdtscp is supposed to be a wrong indicator, exactly since it involves some synchronization. We don't need to know how long the operation takes synchronized, but an actual impact rather, which requires the measurement to be unintrusive.

      Does the compiler generate an efficient code?

      Now let's see what assembly is actually generated by the lto-powered build. We'll look at

      objdump -d ../sql/mariadbd | grep -A 300 '_ZN7handler13ha_update_rowEPKhS1_>:'
      

      to see if the code is actually effective.

      1. Is the function inlined?

      The expected answer is yes, why would a compiler need a hint to make an efficient code to offload the cpu's prefetcher. The actual anwer is:

        fa7e03:	0f 31                	rdtsc
        fa7e05:	48 83 83 28 02 00 00 	addq   $0x1,0x228(%rbx) # rows_changed++ is placed interestingly.
        fa7e0c:	01 
        fa7e0d:	89 c0                	mov    %eax,%eax
        fa7e0f:	4c 89 ee             	mov    %r13,%rsi
        fa7e12:	48 89 df             	mov    %rbx,%rdi
        fa7e15:	48 c1 e2 20          	shl    $0x20,%rdx
      # end of rdtsc
       
        fa7e19:	48 8d 0d 80 6b fe ff 	lea    -0x19480(%rip),%rcx
        fa7e20:	48 09 c2             	or     %rax,%rdx
        fa7e23:	49 89 d6             	mov    %rdx,%r14
        fa7e26:	4c 89 e2             	mov    %r12,%rdx
        fa7e29:	e8 42 e3 ff ff       	call   fa6170 <_ZN7handler14binlog_log_rowEPKhS1_PFbP3THDP5TABLEP9Event_logP17binlog_cache_databmS1_S1_E>
        fa7e2e:	89 c5                	mov    %eax,%ebp
      

      no.
      Hinting a function with inline keyword works perfect though.

      2. Does likely/unlikely work?

      After inlining the function we get the following assembly:

        fa7b67:	0f 31                	rdtsc
        fa7b69:	48 c1 e2 20          	shl    $0x20,%rdx
        fa7b6d:	89 c0                	mov    %eax,%eax
        fa7b6f:	48 83 83 28 02 00 00 	addq   $0x1,0x228(%rbx)
        fa7b76:	01 
        fa7b77:	48 09 c2             	or     %rax,%rdx
      # end rdtsc
       
        fa7b7a:	80 bb 20 05 00 00 00 	cmpb   $0x0,0x520(%rbx) # if (row_logging)
        fa7b81:	48 89 55 80          	mov    %rdx,-0x80(%rbp) # 
        fa7b85:	0f 85 65 01 00 00    	jne    fa7cf0           # jump if != 0, good
       
      # looks like `error` comparison is skipped and moved 
      # to the pessimistic branch. Smart!
       
        fa7b8b:	48 8b 7b 10          	mov    0x10(%rbx),%rdi  # this->table
        fa7b8f:	48 8b 07             	mov    (%rdi),%rax      # table->s
        fa7b92:	48 83 b8 d8 05 00 00 	cmpq   $0x0,0x5d8(%rax) # s->online_alter_binlog
        fa7b99:	00 
        fa7b9a:	74 12                	je     fa7bae # skip second double dispatch
        fa7b9c:	48 8b 47 08          	mov    0x8(%rdi),%rax  # this->table
        fa7ba0:	48 8b 40 68          	mov    0x68(%rax),%rax # table->file
        fa7ba4:	48 39 43 68          	cmp    %rax,0x68(%rbx) # ht != file->ht
        fa7ba8:	0f 84 d2 02 00 00    	je     fa7e80 # 
        fa7bae:	45 84 ff             	test   %r15b,%r15b # if (likely(count_time))
        fa7bb1:	0f 84 80 fe ff ff    	je     fa7a37      # ...
        fa7bb7:	0f 31                	rdtsc
      

      if (row_logging) is guessed to be unlikely. Considering the common (featureless) case to be prioritized, it's good.
      However, we can see that a second double (triple?) dispatch can begin evaluating before table->s->online_alter_binlog.
      Interestingly, this is fetched from %rbx sometimes, not %rdi. As systemV abi dictates:

      2. If the class is INTEGER, the next available register of the sequence %rdi,
      %rsi, %rdx, %rcx, %r8 and %r9 is used

      Good to see that error comparison is not made for nothing in the optimistic path!

      So, this should be found in %rdi anyway.

      So it shows up that a code generated is not ideal. We may try to tune up likely/unlikely:

      -  if (unlikely(!error && table->s->online_alter_binlog && is_root_handler()))
      +  if (likely(!error) && unlikely(table->s->online_alter_binlog) && is_root_handler())
           error= online_alter_log_row(table, before_record, after_record,
                                       log_func);
      

      Now the assembly is perfect:

        fa7c57:	0f 31                	rdtsc
        fa7c59:	48 c1 e2 20          	shl    $0x20,%rdx
        fa7c5d:	89 c0                	mov    %eax,%eax
        fa7c5f:	48 83 83 28 02 00 00 	addq   $0x1,0x228(%rbx)
        fa7c66:	01 
        fa7c67:	48 09 c2             	or     %rax,%rdx
      # end rdtsc
       
        fa7c6a:	80 bb 20 05 00 00 00 	cmpb   $0x0,0x520(%rbx) # if (row_logging)
        fa7c71:	48 89 55 80          	mov    %rdx,-0x80(%rbp) #
        fa7c75:	0f 85 55 01 00 00    	jne    fa7dd0           # jump if != 0, good
       
        fa7c7b:	48 8b 7b 10          	mov    0x10(%rbx),%rdi  # table
        fa7c7f:	48 8b 07             	mov    (%rdi),%rax      # table->s
        fa7c82:	48 83 b8 d8 05 00 00 	cmpq   $0x0,0x5d8(%rax) # s->online_alter_binlog
        fa7c89:	00                                              #
        fa7c8a:	0f 85 e8 03 00 00    	jne    fa8078           # jump away if != 0
       
        fa7c90:	45 84 ff             	test   %r15b,%r15b # if (likely(count_time))
        fa7c93:	0f 84 8e fe ff ff    	je     fa7b27      # continue to exit
        fa7c99:	0f 31                	rdtsc
      

      Attachments

        Activity

          People

            monty Michael Widenius
            nikitamalyavin Nikita Malyavin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.