[MDEV-33219] Optimize handler::binlog_log_row usage Created: 2024-01-10 Updated: 2024-01-31 |
|
| Status: | Stalled |
| Project: | MariaDB Server |
| Component/s: | Data Definition - Alter Table |
| Fix Version/s: | 11.2 |
| Type: | Task | Priority: | Critical |
| Reporter: | Nikita Malyavin | Assignee: | Nikita Malyavin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | online-ddl, performance | ||
| Attachments: |
|
| Description |
|
1. A new function call. A patch binlog_log_row (1).diff Build options: gcc 13.2, DBUG=ON, WSREP=OFF, ASAN=0, -O3, -flto, -fuse-ld=mold Note: rows_changed++ is included into the scope of measurement to avoid the conflicts during patching. 1. time measurement with std::chrono::high_resolution_clock.commit is compared against the patched version. Results: Also an approach was made to tune up likely/unlikely, which didn't change the situation. An assumption is made that 5ns appears because of serialization that could be caused by high_resolution_clock, especially if it was implemented with a system call. 2. clock_gettimeclock_gettime should be implemented through vdso on linux, that is, avoiding any extra synchronization. Results: The difference is still suspiciuosly big hinting that a syncronization is been made, or a misprediction constantly happens. 3. rdtscIn comparison with the rest rdtsc does not make any synchronization. Monty: 45 cycles Now we see that a cost of roughly having an extra if + function call is negligible, unless a synchronization follows. Note: rdtscp is supposed to be a wrong indicator, exactly since it involves some synchronization. We don't need to know how long the operation takes synchronized, but an actual impact rather, which requires the measurement to be unintrusive. Does the compiler generate an efficient code?Now let's see what assembly is actually generated by the lto-powered build. We'll look at
to see if the code is actually effective. 1. Is the function inlined?The expected answer is yes, why would a compiler need a hint to make an efficient code to offload the cpu's prefetcher. The actual anwer is:
no. 2. Does likely/unlikely work?After inlining the function we get the following assembly:
if (row_logging) is guessed to be unlikely. Considering the common (featureless) case to be prioritized, it's good.
Good to see that error comparison is not made for nothing in the optimistic path! So, this should be found in %rdi anyway. So it shows up that a code generated is not ideal. We may try to tune up likely/unlikely:
Now the assembly is perfect:
|
| Comments |
| Comment by Nikita Malyavin [ 2024-01-10 ] |
|
To address the assembly issue, the patch ccf80f7dd8b (link) is made. |
| Comment by Michael Widenius [ 2024-01-24 ] |
|
The above does not address the full code in my patch. For the common insert, my suggested codes uses 1 jump if and 1 memory access to replace a call+return, 3 ifs and 7 memory accesses (edited) Also, making the function inline is not good as it causes code blot which will an affect if many users are calling insert/update/delete at the same time. Also, if we need to add more code later to binlog_row(), the current solution with inline will be even worse. I rather go with my suggested solution, at least until you show the difference with my patch and yours, without inline. |
| Comment by Michael Widenius [ 2024-01-24 ] |
|
Not convinced this is a good idea. |