To assess the impact of MDEV-28313, I repeated a quick Sysbench 8×100,000-row oltp_update_index test without MDEV-28313 and with innodb_flush_log_at_trx_commit=1.
The version column legend is the same as in the previous comments, except for the introduction of 10.9+merge of async, which is the same as patched but without the MDEV-28313 changes.
version |
20 |
40 |
80 |
160 |
320 |
640 |
10.9+merge
of async |
40062.04 |
82227.38 |
154505.53 |
149740.18 |
123871.06 |
131360.35 |
10.9 |
42809.14 |
87178.33 |
152955.76 |
151528.31 |
124043.59 |
131941.35 |
We can observe some insignificant improvement at 80 concurrent connections (which is "polluted" by the checkpoint flush that occurred during that test), and otherwise a performance regression or no improvement.
This 30-second benchmark run is too short to draw any definite conclusion. Actually the bottom line of the table is from an equivalent setup with the bottom line of the previous table.
One interesting change is that with the MDEV-28313 change included, we saw a slight improvement at 160 concurrent connections, but without it, we can observe a regression.
I reran the test in a different way (prepare+run 30 seconds with 80 clients, prepare+run 30 seconds with 160 clients) to gain some more confidence:
version |
80 |
160 |
10.9+merge
of async |
151006.08 |
154541.20 |
10.9 |
150857.26 |
158157.07 |
This time, no checkpoint flushing occurred during the 80-client run, and we see no significant improvement. The clear regression at 160 clients remained.
The counterintuitive performance regression could partly be addressed by MDEV-28313. With the test oltp_update_non_index, performance problems related to the lock-free hash table trx_sys.rw_trx_hash (MDEV-21423) should matter less:
version |
20 |
40 |
80 |
160 |
320 |
640 |
10.9+merge
of async |
38514.14 |
89237.51 |
167100.82 |
192394.20 |
189902.25 |
193034.80 |
10.9 |
42022.65 |
97957.34 |
169509.91 |
187099.23 |
191413.50 |
199397.91 |
Traversal of the entire trx_sys.rw_trx_hash table is necessary not only for checking locks on secondary indexes, but also for read view creation. Let us additionally specify --transaction-isolation=READ-UNCOMMITTED to reduce that activity (purge_sys.view must still be updated), and test it also with the MDEV-28313 improvements:
version |
20 |
40 |
80 |
160 |
320 |
640 |
patched |
38794.57 |
89714.06 |
168784.54 |
191521.04 |
189094.02 |
192025.07 |
10.9+MDEV-28313 |
41801.26 |
97290.21 |
170614.28 |
187754.89 |
196493.73 |
197833.70 |
10.9+merge
of async |
38383.13 |
89040.30 |
168254.81 |
192200.60 |
195663.06 |
193661.70 |
10.9 |
43503.02 |
98087.92 |
169159.82 |
189859.61 |
194903.15 |
199073.90 |
In this scenario with reduced activity around trx_sys.rw_trx_hash, MDEV-28313 should matter less, that is, the difference between the 2nd and 4th row should be mostly noise. However, we can still observe a consistent performance regression due to the asynchronous log writing.
We will need deeper analysis to identify the bottleneck that causes the counterintuitive performance regression. MDEV-21423 may or may not fix this. An artificial benchmark that concurrently updates a very large number of SEQUENCE objects (MDEV-10139) should completely rule out the InnoDB transaction subsystem, because operations on SEQUENCE objects only generate redo log, no undo log at all.
http://www.brendangregg.com/offcpuanalysis.html could be useful if it did not emit most call frames as "unknown" in my recent tests. I should investigate if https://github.com/iovisor/bcc/issues/3884 would fix that.
While working on
MDEV-27812, I noticed that inMDEV-14425, I accidentally removed checks that the log writes succeed (MDEV-27916). Similar error handling must be implemented also for the asynchronous log write code path.Can you try to merge this with the work-in-progress
MDEV-27812branch?