Details
-
New Feature
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
None
Description
When doing server recovery, the active transactions will be rolled
back by InnoDB background rollback thread automatically. The
prepared transactions will be committed or rolled back accordingly
by binlog recovery. Binlog recovery is done in main thread before
the server can provide service to users. If there is a big
transaction to rollback, the server will not available for a long
time.
It is better to make the prepared transactions to be rolled back by the background rollback thread.
Attachments
- MDEV-33853.pdf
- 30 kB
- MDEV-33853C.pdf
- 37 kB
Issue Links
- causes
-
MDEV-35265 wsrep.wsrep-recover, wsrep.wsrep-recover-v25 fail on assertion !wsrep_is_wsrep_xid(&trx->xid)
-
- Closed
-
- relates to
-
MDEV-34909 DDL during SET GLOBAL innodb_log_file_size may hang when using PMEM
-
- Closed
-
Activity
danblack howdy! The PR is approved, so the rest must be done on your side as I gather.
Thanks Elkin, yep much appreciate the review, and I or TheLinuxJedi will handle the merge (once 11.6 branch exists).
danblack, recently marko suggested to regard this work as bugfixes and it has been agreed on a server lead meeting to fix it starting from 10.6 with introduction of a server option which would be OFF by default until 11.6. That would let us test the new recovery mode internally by mtr rather extensively.
I am taking this part on myself, if no objection from you or libing.
Elkin, wasn't the suggestion from marko a compiler option? From my point of view a server option is not the same
I have suggested a Boolean cmake configuration option for enabling this logic, similar to (say) WITH_WSREP. We would disable this optimization at compilation time by default in 10.6, but we could easily enable this option when compiling code for internal testing. The compile-time option could be enabled by default in the next major development version, such as 11.6.
There are at least the following reasons why introducing any run-time parameters can be a bad idea:
- Conditional branches can incur a considerable overhead on superscalar processors. (OK, this one could be a special case if the condition does not need to be tested after server startup.)
- There is no clear, agreed-upon procedure for deprecating and removing runtime configuration parameters. See for example
MDEV-32917, MDEV-33958,MDEV-30332.
Ouch, someone needs a reality check here.
This is a new feature which is a performance-improvement only.
There's no way this is appropriate for a GA release, let alone 10.6.
Thanks.
- Kristian.
knielsen, with all respect, I thought that it would be acceptable to make some changes that are disabled at compilation time and can be easily identified as such by the use of some #ifdef. We do apply some performance fixes in GA releases. I think that is just a matter of an adequate amount of testing. For frequently executed low-level code, it is easy to achieve reasonable coverage even in our regression test suite. For this change, it is more challenging. I was thinking that this could be covered by some existing InnoDB stress testing where the binlog is enabled. Because most of that testing takes place in the currently latest GA branches 10.6 and 10.11, it seemed reasonable to me to enable the functionality in the internal builds (not on any CI system; this feature would have to be explicitly enabled at compilation time). I do not think that an adequate level of testing can be achieved by letting a CI system run the regression test suite. Commits to the latest development branches are relatively rare, because they run per-push, and most git push are bug fixes to earlier branches.
knielsen, I am to blame on distorting Marko's idea with my light-headed "plan". A 10.6 conditional build feature is not something we do regularly, but perhaps is not something really unacceptable.
Hi Marko and Andrei,
I switched the base branch to 10.6 and added option WITH_ASYNC_BINLOG_ROLLBACK for compiling the patch. If WITH_ASYNC_BINLOG_ROLLBACK is enabled, '-async-binlog-rollback' suffix is added into version variable. Thus it can skip the test case if the patch is not compiled.
This ticket requires InnoDB, replication and performance testing.
mleich is doing the InnoDB part, Roel the replication part, and axel can you please take care of the performance test. Thank you
11.6 Feature branch: https://github.com/SongLibing/mariadb-server/tree/binlog_recover_async_rollback
git clone --recurse-submodules -j8 --depth=1 --branch=binlog_recover_async_rollback https://github.com/SongLibing/mariadb-server/ binlog_recover_async_rollback
|
Before knielsen’s comment in 2024-05-03 I would have hoped that we could have tested this in an earlier maintained branch, to avoid some significant and easily foreseeable friction due to our workflow. The basic problem is that the 11.6 branch is based on a 11.5 branch from some 3 months ago. There have been some merges from some earlier branches, but as far as I can see, more than 20 InnoDB changes from earlier branches such as 10.5, 10.6, and 10.11 are missing.
Now that we are close to completing the quarterly releases, there exists a bb-11.5-release branch that is fairly reasonably up to date (but not a final merge yet). I created a new branch bb-11.6-MDEV-33853 for testing. Locally, one test is failing, but I do not think that it is related to these changes:
bb-11.6-MDEV-33853 abea3ec5673d5eebc12118e2a84d93c28b0eea2c |
rpl.purge_binlog 'row' w44 [ fail ]
|
Test ended at 2024-08-05 13:22:44
|
|
CURRENT_TEST: rpl.purge_binlog
|
mysqltest: At line 13: query 'create table t1 (a int, b varchar(32768))' failed: ER_TOO_BIG_FIELDLENGTH (1074): Column length too big for column 'b' (max = 16383); use BLOB or TEXT instead
|
|
The result from queries just before the failure was:
|
include/master-slave.inc
|
[connection master]
|
#
|
# MDEV-34504 PURGE BINARY LOGS not working anymore
|
#
|
select @@slave_connections_needed_for_purge;
|
@@slave_connections_needed_for_purge
|
0
|
set @old_dbug= @@global.debug_dbug;
|
create table t1 (a int, b varchar(32768));
|
Based on another failure, this could be related to a change to the default collation that was made in MDEV-19123:
bb-11.6-MDEV-33853 abea3ec5673d5eebc12118e2a84d93c28b0eea2c |
CURRENT_TEST: innodb.import_bugs
|
--- /mariadb/11/mysql-test/suite/innodb/r/import_bugs.result 2024-08-05 13:08:40.914658348 +0300
|
+++ /mariadb/11/mysql-test/suite/innodb/r/import_bugs.reject 2024-08-05 13:28:21.437615420 +0300
|
@@ -89,7 +89,7 @@
|
`c1` int(11) NOT NULL,
|
`c2` int(11) DEFAULT NULL,
|
PRIMARY KEY (`c1`)
|
-) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
|
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_uca1400_ai_ci
|
DROP TABLE t2, t1;
|
# End of 10.5 tests
|
#
|
|
Result length mismatch
|
mleich, please test if this branch is any better.
I finished replication testing both the original branch, as well as the new one provided by Marko. Note no recovery testing was done, as this was already included in Matthias' testing.
Though a number of bugs were found, everything was present in base versions also, and nothing new related to this ticket was found.
There were however a number of shutdown timeout issues accross the runs, which remain in testcase reduction, as they have proven to be very sporadic and hard to reduce.
They are not frequent enough imho to highlight a major new problem within the branches, and almost all usual runs show some shutdown issues.
However, if it is suspected that shutdown timeouts (i.e. hanging mariadb-admin shutdown) are a possiblity due to specific code updates, please let me know and I'll review them more closely.
UPDATE: a good number of the shutdown issue were confirmed not to be related to these branches.
I have updated the branch bb-11.6-MDEV-33853 that is now the main branch (11.7) with the pull request merged to it. Edit: I merged some recent InnoDB fixes (such as MDEV-34750) to 11.6 and main and updated this branch.
I have run a general InnoDB mixed workload performance test on commit e5145b22629 in branch bb-11.6-MDEV-33853 and its predecessor commit 9811d23b6d0. The binlog was enabled in all tests; once async and once sync:
The only worrying result is for t_oltp_writes_innodb (OLTP write-only) with sync binlog. There are also differences in t_oltp_full_innodb (OLTP read/write) but they are in favor of MDEV-33853. And for t_oltp_insert_innodb_batched (10x INSERT per trx) - those are probably bogus. The numbers are generally quite unstable for async binlog. I will repeat the test to see if it's reproducible ...
The second run of the general InnoDB tests completed. I put all results in one plot:
It turns out, that for the workload in t_oltp_full_innodb we have a high likelihood that MDEV-33853 is indeed faster with async binlog. The differences for t_oltp_insert_innodb_batched seem also seem to be real. But they concern just 2 thread counts. For 32 threads MDEV-33853 is slower and for 64 threads it's faster. So overall it's a draw.
All other differences turned out bogus. So from that point of view MDEV-33853 is ok.
axel, thank you for testing this. If I understood it correctly, you are running Sysbench workloads that do not involve restarting the server, nor any XA transactions for that matter. For that kind of test scenario, I don’t think that this code change can make any difference. The InnoDB changes are strictly limited to server startup, and I guess so are the changes to xarecover_do_commit_or_rollback() and xarecover_complete_and_count().
That is, the observed differences ought to be due to random noise, or possibly due to slightly changed code layout (the number of MMU pages that the busy part of the executable code is residing on).
Thanks marko. I have not yet a test case for measuring server startup time. I thought of running a sysbench prepare job and killing the server in between. Then save the resulting datadir and use it for starting different server builds.
But if this reqires the transactions to be XA, then it will not work out of the box. Sysbench does not use XA:
Thank you, https://github.com/MariaDB/server/pull/3030 now looks almost OK to me, with a couple of minor changes needed. I think that knielsen and Elkin need to review the changes outside InnoDB.