[MDEV-31949] slow parallel replication of user xa Created: 2023-08-18 Updated: 2024-02-06 |
|
| Status: | In Review |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.5, 10.6, 10.9, 10.10, 10.11, 11.0, 11.1, 11.2, 11.3 |
| Fix Version/s: | 10.5, 10.6, 10.11, 11.1, 11.2 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Andrei Elkin | Assignee: | Kristian Nielsen |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | triage | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
With some type of load, such as one statement "light" transactions, the user xa may perform much poorer on the optimistic parallel slave than an equivalent normal - BEGIN..COMMIT - load. This ticket aims at fixing the slowness while the XA high-availability is to be preserved. The whole work is arranged in two parts: |
| Comments |
| Comment by Kristian Nielsen [ 2023-08-22 ] | |
|
I'm trying to follow what the underlying issue here is... | |
| Comment by Marko Mäkelä [ 2023-08-23 ] | |
|
knielsen, according to vlad.lesin’s analysis with offcputime from https://github.com/iovisor/bcc/ (which involved rebuilding some libraries with -fno-omit-frame-pointer so that we can get valid stack traces), the function Xid_apply_log_event::do_apply_event(rpl_group_info*) is waiting 34.46% of the execution time when the workload uses XA PREPARE and XA COMMIT. With a similar workload that uses regular transactions, the wait time in that function would be 1.85%. | |
| Comment by Andrei Elkin [ 2023-08-24 ] | |
|
knielsen, there's no good written summary yet, I am close to compose one. In short the slowness must be caused by skew of xa transactions load distribution to parallel works. At this point I've found a way to subject XAP and XAC distributions to round-robin (to round-robin XAP alone and assign the following XAC to XAP's worker is much simpler but the skew remains and to my benchmarking the performance was still not good). Discussions started on slack 'cos it dealt with a customer case, their data in particular which we used to emulate their load etc. Myself I need to learn how to release at some checkpoints even raw results of my personal r&d process. | |
| Comment by Andrei Elkin [ 2023-08-24 ] | |
|
After exercising a finally constructed (but yet fully stable yet) round-robin XAP, XAC I definitely see both a grown performance and scale up with the number of worker. | |
| Comment by Andrei Elkin [ 2023-09-19 ] | |
|
After reviewing on a poc level, I am taking it back to full completion. | |
| Comment by Roel Van de Paar [ 2023-09-26 ] | |
|
Status update: patch nearing completion, 10.6 rebase next by developers. Also commenced preliminary runs against patch branch. Created | |
| Comment by Roel Van de Paar [ 2023-09-26 ] | |
|
Created MENT-1958 safe_mutex: Found wrong usage of mutex 'flush_list_mutex' and 'mutex' | |
| Comment by Roel Van de Paar [ 2023-09-28 ] | |
|
Created | |
| Comment by Roel Van de Paar [ 2023-10-04 ] | |
|
Created | |
| Comment by Roel Van de Paar [ 2023-10-13 ] | |
|
Created | |
| Comment by Andrei Elkin [ 2023-10-14 ] | |
|
Roel, while bb-10.6-MDEV-31949 is ready for retesting, I am squashing lately collected commits into one and also invite Kristian Nielsen for external review once that's done (by the end of today). Considering the customer pressure we need to focus fully on this case, and prepare ourselves to retesting after KN's notes (if any) too. | |
| Comment by Andrei Elkin [ 2023-10-18 ] | |
|
Howdy Kristian, it'd be great to have your note of the patch. As we exchanged on Zulip its current 10.6-based branch may not be one for the CS. Also feel free to ignore the top commit whose test's failure If it suits you well, exchanges about the branch would be good have via github and naturally Zulip. Looking forward to hear from you soon, Andrei | |
| Comment by Kristian Nielsen [ 2023-10-22 ] | |
|
Review sent: | |
| Comment by Roel Van de Paar [ 2023-11-21 ] | |
|
Status update: patch development continues, now in progress in MDEV-32830 which is a pre-req for this ticket. That ticket is in review by KN & early tests are being run. | |
| Comment by Andrei Elkin [ 2023-11-24 ] | |
|
The part 1 of the description is at review. | |
| Comment by Andrei Elkin [ 2023-11-28 ] | |
|
BB has showed just one rpl.rpl_parallel_optimistic_xa_lsu_off failure on binlog-less slave. | |
| Comment by Andrei Elkin [ 2023-12-08 ] | |
|
Michael Widenius, there're a good news! I finally got a pettry good performance with one little hunk (to be added) to bb-10.6-MDEV-31949. Performance just rocks now. | |
| Comment by Andrei Elkin [ 2023-12-15 ] | |
|
bb-10.6-MDEV-31949 is updated into close-to-review-ready branch. The branch structure is of three I-III parts each consists with 2-3 commits dedicated to a specific "big enough to keep it separate" issue. | |
| Comment by Marko Mäkelä [ 2023-12-16 ] | |
|
Note: Unlike normal ROLLBACK, XA ROLLBACK must really be durably written, because there already must have been a preceding durable write of a XA PREPARE. Non-XA transactions are not written to the binlog before commit, so for normal ROLLBACK there is no need for any durability. InnoDB will simply roll back any recovered incomplete transactions that did not reach the XA PREPARE state. The only mentions about durability that I was able to find in the commit messages were about XA PREPARE and XA COMMIT. The current commit messages mostly say ‘what’ was changed but not ‘why’. I would like to see a clear description how durability is guaranteed and recovery will work for all of XA PREPARE, XA ROLLBACK, and XA COMMIT in various scenarios. This should also cover a clear description (including call stacks) of the InnoDB mini-transaction commits and the writes of the binlog events, as well as where the writes are being made durable. I did not see any tracking of the mini-transaction commit LSN among the code changes. If we are deferring some calls to durable writes, we must keep track of the LSN so that log_write_up_to(lsn, true) can be called by the time it is really needed. (Other InnoDB threads may write the buffered log records, maybe even durably.) I do not see any tests that would try to prevent InnoDB page flushing or log checkpoints from occurring. I believe that when the safe setting of sync_binlog=1 is enabled, each of these operations XA PREPARE, XA ROLLBACK, XA COMMIT must be durably written to both the binlog and the storage engine before an OK packet can be sent to the client or any replication connections. It could be helpful to make use of the interface that was implemented in | |
| Comment by Andrei Elkin [ 2023-12-16 ] | |
|
marko, good notes and questions, thanks for checking! To the how durability is guaranteed and recovery will work, that is the subject of MDEV-21469 binlog replay for XA recovery. And a measure taken in the III.2 to restore write's durability is for the current branch not to show any different with BASE in this regard. Integration of binlogging with | |
| Comment by Andrei Elkin [ 2023-12-19 ] | |
|
Howdy Kristian, bb-10.6-MDEV-31949 is ready for review for all its three parts; commits in the branch are titled with a roman number of the part. Part I I formally arrange as MDEV-32830, it's on Serg's laps now. (I am going to preserve the branch structure with the parts at accounting of any notes that are expected of course). | |
| Comment by Andrei Elkin [ 2023-12-19 ] | |
|
I have to push twice to settle down tests, a new binlog.binlog_xa_mdev-31949 particular. No functional changes. | |
| Comment by Kristian Nielsen [ 2023-12-21 ] | |
|
Let me just state once and for all: The idea to binlog first and prepare in engine later for XA prepare will not be approved. Don't spend more effort in this direction.
For these reasons, the engine prepare must come before the binlog write for XA PREPARE. Again, patches that does the opposite will be rejected in review. | |
| Comment by Andrei Elkin [ 2023-12-21 ] | |
|
knielsen, thanks you for expressing your opinion! I read in your comments both technical arguments, that I comment on once again at the end of this message, MDEV-21469 idea of recovery has its practical reasons. There are many actually, and one major one as you know or remember is I've not given up on the "conventional" ordering though and continued with experiments and analysis after having send MDEV-31949 for review. Perhaps we'll have something good out of that soon. Fingers crossed. So to the technical part, despite all scares about the STATEMENT format danger there exists only the ROW format BUG-32020 which is of a specific and familiar class of ROW format replication. Should the indexes be specified with the event, the bug must be gone. | |
| Comment by Mason Sharp [ 2024-01-05 ] | |
|
What are the next steps? knielsen are you still reviewing, or Elkin are you making changes? | |
| Comment by Andrei Elkin [ 2024-01-06 ] | |
|
masonmariadb, the whole work includes now the IVth recovery commit. Sergei is reviewing the general binlogging part I. split from this ticket as MDEV-32830, which has been refined to reflect the recovery concern raised by Kristian. | |
| Comment by Kristian Nielsen [ 2024-01-07 ] | |
|
I've started implementing MDEV-32020 to fix the XA stuff properly. | |
| Comment by Andrei Elkin [ 2024-01-08 ] | |
|
The branch is force-updated
to preserve its structure of four parts. knielsen, 82aedbe9213 combined with the upcoming Xid_log_list_event must be a recovery solution that you prefer. I'd be really glad to discuss your plan of implementation of xa-prepare deferred execution that you must've been about in the MDEV-32020 c omment. | |
| Comment by Andrei Elkin [ 2024-01-09 ] | |
|
Updated the branch in the IV recovery part/HEAD of MDEV-33168:
The current HEAD is enriched with a number of mtr tests and fixes few glitches in implementation of business logics. | |
| Comment by Andrei Elkin [ 2024-01-22 ] | |
|
Howdy julien.fritsch! The whole issue is of 3 parts: The first two were ready for review for some time. serg ] has provided some notes to the part I. There must be more to come after my reply about a week ago. knielsen did so to the part II of a previous version of the patch and those of his notes have been accounted in the current branch. He also strongly suggested to change the part 3's plan from the |