With some type of load, such as one statement "light" transactions, the user xa may perform much poorer on the optimistic parallel slave than an equivalent normal - BEGIN..COMMIT - load.
This ticket aims at fixing the slowness while the XA high-availability is to be preserved.
The whole work is arranged in two parts:
1. refactoring of the user xa binary logging to facilitate slave side parallel execution MDEV-32830;
2. the slave side change of the parallel scheduler to apply Round-Robin distribution
for the user XA transactions.
Attachments
Issue Links
causes
MDEV-32257Assertion `thd->is_killed()' failed in XID_cache_element::acquired_to_recovered (opt) or acquire_xid (dbg) on XA ROLLBACK or XA COMMIT
Closed
MDEV-32347Stack smashing/looping, ASAN use-after-poison in xid_t::eq/event_xid_t::serialize, SIGSEGV in serialize_xid and Assertion `is_async_xac || thd->lex->xid->eq(thd->transaction->xid_state.get_xid())' failed in binlog_rollback_flush_trx_cache upon LOAD INDEX
Closed
MDEV-32463SIGSEGV in __memmove_avx_unaligned_erms from a memcpy in xid_t::set (sql/handler.h:896) from Gtid_log_event::Gtid_log_event
Closed
MDEV-32469MDEV-31949: MTR outcome difference: MTR testcase from MDEV-27512 crashes in BASE but not in PATCH
Closed
MDEV-32470MDEV-31949: use-after-poison in xid_t::key_length()
Closed
MDEV-33673Parallel replica threads lockup when using many XA threads in bb-10.6-MDEV-31949_ver0
Closed
is blocked by
MDEV-32257Assertion `thd->is_killed()' failed in XID_cache_element::acquired_to_recovered (opt) or acquire_xid (dbg) on XA ROLLBACK or XA COMMIT
Closed
MDEV-32272lock_release_on_prepare_try() does not release lock if supremum bit is set along with other bits set in lock's bitmap
Closed
MDEV-32347Stack smashing/looping, ASAN use-after-poison in xid_t::eq/event_xid_t::serialize, SIGSEGV in serialize_xid and Assertion `is_async_xac || thd->lex->xid->eq(thd->transaction->xid_state.get_xid())' failed in binlog_rollback_flush_trx_cache upon LOAD INDEX
Closed
MDEV-32470MDEV-31949: use-after-poison in xid_t::key_length()
Closed
MDEV-32830refactor XA binlogging for better integration with BGC/replication/recovery
The whole issue is of 3 parts:
1. refactoring of XA binloggin (MDEV-32830)
2. parallel slave XA execution performance
3. XA binlog-recovery (MDEV-33168)
The first two were ready for review for some time. serg ] has provided some notes to the part I. There must be more to come after my reply about a week ago. knielsen did so to the part II of a previous version of the patch and those of his notes have been accounted in the current branch. He also strongly suggested to change the part 3's plan from the MDEV-21649 method to the MDEV-33168 one.
The latter p.3 has been under my testing for few last days. I am about to publish it for review and QA preliminary testing: ETA 1-2 days.
Andrei Elkin
added a comment - Howdy julien.fritsch !
The whole issue is of 3 parts:
1. refactoring of XA binloggin ( MDEV-32830 )
2. parallel slave XA execution performance
3. XA binlog-recovery ( MDEV-33168 )
The first two were ready for review for some time. serg ] has provided some notes to the part I. There must be more to come after my reply about a week ago. knielsen did so to the part II of a previous version of the patch and those of his notes have been accounted in the current branch. He also strongly suggested to change the part 3's plan from the MDEV-21649 method to the MDEV-33168 one.
The latter p.3 has been under my testing for few last days. I am about to publish it for review and QA preliminary testing: ETA 1-2 days.
Updated the branch in the IV recovery part/HEAD of MDEV-33168:
82aedbe9213...a71f13489d0 HEAD -> bb-10.6-MDEV-31949 (forced update)
The current HEAD is enriched with a number of mtr tests and fixes few glitches in implementation of business logics.
Integration with Xid_log_list_event is scheduled for tomorrow.
Andrei Elkin
added a comment - - edited Updated the branch in the IV recovery part/HEAD of MDEV-33168 :
82aedbe9213...a71f13489d0 HEAD -> bb-10.6-MDEV-31949 (forced update)
The current HEAD is enriched with a number of mtr tests and fixes few glitches in implementation of business logics.
Integration with Xid_log_list_event is scheduled for tomorrow.
ff8c19e7b64...82aedbe9213 HEAD -> bb-10.6-MDEV-31949 (forced update)
to preserve its structure of four parts.
Part I,II,III remain with In-Review status, the part IV needs incorporation with a Xid_log_list_event piece being worked on by Brandon.
knielsen, 82aedbe9213 combined with the upcoming Xid_log_list_event must be a recovery solution that you prefer.
Feel free to check it soon.
It am sure it should not hold you on anymore for reviewing the part II.
I'd be really glad to discuss your plan of implementation of xa-prepare deferred execution that you must've been about in the MDEV-32020 c omment.
Having your design presented may help me to understand what you mean in 'proper'. I pointed it a number of times already that
that bug case itself relates to the XA replication framework rather remotely. Much more tightly it deals with freaking specifics that unique key absent Rows events have manifested in various ways. It is resolvable within few hours by standard means. While
the deferring XA transaction execution may be a solution too, I do feel uncomfortable about that, especially when it's rather clear that the deferring method must ensue a number of design hurdles esp recovery related.
Andrei Elkin
added a comment - - edited The branch is force-updated
ff8c19e7b64...82aedbe9213 HEAD -> bb-10.6-MDEV-31949 (forced update)
to preserve its structure of four parts.
Part I,II,III remain with In-Review status, the part IV needs incorporation with a Xid_log_list_event piece being worked on by Brandon.
knielsen , 82aedbe9213 combined with the upcoming Xid_log_list_event must be a recovery solution that you prefer.
Feel free to check it soon.
It am sure it should not hold you on anymore for reviewing the part II.
I'd be really glad to discuss your plan of implementation of xa-prepare deferred execution that you must've been about in the MDEV-32020 c omment.
Having your design presented may help me to understand what you mean in 'proper'. I pointed it a number of times already that
that bug case itself relates to the XA replication framework rather remotely. Much more tightly it deals with freaking specifics that unique key absent Rows events have manifested in various ways. It is resolvable within few hours by standard means. While
the deferring XA transaction execution may be a solution too, I do feel uncomfortable about that, especially when it's rather clear that the deferring method must ensue a number of design hurdles esp recovery related.
masonmariadb, the whole work includes now the IVth recovery commit. Sergei is reviewing the general binlogging part I. split from this ticket as MDEV-32830, which has been refined to reflect the recovery concern raised by Kristian.
The part II is on Kristian, I'll notify him once the part IV is finally readied for review by Sergei and him (ETA: Mon).
The part III is about few Innodb changes that Marko needs look at, also when the IV is out.
Andrei Elkin
added a comment - masonmariadb , the whole work includes now the IVth recovery commit. Sergei is reviewing the general binlogging part I. split from this ticket as MDEV-32830 , which has been refined to reflect the recovery concern raised by Kristian.
The part II is on Kristian, I'll notify him once the part IV is finally readied for review by Sergei and him (ETA: Mon).
The part III is about few Innodb changes that Marko needs look at, also when the IV is out.
Howdy julien.fritsch!
The whole issue is of 3 parts:
1. refactoring of XA binloggin (MDEV-32830)
2. parallel slave XA execution performance
3. XA binlog-recovery (MDEV-33168)
The first two were ready for review for some time. serg ] has provided some notes to the part I. There must be more to come after my reply about a week ago. knielsen did so to the part II of a previous version of the patch and those of his notes have been accounted in the current branch. He also strongly suggested to change the part 3's plan from the
MDEV-21649method to the MDEV-33168 one.The latter p.3 has been under my testing for few last days. I am about to publish it for review and QA preliminary testing: ETA 1-2 days.