[MDEV-33668] Adapt parallel slave's round-robin scheduling to XA events - Jira

Andrei Elkin added a comment - 2024-03-13 14:46

Set myself to review Kristian's branch.

Andrei Elkin added a comment - 2024-03-13 14:46 Set myself to review Kristian's branch.

Andrei Elkin added a comment - 2024-03-13 14:48 - edited

monty, serg, knielsen Fyi.

Andrei Elkin added a comment - 2024-03-13 14:48 - edited monty , serg , knielsen Fyi.

Kristian Nielsen added a comment - 2024-03-17 23:02

Pushed a small fix to bb-10.6-~~MDEV-33668~~, taking the same fix already used in Andrei's rev0 and rev1 patches.

There was a wait_for_prior_commit() in XA_prepare_log_event::do_commit() before doing the actual XA PREPARE. This would prevent all XA PREPARE events from group-committing with any prior transactions, which would hurt performance of parallel replication of workloads with a significant fraction of XA transactions.

Kristian Nielsen added a comment - 2024-03-17 23:02 Pushed a small fix to bb-10.6- MDEV-33668 , taking the same fix already used in Andrei's rev0 and rev1 patches. There was a wait_for_prior_commit() in XA_prepare_log_event::do_commit() before doing the actual XA PREPARE. This would prevent all XA PREPARE events from group-committing with any prior transactions, which would hurt performance of parallel replication of workloads with a significant fraction of XA transactions.

Roel Van de Paar added a comment - 2024-03-19 01:12

Restarted testing on the revised bb-10.6-MDEV-33668.

Roel Van de Paar added a comment - 2024-03-19 01:12 Restarted testing on the revised bb-10.6-MDEV-33668.

Roel Van de Paar added a comment - 2024-03-19 04:38

Andrei's ver0 work (ref) includes a lot of testcase work. Will this need to be added to this patch also, and will it be the same?

Roel Van de Paar added a comment - 2024-03-19 04:38 Andrei's ver0 work ( ref ) includes a lot of testcase work. Will this need to be added to this patch also, and will it be the same?

Roel Van de Paar added a comment - 2024-03-19 05:37

These tests (rpl_xa_prepare_gtid_fail, rpl_xa_concurrent_2pc, rpl_xa_empty_transaction, rpl_parallel_xa_same_xid and binlog_xa_prepared_bugs) have proven stable in bb-10.6-MDEV-33668 when copied (inc related files) from ver0.

Roel Van de Paar added a comment - 2024-03-19 05:37 These tests ( rpl_xa_prepare_gtid_fail, rpl_xa_concurrent_2pc, rpl_xa_empty_transaction, rpl_parallel_xa_same_xid and binlog_xa_prepared_bugs ) have proven stable in bb-10.6-MDEV-33668 when copied (inc related files) from ver0.

Andrei Elkin added a comment - 2024-03-19 11:42

Roel, good point! I'll consider their relevance.

Andrei Elkin added a comment - 2024-03-19 11:42 Roel , good point! I'll consider their relevance.

Andrei Elkin added a comment - 2024-03-20 15:20 - edited

roel salve! I'm going to push fixes to stabilize rpl_xa_prepare_gtid_fail separately. Other the 31949 branches' are not relevant in the current context. E.g rpl_parallel_xa_same_xid changes in origin/bb-10.6-MDEV-31949_ver0_opt are only for "patient" repeating of duplicate xid:s while in this ticket's branch even a single retry can't occur for this reason (just as it is in the base 10.6).

Oth, a new rpl_parallel_multi_domain_xa is added to the branch with the purpose of proving
multiple gtid domain parallel applying.
Could you please look into it as we need this sort of setup tested rather aggressively as well.
I've not heard yet from axel about benchmarking results. But it all looks like this branch will be pushed as a solution to the slowness issue.

Cheers!

Andrei Elkin added a comment - 2024-03-20 15:20 - edited roel salve! I'm going to push fixes to stabilize rpl_xa_prepare_gtid_fail separately. Other the 31949 branches' are not relevant in the current context. E.g rpl_parallel_xa_same_xid changes in origin/bb-10.6- MDEV-31949 _ver0_opt are only for "patient" repeating of duplicate xid:s while in this ticket's branch even a single retry can't occur for this reason (just as it is in the base 10.6). Oth, a new rpl_parallel_multi_domain_xa is added to the branch with the purpose of proving multiple gtid domain parallel applying. Could you please look into it as we need this sort of setup tested rather aggressively as well. I've not heard yet from axel about benchmarking results. But it all looks like this branch will be pushed as a solution to the slowness issue. Cheers!

Andrei Elkin added a comment - 2024-03-20 17:10

Howdy knielsen. I've made my few notes to your branch and followed with few small commits. It's a great piece of work and, I've grown convinced we should go along with it rather than with ~~MDEV-33667~~ as a "hot-fixes" to the slowness.
While we're for Axel's benchmarking, as a blessing I have not doubts , could you please review the current total branch on your own? I only had some concern about maybe_active_xid... array size and scan search in it. The size is made under control. To my understanding the latter can't be really concerning while the number of workers is not big.

Andrei Elkin added a comment - 2024-03-20 17:10 Howdy knielsen . I've made my few notes to your branch and followed with few small commits. It's a great piece of work and, I've grown convinced we should go along with it rather than with MDEV-33667 as a "hot-fixes" to the slowness. While we're for Axel's benchmarking, as a blessing I have not doubts , could you please review the current total branch on your own? I only had some concern about maybe_active_xid... array size and scan search in it. The size is made under control. To my understanding the latter can't be really concerning while the number of workers is not big.

Kristian Nielsen added a comment - 2024-03-21 11:21

Elkin, I looked over the complete patch series, looks good to me.
I squashed some commits to simplify the commit history and force-pushed the bb-10.6-~~MDEV-33668~~ branch. I kept the preparatory/refactoring patch separate, I think it's nice to keep the difference between the refactoring and the change of logic in XA scheduling.

Agree with your comments about the size of maybe_active_xid. There will be some cost to the scanning of the array to search for XID. This could be improved using a hash table; also a more careful accounting of history could reduce the size of generations. But I'm not sure it will be a problem in practice, so maybe just go with the simple approach for now. Even if the size of the array would be significant, this only affects the SQL driver thread, and it seems unlikely that the impact would be severe enough to make it a bottleneck.

Let me know if there is something else I should do on this MDEV at this point.

- Kristian.

Kristian Nielsen added a comment - 2024-03-21 11:21 Elkin , I looked over the complete patch series, looks good to me. I squashed some commits to simplify the commit history and force-pushed the bb-10.6- MDEV-33668 branch. I kept the preparatory/refactoring patch separate, I think it's nice to keep the difference between the refactoring and the change of logic in XA scheduling. Agree with your comments about the size of maybe_active_xid. There will be some cost to the scanning of the array to search for XID. This could be improved using a hash table; also a more careful accounting of history could reduce the size of generations. But I'm not sure it will be a problem in practice, so maybe just go with the simple approach for now. Even if the size of the array would be significant, this only affects the SQL driver thread, and it seems unlikely that the impact would be severe enough to make it a bottleneck. Let me know if there is something else I should do on this MDEV at this point. - Kristian.

Andrei Elkin added a comment - 2024-03-21 18:30 - edited

monty (knielsen), there's more clear picture about performance done by axel. I am pasting the comparison table between few branches investigated.
Optimistic parallel mode slave is run with --log-bin and --log-slave-updates.

Variant A Variant B Variant C Variant D
-------------------------------------------------------------------------------------

#thd tps #thd tps #thd tps #thd tps
1 2150 1 2092 1 2109 1 2217
2 2226 2 2395 2 2503 2 2845
4 2572 4 3434 4 3380 4 3851
8 3024 8 5180 8 4419 8 5560
16 3649 16 7801 16 5746 16 8171
32 4358 32 10628 32 7852 32 11011
64 5107 64 15094 64 9323 64 14565


A: mariadb-10.6-567c0973591, branch 10.6
B: mariadb-10.6-093efc509b1 branch bb-10.6-MDEV-31949_ver0_opt
C: mariadb-10.6-0dc13677f3d, branch bb-10.6-MDEV-33668
D: mariadb-10.6-1bdc9bc8077, branch bb-10.6-MDEV-31949

(bb-10.6-MDEV-31949_ver0_opt should've been renamed to bb-10.6-MDEV-33667).
As you can see with 4 workers bb-10.6-MDEV-33668 is better than 10.6 by about 30%,
with 8 workers by about 50%. This is somewhat worse than my "capsule" benchmarking showed.

Andrei Elkin added a comment - 2024-03-21 18:30 - edited monty ( knielsen ), there's more clear picture about performance done by axel . I am pasting the comparison table between few branches investigated. Optimistic parallel mode slave is run with --log-bin and --log-slave-updates. Variant A Variant B Variant C Variant D ------------------------------------------------------------------------------------- #thd tps #thd tps #thd tps #thd tps 1 2150 1 2092 1 2109 1 2217 2 2226 2 2395 2 2503 2 2845 4 2572 4 3434 4 3380 4 3851 8 3024 8 5180 8 4419 8 5560 16 3649 16 7801 16 5746 16 8171 32 4358 32 10628 32 7852 32 11011 64 5107 64 15094 64 9323 64 14565 A: mariadb-10.6-567c0973591, branch 10.6 B: mariadb-10.6-093efc509b1 branch bb-10.6-MDEV-31949_ver0_opt C: mariadb-10.6-0dc13677f3d, branch bb-10.6-MDEV-33668 D: mariadb-10.6-1bdc9bc8077, branch bb-10.6-MDEV-31949 ( bb-10.6-MDEV-31949_ver0_opt should've been renamed to bb-10.6-MDEV-33667 ). As you can see with 4 workers bb-10.6- MDEV-33668 is better than 10.6 by about 30%, with 8 workers by about 50%. This is somewhat worse than my "capsule" benchmarking showed.

Roel Van de Paar added a comment - 2024-03-22 22:56

Testing has been restarted on the yesterday updated bb-10.6-MDEV-33668 branch

Roel Van de Paar added a comment - 2024-03-22 22:56 Testing has been restarted on the yesterday updated bb-10.6-MDEV-33668 branch

Brandon Nesterenko added a comment - 2024-04-03 13:51 - edited

Hi knielsen, monty!

I looked over the patch, and have the following thoughts. 2-4 are quite minor (just for consideration), though I think it would be good to address the first point (and it wouldn't impact Roel's testing).

1) The test doesn't test a pretty big aspect of the patch, generations. It'd be good to wrap that main loop with another loop that separates out XAP from XAC/XAR by <slave_parallel_workers>*<index_of_new_loop> transactions for 0-4 generations.
2) The granularity of generation seems like it can overzealously fix the scheduling for transactions in the last vulnerable generation - can we infer it based on sub_id? I imagine that may allow a little more parallelism
3) The freeze_size(&maybe_active_xid) condition can be done after the whole event group has been queued so we don't impede "real work to be done" for maintenance work
4) Though I doubt comparing each xid for just three generations would be too much overhead, I wonder if a sliding bloom filter could be used for a quick initial xid existence quick, before attempting to compare each xid. I don't necessarily think we should incorporate this now (after all the first rule of code optimization is don't optimize your code), but just leaving it as a future thought.

Brandon Nesterenko added a comment - 2024-04-03 13:51 - edited Hi knielsen , monty ! I looked over the patch, and have the following thoughts. 2-4 are quite minor (just for consideration), though I think it would be good to address the first point (and it wouldn't impact Roel 's testing). 1) The test doesn't test a pretty big aspect of the patch, generations. It'd be good to wrap that main loop with another loop that separates out XAP from XAC/XAR by <slave_parallel_workers>*<index_of_new_loop> transactions for 0-4 generations. 2) The granularity of generation seems like it can overzealously fix the scheduling for transactions in the last vulnerable generation - can we infer it based on sub_id? I imagine that may allow a little more parallelism 3) The freeze_size(&maybe_active_xid) condition can be done after the whole event group has been queued so we don't impede "real work to be done" for maintenance work 4) Though I doubt comparing each xid for just three generations would be too much overhead, I wonder if a sliding bloom filter could be used for a quick initial xid existence quick, before attempting to compare each xid. I don't necessarily think we should incorporate this now (after all the first rule of code optimization is don't optimize your code), but just leaving it as a future thought.

Kristian Nielsen added a comment - 2024-04-03 15:27

Agree that a test for generations would be good.

The use of generations in the patch was a minimal way of ensuring
correctness in the (presumably rare) case of duplicate XID between
transactions. The granularity of generations is calculated in a very
conservative manner, and generations can be quite a lot longer than they
need to. This does not impact parallelism if XIDs are all distinct. But it
does lead to a longer linear search for the XID inside the SQL driver
thread. Which will have little impact as long as the SQL driver thread is
not a bottleneck.

As I understand, this was found not to be a problem in practice in
performance testing. But if/when we want to improve it, the obvious idea
would be to have a hash table mapping XID to the sched_bucket it was last
scheduled to. This would eliminate the linear search length that depends on
the estimated size of generations.

In pathological cases it can be necessary to keep track of an XID for
arbitrarily long. In principle, we could schedule thousands of event groups
between workers W1 and W2 due to XA dependencies, and then the next event
group scheduled on W3 could in principle run in parallel with any or all of
them. So I don't think sub_id by itself is enough.

Some improvement is though possible using sub_id. For example, suppose we
schedule event groups E1 and E10 (say) to W1, and we scheduled E15 to W2,
where 1, 10, 15 are the sub_ids for the event groups. Then we know that E15
will wait for E10 to commit (because 15>10, sub_id comparison). So then it's
safe to schedule the next event group to W2, even if it has the same XA XID
as E1.

So in summary, the generations is just a simple and low-cost way to
conservatively bound the time for which an XID needs to be remembered. And a
hash table of the XIDs could be used to eliminate the linear search through
the XID history, if required.

I'm not sure about point (3), maybe I did not catch your meaning. This all
runs in the SQL driver thread, and freeze_size() is just a realloc(), I
don't understand how moving this could significantly affect "real work to be
done"?

Kristian Nielsen added a comment - 2024-04-03 15:27 Agree that a test for generations would be good. The use of generations in the patch was a minimal way of ensuring correctness in the (presumably rare) case of duplicate XID between transactions. The granularity of generations is calculated in a very conservative manner, and generations can be quite a lot longer than they need to. This does not impact parallelism if XIDs are all distinct. But it does lead to a longer linear search for the XID inside the SQL driver thread. Which will have little impact as long as the SQL driver thread is not a bottleneck. As I understand, this was found not to be a problem in practice in performance testing. But if/when we want to improve it, the obvious idea would be to have a hash table mapping XID to the sched_bucket it was last scheduled to. This would eliminate the linear search length that depends on the estimated size of generations. In pathological cases it can be necessary to keep track of an XID for arbitrarily long. In principle, we could schedule thousands of event groups between workers W1 and W2 due to XA dependencies, and then the next event group scheduled on W3 could in principle run in parallel with any or all of them. So I don't think sub_id by itself is enough. Some improvement is though possible using sub_id. For example, suppose we schedule event groups E1 and E10 (say) to W1, and we scheduled E15 to W2, where 1, 10, 15 are the sub_ids for the event groups. Then we know that E15 will wait for E10 to commit (because 15>10, sub_id comparison). So then it's safe to schedule the next event group to W2, even if it has the same XA XID as E1. So in summary, the generations is just a simple and low-cost way to conservatively bound the time for which an XID needs to be remembered. And a hash table of the XIDs could be used to eliminate the linear search through the XID history, if required. I'm not sure about point (3), maybe I did not catch your meaning. This all runs in the SQL driver thread, and freeze_size() is just a realloc(), I don't understand how moving this could significantly affect "real work to be done"?

Brandon Nesterenko added a comment - 2024-04-03 16:33

To clarify point (3), I only mean it can be moved to a point after we've queued all events of the current transaction, so the resizing can be done (by the SQL thread) in parallel with the worker thread executing the transaction. Though I wouldn't classify it as significant, just for thought.

Agreed that sub_id isn't enough alone to replace the idea of a generation. My thought seemed to parallel the example you gave of using sub_id as an improvement. That is, to calculate a relative generation using sub_id (while still maintaining an XID list for active generations), then I imagine we wouldn't need to store the XIDs of the third/last generation (which seems to be in-line with your example). Just wanted to define it to be sure we're on the same page. Again, not something I'd say that is significant.

Thanks for the in-depth elaboration!

Brandon Nesterenko added a comment - 2024-04-03 16:33 To clarify point (3), I only mean it can be moved to a point after we've queued all events of the current transaction, so the resizing can be done (by the SQL thread) in parallel with the worker thread executing the transaction. Though I wouldn't classify it as significant, just for thought. Agreed that sub_id isn't enough alone to replace the idea of a generation. My thought seemed to parallel the example you gave of using sub_id as an improvement. That is, to calculate a relative generation using sub_id (while still maintaining an XID list for active generations), then I imagine we wouldn't need to store the XIDs of the third/last generation (which seems to be in-line with your example). Just wanted to define it to be sure we're on the same page. Again, not something I'd say that is significant. Thanks for the in-depth elaboration!

Roel Van de Paar added a comment - 2024-04-03 20:33 - edited

As promised, a detailed testing status update.

* I am currently focusing on debugging these issues seen during testing thus far:

1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof)
2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof)
3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof)
4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related.
5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql
6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event
7. Other crashes/assertions which may or may not be related

* After that, or concurrent as feasible, the plan is to do the following tests:

8. Stress test related XA replication MTR tests (and verify any failures against BASE as needed)
9. Run the MENT-1905 XA lua scripts in various configurations. This is important as these scripts have repeatedly shown various issues in this area.

My total overall estimate, keeping in mind my availability (4 days/week), remains the 17th of April. If it possible to finish testing before then, it will be done.

Andrei also mentioned that multiple GTID domains is very important to check in connection with this ticket, which he indicated is one area where the fixes may play out something unexpected. For this, I propose knielsen or Elkin to add an MTR test.

Roel Van de Paar added a comment - 2024-04-03 20:33 - edited As promised, a detailed testing status update. * I am currently focusing on debugging these issues seen during testing thus far: 1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof) 2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof) 3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof) 4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related. 5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql 6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event 7. Other crashes/assertions which may or may not be related * After that, or concurrent as feasible, the plan is to do the following tests: 8. Stress test related XA replication MTR tests (and verify any failures against BASE as needed) 9. Run the MENT-1905 XA lua scripts in various configurations. This is important as these scripts have repeatedly shown various issues in this area. My total overall estimate, keeping in mind my availability (4 days/week), remains the 17th of April. If it possible to finish testing before then, it will be done. Andrei also mentioned that multiple GTID domains is very important to check in connection with this ticket, which he indicated is one area where the fixes may play out something unexpected. For this, I propose knielsen or Elkin to add an MTR test.

Roel Van de Paar added a comment - 2024-04-03 22:22 - edited

Also of some interest is the assertion

!look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys

Though it is seen in base runs as well.

Roel Van de Paar added a comment - 2024-04-03 22:22 - edited Also of some interest is the assertion !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys Though it is seen in base runs as well.

Michael Widenius added a comment - 2024-04-04 06:17

Roel can you please create sub task for all of the above listed errors (including how to repeat them / what you did to get them) so that we get them fixed?

Michael Widenius added a comment - 2024-04-04 06:17 Roel can you please create sub task for all of the above listed errors (including how to repeat them / what you did to get them) so that we get them fixed?

Roel Van de Paar added a comment - 2024-04-04 22:04

Yes; all the issues above - as well as various others seen in trunk/base during testing - will be debugged and logged, provided they prove to be bugs.

In case it was not clear, the list of issues 1-6 above were only seen in the patch tree, not in trunk/base.

Roel Van de Paar added a comment - 2024-04-04 22:04 Yes; all the issues above - as well as various others seen in trunk/base during testing - will be debugged and logged, provided they prove to be bugs. In case it was not clear, the list of issues 1-6 above were only seen in the patch tree, not in trunk/base.

Kristian Nielsen added a comment - 2024-04-05 16:54

Hi bnestere,

I pushed a couple patches to branch knielsen_xa_sched_minimal_fix that implement
what I described in my previous comment.

One is a testcase for generations, as you also mentioned. It runs a lot of
random XA transactions with different distances between XA PREPARE and XA
COMMIT, and with some duplicate XIDs. This test was successful at failing
when I deliberately introduced a couple bugs in the generation accounting.

Unfortunately that test fails sporadically, both with my patches and without
(vanilla 10.6). I think perhaps this is because the slave's GTID position is
updated too early, so there is a race where MASTER_GTID_WAIT() has returned
successfully but the corresponding transaction is not yet visible in the
InnoDB table. Something for Andrei to look into perhaps?

The second is a small patch that re-writes the computation for when a new
generation starts. This seems much better than the original code, both
simpler and more precise. It just uses the FIFO-order of scheduling to see
when the last bucket of the previous generation has been scheduled for the
current generation. Instead of the overly clever index manipulation that
significantly over-estimates the time before a new generation is reached.

The third patch is more complex. It replaces the linear array scan of
recently scheduled XIDs with a single hash table lookup. So it avoids the
potentially costly scan in the original patch if the workload has a high
density of XA transactions and the number of worker threads is large. It
also implements the refinement using sub_id to avoid XID dependencies in
some cases where they would otherwise be implied by generation
considerations alone.

I don't necessarily suggest to include these new patches in the upcoming
release or custom build. They are not critical, the functionality should be
the same with or without. I just wanted to show how I envisioned the minimal
patch's accounting of dependencies and generations could look with a more
complex and full implementation.

- Kristian.

Kristian Nielsen added a comment - 2024-04-05 16:54 Hi bnestere , I pushed a couple patches to branch knielsen_xa_sched_minimal_fix that implement what I described in my previous comment. One is a testcase for generations, as you also mentioned. It runs a lot of random XA transactions with different distances between XA PREPARE and XA COMMIT, and with some duplicate XIDs. This test was successful at failing when I deliberately introduced a couple bugs in the generation accounting. Unfortunately that test fails sporadically, both with my patches and without (vanilla 10.6). I think perhaps this is because the slave's GTID position is updated too early, so there is a race where MASTER_GTID_WAIT() has returned successfully but the corresponding transaction is not yet visible in the InnoDB table. Something for Andrei to look into perhaps? The second is a small patch that re-writes the computation for when a new generation starts. This seems much better than the original code, both simpler and more precise. It just uses the FIFO-order of scheduling to see when the last bucket of the previous generation has been scheduled for the current generation. Instead of the overly clever index manipulation that significantly over-estimates the time before a new generation is reached. The third patch is more complex. It replaces the linear array scan of recently scheduled XIDs with a single hash table lookup. So it avoids the potentially costly scan in the original patch if the workload has a high density of XA transactions and the number of worker threads is large. It also implements the refinement using sub_id to avoid XID dependencies in some cases where they would otherwise be implied by generation considerations alone. I don't necessarily suggest to include these new patches in the upcoming release or custom build. They are not critical, the functionality should be the same with or without. I just wanted to show how I envisioned the minimal patch's accounting of dependencies and generations could look with a more complex and full implementation. - Kristian.

Andrei Elkin added a comment - 2024-04-09 09:42

Few reviews were done. Two commits are pushed.
89c907bd4f7..d90a2b44add HEAD -> 10.6

Andrei Elkin added a comment - 2024-04-09 09:42 Few reviews were done. Two commits are pushed. 89c907bd4f7..d90a2b44add HEAD -> 10.6

Roel Van de Paar added a comment - 2024-04-10 04:44 - edited

Elkin What will happen with the additional patches that bnestere and knielsen discussed/proposed? Also, where does this leave MDEV-31949/ MENT-1905?

Roel Van de Paar added a comment - 2024-04-10 04:44 - edited Elkin What will happen with the additional patches that bnestere and knielsen discussed/proposed? Also, where does this leave MDEV-31949 / MENT-1905?

Andrei Elkin added a comment - 2024-04-10 09:34

Roel, that code is currently in Kristian's branch. I thought the best place for it would be 11.5, as a polished version of this bug fixes.
MDEV-31949 keeps aiming to 11.6.

Andrei Elkin added a comment - 2024-04-10 09:34 Roel , that code is currently in Kristian's branch. I thought the best place for it would be 11.5, as a polished version of this bug fixes. MDEV-31949 keeps aiming to 11.6.

Roel Van de Paar added a comment - 2024-04-11 04:20

Elkin Ack, thank you. Should a new ticket be created for those patches? As they will require testing, I think we should look at 11.6 for those as well.
Also, is this patch (~~MDEV-33668~~ inc the additional patches) fully compatible with MDEV-31949?

Roel Van de Paar added a comment - 2024-04-11 04:20 Elkin Ack, thank you. Should a new ticket be created for those patches? As they will require testing, I think we should look at 11.6 for those as well. Also, is this patch ( MDEV-33668 inc the additional patches) fully compatible with MDEV-31949 ?

Andrei Elkin added a comment - 2024-04-11 11:16

Roel, a new ticket, of course. I'll report one when have fully understood its merge with MDEV-31949. The latter is different on the slave side, maybe the merge with this solution will lessen that, and 31949 also refines the master side because of recovery (MDEV-21777, MDEV-32830).

Andrei Elkin added a comment - 2024-04-11 11:16 Roel , a new ticket, of course. I'll report one when have fully understood its merge with MDEV-31949 . The latter is different on the slave side, maybe the merge with this solution will lessen that, and 31949 also refines the master side because of recovery ( MDEV-21777 , MDEV-32830 ).

Roel Van de Paar added a comment - 2024-04-16 04:22 - edited

From the list above (all bugs reproduced in BASE as well, except #6 ftm):

1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof)
> This is now MDEV-34010

2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof)
> Filtered after analysis

3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof)
> This is now MDEV-34011

4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). ~~This one may be XA related~~.
> This is now MDEV-33961

5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql
> This is now MDEV-34051

6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event
> Ref MDEV-32372

7. !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys and
!look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_internal_low|dict_stats_fetch_from_ps
> This is now MDEV-33917

This comment ~~will be updated as work on bugs progresses~~ is now complete.

Roel Van de Paar added a comment - 2024-04-16 04:22 - edited From the list above (all bugs reproduced in BASE as well, except #6 ftm): 1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof) > This is now MDEV-34010 2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof) > Filtered after analysis 3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof) > This is now MDEV-34011 4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related . > This is now MDEV-33961 5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql > This is now MDEV-34051 6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event > Ref MDEV-32372 7. !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys and !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_internal_low|dict_stats_fetch_from_ps > This is now MDEV-33917 This comment will be updated as work on bugs progresses is now complete.

MariaDB Server

Adapt parallel slave's round-robin scheduling to XA events

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration