Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33668

Adapt parallel slave's round-robin scheduling to XA events

Details

    Description

      This ticket offers an alternative to MDEV-33667 solution for improving parallel slave performance of XA transaction load.
      Round-robin distribution is mingled with assigning XA-"COMPLETE" events to the same
      slave workers that currently may be processing respective XA-PREPARE parts.

      The algorithm tries to reach fair distribution but it can't provide the parallelism of
      XA-PREPARE with XA-COMMIT (ROLLBACK).

      Currently knielsen_xa_sched_minimal_fix branch contains such solution.

      10.6 Rebase branch: bb-10.6-MDEV-33668
      https://github.com/MariaDB/server/commits/bb-10.6-MDEV-33668/

      Attachments

        Issue Links

          Activity

            Elkin Andrei Elkin added a comment -

            Set myself to review Kristian's branch.

            Elkin Andrei Elkin added a comment - Set myself to review Kristian's branch.
            Elkin Andrei Elkin added a comment - - edited
            Elkin Andrei Elkin added a comment - - edited monty , serg , knielsen Fyi.

            Pushed a small fix to bb-10.6-MDEV-33668, taking the same fix already used in Andrei's rev0 and rev1 patches.

            There was a wait_for_prior_commit() in XA_prepare_log_event::do_commit() before doing the actual XA PREPARE. This would prevent all XA PREPARE events from group-committing with any prior transactions, which would hurt performance of parallel replication of workloads with a significant fraction of XA transactions.

            knielsen Kristian Nielsen added a comment - Pushed a small fix to bb-10.6- MDEV-33668 , taking the same fix already used in Andrei's rev0 and rev1 patches. There was a wait_for_prior_commit() in XA_prepare_log_event::do_commit() before doing the actual XA PREPARE. This would prevent all XA PREPARE events from group-committing with any prior transactions, which would hurt performance of parallel replication of workloads with a significant fraction of XA transactions.

            Restarted testing on the revised bb-10.6-MDEV-33668.

            Roel Roel Van de Paar added a comment - Restarted testing on the revised bb-10.6-MDEV-33668.

            Andrei's ver0 work (ref) includes a lot of testcase work. Will this need to be added to this patch also, and will it be the same?

            Roel Roel Van de Paar added a comment - Andrei's ver0 work ( ref ) includes a lot of testcase work. Will this need to be added to this patch also, and will it be the same?

            These tests (rpl_xa_prepare_gtid_fail, rpl_xa_concurrent_2pc, rpl_xa_empty_transaction, rpl_parallel_xa_same_xid and binlog_xa_prepared_bugs) have proven stable in bb-10.6-MDEV-33668 when copied (inc related files) from ver0.

            Roel Roel Van de Paar added a comment - These tests ( rpl_xa_prepare_gtid_fail, rpl_xa_concurrent_2pc, rpl_xa_empty_transaction, rpl_parallel_xa_same_xid and binlog_xa_prepared_bugs ) have proven stable in bb-10.6-MDEV-33668 when copied (inc related files) from ver0.
            Elkin Andrei Elkin added a comment -

            Roel, good point! I'll consider their relevance.

            Elkin Andrei Elkin added a comment - Roel , good point! I'll consider their relevance.
            Elkin Andrei Elkin added a comment - - edited

            roel salve! I'm going to push fixes to stabilize rpl_xa_prepare_gtid_fail separately. Other the 31949 branches' are not relevant in the current context. E.g rpl_parallel_xa_same_xid changes in origin/bb-10.6-MDEV-31949_ver0_opt are only for "patient" repeating of duplicate xid:s while in this ticket's branch even a single retry can't occur for this reason (just as it is in the base 10.6).

            Oth, a new rpl_parallel_multi_domain_xa is added to the branch with the purpose of proving
            multiple gtid domain parallel applying.
            Could you please look into it as we need this sort of setup tested rather aggressively as well.
            I've not heard yet from axel about benchmarking results. But it all looks like this branch will be pushed as a solution to the slowness issue.

            Cheers!

            Elkin Andrei Elkin added a comment - - edited roel salve! I'm going to push fixes to stabilize rpl_xa_prepare_gtid_fail separately. Other the 31949 branches' are not relevant in the current context. E.g rpl_parallel_xa_same_xid changes in origin/bb-10.6- MDEV-31949 _ver0_opt are only for "patient" repeating of duplicate xid:s while in this ticket's branch even a single retry can't occur for this reason (just as it is in the base 10.6). Oth, a new rpl_parallel_multi_domain_xa is added to the branch with the purpose of proving multiple gtid domain parallel applying. Could you please look into it as we need this sort of setup tested rather aggressively as well. I've not heard yet from axel about benchmarking results. But it all looks like this branch will be pushed as a solution to the slowness issue. Cheers!
            Elkin Andrei Elkin added a comment -

            Howdy knielsen. I've made my few notes to your branch and followed with few small commits. It's a great piece of work and, I've grown convinced we should go along with it rather than with MDEV-33667 as a "hot-fixes" to the slowness.
            While we're for Axel's benchmarking, as a blessing I have not doubts , could you please review the current total branch on your own? I only had some concern about maybe_active_xid... array size and scan search in it. The size is made under control. To my understanding the latter can't be really concerning while the number of workers is not big.

            Elkin Andrei Elkin added a comment - Howdy knielsen . I've made my few notes to your branch and followed with few small commits. It's a great piece of work and, I've grown convinced we should go along with it rather than with MDEV-33667 as a "hot-fixes" to the slowness. While we're for Axel's benchmarking, as a blessing I have not doubts , could you please review the current total branch on your own? I only had some concern about maybe_active_xid... array size and scan search in it. The size is made under control. To my understanding the latter can't be really concerning while the number of workers is not big.

            Elkin, I looked over the complete patch series, looks good to me.
            I squashed some commits to simplify the commit history and force-pushed the bb-10.6-MDEV-33668 branch. I kept the preparatory/refactoring patch separate, I think it's nice to keep the difference between the refactoring and the change of logic in XA scheduling.

            Agree with your comments about the size of maybe_active_xid. There will be some cost to the scanning of the array to search for XID. This could be improved using a hash table; also a more careful accounting of history could reduce the size of generations. But I'm not sure it will be a problem in practice, so maybe just go with the simple approach for now. Even if the size of the array would be significant, this only affects the SQL driver thread, and it seems unlikely that the impact would be severe enough to make it a bottleneck.

            Let me know if there is something else I should do on this MDEV at this point.

            - Kristian.

            knielsen Kristian Nielsen added a comment - Elkin , I looked over the complete patch series, looks good to me. I squashed some commits to simplify the commit history and force-pushed the bb-10.6- MDEV-33668 branch. I kept the preparatory/refactoring patch separate, I think it's nice to keep the difference between the refactoring and the change of logic in XA scheduling. Agree with your comments about the size of maybe_active_xid. There will be some cost to the scanning of the array to search for XID. This could be improved using a hash table; also a more careful accounting of history could reduce the size of generations. But I'm not sure it will be a problem in practice, so maybe just go with the simple approach for now. Even if the size of the array would be significant, this only affects the SQL driver thread, and it seems unlikely that the impact would be severe enough to make it a bottleneck. Let me know if there is something else I should do on this MDEV at this point. - Kristian.
            Elkin Andrei Elkin added a comment - - edited

            monty (knielsen), there's more clear picture about performance done by axel. I am pasting the comparison table between few branches investigated.
            Optimistic parallel mode slave is run with --log-bin and --log-slave-updates.

            Variant A               Variant B               Variant C               Variant D
            -------------------------------------------------------------------------------------
             
            #thd    tps             #thd    tps             #thd    tps             #thd    tps
            1       2150            1       2092            1       2109            1       2217
            2       2226            2       2395            2       2503            2       2845
            4       2572            4       3434            4       3380            4       3851
            8       3024            8       5180            8       4419            8       5560
            16      3649            16      7801            16      5746            16      8171
            32      4358            32      10628           32      7852            32      11011
            64      5107            64      15094           64      9323            64      14565
             
             
            A: mariadb-10.6-567c0973591, branch 10.6
            B: mariadb-10.6-093efc509b1  branch bb-10.6-MDEV-31949_ver0_opt
            C: mariadb-10.6-0dc13677f3d, branch bb-10.6-MDEV-33668
            D: mariadb-10.6-1bdc9bc8077, branch bb-10.6-MDEV-31949
            

            (bb-10.6-MDEV-31949_ver0_opt should've been renamed to bb-10.6-MDEV-33667).
            As you can see with 4 workers bb-10.6-MDEV-33668 is better than 10.6 by about 30%,
            with 8 workers by about 50%. This is somewhat worse than my "capsule" benchmarking showed.

            Elkin Andrei Elkin added a comment - - edited monty ( knielsen ), there's more clear picture about performance done by axel . I am pasting the comparison table between few branches investigated. Optimistic parallel mode slave is run with --log-bin and --log-slave-updates. Variant A Variant B Variant C Variant D -------------------------------------------------------------------------------------   #thd tps #thd tps #thd tps #thd tps 1 2150 1 2092 1 2109 1 2217 2 2226 2 2395 2 2503 2 2845 4 2572 4 3434 4 3380 4 3851 8 3024 8 5180 8 4419 8 5560 16 3649 16 7801 16 5746 16 8171 32 4358 32 10628 32 7852 32 11011 64 5107 64 15094 64 9323 64 14565     A: mariadb-10.6-567c0973591, branch 10.6 B: mariadb-10.6-093efc509b1 branch bb-10.6-MDEV-31949_ver0_opt C: mariadb-10.6-0dc13677f3d, branch bb-10.6-MDEV-33668 D: mariadb-10.6-1bdc9bc8077, branch bb-10.6-MDEV-31949 ( bb-10.6-MDEV-31949_ver0_opt should've been renamed to bb-10.6-MDEV-33667 ). As you can see with 4 workers bb-10.6- MDEV-33668 is better than 10.6 by about 30%, with 8 workers by about 50%. This is somewhat worse than my "capsule" benchmarking showed.

            Testing has been restarted on the yesterday updated bb-10.6-MDEV-33668 branch

            Roel Roel Van de Paar added a comment - Testing has been restarted on the yesterday updated bb-10.6-MDEV-33668 branch
            bnestere Brandon Nesterenko added a comment - - edited

            Hi knielsen, monty!

            I looked over the patch, and have the following thoughts. 2-4 are quite minor (just for consideration), though I think it would be good to address the first point (and it wouldn't impact Roel's testing).

            1) The test doesn't test a pretty big aspect of the patch, generations. It'd be good to wrap that main loop with another loop that separates out XAP from XAC/XAR by <slave_parallel_workers>*<index_of_new_loop> transactions for 0-4 generations.
            2) The granularity of generation seems like it can overzealously fix the scheduling for transactions in the last vulnerable generation - can we infer it based on sub_id? I imagine that may allow a little more parallelism
            3) The freeze_size(&maybe_active_xid) condition can be done after the whole event group has been queued so we don't impede "real work to be done" for maintenance work
            4) Though I doubt comparing each xid for just three generations would be too much overhead, I wonder if a sliding bloom filter could be used for a quick initial xid existence quick, before attempting to compare each xid. I don't necessarily think we should incorporate this now (after all the first rule of code optimization is don't optimize your code), but just leaving it as a future thought.

            bnestere Brandon Nesterenko added a comment - - edited Hi knielsen , monty ! I looked over the patch, and have the following thoughts. 2-4 are quite minor (just for consideration), though I think it would be good to address the first point (and it wouldn't impact Roel 's testing). 1) The test doesn't test a pretty big aspect of the patch, generations. It'd be good to wrap that main loop with another loop that separates out XAP from XAC/XAR by <slave_parallel_workers>*<index_of_new_loop> transactions for 0-4 generations. 2) The granularity of generation seems like it can overzealously fix the scheduling for transactions in the last vulnerable generation - can we infer it based on sub_id? I imagine that may allow a little more parallelism 3) The freeze_size(&maybe_active_xid) condition can be done after the whole event group has been queued so we don't impede "real work to be done" for maintenance work 4) Though I doubt comparing each xid for just three generations would be too much overhead, I wonder if a sliding bloom filter could be used for a quick initial xid existence quick, before attempting to compare each xid. I don't necessarily think we should incorporate this now (after all the first rule of code optimization is don't optimize your code), but just leaving it as a future thought.

            Agree that a test for generations would be good.

            The use of generations in the patch was a minimal way of ensuring
            correctness in the (presumably rare) case of duplicate XID between
            transactions. The granularity of generations is calculated in a very
            conservative manner, and generations can be quite a lot longer than they
            need to. This does not impact parallelism if XIDs are all distinct. But it
            does lead to a longer linear search for the XID inside the SQL driver
            thread. Which will have little impact as long as the SQL driver thread is
            not a bottleneck.

            As I understand, this was found not to be a problem in practice in
            performance testing. But if/when we want to improve it, the obvious idea
            would be to have a hash table mapping XID to the sched_bucket it was last
            scheduled to. This would eliminate the linear search length that depends on
            the estimated size of generations.

            In pathological cases it can be necessary to keep track of an XID for
            arbitrarily long. In principle, we could schedule thousands of event groups
            between workers W1 and W2 due to XA dependencies, and then the next event
            group scheduled on W3 could in principle run in parallel with any or all of
            them. So I don't think sub_id by itself is enough.

            Some improvement is though possible using sub_id. For example, suppose we
            schedule event groups E1 and E10 (say) to W1, and we scheduled E15 to W2,
            where 1, 10, 15 are the sub_ids for the event groups. Then we know that E15
            will wait for E10 to commit (because 15>10, sub_id comparison). So then it's
            safe to schedule the next event group to W2, even if it has the same XA XID
            as E1.

            So in summary, the generations is just a simple and low-cost way to
            conservatively bound the time for which an XID needs to be remembered. And a
            hash table of the XIDs could be used to eliminate the linear search through
            the XID history, if required.

            I'm not sure about point (3), maybe I did not catch your meaning. This all
            runs in the SQL driver thread, and freeze_size() is just a realloc(), I
            don't understand how moving this could significantly affect "real work to be
            done"?

            knielsen Kristian Nielsen added a comment - Agree that a test for generations would be good. The use of generations in the patch was a minimal way of ensuring correctness in the (presumably rare) case of duplicate XID between transactions. The granularity of generations is calculated in a very conservative manner, and generations can be quite a lot longer than they need to. This does not impact parallelism if XIDs are all distinct. But it does lead to a longer linear search for the XID inside the SQL driver thread. Which will have little impact as long as the SQL driver thread is not a bottleneck. As I understand, this was found not to be a problem in practice in performance testing. But if/when we want to improve it, the obvious idea would be to have a hash table mapping XID to the sched_bucket it was last scheduled to. This would eliminate the linear search length that depends on the estimated size of generations. In pathological cases it can be necessary to keep track of an XID for arbitrarily long. In principle, we could schedule thousands of event groups between workers W1 and W2 due to XA dependencies, and then the next event group scheduled on W3 could in principle run in parallel with any or all of them. So I don't think sub_id by itself is enough. Some improvement is though possible using sub_id. For example, suppose we schedule event groups E1 and E10 (say) to W1, and we scheduled E15 to W2, where 1, 10, 15 are the sub_ids for the event groups. Then we know that E15 will wait for E10 to commit (because 15>10, sub_id comparison). So then it's safe to schedule the next event group to W2, even if it has the same XA XID as E1. So in summary, the generations is just a simple and low-cost way to conservatively bound the time for which an XID needs to be remembered. And a hash table of the XIDs could be used to eliminate the linear search through the XID history, if required. I'm not sure about point (3), maybe I did not catch your meaning. This all runs in the SQL driver thread, and freeze_size() is just a realloc(), I don't understand how moving this could significantly affect "real work to be done"?

            To clarify point (3), I only mean it can be moved to a point after we've queued all events of the current transaction, so the resizing can be done (by the SQL thread) in parallel with the worker thread executing the transaction. Though I wouldn't classify it as significant, just for thought.

            Agreed that sub_id isn't enough alone to replace the idea of a generation. My thought seemed to parallel the example you gave of using sub_id as an improvement. That is, to calculate a relative generation using sub_id (while still maintaining an XID list for active generations), then I imagine we wouldn't need to store the XIDs of the third/last generation (which seems to be in-line with your example). Just wanted to define it to be sure we're on the same page. Again, not something I'd say that is significant.

            Thanks for the in-depth elaboration!

            bnestere Brandon Nesterenko added a comment - To clarify point (3), I only mean it can be moved to a point after we've queued all events of the current transaction, so the resizing can be done (by the SQL thread) in parallel with the worker thread executing the transaction. Though I wouldn't classify it as significant, just for thought. Agreed that sub_id isn't enough alone to replace the idea of a generation. My thought seemed to parallel the example you gave of using sub_id as an improvement. That is, to calculate a relative generation using sub_id (while still maintaining an XID list for active generations), then I imagine we wouldn't need to store the XIDs of the third/last generation (which seems to be in-line with your example). Just wanted to define it to be sure we're on the same page. Again, not something I'd say that is significant. Thanks for the in-depth elaboration!
            Roel Roel Van de Paar added a comment - - edited

            As promised, a detailed testing status update.

            * I am currently focusing on debugging these issues seen during testing thus far:

            1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof)
            2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof)
            3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof)
            4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related.
            5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql
            6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event
            7. Other crashes/assertions which may or may not be related

            * After that, or concurrent as feasible, the plan is to do the following tests:

            8. Stress test related XA replication MTR tests (and verify any failures against BASE as needed)
            9. Run the MENT-1905 XA lua scripts in various configurations. This is important as these scripts have repeatedly shown various issues in this area.

            My total overall estimate, keeping in mind my availability (4 days/week), remains the 17th of April. If it possible to finish testing before then, it will be done.

            Andrei also mentioned that multiple GTID domains is very important to check in connection with this ticket, which he indicated is one area where the fixes may play out something unexpected. For this, I propose knielsen or Elkin to add an MTR test.

            Roel Roel Van de Paar added a comment - - edited As promised, a detailed testing status update. * I am currently focusing on debugging these issues seen during testing thus far: 1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof) 2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof) 3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof) 4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related. 5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql 6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event 7. Other crashes/assertions which may or may not be related * After that, or concurrent as feasible, the plan is to do the following tests: 8. Stress test related XA replication MTR tests (and verify any failures against BASE as needed) 9. Run the MENT-1905 XA lua scripts in various configurations. This is important as these scripts have repeatedly shown various issues in this area. My total overall estimate, keeping in mind my availability (4 days/week), remains the 17th of April. If it possible to finish testing before then, it will be done. Andrei also mentioned that multiple GTID domains is very important to check in connection with this ticket, which he indicated is one area where the fixes may play out something unexpected. For this, I propose knielsen or Elkin to add an MTR test.
            Roel Roel Van de Paar added a comment - - edited

            Also of some interest is the assertion

            !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys
            

            Though it is seen in base runs as well.

            Roel Roel Van de Paar added a comment - - edited Also of some interest is the assertion !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys Though it is seen in base runs as well.

            Roel can you please create sub task for all of the above listed errors (including how to repeat them / what you did to get them) so that we get them fixed?

            monty Michael Widenius added a comment - Roel can you please create sub task for all of the above listed errors (including how to repeat them / what you did to get them) so that we get them fixed?

            Yes; all the issues above - as well as various others seen in trunk/base during testing - will be debugged and logged, provided they prove to be bugs.

            In case it was not clear, the list of issues 1-6 above were only seen in the patch tree, not in trunk/base.

            Roel Roel Van de Paar added a comment - Yes; all the issues above - as well as various others seen in trunk/base during testing - will be debugged and logged, provided they prove to be bugs. In case it was not clear, the list of issues 1-6 above were only seen in the patch tree, not in trunk/base.

            Hi bnestere,

            I pushed a couple patches to branch knielsen_xa_sched_minimal_fix that implement
            what I described in my previous comment.

            One is a testcase for generations, as you also mentioned. It runs a lot of
            random XA transactions with different distances between XA PREPARE and XA
            COMMIT, and with some duplicate XIDs. This test was successful at failing
            when I deliberately introduced a couple bugs in the generation accounting.

            Unfortunately that test fails sporadically, both with my patches and without
            (vanilla 10.6). I think perhaps this is because the slave's GTID position is
            updated too early, so there is a race where MASTER_GTID_WAIT() has returned
            successfully but the corresponding transaction is not yet visible in the
            InnoDB table. Something for Andrei to look into perhaps?

            The second is a small patch that re-writes the computation for when a new
            generation starts. This seems much better than the original code, both
            simpler and more precise. It just uses the FIFO-order of scheduling to see
            when the last bucket of the previous generation has been scheduled for the
            current generation. Instead of the overly clever index manipulation that
            significantly over-estimates the time before a new generation is reached.

            The third patch is more complex. It replaces the linear array scan of
            recently scheduled XIDs with a single hash table lookup. So it avoids the
            potentially costly scan in the original patch if the workload has a high
            density of XA transactions and the number of worker threads is large. It
            also implements the refinement using sub_id to avoid XID dependencies in
            some cases where they would otherwise be implied by generation
            considerations alone.

            I don't necessarily suggest to include these new patches in the upcoming
            release or custom build. They are not critical, the functionality should be
            the same with or without. I just wanted to show how I envisioned the minimal
            patch's accounting of dependencies and generations could look with a more
            complex and full implementation.

            - Kristian.

            knielsen Kristian Nielsen added a comment - Hi bnestere , I pushed a couple patches to branch knielsen_xa_sched_minimal_fix that implement what I described in my previous comment. One is a testcase for generations, as you also mentioned. It runs a lot of random XA transactions with different distances between XA PREPARE and XA COMMIT, and with some duplicate XIDs. This test was successful at failing when I deliberately introduced a couple bugs in the generation accounting. Unfortunately that test fails sporadically, both with my patches and without (vanilla 10.6). I think perhaps this is because the slave's GTID position is updated too early, so there is a race where MASTER_GTID_WAIT() has returned successfully but the corresponding transaction is not yet visible in the InnoDB table. Something for Andrei to look into perhaps? The second is a small patch that re-writes the computation for when a new generation starts. This seems much better than the original code, both simpler and more precise. It just uses the FIFO-order of scheduling to see when the last bucket of the previous generation has been scheduled for the current generation. Instead of the overly clever index manipulation that significantly over-estimates the time before a new generation is reached. The third patch is more complex. It replaces the linear array scan of recently scheduled XIDs with a single hash table lookup. So it avoids the potentially costly scan in the original patch if the workload has a high density of XA transactions and the number of worker threads is large. It also implements the refinement using sub_id to avoid XID dependencies in some cases where they would otherwise be implied by generation considerations alone. I don't necessarily suggest to include these new patches in the upcoming release or custom build. They are not critical, the functionality should be the same with or without. I just wanted to show how I envisioned the minimal patch's accounting of dependencies and generations could look with a more complex and full implementation. - Kristian.
            Elkin Andrei Elkin added a comment -

            Few reviews were done. Two commits are pushed.
            89c907bd4f7..d90a2b44add HEAD -> 10.6

            Elkin Andrei Elkin added a comment - Few reviews were done. Two commits are pushed. 89c907bd4f7..d90a2b44add HEAD -> 10.6
            Roel Roel Van de Paar added a comment - - edited

            Elkin What will happen with the additional patches that bnestere and knielsen discussed/proposed? Also, where does this leave MDEV-31949/ MENT-1905?

            Roel Roel Van de Paar added a comment - - edited Elkin What will happen with the additional patches that bnestere and knielsen discussed/proposed? Also, where does this leave MDEV-31949 / MENT-1905?
            Elkin Andrei Elkin added a comment -

            Roel, that code is currently in Kristian's branch. I thought the best place for it would be 11.5, as a polished version of this bug fixes.
            MDEV-31949 keeps aiming to 11.6.

            Elkin Andrei Elkin added a comment - Roel , that code is currently in Kristian's branch. I thought the best place for it would be 11.5, as a polished version of this bug fixes. MDEV-31949 keeps aiming to 11.6.

            Elkin Ack, thank you. Should a new ticket be created for those patches? As they will require testing, I think we should look at 11.6 for those as well.
            Also, is this patch (MDEV-33668 inc the additional patches) fully compatible with MDEV-31949?

            Roel Roel Van de Paar added a comment - Elkin Ack, thank you. Should a new ticket be created for those patches? As they will require testing, I think we should look at 11.6 for those as well. Also, is this patch ( MDEV-33668 inc the additional patches) fully compatible with MDEV-31949 ?
            Elkin Andrei Elkin added a comment -

            Roel, a new ticket, of course. I'll report one when have fully understood its merge with MDEV-31949. The latter is different on the slave side, maybe the merge with this solution will lessen that, and 31949 also refines the master side because of recovery (MDEV-21777, MDEV-32830).

            Elkin Andrei Elkin added a comment - Roel , a new ticket, of course. I'll report one when have fully understood its merge with MDEV-31949 . The latter is different on the slave side, maybe the merge with this solution will lessen that, and 31949 also refines the master side because of recovery ( MDEV-21777 , MDEV-32830 ).
            Roel Roel Van de Paar added a comment - - edited

            From the list above (all bugs reproduced in BASE as well, except #6 ftm):

            1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof)
            > This is now MDEV-34010

            2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof)
            > Filtered after analysis

            3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof)
            > This is now MDEV-34011

            4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related.
            > This is now MDEV-33961

            5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql
            > This is now MDEV-34051

            6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event
            > Ref MDEV-32372

            7. !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys and
            !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_internal_low|dict_stats_fetch_from_ps
            > This is now MDEV-33917

            This comment will be updated as work on bugs progresses is now complete.

            Roel Roel Van de Paar added a comment - - edited From the list above (all bugs reproduced in BASE as well, except #6 ftm): 1. [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 0-1-20, Internal MariaDB error code: 1964 (and variations thereof) > This is now MDEV-34010 2. [ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'could not find next log; the first event '.' at 4, the last event read from 'binlog.000009' at 2595, the last byte read from 'binlog.000009' at 2702.', Internal MariaDB error code: 1236 (and variations thereof) > Filtered after analysis 3. [ERROR] Slave SQL: Error executing row event: 'You can't combine write-locking of system tables with other tables or lock types', Gtid 0-15-143, Internal MariaDB error code: 1428 (and variations thereof) > This is now MDEV-34011 4. [Warning] Slave: Got error 140 "Wrong create options" from storage engine InnoDB Error_code: 1030 (w/ slave halt). This one may be XA related . > This is now MDEV-33961 5. (Slave Crash) SIGSEGV|dict_table_t::is_active_ddl|trx_t::rollback_low|trx_t::rollback|trx_rollback_last_sql_stat_for_mysql > This is now MDEV-34051 6. (Slave Assertion) thd->transaction->stmt.is_empty() || thd->in_sub_stmt || (thd->state_flags & Open_tables_state::BACKUPS_AVAIL)|SIGABRT|close_thread_tables|mysql_execute_command|mysql_parse|Query_log_event::do_apply_event > Ref MDEV-32372 7. !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_if_not_started_xa_low|create_table_info_t::create_foreign_keys and !look_for_rollover || start_scan_slot != slot|SIGABRT|trx_assign_rseg_low|trx_start_low|trx_start_internal_low|dict_stats_fetch_from_ps > This is now MDEV-33917 This comment will be updated as work on bugs progresses is now complete.

            People

              Elkin Andrei Elkin
              Elkin Andrei Elkin
              Votes:
              1 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.