Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7340

[PATCH] parallel replication status variables

Details

    Description

      from pivanof on mailing list in reference to parallel replication:

      > some status variables which could be plotted over time and show (or at
      > least hint on) whether this is significant bottleneck for performance
      > or not.

      > This could be something like total time (in both wall time and
      > accumulated CPU time) spent executing transactions in parallel, time
      > spent rolling back transactions due to this lock conflict, time spent
      > rolling back transactions because of other reasons (e.g. due to STOP
      > SLAVE or reconnect after master crash), maybe also time spent waiting
      > in one parallel thread while transaction is executing in another
      > thread, etc.

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            from MDEV-7396 - max consecutive parallel deadlocks is probably useful

            danblack Daniel Black added a comment - from MDEV-7396 - max consecutive parallel deadlocks is probably useful
            danblack Daniel Black added a comment -

            any more suggestions here? I may get started on this soon. Note it and it might be included.

            danblack Daniel Black added a comment - any more suggestions here? I may get started on this soon. Note it and it might be included.
            danblack Daniel Black added a comment -

            slave-domain-parallel-threads reached - # of time this is reached ? ( per domain?)

            microsecond_interval_timer is wall clock right? any tips on cpu time functions?

            danblack Daniel Black added a comment - slave-domain-parallel-threads reached - # of time this is reached ? ( per domain?) microsecond_interval_timer is wall clock right? any tips on cpu time functions?

            I would be interested in slave_parallel_workers_reached: number of time all workers were active executing a transaction (if this happen too often, slave_parallel_workers might be increased).

            I would also be interested in slave_parallel_workers_waiting: number of time a worker waited for a previous worker to complete (if this happen too often, slave_parallel_workers might be decreased).

            I would also be interested in relaylog_group_commits: count of the group commit received from masters (allows, in combination with binlog_group_commits, to monitor parallelism gain or lost with log-slave-updates: see http://blog.booking.com/better_parallel_replication_for_mysql.html for more details).

            More ideas to come later.

            Thanks for improving monitoring and tuning-ability.

            jgagne Jean-François Gagné added a comment - I would be interested in slave_parallel_workers_reached : number of time all workers were active executing a transaction (if this happen too often, slave_parallel_workers might be increased). I would also be interested in slave_parallel_workers_waiting : number of time a worker waited for a previous worker to complete (if this happen too often, slave_parallel_workers might be decreased). I would also be interested in relaylog_group_commits : count of the group commit received from masters (allows, in combination with binlog_group_commits, to monitor parallelism gain or lost with log-slave-updates: see http://blog.booking.com/better_parallel_replication_for_mysql.html for more details). More ideas to come later. Thanks for improving monitoring and tuning-ability.
            danblack Daniel Black added a comment -

            If a relay_commits is possible at the same time as relay_group_commits we can get an approximate graph of active threads over time. Is slave_parallel_workers_reached still useful then?

            s/_worker/_thread/g

            Not sure slave_parallel_workers_waiting helps, all but one thread will be waiting at some point. Please correct me if I'm missing something. Having an indicator or more relay log events waiting in a group but are restricted by slave_parallel_threads might be useful (parallel_in_order_group_thread_exhaustion?). On thread utilisation the original busy vs wait time vs rollback probably covers that well enough.

            Further down the track for the breakdown of which slave_parallel_mode decision each transaction is reaching (and how many per relevant category are rolled back).

            And then there is monitoring the out of order commits from slave_domain_parallel_threads and the limit it imposes however I'm thinking that presenting an information_schema table for that per domain if possible.

            danblack Daniel Black added a comment - If a relay_commits is possible at the same time as relay_group_commits we can get an approximate graph of active threads over time. Is slave_parallel_workers_reached still useful then? s/_worker/_thread/g Not sure slave_parallel_workers_waiting helps, all but one thread will be waiting at some point. Please correct me if I'm missing something. Having an indicator or more relay log events waiting in a group but are restricted by slave_parallel_threads might be useful ( parallel_in_order_group_thread_exhaustion ?). On thread utilisation the original busy vs wait time vs rollback probably covers that well enough. Further down the track for the breakdown of which slave_parallel_mode decision each transaction is reaching (and how many per relevant category are rolled back). And then there is monitoring the out of order commits from slave_domain_parallel_threads and the limit it imposes however I'm thinking that presenting an information_schema table for that per domain if possible.

            I made a patch that adds some status variables, measuring time spent by
            worker threads being idle, processing events, and waiting for other
            transactions:

            http://lists.askmonty.org/pipermail/commits/2015-July/008126.html

            More details in the commit message in the link. This is not necessarily
            meant to be the final form (or any form) of this MDEV-7340, but it might be
            interesting, at least. Testing of the patch welcome.

            knielsen Kristian Nielsen added a comment - I made a patch that adds some status variables, measuring time spent by worker threads being idle, processing events, and waiting for other transactions: http://lists.askmonty.org/pipermail/commits/2015-July/008126.html More details in the commit message in the link. This is not necessarily meant to be the final form (or any form) of this MDEV-7340 , but it might be interesting, at least. Testing of the patch welcome.
            danblack Daniel Black added a comment -

            nice. Thank you. Was looking to see if anything other than status_lock could be used but like you didn't see an easy approach. status vars look good.

            danblack Daniel Black added a comment - nice. Thank you. Was looking to see if anything other than status_lock could be used but like you didn't see an easy approach. status vars look good.
            knielsen Kristian Nielsen added a comment - - edited

            Do you mean LOCK_status?

              statistic_add(*current_status_var,
                            new_time - slave_worker_phase_start_time, &LOCK_status);

            If my understanding is correct, this statistic_add compiles into an atomic
            add operation on platforms of interest. The lock is not actually used,
            unless some wierd platform that does not have atomic operations.

            EDIT: Actually, it appears that neither lock nor atomic operations are
            used. There is a SAFE_STATISTICS define that causes them to be used, but it
            is never enabled. So there is no locking apparently, and the statistics can
            at least theoretically be off, trading 100% accuracy for improved
            performance.

            knielsen Kristian Nielsen added a comment - - edited Do you mean LOCK_status? statistic_add(*current_status_var, new_time - slave_worker_phase_start_time, &LOCK_status); If my understanding is correct, this statistic_add compiles into an atomic add operation on platforms of interest. The lock is not actually used, unless some wierd platform that does not have atomic operations. EDIT: Actually, it appears that neither lock nor atomic operations are used. There is a SAFE_STATISTICS define that causes them to be used, but it is never enabled. So there is no locking apparently, and the statistics can at least theoretically be off, trading 100% accuracy for improved performance.
            danblack Daniel Black added a comment -

            thanks for looking this up.

            I'm quite happy with this tradeoff.

            If you're happy with its final form any chance of a backport? It applies to 10.0 with minimal fuzz.

            danblack Daniel Black added a comment - thanks for looking this up. I'm quite happy with this tradeoff. If you're happy with its final form any chance of a backport? It applies to 10.0 with minimal fuzz.
            danblack Daniel Black added a comment -

            Any chance of a port to 10.0? I don't particularly care if it changes sightly in the future. Something is better than nothing.

            danblack Daniel Black added a comment - Any chance of a port to 10.0? I don't particularly care if it changes sightly in the future. Something is better than nothing.
            danblack Daniel Black added a comment -

            Amazingly this patch still applies (with fuzz level 3) to 10.1 head. I didn't test the current correctness however.

            danblack Daniel Black added a comment - Amazingly this patch still applies (with fuzz level 3) to 10.1 head. I didn't test the current correctness however.

            People

              knielsen Kristian Nielsen
              danblack Daniel Black
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.