[MDEV-20220] Merge 5.7 P_S replication table 'replication_applier_status_by_worker' Created: 2019-07-30 Updated: 2021-04-23 Resolved: 2021-04-08 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Variables |
| Fix Version/s: | 10.6.0 |
| Type: | Task | Priority: | Critical |
| Reporter: | Geoff Montee (Inactive) | Assignee: | Sujatha Sivakumar (Inactive) |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Description |
|
On a replication slave that has parallel replication enabled, if slave_parallel_max_queued is set too small, then the SQL thread's queue can sometimes be too small to assign work to all of the slave worker threads. When this happens, the SQL thread's state will be "Waiting for room in worker thread event", and the idle worker thread's state will be "Waiting for work from SQL thread".
I think it would be useful to add a status variable that gets incremented if slave's parallel worker thread is idle because the SQL thread's queue is full. |
| Comments |
| Comment by Sujatha Sivakumar (Inactive) [ 2020-05-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Andrei, Please review the fix for https://github.com/MariaDB/server/commit/92fbb4e911d700576d9b894637a476b18324f1ed Added a new status variable 'Slave_idle_parallel_worker_count'.
On master:
On slave: One of the workers will execute the above DDL and
Counter gets cleared during START SLAVE.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2020-06-03 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello GeoffMontee A draft patch has been implemented as per the task description. Would it be helpful, if the status variable displays the total Please let us know. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-06-08 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I would go with the most "lazy" one of FLUSH STATUS. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-06-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi sujatha.sivakumar and Elkin,
This raises a question to me. Let's say that you have a slave with 4 parallel worker threads. And let's say that during a 1 second window, 3 of those worker threads were idle because their queues were full. What would the "the total count of idle time spent by workers" be in this case? Would it be "1 second", because all 3 worker threads were idle for the same 1 second time period? Or would it be "3 seconds", because there were 3 workers threads, and each one was idle for 1 second? Or would it be something else? Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-06-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
GeoffMontee And let's say that during a 1 second window, 3 of those worker threads were idle because their queues were full. - that means 3 are 'busy', are not? I am guessing you meant that, as 'idle' that is one with an empty queue. Then we count the total idleness. If all N have nothing to do within time t then the idle period is N*t. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-06-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elkin,
No, I meant what I said for the example--that 3 worker threads are idle because their queues do not have room, and only 1 worker thread is performing work. I am specifically referring to cases like the one shown in the ticket description, where the worker thread has the state "Waiting for work from SQL thread", and the SQL thread has the state "Waiting for room in worker thread event queue":
I don't completely understand why this happens. It seems somewhat contradictory to me. If the worker thread isn't performing work, then shouldn't the worker thread's queue have room? I assume that this happens because of some internal implementation details. Do you know?
OK, I see. So if 3 worker threads are idle for the same 1 second period, then the status variable would say that the total idle time is 3 seconds. Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Andrei Elkin [ 2020-06-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Well, they are not really idle, unless
The other form of idling is by the Driver thread
In your paste the idler is really one Worker.
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-06-16 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Elkin,
Yes, I understand that.
Yes, I understand that. I do not have example text showing "3 idle worker threads, 1 active worker thread". That was a hypothetical example that I came up with to ask you how your new status variable would behave in that scenario. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2020-08-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello serg, GeoffMontee and Elkin As part of current task can we add a new `performance schema` table similar to upstream? performance_schema.replication_applier_status_by_worker
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Geoff Montee (Inactive) [ 2020-08-17 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
That sounds good to me. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2020-09-23 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello serg Can you please review the changes for Patch: https://github.com/MariaDB/server/commit/0922edb1b162c5aa73c11867157e0e118f79bbc6 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2020-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello serg Since fix version is 10.6, ported my 10.5 changes to 10.6 and tested them. Please find the 10.6 patch: https://github.com/MariaDB/server/commit/1d5f0db0158c5a4c2e82abd3b881de46a719412d This patch is for 'replication_applier_status_by_worker' table alone. Current patch will be used as base of Thank you. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sujatha Sivakumar (Inactive) [ 2021-03-30 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hello Sergei, Good Morning. I have addressed your review comments. Please review the new set of changes. Patch: https://github.com/MariaDB/server/commit/9abbb589e6d6f61e03cad0fc5c5aee31077dd00c BuildBot Results: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.6-sujatha Thank you. |