[MDEV-7202] [PATCH] additional statistics for parallel replication - Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Minor
Resolution: Duplicate
Fix Version/s: N/A
Component/s: Replication
Labels:
- parallelslave

Description

in ~~MDEV-6680~~ I thought some additional status would be helpful.

Attached patch adds a total status for all threads for the Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending.

Rather/in addition to totals, would push a per thread status as slave_parallel_eventqueue_0_size be acceptable?

Anything else useful to capture/graph here?

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Slave_parallel_eventqueue.patch
3 kB
2014-11-25 15:23

Issue Links

is duplicated by

MDEV-7340 [PATCH] parallel replication status variables

Open

Activity

Ascending order - Click to sort in descending order

Kristian Nielsen added a comment - 2015-02-04 11:32

Ok, I (finally) got to look at this patch.

> Attached patch adds a total status for all threads for the
> Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending.

The patch exposes the loc_qev_size and qev_free_pending fields as status
variables. I don't really see how this is useful?

This is completely internal detail to the memory management of the parallel
replication. It is the size of an internal free list of buffers that each
thread keeps for handling event queueing efficiently. Its size does not seem
to mean much for how parallel replication is working?

> Rather/in addition to totals, would push a per thread status as
> slave_parallel_eventqueue_0_size be acceptable?

Do you mean here that there would be N status variables, one for each worker
thread? Maybe there are better places to expose per-thread statistics, like
performance schema or information_schema? But as I said, it seems to me that
these particular values will be more confusing than useful.

> I thought some additional status would be helpful.
> Anything else useful to capture/graph here?

I 100% agree that more monitoring of parallel replication is needed.

With respect to size of event queues, the issue here is that the code does not
update the queue size after every event execution, in order to reduce lock
contention. So the information is not easily available in the current code.

I'm trying to think of a way to get size of pending events without introducing
additional locking overhead.

The SQL driver thread takes LOCK_rpl_thread whenever an event is queued. And
the worker thread takes LOCK_parallel_entry whenever a new event group
(transaction) is started. So maybe we could do something while these locks are
held?

Under LOCK_parallel_entry, a worker thread could update a counter of size of
events processsed but not yet freed (in class rpl_parallel_entry). And under
LOCK_rpl_thread, the SQL driver thread could increment per-thread size of
events queued. And somehow the status variable would combine these to obtain
the right value. But it sounds a bit too complicated... would be nice to come
up with a simpler idea to add good monitoring of parallel replication status
(not just queue size).

In general, I'm unsure how to balance the need for more monitoring against the
overhead of locking/atomics needed to maintain such monitoring. There is
already significant locking overhead in parallel replication, and not much
benchmarking has been done to understand the significance of this overhead.

Kristian Nielsen added a comment - 2015-02-04 11:32 Ok, I (finally) got to look at this patch. > Attached patch adds a total status for all threads for the > Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending. The patch exposes the loc_qev_size and qev_free_pending fields as status variables. I don't really see how this is useful? This is completely internal detail to the memory management of the parallel replication. It is the size of an internal free list of buffers that each thread keeps for handling event queueing efficiently. Its size does not seem to mean much for how parallel replication is working? > Rather/in addition to totals, would push a per thread status as > slave_parallel_eventqueue_0_size be acceptable? Do you mean here that there would be N status variables, one for each worker thread? Maybe there are better places to expose per-thread statistics, like performance schema or information_schema? But as I said, it seems to me that these particular values will be more confusing than useful. > I thought some additional status would be helpful. > Anything else useful to capture/graph here? I 100% agree that more monitoring of parallel replication is needed. With respect to size of event queues, the issue here is that the code does not update the queue size after every event execution, in order to reduce lock contention. So the information is not easily available in the current code. I'm trying to think of a way to get size of pending events without introducing additional locking overhead. The SQL driver thread takes LOCK_rpl_thread whenever an event is queued. And the worker thread takes LOCK_parallel_entry whenever a new event group (transaction) is started. So maybe we could do something while these locks are held? Under LOCK_parallel_entry, a worker thread could update a counter of size of events processsed but not yet freed (in class rpl_parallel_entry). And under LOCK_rpl_thread, the SQL driver thread could increment per-thread size of events queued. And somehow the status variable would combine these to obtain the right value. But it sounds a bit too complicated... would be nice to come up with a simpler idea to add good monitoring of parallel replication status (not just queue size). In general, I'm unsure how to balance the need for more monitoring against the overhead of locking/atomics needed to maintain such monitoring. There is already significant locking overhead in parallel replication, and not much benchmarking has been done to understand the significance of this overhead.

Kristian Nielsen added a comment - 2015-02-04 11:42

I tried to assign the issue back to user Daniel Black, but that did not seem possible

Kristian Nielsen added a comment - 2015-02-04 11:42 I tried to assign the issue back to user Daniel Black, but that did not seem possible

Daniel Black added a comment - 2015-04-03 07:04

pivanof suggested much better options in MDEV-7340 so lets close this and continue there.

Daniel Black added a comment - 2015-04-03 07:04 pivanof suggested much better options in MDEV-7340 so lets close this and continue there.

MariaDB Server

[PATCH] additional statistics for parallel replication - Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration