> Attached patch adds a total status for all threads for the
> Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending.
The patch exposes the loc_qev_size and qev_free_pending fields as status
variables. I don't really see how this is useful?
This is completely internal detail to the memory management of the parallel
replication. It is the size of an internal free list of buffers that each
thread keeps for handling event queueing efficiently. Its size does not seem
to mean much for how parallel replication is working?
> Rather/in addition to totals, would push a per thread status as
> slave_parallel_eventqueue_0_size be acceptable?
Do you mean here that there would be N status variables, one for each worker
thread? Maybe there are better places to expose per-thread statistics, like
performance schema or information_schema? But as I said, it seems to me that
these particular values will be more confusing than useful.
> I thought some additional status would be helpful.
> Anything else useful to capture/graph here?
I 100% agree that more monitoring of parallel replication is needed.
With respect to size of event queues, the issue here is that the code does not
update the queue size after every event execution, in order to reduce lock
contention. So the information is not easily available in the current code.
I'm trying to think of a way to get size of pending events without introducing
additional locking overhead.
The SQL driver thread takes LOCK_rpl_thread whenever an event is queued. And
the worker thread takes LOCK_parallel_entry whenever a new event group
(transaction) is started. So maybe we could do something while these locks are
held?
Under LOCK_parallel_entry, a worker thread could update a counter of size of
events processsed but not yet freed (in class rpl_parallel_entry). And under
LOCK_rpl_thread, the SQL driver thread could increment per-thread size of
events queued. And somehow the status variable would combine these to obtain
the right value. But it sounds a bit too complicated... would be nice to come
up with a simpler idea to add good monitoring of parallel replication status
(not just queue size).
In general, I'm unsure how to balance the need for more monitoring against the
overhead of locking/atomics needed to maintain such monitoring. There is
already significant locking overhead in parallel replication, and not much
benchmarking has been done to understand the significance of this overhead.
Kristian Nielsen
added a comment - Ok, I (finally) got to look at this patch.
> Attached patch adds a total status for all threads for the
> Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending.
The patch exposes the loc_qev_size and qev_free_pending fields as status
variables. I don't really see how this is useful?
This is completely internal detail to the memory management of the parallel
replication. It is the size of an internal free list of buffers that each
thread keeps for handling event queueing efficiently. Its size does not seem
to mean much for how parallel replication is working?
> Rather/in addition to totals, would push a per thread status as
> slave_parallel_eventqueue_0_size be acceptable?
Do you mean here that there would be N status variables, one for each worker
thread? Maybe there are better places to expose per-thread statistics, like
performance schema or information_schema? But as I said, it seems to me that
these particular values will be more confusing than useful.
> I thought some additional status would be helpful.
> Anything else useful to capture/graph here?
I 100% agree that more monitoring of parallel replication is needed.
With respect to size of event queues, the issue here is that the code does not
update the queue size after every event execution, in order to reduce lock
contention. So the information is not easily available in the current code.
I'm trying to think of a way to get size of pending events without introducing
additional locking overhead.
The SQL driver thread takes LOCK_rpl_thread whenever an event is queued. And
the worker thread takes LOCK_parallel_entry whenever a new event group
(transaction) is started. So maybe we could do something while these locks are
held?
Under LOCK_parallel_entry, a worker thread could update a counter of size of
events processsed but not yet freed (in class rpl_parallel_entry). And under
LOCK_rpl_thread, the SQL driver thread could increment per-thread size of
events queued. And somehow the status variable would combine these to obtain
the right value. But it sounds a bit too complicated... would be nice to come
up with a simpler idea to add good monitoring of parallel replication status
(not just queue size).
In general, I'm unsure how to balance the need for more monitoring against the
overhead of locking/atomics needed to maintain such monitoring. There is
already significant locking overhead in parallel replication, and not much
benchmarking has been done to understand the significance of this overhead.
Ok, I (finally) got to look at this patch.
> Attached patch adds a total status for all threads for the
> Slave_parallel_eventqueue_size/Slave_parallel_eventqueue_freepending.
The patch exposes the loc_qev_size and qev_free_pending fields as status
variables. I don't really see how this is useful?
This is completely internal detail to the memory management of the parallel
replication. It is the size of an internal free list of buffers that each
thread keeps for handling event queueing efficiently. Its size does not seem
to mean much for how parallel replication is working?
> Rather/in addition to totals, would push a per thread status as
> slave_parallel_eventqueue_0_size be acceptable?
Do you mean here that there would be N status variables, one for each worker
thread? Maybe there are better places to expose per-thread statistics, like
performance schema or information_schema? But as I said, it seems to me that
these particular values will be more confusing than useful.
> I thought some additional status would be helpful.
> Anything else useful to capture/graph here?
I 100% agree that more monitoring of parallel replication is needed.
With respect to size of event queues, the issue here is that the code does not
update the queue size after every event execution, in order to reduce lock
contention. So the information is not easily available in the current code.
I'm trying to think of a way to get size of pending events without introducing
additional locking overhead.
The SQL driver thread takes LOCK_rpl_thread whenever an event is queued. And
the worker thread takes LOCK_parallel_entry whenever a new event group
(transaction) is started. So maybe we could do something while these locks are
held?
Under LOCK_parallel_entry, a worker thread could update a counter of size of
events processsed but not yet freed (in class rpl_parallel_entry). And under
LOCK_rpl_thread, the SQL driver thread could increment per-thread size of
events queued. And somehow the status variable would combine these to obtain
the right value. But it sounds a bit too complicated... would be nice to come
up with a simpler idea to add good monitoring of parallel replication status
(not just queue size).
In general, I'm unsure how to balance the need for more monitoring against the
overhead of locking/atomics needed to maintain such monitoring. There is
already significant locking overhead in parallel replication, and not much
benchmarking has been done to understand the significance of this overhead.