Details
Description
150 system user NULL Slave_IO 66361 Waiting for master to send event NULL 0.000 |
152 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
153 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
154 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
155 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
156 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
157 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
158 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
159 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
160 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
162 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
161 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
163 system user NULL Slave_worker 0 Write_rows_log_event::write_row(-1) NULL 0.000 |
151 system user NULL Slave_SQL 1090 Reading event from the relay log NULL 0.000 |
2415 root 127.0.0.1:42392 NULL Query 835 Killing slave STOP SLAVE 0.000 |
slave machine configuration:
sync_master_info = 500000 |
sync_relay_log = 100000 |
sync_relay_log_info = 500000 |
slave_parallel_max_queued = 67108864 |
slave_parallel_mode = optimistic
|
slave_parallel_threads = 12 |
Is that normal?
Attachments
Issue Links
- blocks
-
MDEV-30458 Consolidate Serial Replica to Parallel Replica with 1 Worker Thread
-
- Open
-
Something that is missing from the discussion here is that the main reason STOP SLAVE is slow in parallel replication is not because it doesn't roll back running transactions. The main problem is that in many cases parallel replication will replicate all queued events (@@slave_parallel_max_queued).
I think this is a left-over of when only conservative mode existed. The current STOP SLAVE mechanism is seen in do_gco_wait(), it continues until the current GCO is completed (wait_count > entry->stop_count). But in optimistic mode, the GCO can be very large, potentially all queued events, thus stop is delayed longer than needed.
I think a much simpler solution is to fix this, so that stop_count is initialised to largest_started_sub_id, and compared against rgi->gtid_sub_id.
This will not rollback an existing long-running transaction, but I think that's actually good. Forcing stop immediately will cause massive rollback when many threads are configured (Jean Francois Gagné tested using > 1000 threads), which seems undesirable. And forcing stop does not guarantee fast stop anyway, a long-running statement will not be aborted.