InnoDB in MySQL 5.6 (and MariaDB 10.0) introduced multiple inodb_purge_threads (4 by default).
When there are few modifications that require purging, the purge threads will either be mostly sitting idle, or they will form ‘thundering herds’ whenever some work turns up. In such a scenario, it would be better to have only one purge worker.
With this work, the poorly working parameter innodb_max_purge_lag should be revised or removed as well. The logic behind that parameter does not take into account the actual number of purgeable records; committed modifications cannot be purged while old read views exist that can access the old data.
Benchmarking is the key
This work must involve creating a benchmark to measure purge performance under various conditions:
Ensure that everything has been purged (history list length is 0).
Block purge by issuing START TRANSACTION WITH CONSISTENT SNAPSHOT from a new connection.
From other connections, perform modifications that create workload for purge. Try different variations: tables with many indexes, many small transactions on many tables, large transactions on a single table.
Commit or disconnect the connection that blocked purge.
Measure the time to reach History list length 0 in SHOW ENGINE INNODB STATUS. (Some workaround for the batching logic related to innodb_purge_rseg_truncate_frequency may be needed.)
Alternatively, measure the time to shutdown with innodb_fast_shutdown=0. But be sure to separate the impact of innodb_change_buffering. (Slow shutdown runs both purge and change buffer merge to completion.)
We might also want to run with innodb_force_recovery=2 to disable the purge altogether, and then restart without that parameter and initiate a slow shutdown, to measure how long it takes to process all the undo log that was accumulated.
The first step should be to replace the purge threads with the common work queue that will be implemented in MDEV-16264. After that, it should become meaningful to measure purge performance and the meaningfulness of the parameters.
Apart from removing or revising innodb_max_purge_lag, we should also consider if it makes sense to have the parameter innodb_purge_batch_size. At the very least, all parameters related to purge should be settable at runtime.
Marko Mäkelä
added a comment - The first step should be to replace the purge threads with the common work queue that will be implemented in MDEV-16264 . After that, it should become meaningful to measure purge performance and the meaningfulness of the parameters.
Apart from removing or revising innodb_max_purge_lag , we should also consider if it makes sense to have the parameter innodb_purge_batch_size . At the very least, all parameters related to purge should be settable at runtime.
It looks like we may have to reduce the scope of this task, so that something can be delivered in the 10.5 release.
wlad mentioned that we could dynamically scale the number of purge workers to make a slow shutdown (innodb_fast_shutdown=0) faster. That would not require any interface changes and could thus be done even after a feature freeze. This is now filed as MDEV-21751.
Marko Mäkelä
added a comment - - edited It looks like we may have to reduce the scope of this task, so that something can be delivered in the 10.5 release.
wlad mentioned that we could dynamically scale the number of purge workers to make a slow shutdown ( innodb_fast_shutdown=0 ) faster. That would not require any interface changes and could thus be done even after a feature freeze. This is now filed as MDEV-21751 .
I question whether "Measure the time to reach History list length 0" is the appropriate metric. I fear that minimizing that time will throw all CPU and I/O resources at the purge task while starving queries of such resources.
Rick James
added a comment - I question whether "Measure the time to reach History list length 0" is the appropriate metric. I fear that minimizing that time will throw all CPU and I/O resources at the purge task while starving queries of such resources.
rjasdfiii, my note in the Description was only giving ideas for benchmarks. For dynamically tuning the system, we would have to use multiple metrics and parameters, such as innodb_io_capacity and innodb_io_capacity_max.
Marko Mäkelä
added a comment - rjasdfiii , my note in the Description was only giving ideas for benchmarks. For dynamically tuning the system, we would have to use multiple metrics and parameters, such as innodb_io_capacity and innodb_io_capacity_max .
As highlighted in MDEV-22577, there is a design problem with the purge, which should be addressed as well: Much of the time, only the coordinator is active, gathering work to be distributed to workers. Could we eliminate the coordinator, distributing its work to symmetric worker tasks and make it scale better in that way?
Marko Mäkelä
added a comment - As highlighted in MDEV-22577 , there is a design problem with the purge, which should be addressed as well: Much of the time, only the coordinator is active, gathering work to be distributed to workers. Could we eliminate the coordinator, distributing its work to symmetric worker tasks and make it scale better in that way?
1. sysbench --db-driver=mysql --mysql-host=192.168.10.4 --mysql-port=$port --threads=128 \
--warmup-time=6 --time=30 \
$HOME/sysbench/src/lua/oltp_update_non_index.lua --tables=50 --table_size=1000000 run
2. SIGTERM on mariadb
3. perf record -g -F 30 -o ../mariadb-sys-purge-shutdown2.perf -p $(pidof mysqld) – sleep 30
(30 seconds seemed to be about the time from signal to termination based on previous runs)
2020-06-04 19:17:02 0 [Note] InnoDB: Uses system mutexes
2020-06-04 19:17:02 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-06-04 19:17:02 0 [Note] InnoDB: Number of pools: 1
2020-06-04 19:17:02 0 [Note] InnoDB: Using POWER8 crc32 instructions
2020-06-04 19:17:02 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2020-06-04 19:17:02 0 [Note] InnoDB: Initializing buffer pool, total size = 28991029248, chunk size = 134217728
2020-06-04 19:17:10 0 [Note] InnoDB: Completed initialization of buffer pool
2020-06-04 19:17:10 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-06-04 19:17:11 0 [Note] InnoDB: 128 rollback segments are active.
2020-06-04 21:49:51 0 [Note] InnoDB: Using Linux native AIO
2020-06-04 21:49:51 0 [Note] InnoDB: Uses system mutexes
2020-06-04 21:49:51 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2020-06-04 21:49:51 0 [Note] InnoDB: Number of pools: 1
2020-06-04 21:49:51 0 [Note] InnoDB: Using POWER8 crc32 instructions
2020-06-04 21:49:51 0 [Note] mysqld: O_TMPFILE is not supported on /tmp (disabling future attempts)
2020-06-04 21:49:51 0 [Note] InnoDB: Initializing buffer pool, total size = 28991029248, chunk size = 134217728
2020-06-04 21:50:00 0 [Note] InnoDB: Completed initialization of buffer pool
2020-06-04 21:50:00 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-06-04 21:50:00 0 [Note] InnoDB: 128 rollback segments are active.
Daniel Black
added a comment - https://www.kernel.org/doc/html/latest/accounting/psi.html "Actually the I/O pressure could be something for controlling the rate of purging transaction history" (like MDEV-24670 )
Will any new GLOBAL STATUS values be added to help users gain insight? Perhaps something like a "exponential moving average" of, say, the history_length.
Rick James
added a comment - Will any new GLOBAL STATUS values be added to help users gain insight? Perhaps something like a "exponential moving average" of, say, the history_length.
The first step should be to replace the purge threads with the common work queue that will be implemented in
MDEV-16264. After that, it should become meaningful to measure purge performance and the meaningfulness of the parameters.Apart from removing or revising innodb_max_purge_lag, we should also consider if it makes sense to have the parameter innodb_purge_batch_size. At the very least, all parameters related to purge should be settable at runtime.