Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26356

Performance regression after dict_sys.mutex removal

Details

    Description

      Users can configure the number of purge threads to use.

      Currently, the existing logic looks at the increasing history length and accordingly increases the purge threads (respecting the user-set threshold). User active user workload causes history length to increase that, in turn, leads to more purge threads to get scheduled.

      Purge operation generates redo-records in addition to user workload redo-records. This dual redo generations cause intense pressure on redo-log which can easily cause redo-log to hit the threshold and there-by-causing jitter in overall throughput (including furious flushing).
      In order to improve the user workload performance and it is important to ensure background activity like a purge is not overdone while ensuring history length is kept in check to avoid an increase in select latency.

      The proposed patch explores an adaptive purge thread scheduling approach based on the redo log fill factor. Logic tends to dynamically increase or decrease the purge threads based on how much redo log is filled, leaving enough space/resources/flush bandwidth for the user workload.

      Testing done so far has revealed quite encouraging results especially with slower disk where-in flush is unable to catch up with redo log generation rate. Increasing in history length doesn't tend to have a regressing effect on the queries.

      Attachments

        1. adaptive_purge.gnumeric
          25 kB
        2. adaptive_purge.tar.gz
          63 kB
        3. arm.pt16.8.png
          arm.pt16.8.png
          147 kB
        4. arm.pt4.slowdisk.png
          arm.pt4.slowdisk.png
          165 kB
        5. arm.pt8.slowdisk.png
          arm.pt8.slowdisk.png
          166 kB
        6. patched2-ssd40.txt
          57 kB
        7. patched2-ssd40-80.txt
          57 kB
        8. patched2-ssd40-80-32.txt
          57 kB
        9. purge-thread=8 + nvme + cpu bound.png
          purge-thread=8 + nvme + cpu bound.png
          38 kB
        10. purge-thread=8 + nvme disk + iobound workload.png
          purge-thread=8 + nvme disk + iobound workload.png
          38 kB
        11. x86.pt16.8.png
          x86.pt16.8.png
          153 kB
        12. x86.pt4.slowdisk.png
          x86.pt4.slowdisk.png
          161 kB
        13. x86.pt8.slowdisk.png
          x86.pt8.slowdisk.png
          166 kB

        Issue Links

          Activity

            I reran bench.sh from adaptive_purge.tar.gz on the SATA SSD, with innodb_log_file_size=10g and innodb_buffer_pool_size=40g (and 12GiB of data files). The previous run should correspond to the file patched-ssd40.txt inside the archive. In adaptive_purge.gnumeric I think that this corresponds to the tab "SSD 40G/40G". In that data, we can see a few 5-second intervals where the throughput dropped to 0, both for the baseline and the originally tested patch.

            With my experimental revert in MDEV-32050, I see no intervals of 0tps, but the average and maximum throughput were in the worst case dropping to half, in the best case doubled. It is possible that MDEV-26827 and MDEV-26055 as well as the other changes in MDEV-32050 have contributed to this. I attached the raw log from this experiment as patched2-ssd40.txt.

            marko Marko Mäkelä added a comment - I reran bench.sh from adaptive_purge.tar.gz on the SATA SSD, with innodb_log_file_size=10g and innodb_buffer_pool_size=40g (and 12GiB of data files). The previous run should correspond to the file patched-ssd40.txt inside the archive. In adaptive_purge.gnumeric I think that this corresponds to the tab "SSD 40G/40G". In that data, we can see a few 5-second intervals where the throughput dropped to 0, both for the baseline and the originally tested patch. With my experimental revert in MDEV-32050 , I see no intervals of 0tps, but the average and maximum throughput were in the worst case dropping to half, in the best case doubled. It is possible that MDEV-26827 and MDEV-26055 as well as the other changes in MDEV-32050 have contributed to this. I attached the raw log from this experiment as patched2-ssd40.txt .

            During the 1-hour test run (most time being spent in a sequential load of data), the 10GiB redo log was overwritten more than 7 times (the LSN grew to 73G):

            2023-09-27 16:06:24 0 [Note] /dev/shm/10.6g/sql/mariadbd: ready for connections.
            Version: '10.6.16-MariaDB'  socket: '/mnt/sbtest/mysqld.sock'  port: 3306  Source distribution
            2023-09-27 17:06:47 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
            2023-09-27 17:06:47 0 [Note] InnoDB: FTS optimize thread exiting.
            2023-09-27 17:06:47 0 [Note] InnoDB: to purge 26366271 transactions
            2023-09-27 17:12:19 0 [Note] InnoDB: Starting shutdown...
            2023-09-27 17:12:19 0 [Note] InnoDB: Dumping buffer pool(s) to /mnt/sbtest/ib_buffer_pool
            2023-09-27 17:12:19 0 [Note] InnoDB: Buffer pool(s) dump completed at 230927 17:12:19
            2023-09-27 17:12:19 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
            2023-09-27 17:12:19 0 [Note] InnoDB: Shutdown completed; log sequence number 73559731602; transaction id 173642215
            2023-09-27 17:12:19 0 [Note] /dev/shm/10.6g/sql/mariadbd: Shutdown complete
            

            This is clearly a misconfigured (on purpose) system; the innodb_log_file_size should be large enough to accommodate 1 or 2 hours of writes.

            marko Marko Mäkelä added a comment - During the 1-hour test run (most time being spent in a sequential load of data), the 10GiB redo log was overwritten more than 7 times (the LSN grew to 73G): 2023-09-27 16:06:24 0 [Note] /dev/shm/10.6g/sql/mariadbd: ready for connections. Version: '10.6.16-MariaDB' socket: '/mnt/sbtest/mysqld.sock' port: 3306 Source distribution 2023-09-27 17:06:47 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown 2023-09-27 17:06:47 0 [Note] InnoDB: FTS optimize thread exiting. 2023-09-27 17:06:47 0 [Note] InnoDB: to purge 26366271 transactions 2023-09-27 17:12:19 0 [Note] InnoDB: Starting shutdown... 2023-09-27 17:12:19 0 [Note] InnoDB: Dumping buffer pool(s) to /mnt/sbtest/ib_buffer_pool 2023-09-27 17:12:19 0 [Note] InnoDB: Buffer pool(s) dump completed at 230927 17:12:19 2023-09-27 17:12:19 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1" 2023-09-27 17:12:19 0 [Note] InnoDB: Shutdown completed; log sequence number 73559731602; transaction id 173642215 2023-09-27 17:12:19 0 [Note] /dev/shm/10.6g/sql/mariadbd: Shutdown complete This is clearly a misconfigured (on purpose) system; the innodb_log_file_size should be large enough to accommodate 1 or 2 hours of writes.

            I repeated the experiment with innodb_log_file_size=80g; the raw data is in patched2-ssd40-80.txt. Now the history during the workload would grow 5×, to 110794627 transactions, and also the size of the data directory (excluding the redo log) would grow from 12GiB to 47GiB. The slow shutdown reached a history size of 47105527 in about 18½ minutes. Only about 1½ CPUs are busy; the workload is I/O bound because the 47GiB of data exceeds the buffer pool size of 40GiB.

            I think that better performance could be achieved by not setting innodb_max_dirty_pages_pct_lwm=10 or touching the LRU parameters, and perhaps setting innodb_purge_threads=32. After all, we are testing with 10, 20, 40, 80, and 160 concurrent writer connections.

            marko Marko Mäkelä added a comment - I repeated the experiment with innodb_log_file_size=80g ; the raw data is in patched2-ssd40-80.txt . Now the history during the workload would grow 5×, to 110794627 transactions, and also the size of the data directory (excluding the redo log) would grow from 12GiB to 47GiB. The slow shutdown reached a history size of 47105527 in about 18½ minutes. Only about 1½ CPUs are busy; the workload is I/O bound because the 47GiB of data exceeds the buffer pool size of 40GiB. I think that better performance could be achieved by not setting innodb_max_dirty_pages_pct_lwm=10 or touching the LRU parameters, and perhaps setting innodb_purge_threads=32 . After all, we are testing with 10, 20, 40, 80, and 160 concurrent writer connections.

            I did one more run, with 40GiB buffer pool, 80GiB log, no flushing related parameters. I intended this to employ 32 purge threads, hence the file name for the raw data: patched2-ssd40-80-32.txt. But, it in fact was the same default innodb_purge_threads=4, because the script had passed the parameter twice.

            This time, the history list grew to 89 million transactions, and the slow shutdown took almost 25 minutes:

            2023-09-27 19:27:25 0 [Note] /dev/shm/10.6g/sql/mariadbd: ready for connections.
            Version: '10.6.16-MariaDB'  socket: '/mnt/sbtest/mysqld.sock'  port: 3306  Source distribution
            2023-09-27 20:24:17 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown
            2023-09-27 20:24:17 0 [Note] InnoDB: FTS optimize thread exiting.
            2023-09-27 20:24:17 0 [Note] InnoDB: to purge 89374144 transactions
            2023-09-27 20:49:08 0 [Note] InnoDB: Starting shutdown...
            2023-09-27 20:49:08 0 [Note] InnoDB: Dumping buffer pool(s) to /mnt/sbtest/ib_buffer_pool
            2023-09-27 20:49:09 0 [Note] InnoDB: Buffer pool(s) dump completed at 230927 20:49:09
            2023-09-27 20:49:09 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
            2023-09-27 20:49:09 0 [Note] InnoDB: Shutdown completed; log sequence number 110644210169; transaction id 359504570
            2023-09-27 20:49:09 0 [Note] /dev/shm/10.6g/sql/mariadbd: Shutdown complete
            

            The slow shutdown (which automatically switches to the maximum innodb_purge_threads=32) fully employed only about 1.7 CPU cores. I checked some perf record while it was running, and I noticed that quite some CPU time was being spent on table lookup. Maybe each purge task should maintain a larger local cache that maps table ID to table pointers. Eviction would be prevented by reference counts. This test uses 100 tables of 1½ million rows each.

            Unlike my test in MDEV-32050, this test oltp_update_non_index should not involve any bottleneck due to secondary indexes (to be fixed by MDEV-17598).

            marko Marko Mäkelä added a comment - I did one more run, with 40GiB buffer pool, 80GiB log, no flushing related parameters. I intended this to employ 32 purge threads, hence the file name for the raw data: patched2-ssd40-80-32.txt . But, it in fact was the same default innodb_purge_threads=4 , because the script had passed the parameter twice. This time, the history list grew to 89 million transactions, and the slow shutdown took almost 25 minutes: 2023-09-27 19:27:25 0 [Note] /dev/shm/10.6g/sql/mariadbd: ready for connections. Version: '10.6.16-MariaDB' socket: '/mnt/sbtest/mysqld.sock' port: 3306 Source distribution 2023-09-27 20:24:17 0 [Note] /dev/shm/10.6g/sql/mariadbd (initiated by: root[root] @ localhost []): Normal shutdown 2023-09-27 20:24:17 0 [Note] InnoDB: FTS optimize thread exiting. 2023-09-27 20:24:17 0 [Note] InnoDB: to purge 89374144 transactions 2023-09-27 20:49:08 0 [Note] InnoDB: Starting shutdown... 2023-09-27 20:49:08 0 [Note] InnoDB: Dumping buffer pool(s) to /mnt/sbtest/ib_buffer_pool 2023-09-27 20:49:09 0 [Note] InnoDB: Buffer pool(s) dump completed at 230927 20:49:09 2023-09-27 20:49:09 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1" 2023-09-27 20:49:09 0 [Note] InnoDB: Shutdown completed; log sequence number 110644210169; transaction id 359504570 2023-09-27 20:49:09 0 [Note] /dev/shm/10.6g/sql/mariadbd: Shutdown complete The slow shutdown (which automatically switches to the maximum innodb_purge_threads=32 ) fully employed only about 1.7 CPU cores. I checked some perf record while it was running, and I noticed that quite some CPU time was being spent on table lookup. Maybe each purge task should maintain a larger local cache that maps table ID to table pointers. Eviction would be prevented by reference counts. This test uses 100 tables of 1½ million rows each. Unlike my test in MDEV-32050 , this test oltp_update_non_index should not involve any bottleneck due to secondary indexes (to be fixed by MDEV-17598 ).

            In MDEV-32050, I confirmed that this bottleneck is related to table lookups. Up to 32 tables being modified in the test is no problem.

            marko Marko Mäkelä added a comment - In MDEV-32050 , I confirmed that this bottleneck is related to table lookups. Up to 32 tables being modified in the test is no problem.

            People

              marko Marko Mäkelä
              krunalbauskar Krunal Bauskar
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.