Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34178

Performance regression with IO-bound insert benchmark

Details

    Description

      This is similar to regressions I found with CPU-bound sysbench (MDEV-33966), but I opened a separate issue because this occurs with IO-bound Insert Benchmark.

      I ran an IO-bound Insert Benchmark (IO-bound because the working set and database are much larger than memory) on a small server (8 cores, 16G RAM) to compare MariaDB LTS releases for 10.2, 10.3, 10.4, 10.5, 10.6, 10.11 and upcoming 11.4 with MySQL 5.6, 5.7 and 8.0.

      A result for a CPU-bound benchmark is here which covers a test that uses a smaller database that can be cached and isn't IO-bound. In the cased case, MariaDB doesn't have large regressions from 10.2 to 11.4. Here, with an IO-bound setup there are regressions.

      The IO-bound tests were run for 1 and 4 clients:

      The way I label results context on the DBMS version and my.cnf

      • ma101107_rel.cz11a_bee - MariaDB 10.11.7 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none
      • ma101107_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and innodb_change_buffering=none
      • ma110401_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and the InnoDB change buffer has been removed
      • my8036_rel.cz11a_bee - MySQL 8.0.36 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=all
      • my8036_rel.cz11d_bee - MySQL 8.0.36 with the cz11d_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none

      Note that the cz11d_bee config for MySQL 8.0.36 is similar to the cz11a_bee config for MariaDB.

      My claim about the regressions is based on the following

      • start with the MariaDB vs MySQL comparison for 1 client and 4 clients and the results for MySQL 8.0.36 (my8036_rel.cz11a_bee and my8036_rel.cz11d_bee) are much better than for MariaDB 10.11.7 (ma101107_rel.cz11a_bee) and 11.4.1 (ma110401_rel.cz11b_bee).
      • then look at results for MariaDB LTS releases with 1 client and 4 clients and see some regressions from ma100433 (10.4.33) to ma100524 (10.5.24) and larger regressions from ma100524 to ma100617 (10.6.17)
      • then look at the HW metrics for MariaDB LTS releases from the 1 client setup. These are values from vmstat and iostat normalized by query and insert rates to understand HW efficiency. For the write heavy benchmark steps from 10.4.33 through 10.6.17 I see a ~20% increase in context switches per insert (cspq) and ~20% decrease in CPU per insert (cpupq) – see for l.i1 and for l.i2. Most of the change is from 10.5.24 to 10.6.17. I assume this is a result of the changes in 10.6 to replace some mutexes and rw-locks from spinning to not-spinning. So there is less CPU burned, but more lock waiters are going to sleep.
      • the read-write benchmark steps also show a similar pattern as the write rate increases. See the 1 client results for range queries and point queries when the background write rate is 1000/s. Although here I see an increase in cspq (more context switches per query == more threads going to sleep) but not a large decrease in cpupq (CPU per query)

      Attachments

        1. image-2024-06-26-16-08-14-626.png
          image-2024-06-26-16-08-14-626.png
          81 kB
        2. MDEV-34178_g1_g2.pdf
          52 kB
        3. MDEV-34178.pdf
          76 kB
        4. test_output_sudo2.txt
          0.4 kB
        5. test_output1.txt
          3 kB
        6. test_output3.txt
          77 kB
        7. update_index_10.11.txt
          106 kB
        8. update_index_10.4.txt
          103 kB

        Issue Links

          Activity

            mdcallag Mark Callaghan created issue -
            mdcallag Mark Callaghan made changes -
            Field Original Value New Value
            Description This is similar to regressions I found with CPU-bound sysbench ([MDEV-33966|https://jira.mariadb.org/browse/MDEV-33966]), but I opened a separate issue because this occurs with IO-bound Insert Benchmark.

            I ran an IO-bound Insert Benchmark (IO-bound because the working set and database are much larger than memory) on a small server (8 cores, 16G RAM) to compare MariaDB LTS releases for 10.2, 10.3, 10.4, 10.5, 10.6, 10.11 and upcoming 11.4 with MySQL 5.6, 5.7 and 8.0.

            A result for a CPU-bound benchmark [is here|https://smalldatum.blogspot.com/2024/05/the-insert-benchmark-mariadb-mysql-new.html] which covers a test that uses a smaller database that can be cached and isn't IO-bound. In the cased case, MariaDB doesn't have large regressions from 10.2 to 11.4. Here, with an IO-bound setup there are regressions.

            The IO-bound tests were run for 1 and 4 clients:
            * all DBMS - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.all/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.all/all.html#summary]
            * all MariaDB LTS - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma/all.html#summary]
            * all MySQL - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.my/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.my/all.html#summary]
            * MariaDB vs MySQL - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.mavsmy/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.mavsmy/all.html#summary]
            * MariaDB 10.11 - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma10/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma10/all.html#summary]
            * MariaDB 11.4 - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma11/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma11/all.html#summary]

            The way I label results context on the DBMS version and my.cnf
            * ma101107_rel.cz11a_bee - MariaDB 10.11.7 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none
            * ma101107_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and innodb_change_buffering=none
            * ma110401_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and the InnoDB change buffer has been removed
            * my8036_rel.cz11a_bee - MySQL 8.0.36 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=all
            * my8036_rel.cz11d_bee - MySQL 8.0.36 with the cz11d_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none

            Note that the cz11d_bee config for MySQL 8.0.36 is similar to the cz11a_bee config for MariaDB.

            My claim about the regressions is based on the following
            * start with the MariaDB vs MySQL comparison for [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.mavsmy/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.mavsmy/all.html#summary] and the results for MySQL 8.0.36 (my8036_rel.cz11a_bee and my8036_rel.cz11d_bee) are much better than for MariaDB 10.11.7 (ma101107_rel.cz11a_bee) and 11.4.1 (ma110401_rel.cz11b_bee).
            * then look at results for MariaDB LTS releases with [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma/all.html#summary] and see some regressions from ma100433 (10.4.33) to ma100524 (10.5.24) and larger regressions from ma100524 to ma100617 (10.6.17)
            * then look at the HW metrics for MariaDB LTS releases from the 1 client setup. These are values from vmstat and iostat normalized by query and insert rates to understand HW efficiency. For the write heavy benchmark steps from 10.4.33 through 10.6.17 I see a ~20% increase in context switches per insert (cspq) and ~20% decrease in CPU per insert (cpupq) for the write heavy benchmark steps -- see [for l.i1|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#l.i1.metrics] and [for l.i2|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#l.i2.metrics]. Most of the change is from 10.5.24 to 10.6.17. I assume this is a result of the changes in 10.6 to replace some mutexes and rw-locks from spinning to not-spinning. So there is less CPU burned, but more lock waiters are going to sleep.
            * the read-write benchmark steps also show a similar pattern as the write rate increases. See the 1 client results for [range queries|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#qr1000.L5.metrics] and [point queries|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#qp1000.L6.metrics] when the background write rate is 1000/s. Although here I see an increase in cspq (more context switches per query == more threads going to sleep) but not a large decrease in cpupq (CPU per query)
            This is similar to regressions I found with CPU-bound sysbench ([MDEV-33966|https://jira.mariadb.org/browse/MDEV-33966]), but I opened a separate issue because this occurs with IO-bound Insert Benchmark.

            I ran an IO-bound Insert Benchmark (IO-bound because the working set and database are much larger than memory) on a small server (8 cores, 16G RAM) to compare MariaDB LTS releases for 10.2, 10.3, 10.4, 10.5, 10.6, 10.11 and upcoming 11.4 with MySQL 5.6, 5.7 and 8.0.

            A result for a CPU-bound benchmark [is here|https://smalldatum.blogspot.com/2024/05/the-insert-benchmark-mariadb-mysql-new.html] which covers a test that uses a smaller database that can be cached and isn't IO-bound. In the cased case, MariaDB doesn't have large regressions from 10.2 to 11.4. Here, with an IO-bound setup there are regressions.

            The IO-bound tests were run for 1 and 4 clients:
            * all DBMS - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.all/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.all/all.html#summary]
            * all MariaDB LTS - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma/all.html#summary]
            * all MySQL - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.my/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.my/all.html#summary]
            * MariaDB vs MySQL - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.mavsmy/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.mavsmy/all.html#summary]
            * MariaDB 10.11 - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma10/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma10/all.html#summary]
            * MariaDB 11.4 - [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma11/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma11/all.html#summary]

            The way I label results context on the DBMS version and my.cnf
            * ma101107_rel.cz11a_bee - MariaDB 10.11.7 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none
            * ma101107_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and innodb_change_buffering=none
            * ma110401_rel.cz11b_bee - MariaDB 10.11.7 with the cz11b_bee config that uses innodb_flush_method=O_DIRECT and the InnoDB change buffer has been removed
            * my8036_rel.cz11a_bee - MySQL 8.0.36 with the cz11a_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=all
            * my8036_rel.cz11d_bee - MySQL 8.0.36 with the cz11d_bee config that uses innodb_flush_method=O_DIRECT_NO_FSYNC and innodb_change_buffering=none

            Note that the cz11d_bee config for MySQL 8.0.36 is similar to the cz11a_bee config for MariaDB.

            My claim about the regressions is based on the following
            * start with the MariaDB vs MySQL comparison for [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.mavsmy/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.mavsmy/all.html#summary] and the results for MySQL 8.0.36 (my8036_rel.cz11a_bee and my8036_rel.cz11d_bee) are much better than for MariaDB 10.11.7 (ma101107_rel.cz11a_bee) and 11.4.1 (ma110401_rel.cz11b_bee).
            * then look at results for MariaDB LTS releases with [1 client|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#summary] and [4 clients|https://mdcallag.github.io/reports/24_05_15.ib.4u.1tno.io.amd3.ma/all.html#summary] and see some regressions from ma100433 (10.4.33) to ma100524 (10.5.24) and larger regressions from ma100524 to ma100617 (10.6.17)
            * then look at the HW metrics for MariaDB LTS releases from the 1 client setup. These are values from vmstat and iostat normalized by query and insert rates to understand HW efficiency. For the write heavy benchmark steps from 10.4.33 through 10.6.17 I see a ~20% increase in context switches per insert (cspq) and ~20% decrease in CPU per insert (cpupq) -- see [for l.i1|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#l.i1.metrics] and [for l.i2|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#l.i2.metrics]. Most of the change is from 10.5.24 to 10.6.17. I assume this is a result of the changes in 10.6 to replace some mutexes and rw-locks from spinning to not-spinning. So there is less CPU burned, but more lock waiters are going to sleep.
            * the read-write benchmark steps also show a similar pattern as the write rate increases. See the 1 client results for [range queries|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#qr1000.L5.metrics] and [point queries|https://mdcallag.github.io/reports/24_05_15.ib.1u.1tno.io.amd3.ma/all.html#qp1000.L6.metrics] when the background write rate is 1000/s. Although here I see an increase in cspq (more context switches per query == more threads going to sleep) but not a large decrease in cpupq (CPU per query)
            serg Sergei Golubchik made changes -
            Priority Minor [ 4 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Assignee Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.4 [ 29301 ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            debarun Debarun Banerjee made changes -
            Assignee Debarun Banerjee [ JIRAUSER54513 ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            issue.field.resolutiondate 2024-06-19 11:32:16.0 2024-06-19 11:32:15.851
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Component/s Server [ 13907 ]
            Fix Version/s 10.6.19 [ 29833 ]
            Fix Version/s 10.11.9 [ 29834 ]
            Fix Version/s 11.1.6 [ 29835 ]
            Fix Version/s 11.2.5 [ 29836 ]
            Fix Version/s 11.4.3 [ 29837 ]
            Fix Version/s 11.5.2 [ 29838 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.4 [ 29301 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            axel Axel Schwenke made changes -
            Attachment MDEV-34178_g1_g2.pdf [ 73680 ]
            axel Axel Schwenke made changes -
            Attachment MDEV-34178.pdf [ 73682 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            wlad Vladislav Vaintroub made changes -
            Attachment image-2024-06-26-16-08-14-626.png [ 73701 ]
            wlad Vladislav Vaintroub made changes -
            Attachment update_index_10.4.txt [ 73702 ]
            Attachment update_index_10.11.txt [ 73703 ]
            wlad Vladislav Vaintroub made changes -
            kirill.perov@mariadb.com Kirill Perov (Inactive) made changes -
            Assignee Marko Mäkelä [ marko ] Kirill Perov [ JIRAUSER51446 ]
            kirill.perov@mariadb.com Kirill Perov (Inactive) made changes -
            Assignee Kirill Perov [ JIRAUSER51446 ] Marko Mäkelä [ marko ]
            kirill.perov@mariadb.com Kirill Perov (Inactive) made changes -
            Attachment test_output_sudo2.txt [ 73970 ]
            Attachment test_output1.txt [ 73971 ]
            kirill.perov@mariadb.com Kirill Perov (Inactive) made changes -
            Attachment test_output3.txt [ 73974 ]
            marko Marko Mäkelä made changes -

            People

              marko Marko Mäkelä
              mdcallag Mark Callaghan
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.