Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26390

Performance regression for select queries that return large number of rows after upgrading from 10.3 to 10.5

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.10, 10.5.11
    • None
    • None
    • None
    • CentOS 7

    Description

      After recently upgrading MariaDB we started seeing some of our queries' performance significantly decreased.

      Given these two tables

      MariaDB [(none)]> show create table `region`.`hr_blocked_by_users` \G
      *************************** 1. row ***************************
             Table: hr_blocked_by_users
      Create Table: CREATE TABLE `hr_blocked_by_users` (
        `user_id` int(11) unsigned NOT NULL,
        `hr_id` int(11) unsigned NOT NULL,
        `date` int(10) unsigned NOT NULL DEFAULT 0,
        PRIMARY KEY (`user_id`,`hr_id`),
        KEY `user_id__date` (`user_id`,`date`),
        KEY `hr_id` (`hr_id`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
      1 row in set (0.01 sec)
       
      MariaDB [(none)]> show create table `users_ru`.`reg_users_resumes` \G
      *************************** 1. row ***************************
             Table: reg_users_resumes
      Create Table: CREATE TABLE `reg_users_resumes` (
        `id_user` int(10) unsigned NOT NULL DEFAULT 0,
        `id_resume` int(10) unsigned NOT NULL DEFAULT 0,
        PRIMARY KEY (`id_resume`),
        KEY `id_user` (`id_user`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC
      1 row in set (0.00 sec)
      

      and a query

      SELECT
        `rur`.`id_resume`
      FROM
        `region`.`hr_blocked_by_users` `blocked`
        JOIN `users_ru`.`reg_users_resumes` `rur` ON blocked.user_id = rur.id_user
      WHERE
        blocked.hr_id = '3675742'
      

      we've seen 30-80% slowdown after upgrading from 10.3.22 to 10.5.10.

      Query plans seem almost identical, but execution time is much faster for 10.3.22.

      -- 10.3.22
      ANALYZE FORMAT=JSON SELECT SQL_NO_CACHE `rur`.`id_resume` FROM region.`hr_blocked_by_users` `blocked` JOIN users_ru.`reg_users_resumes` `rur` ON blocked.user_id = rur.id_user WHERE blocked.hr_id = '3018027'\G
      *************************** 1. row ***************************
      ANALYZE: {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 93.691,
          "table": {
            "table_name": "blocked",
            "access_type": "ref",
            "possible_keys": ["PRIMARY", "user_id__date", "hr_id"],
            "key": "hr_id",
            "key_length": "4",
            "used_key_parts": ["hr_id"],
            "ref": ["const"],
            "r_loops": 1,
            "rows": 69858,
            "r_rows": 30940,
            "r_total_time_ms": 5.2686,
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          },
          "table": {
            "table_name": "rur",
            "access_type": "ref",
            "possible_keys": ["id_user"],
            "key": "id_user",
            "key_length": "4",
            "used_key_parts": ["id_user"],
            "ref": ["region.blocked.user_id"],
            "r_loops": 30940,
            "rows": 1,
            "r_rows": 1.4677,
            "r_total_time_ms": 82.421,
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          }
        }
      }
      

      -- 10.5.10
      ANALYZE FORMAT=JSON SELECT SQL_NO_CACHE `rur`.`id_resume` FROM region.`hr_blocked_by_users` `blocked` JOIN users_ru.`reg_users_resumes` `rur` ON blocked.user_id = rur.id_user WHERE blocked.hr_id = '3018027'\G
      *************************** 1. row ***************************
      ANALYZE: {
        "query_block": {
          "select_id": 1,
          "r_loops": 1,
          "r_total_time_ms": 168.1655,
          "table": {
            "table_name": "blocked",
            "access_type": "ref",
            "possible_keys": ["PRIMARY", "user_id__date", "hr_id"],
            "key": "hr_id",
            "key_length": "4",
            "used_key_parts": ["hr_id"],
            "ref": ["const"],
            "r_loops": 1,
            "rows": 66354,
            "r_rows": 30940,
            "r_table_time_ms": 13.15574,
            "r_other_time_ms": 3.069465273,
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          },
          "table": {
            "table_name": "rur",
            "access_type": "ref",
            "possible_keys": ["id_user"],
            "key": "id_user",
            "key_length": "4",
            "used_key_parts": ["id_user"],
            "ref": ["region.blocked.user_id"],
            "r_loops": 30940,
            "rows": 1,
            "r_rows": 1.4677117,
            "r_table_time_ms": 145.4823186,
            "r_other_time_ms": 6.440280838,
            "filtered": 100,
            "r_filtered": 100,
            "using_index": true
          }
        }
      }
      

      Also, in both cases almost all of the accounted time in a query profile is attributed to "Sending data" phase.

      -- 10.3.22
      SHOW PROFILE ALL;
      +------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
      | Status                 | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file   | Source_line |
      +------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
      | Starting               | 0.000072 | 0.000029 |   0.000030 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 1 |     0 | NULL            | NULL          |        NULL |
      | Checking permissions   | 0.000015 | 0.000011 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 1 |     0 | <unknown>       | sql_parse.cc  |        6506 |
      | Opening tables         | 0.000022 | 0.000022 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_base.cc   |        4076 |
      | After opening tables   | 0.000007 | 0.000007 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_base.cc   |        4326 |
      | System lock            | 0.000007 | 0.000007 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         335 |
      | Table lock             | 0.000007 | 0.000007 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         340 |
      | Init                   | 0.000020 | 0.000020 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        4273 |
      | Optimizing             | 0.000021 | 0.000021 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        1552 |
      | Statistics             | 0.000079 | 0.000050 |   0.000030 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 3 |     0 | <unknown>       | sql_select.cc |        1940 |
      | Preparing              | 0.000028 | 0.000027 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 1 |     0 | <unknown>       | sql_select.cc |        2012 |
      | Executing              | 0.000006 | 0.000006 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        3910 |
      | Sending data           | 0.094987 | 0.076359 |   0.025799 |               220 |                   0 |            0 |            24 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        4092 |
      | End of update loop     | 0.000052 | 0.000084 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        4308 |
      | Query end              | 0.000009 | 0.000013 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6041 |
      | Commit                 | 0.000007 | 0.000000 |   0.000014 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6104 |
      | Closing tables         | 0.000006 | 0.000000 |   0.000013 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_base.cc   |         728 |
      | Unlocking tables       | 0.000006 | 0.000007 |   0.000004 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         426 |
      | Closing tables         | 0.000014 | 0.000029 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         439 |
      | Starting cleanup       | 0.000007 | 0.000023 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6176 |
      | Freeing items          | 0.000011 | 0.000030 |   0.000030 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 1 |     0 | <unknown>       | sql_parse.cc  |        7835 |
      | Updating status        | 0.000020 | 0.000100 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        2408 |
      | Reset for next command | 0.000015 | 0.000043 |   0.000030 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        2432 |
      +------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
      

      -- 10.5.10
      SHOW PROFILE ALL;
      +------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
      | Status                 | Duration | CPU_user | CPU_system | Context_voluntary | Context_involuntary | Block_ops_in | Block_ops_out | Messages_sent | Messages_received | Page_faults_major | Page_faults_minor | Swaps | Source_function | Source_file   | Source_line |
      +------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
      | Starting               | 0.000062 | 0.000150 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | NULL            | NULL          |        NULL |
      | checking permissions   | 0.000010 | 0.000031 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6760 |
      | Opening tables         | 0.000028 | 0.000083 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_base.cc   |        4207 |
      | After opening tables   | 0.000010 | 0.000030 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_base.cc   |        4464 |
      | System lock            | 0.000027 | 0.000000 |   0.000080 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         337 |
      | table lock             | 0.000013 | 0.000037 |   0.000003 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         342 |
      | init                   | 0.000019 | 0.000056 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        4735 |
      | Optimizing             | 0.000017 | 0.000053 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        1833 |
      | Statistics             | 0.000078 | 0.000235 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        2292 |
      | Preparing              | 0.000031 | 0.000010 |   0.000084 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        2367 |
      | Executing              | 0.000010 | 0.000027 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_select.cc |        4303 |
      | Sending data           | 0.167589 | 0.375163 |   0.104263 |              1239 |                   2 |           32 |            88 |             0 |                 0 |                 0 |                92 |     0 | <unknown>       | sql_select.cc |        4499 |
      | End of update loop     | 0.000110 | 0.000192 |   0.000083 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 1 |     0 | <unknown>       | sql_select.cc |        4779 |
      | Query end              | 0.000010 | 0.000038 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6067 |
      | Commit                 | 0.000011 | 0.000042 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6113 |
      | closing tables         | 0.000009 | 0.000037 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_base.cc   |         784 |
      | Unlocking tables       | 0.000010 | 0.000040 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         429 |
      | closing tables         | 0.000021 | 0.000084 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | lock.cc       |         442 |
      | Starting cleanup       | 0.000009 | 0.000034 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        6179 |
      | Freeing items          | 0.000015 | 0.000061 |   0.000000 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        8116 |
      | Updating status        | 0.000035 | 0.000163 |   0.000084 |                 0 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        2472 |
      | Reset for next command | 0.000026 | 0.000170 |   0.000000 |                 1 |                   0 |            0 |             0 |             0 |                 0 |                 0 |                 0 |     0 | <unknown>       | sql_parse.cc  |        2496 |
      +------------------------+----------+----------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------+---------------+-------------+
      

      To be fair, these servers are not quite identical, but I believe the difference is not enough to explain this much of a difference.
      Server with 10.3.22:

      • Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
      • 512G RAM
      • MariaDB 10.3.22 (official CentOS 7 rpm package)
      • innodb_buffer_pool_size=400G
      • is a replica, replication threads are running
      • no other workload

      Server with 10.5.10:

      • Intel(R) Xeon(R) CPU E7-8891 v4 @ 2.80GHz
      • 512G RAM
      • MariaDB 10.5.10 (official CentOS 7 rpm package)
      • innodb_buffer_pool_size=400G
      • is a replica, replication threads are running
      • read-only workload

      Unfortunately, I couldn't reproduce the issue in a "clean" manner.
      I tried the following ways of reproducing:

      1. loading these tables from sql dump into a fresh 10.5.10 docker container with similar settings;
      2. loading these tables from sql dump into a fresh 10.3.22 docker container with similar settings and performing an upgrade to 10.5.10;
      3. importing tablespaces from a read-only replica.
        In all of these cases performance did not differ from a container running 10.3.22.

      Also, I tried to test the performance of this query on different replicas.
      For measurement these scripts were used:

      (
        # warmup
        for i in $(seq 1 3); do
          mysql -rN -h $MYSQL_HOST < $SQL_FILE > /dev/null
        done
        echo "r_total_time_ms";
        for i in $(seq 1 50); do
          (
            echo "ANALYZE FORMAT=JSON"
            cat $SQL_FILE
          ) | mysql -rN -h $MYSQL_HOST | jq .query_block.r_total_time_ms
        done
      ) | python stat.py
      

      # cat stat.py
      import pandas as pd
      import sys
       
      df = pd.read_csv(sys.stdin)
      col = df['r_total_time_ms']
      mean = col.mean()
      std = col.std()
      print("Time   (mean ± σ): {:4.2f} ± {:4.2f}".format(mean, std))
      print("Range (-3σ ... +3σ)  {:4.2f} ... {:4.2f}".format(mean - 3 * std, mean + 3 * std))
      print("      (min ... max): {:4.2f} ... {:4.2f}".format(col.min(), col.max()))
      

      Results were the following:

      server #1
      Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz
      512G RAM
      MariaDB 10.3.22 (official rpm package)
      innodb_buffer_pool_size=400G
      active replica
      no other workload
       
      Time   (mean ± σ): 89.20 ± 2.03
      Range (-3σ ... +3σ)  83.10 ... 95.31
            (min ... max): 86.81 ... 98.48
      

      server #2
      Intel(R) Xeon(R) CPU E7-8891 v4 @ 2.80GHz
      512G RAM
      MariaDB 10.5.10 (official rpm package)
      innodb_buffer_pool_size=400G
      active replica
      'work_mirror' workload
       
      Time   (mean ± σ): 187.52 ± 6.04
      Range (-3σ ... +3σ)  169.40 ... 205.64
            (min ... max): 175.38 ... 208.00
      

      server #3
      Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz
      128G RAM
      MariaDB 10.5.11 (official rpm package)
      innodb_buffer_pool_size=80G
      active replica
      read-only workload
       
      Time   (mean ± σ): 127.07 ± 2.38
      Range (-3σ ... +3σ)  119.92 ... 134.21
            (min ... max): 119.95 ... 133.93
      

      server #4
      Intel(R) Xeon(R) CPU E5-2667 v2 @ 3.30GHz
      128G RAM
      MariaDB 10.5.11 (official rpm package)
      innodb_buffer_pool_size=80G
      active replica
      no workload
       
      Time   (mean ± σ): 117.29 ± 3.17
      Range (-3σ ... +3σ)  107.79 ... 126.79
            (min ... max): 111.47 ... 123.27
      

      Performance becomes somewhat better when performance_schema is disabled (or even with performance-schema-instrument='stage/%=OFF')

      server #4 with performance-schema-instrument='stage/%=OFF'
      Time   (mean ± σ): 110.02 ± 1.79
      Range (-3σ ... +3σ)  104.66 ... 115.39
            (min ... max): 104.99 ... 113.35
      

      What's puzzling for me is that, as I mentioned earlier, this doesn't manifest in a docker environment.

      developer machine
      Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz
      CentOS 7 + mainline kernel (5.2.2)
      32G RAM
      MariaDB 10.5.10/10.3.22 (official docker image)
      innodb_buffer_pool_size=24G
      no replication
      no workload
       
      10.3.22
      Time   (mean ± σ): 72.17 ± 0.71
      Range (-3σ ... +3σ)  70.05 ... 74.29
            (min ... max): 71.03 ... 74.48
       
      10.5.10
      Time   (mean ± σ): 71.20 ± 1.13
      Range (-3σ ... +3σ)  67.81 ... 74.59
            (min ... max): 69.95 ... 77.26
      

      Because of that I'm inclined to think that the issue may very well be not in the MariaDB code itself, but that MariaDB 10.5 requires some system settings to be changed.

      Attachments

        1. SHOW_GLOBAL_VARIABLES_10.3.22.txt
          18 kB
          Alex Sladkov
        2. SHOW_GLOBAL_VARIABLES_10.5.10.txt
          19 kB
          Alex Sladkov
        3. mariadb_10.3.22_2021-08-30.svg
          378 kB
          Alex Sladkov
        4. mariadb_10.5.11_2021-08-30.svg
          467 kB
          Alex Sladkov

        Activity

          People

            Unassigned Unassigned
            narkq Alex Sladkov
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.