Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10064

performance regression with threadpool

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Not a Bug
    • 10.0.25, 10.1.14
    • 10.1.25
    • OTHER
    • Ubuntu x86_64
    • 10.0.26, 10.0.28, 5.5.55, 10.0.30

    Description

      Enabling the thread pool leads to about 5% performance loss in MariaDB 10.0 and 10.1, but not in MariaDB 5.5. I tested 5.5.49 vs. 10.0.25 vs. 10.1.14.

      The benchmark is sysbench OLTP read-only with 1000 point-selects per transaction. The benchmark machine has 16 cores (32 hyperthreads).

      my.cnf:

      [mysqld]
      max_connections = 1300
      table_open_cache = 2600
      query_cache_type = 0
       
      innodb_buffer_pool_size = 512M
      innodb_buffer_pool_instances = 10
      innodb_adaptive_hash_index_partitions = 20
       
      thread_handling=pool-of-threads
      

      See attached spread sheet for numbers.

      Attachments

        1. one_thread.txt
          77 kB
        2. pool.txt
          77 kB
        3. threadpool.ods
          56 kB
        4. tp10.png
          tp10.png
          60 kB
        5. tp1000.png
          tp1000.png
          57 kB

        Activity

          wlad Vladislav Vaintroub added a comment - - edited

          Ok, I measured some more , with and without taskset. So, one can see what my appear as very slight regression if taskset is not used ,specifically for threadpool case. But, this is a phantom regression . Indeed, as mentioned elsewhere (e.g in threadpool documentation, in the section of how to run benchmarks), benchmark driver seems to take a bigger share of the overall CPU. Concretely, in this case in 10.1, without pinning, you can get a situation where sysbench-0.5 is using 10 CPUs out of 32, while mysql is using 22 CPUs, as shown by "top". The idle time is 0%, there are 32 CPUs, that are all busy. However, mysqld can do more, if affinitized (use up to 24 CPUs, which results in better throughput, but then sysbench needs to be restricted).In all of my affinitized test, threadpool outperforms thread-per-connection (the later can be affinitized or not). In all of overall tests, threadpool continues to scale above 1024 concurrent selects.

          Either there is something I do wrong on my end, or I'd say that the benchmarks were not run properly, and the same hardware can do better, and outperform thread-per-connection in all aspects, including raw throughput, if the benchmark would run using taskset, as mentioned in the threadpool documentation.
          taskset really makes a visible difference.

          I shared my results in
          https://docs.google.com/spreadsheets/d/12KPobxrP89BzrevPaCoGxGUPnI4kuLWRtTLjTfPJw78/edit#gid=0

          axel, I'm reasssigning this back. Could you please confirm, my findings (and, in this case, I think the MDEV can be closed), or tell my whether I do something wrong.

          I shared details how I run the benchmarks, including sysbench and mysqld parameters (including the taskset params) in this comment

          https://jira.mariadb.org/browse/MDEV-10064?focusedCommentId=84510&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-84510

          wlad Vladislav Vaintroub added a comment - - edited Ok, I measured some more , with and without taskset. So, one can see what my appear as very slight regression if taskset is not used ,specifically for threadpool case. But, this is a phantom regression . Indeed, as mentioned elsewhere (e.g in threadpool documentation, in the section of how to run benchmarks), benchmark driver seems to take a bigger share of the overall CPU. Concretely, in this case in 10.1, without pinning, you can get a situation where sysbench-0.5 is using 10 CPUs out of 32, while mysql is using 22 CPUs, as shown by "top". The idle time is 0%, there are 32 CPUs, that are all busy. However, mysqld can do more, if affinitized (use up to 24 CPUs, which results in better throughput, but then sysbench needs to be restricted).In all of my affinitized test, threadpool outperforms thread-per-connection (the later can be affinitized or not). In all of overall tests, threadpool continues to scale above 1024 concurrent selects. Either there is something I do wrong on my end, or I'd say that the benchmarks were not run properly, and the same hardware can do better, and outperform thread-per-connection in all aspects, including raw throughput, if the benchmark would run using taskset, as mentioned in the threadpool documentation. taskset really makes a visible difference. I shared my results in https://docs.google.com/spreadsheets/d/12KPobxrP89BzrevPaCoGxGUPnI4kuLWRtTLjTfPJw78/edit#gid=0 axel , I'm reasssigning this back. Could you please confirm, my findings (and, in this case, I think the MDEV can be closed), or tell my whether I do something wrong. I shared details how I run the benchmarks, including sysbench and mysqld parameters (including the taskset params) in this comment https://jira.mariadb.org/browse/MDEV-10064?focusedCommentId=84510&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-84510

          If the origin of this regression is suspected to be this Percona XtraDB commit, then I presume that the condition

          	} else if (free_len > max_free_len / 5) {
          

          should be preserved intact.

          marko Marko Mäkelä added a comment - If the origin of this regression is suspected to be this Percona XtraDB commit , then I presume that the condition } else if (free_len > max_free_len / 5) { should be preserved intact.

          marko I think the comment belongs to MDEV-10409, not this one

          wlad Vladislav Vaintroub added a comment - marko I think the comment belongs to MDEV-10409 , not this one
          axel Axel Schwenke added a comment -

          Added a test case to the regression test suite to test thread pool behavior for all MariaDB releases, starting with 5.5.

          axel Axel Schwenke added a comment - Added a test case to the regression test suite to test thread pool behavior for all MariaDB releases, starting with 5.5.
          axel Axel Schwenke added a comment -

          Could not find any regression with a 16:16 splitting of hyperthreads. Performance with threadpool enabled is flat over releases and performance at high thread counts is slightly better with threadpool enabled vs. one-thread-per-connection

          axel Axel Schwenke added a comment - Could not find any regression with a 16:16 splitting of hyperthreads. Performance with threadpool enabled is flat over releases and performance at high thread counts is slightly better with threadpool enabled vs. one-thread-per-connection

          People

            axel Axel Schwenke
            axel Axel Schwenke
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.