Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-5448

Performance regression between 10.0.4 and 10.0.5 (~8%)

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Duplicate
    • 10.0.6
    • 10.0.8
    • None
    • None

    Description

      As Axel mentioned in his E-mail, there is performance regression between 10.0.4 and 10.0.5:

      Date: Thu, 21 Nov 2013 18:32:45 +0100
      From: Axel Schwenke <axel@askmonty.org>
      To: "maria-developers@lists.launchpad.net" <maria-developers@lists.launchpad.net>
      Subject: [Maria-developers] MariaDB-10.0-beta sysbench results

      Looking for this regression I can see clear performance drop with the following revision:

      revno: 3427.1.258
      revision-id: knielsen@knielsen-hq.org-20130823120213-pbhsq4zc1h3jwa0i
      parent: knielsen@knielsen-hq.org-20130823081643-f3yhupp15yw9cpy4
      committer: knielsen@knielsen-hq.org
      branch nick: work-10.0-mdev26
      timestamp: Fri 2013-08-23 14:02:13 +0200
      message:
        MDEV-26: Global transaction ID.
       
        Implement @@gtid_binlog_state. This is the internal state of the binlog
        (most recent GTID logged for every domain_id and server_id). This allows
        to save the state before RESET MASTER and restore it afterwards.

      Specifically sys_vars.cc part:

      static unsigned char opt_gtid_binlog_state_dummy;
      static Sys_var_gtid_binlog_state Sys_gtid_binlog_state(
             "gtid_binlog_state",
             "The internal GTID state of the binlog, used to keep track of all "
             "GTIDs ever logged to the binlog.",
             GLOBAL_VAR(opt_gtid_binlog_state_dummy), NO_CMD_LINE);

      If I comment it out, I get nice performance boost. Note that it doesn't seem to have anything to do with gtid functionality accessed by Sys_var_gtid_binlog_state methods: I removed all references to gtid code and still observe performance degradation.

      It seem to be somehow caused by increase of system variables. If I add new system variable (on revision 3816), I can see performance degradation:

      static ulong table_cache_instances1;
      static Sys_var_ulong Sys_table_cache_instances1(
             "table_open_cache_instances1",
             "MySQL 5.6 compatible option. Not used or needed in MariaDB",
             READ_ONLY GLOBAL_VAR(table_cache_instances1), CMD_LINE(REQUIRED_ARG),
             VALID_RANGE(1, 64), DEFAULT(1),
             BLOCK_SIZE(1), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL),
             ON_UPDATE(NULL), NULL);

      The difference is like:
      64 threads, time spent: 60s, queries executed: 9326530, qps: 155442, 1 thread qps: 2428

      vs

      64 threads, time spent: 60s, queries executed: 9879031, qps: 164650, 1 thread qps: 2572

      I was unable to reproduce performance boost with fresh 10.0 by commenting out gtid_binlog_state.

      Even simpler patch for revision 3816 to see performance degradation:

      === modified file 'sql/sys_vars.cc'
      --- sql/sys_vars.cc	2013-08-14 08:48:50 +0000
      +++ sql/sys_vars.cc	2013-12-14 18:24:15 +0000
      @@ -2694,6 +2694,8 @@
              BLOCK_SIZE(1), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(NULL),
              ON_UPDATE(NULL), NULL);
       
      +char buf[sizeof(Sys_table_cache_instances)];
      +
       static Sys_var_ulong Sys_thread_cache_size(
              "thread_cache_size",
              "How many threads we should keep in a cache for reuse",
       

      Attachments

        Issue Links

          Activity

            When we add new system variable (e.g. ptr= 0x1061d40, size= 208), addresses of other global C++ variables may change. Among other things address of LOCK_open and unused_tables changes.

            rev.3816 (fast):
            LOCK_open: 0x1074120, size= 48 (cache line starts 0x1074100)
            unused_tables: 0x1074150, size= 8 (cache line starts 0x1074140)

            rev.3816 + "char buf[sizeof(Sys_table_cache_instances)]" (slow):
            LOCK_open: 0x1074200, size= 48 (cache line starts 0x1074200)
            unused_tables: 0x1074230, size= 8 (cache line starts 0x1074200)

            Note that in fast version LOCK_open resides on 2 cache lines (32 bytes on first + 16 bytes on second). Second cache line is shared with unused_tables. But since these last 16 bytes are quite static, there should be no false sharing issues.

            In slow version LOCK_open resides on 1 cache line which is shared with unused_tables.

            oprofile proves that LLC_MISSES increase in slow version:
            3816 (fast)
            CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
            Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
            samples % image name symbol name
            43387 37.4148 no-vmlinux /no-vmlinux
            21919 18.9019 libpthread-2.15.so pthread_mutex_lock
            6986 6.0244 libpthread-2.15.so pthread_mutex_unlock
            5427 4.6800 mysqld tc_release_table(TABLE*)
            3741 3.2261 mysqld TABLE::init(THD*, TABLE_LIST*)
            3168 2.7319 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
            3014 2.5991 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
            2199 1.8963 libpthread-2.15.so pthread_rwlock_unlock
            2151 1.8549 libpthread-2.15.so __lll_lock_wait
            2134 1.8403 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)

            3816 (slow)
            CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
            Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
            samples % image name symbol name
            43059 39.1488 no-vmlinux /no-vmlinux
            20065 18.2429 libpthread-2.15.so pthread_mutex_lock
            5736 5.2151 mysqld tc_release_table(TABLE*)
            5633 5.1215 libpthread-2.15.so pthread_mutex_unlock
            3331 3.0285 mysqld TABLE::init(THD*, TABLE_LIST*)
            2913 2.6485 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
            2666 2.4239 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
            2198 1.9984 libpthread-2.15.so pthread_rwlock_unlock
            1998 1.8166 libpthread-2.15.so __lll_lock_wait
            1976 1.7966 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)

            3816 (slow + padding)
            CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated)
            Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000
            samples % image name symbol name
            43144 37.7159 no-vmlinux /no-vmlinux
            21324 18.6412 libpthread-2.15.so pthread_mutex_lock
            5930 5.1839 libpthread-2.15.so pthread_mutex_unlock
            5889 5.1481 mysqld tc_release_table(TABLE*)
            3678 3.2153 mysqld TABLE::init(THD*, TABLE_LIST*)
            3469 3.0326 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**)
            3221 2.8158 mysqld open_tables(THD*, TABLE_LIST*, unsigned int, unsigned int, Prelocking_strategy*)
            2418 2.1138 libpthread-2.15.so pthread_rwlock_unlock
            2165 1.8926 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int)
            2144 1.8743 libpthread-2.15.so __lll_lock_wait

            Adding dummy padding around LOCK_open restore performance:
            +char pada[1024];
            mysql_mutex_t LOCK_open;
            +char padb[1024];

            svoj Sergey Vojtovich added a comment - When we add new system variable (e.g. ptr= 0x1061d40, size= 208), addresses of other global C++ variables may change. Among other things address of LOCK_open and unused_tables changes. rev.3816 (fast): LOCK_open: 0x1074120, size= 48 (cache line starts 0x1074100) unused_tables: 0x1074150, size= 8 (cache line starts 0x1074140) rev.3816 + "char buf [sizeof(Sys_table_cache_instances)] " (slow): LOCK_open: 0x1074200, size= 48 (cache line starts 0x1074200) unused_tables: 0x1074230, size= 8 (cache line starts 0x1074200) Note that in fast version LOCK_open resides on 2 cache lines (32 bytes on first + 16 bytes on second). Second cache line is shared with unused_tables. But since these last 16 bytes are quite static, there should be no false sharing issues. In slow version LOCK_open resides on 1 cache line which is shared with unused_tables. oprofile proves that LLC_MISSES increase in slow version: 3816 (fast) CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43387 37.4148 no-vmlinux /no-vmlinux 21919 18.9019 libpthread-2.15.so pthread_mutex_lock 6986 6.0244 libpthread-2.15.so pthread_mutex_unlock 5427 4.6800 mysqld tc_release_table(TABLE*) 3741 3.2261 mysqld TABLE::init(THD*, TABLE_LIST*) 3168 2.7319 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 3014 2.5991 mysqld open_tables(THD*, TABLE_LIST* , unsigned int , unsigned int, Prelocking_strategy*) 2199 1.8963 libpthread-2.15.so pthread_rwlock_unlock 2151 1.8549 libpthread-2.15.so __lll_lock_wait 2134 1.8403 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 3816 (slow) CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43059 39.1488 no-vmlinux /no-vmlinux 20065 18.2429 libpthread-2.15.so pthread_mutex_lock 5736 5.2151 mysqld tc_release_table(TABLE*) 5633 5.1215 libpthread-2.15.so pthread_mutex_unlock 3331 3.0285 mysqld TABLE::init(THD*, TABLE_LIST*) 2913 2.6485 mysqld open_tables(THD*, TABLE_LIST* , unsigned int , unsigned int, Prelocking_strategy*) 2666 2.4239 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 2198 1.9984 libpthread-2.15.so pthread_rwlock_unlock 1998 1.8166 libpthread-2.15.so __lll_lock_wait 1976 1.7966 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 3816 (slow + padding) CPU: Intel Sandy Bridge microarchitecture, speed 2.701e+06 MHz (estimated) Counted LLC_MISSES events (Last level cache demand requests from this core that missed the LLC) with a unit mask of 0x41 (No unit mask) count 10000 samples % image name symbol name 43144 37.7159 no-vmlinux /no-vmlinux 21324 18.6412 libpthread-2.15.so pthread_mutex_lock 5930 5.1839 libpthread-2.15.so pthread_mutex_unlock 5889 5.1481 mysqld tc_release_table(TABLE*) 3678 3.2153 mysqld TABLE::init(THD*, TABLE_LIST*) 3469 3.0326 mysqld tdc_acquire_share(THD*, char const*, char const*, char const*, unsigned int, unsigned int, TABLE**) 3221 2.8158 mysqld open_tables(THD*, TABLE_LIST* , unsigned int , unsigned int, Prelocking_strategy*) 2418 2.1138 libpthread-2.15.so pthread_rwlock_unlock 2165 1.8926 mysqld dispatch_command(enum_server_command, THD*, char*, unsigned int) 2144 1.8743 libpthread-2.15.so __lll_lock_wait Adding dummy padding around LOCK_open restore performance: +char pada [1024] ; mysql_mutex_t LOCK_open; +char padb [1024] ;

            MDEV-5388 removes unused_tables, so this particular performance regression is fixed.

            svoj Sergey Vojtovich added a comment - MDEV-5388 removes unused_tables, so this particular performance regression is fixed.

            People

              svoj Sergey Vojtovich
              svoj Sergey Vojtovich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.