Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32176

Contention in ha_innobase::info_low (dict_table::lock_mutex_lock)

Details

    Description

      Running sysbench with Intel's VTune, I've noticed there is large-ish contention in ha_innobase::info_low(), if flag contains HA_STATUS_VARIABLE.

      It is a cached workload, database is started with large enough buffer pool 10G.
      sysbench is run for 1 table x 1000000 rows, oltp_point_select, 400 users.

      .\sysbench.exe oltp_point_select --table-size=10000000 --mysql-user=root --report-interval=1 --time=30 --threads=400 --point-selects=0 --mysql-db=sbtest --mysql-ssl=off --histogram=1 run

      I'm attaching the flamegraph screenshots (sorry for this, but it does not seem to be possible to export that in svg from VTune)

      The code in question is this

      		ib_table->stats_mutex_lock();
       
      		ut_a(ib_table->stat_initialized);
       
      		n_rows = ib_table->stat_n_rows;
       
      		stat_clustered_index_size
      			= ib_table->stat_clustered_index_size;
       
      		stat_sum_of_other_index_sizes
      			= ib_table->stat_sum_of_other_index_sizes;
       
      		ib_table->stats_mutex_unlock();
      

      the callstack is

      ntdll.dll ! ZwWaitForAlertByThreadId
      ntdll.dll ! RtlpWaitOnAddressWithTimeout + 0x80
      ntdll.dll ! RtlpWaitOnAddress + 0xd7
      ntdll.dll ! RtlWaitOnAddress + 0x12
      KERNELBASE.dll ! WaitOnAddress + 0x32
      server.dll ! srw_mutex_impl<1>::wait + 0x18 - srw_lock.cc:202
      server.dll ! srw_mutex_impl<1>::wait_and_lock + 0x60 - srw_lock.cc:324
      server.dll ! srw_mutex_impl<1>::wr_lock + 0x10 - srw_lock.h:134
      server.dll ! dict_table_t::lock_mutex_lock + 0x7 - dict0mem.h:2011
      server.dll ! dict_table_t::stats_mutex_lock - dict0mem.h:2039
      server.dll ! ha_innobase::info_low + 0x202 - ha_innodb.cc:14787
      server.dll ! make_join_statistics + 0x20c - sql_select.cc:5499
      server.dll ! JOIN::optimize_inner + 0x1cff - sql_select.cc:2625
      server.dll ! JOIN::optimize + 0x8b - sql_select.cc:1944
      server.dll ! mysql_select + 0x2e1 - sql_select.cc:5235
      server.dll ! handle_select + 0x1f9 - sql_select.cc:628
      server.dll ! execute_sqlcom_select + 0x2e5 - sql_parse.cc:6012
      server.dll ! mysql_execute_command + 0xac6 - sql_parse.cc:3911
      server.dll ! Prepared_statement::execute + 0x2b3 - sql_prepare.cc:5027
      ...
      

      It seems like mutual exclusion is an overkill here, as no data is being changed. Perhaps a slim rwlock would be better?

      Attachments

        1. contention_larger_view.png
          59 kB
          Vladislav Vaintroub
        2. ha_innobase_info_low_contention.png
          50 kB
          Vladislav Vaintroub
        3. info_low_point_select.patch.png
          34 kB
          Vladislav Vaintroub
        4. info_low_point_select.pre-patch.png
          31 kB
          Vladislav Vaintroub

        Issue Links

          Activity

            assigning to @marko. I see it is still being unassigned, after almost I year

            wlad Vladislav Vaintroub added a comment - assigning to @marko. I see it is still being unassigned, after almost I year

            There are also a couple other places where it indeed would suffice to hold a shared latch, both when accessing table locks or table statistics. We can also attempt lock elision (MDEV-26769) in ha_innobase::info_low(HA_STATUS_VARIABLE).

            Somewhat related to this, I had written a note that in innobase_build_v_templ() it would suffice to hold an exclusive per-table latch, instead of holding an exclusive dict_sys.latch. I would fix also that as part of this.

            marko Marko Mäkelä added a comment - There are also a couple other places where it indeed would suffice to hold a shared latch, both when accessing table locks or table statistics. We can also attempt lock elision ( MDEV-26769 ) in ha_innobase::info_low(HA_STATUS_VARIABLE) . Somewhat related to this, I had written a note that in innobase_build_v_templ() it would suffice to hold an exclusive per-table latch, instead of holding an exclusive dict_sys.latch . I would fix also that as part of this.

            https://github.com/MariaDB/server/pull/3320 includes also a tentative fix of mdcallag’s MDEV-34178, which could be interesting to test at the same time.

            marko Marko Mäkelä added a comment - https://github.com/MariaDB/server/pull/3320 includes also a tentative fix of mdcallag ’s MDEV-34178 , which could be interesting to test at the same time.

            confirm that patch reduces CPU time in the ha_innobase::info_low in the original test from 1.2% to 0.3%

            wlad Vladislav Vaintroub added a comment - confirm that patch reduces CPU time in the ha_innobase::info_low in the original test from 1.2% to 0.3%

            Pushed to 10.6 and merged to 10.11 (with no conflicts).

            marko Marko Mäkelä added a comment - Pushed to 10.6 and merged to 10.11 (with no conflicts).

            origin/10.6-MDEV-32176 e10acdf6c4624f891fbc22116e859ec3e7f86d9a 2024-06-24T15:53:41+03:00
            behaved well in RQG testing.
            

            mleich Matthias Leich added a comment - origin/10.6-MDEV-32176 e10acdf6c4624f891fbc22116e859ec3e7f86d9a 2024-06-24T15:53:41+03:00 behaved well in RQG testing.

            People

              marko Marko Mäkelä
              wlad Vladislav Vaintroub
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.