Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-7943

pthread_getspecific() takes 0.76% in OLTP RO

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1(EOL)
    • 10.1.6
    • OTHER
    • None

    Description

      Data comes from Sandy Bridge system running sysbench OLTP RO in 1 thread against 1 table.

      Call graphs:

      -   0.76%  mysqld  libpthread-2.15.so   [.] pthread_getspecific
         - pthread_getspecific
            + 19.28% trx_is_interrupted(trx_t const*)
            + 8.56% net_real_write
            + 7.94% vio_io_wait
            + 5.19% execute_sqlcom_select(THD*, TABLE_LIST*)
            + 4.35% my_free
            + 3.82% String_list::append_str(st_mem_root*, char const*)
            + 3.70% my_real_read(st_net*, unsigned long*, char)
            + 3.26% Item_equal::add_const(Item*, Item*)
            + 3.11% MYSQLparse(THD*)
            + 3.04% make_select(TABLE*, unsigned long long, unsigned long long, Item*, bool, int*)
            + 2.62% Item_equal::Item_equal(Item*, Item*, bool)
            + 2.61% Item_func::fix_fields(THD*, Item**)
            + 2.41% get_best_combination(JOIN*)
            + 2.39% st_select_lex::init_query()
            + 2.16% check_simple_equality(Item*, Item*, Item*, COND_EQUAL*)
            + 1.80% Item_ident::Item_ident(Name_resolution_context*, char const*, char const*, char const*)
            + 1.79% build_equal_items(JOIN*, Item*, COND_EQUAL*, List<TABLE_LIST>*, bool, COND_EQUAL**, bool) [clone .constprop.262]
            + 1.77% mysql_select(THD*, Item***, TABLE_LIST*, unsigned int, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_
            + 1.63% st_select_lex::add_joined_table(TABLE_LIST*)
            + 1.59% make_leaves_list(List<TABLE_LIST>&, TABLE_LIST*, bool, TABLE_LIST*)
            + 1.55% my_malloc
            + 1.44% DsMrr_impl::dsmrr_info_const(unsigned int, st_range_seq_if*, void*, unsigned int, unsigned int*, unsigned int*, Cost_estimate*)
            + 1.34% Item_bool_func2::Item_bool_func2(Item*, Item*)
            + 1.31% Item_int::Item_int(char const*, long long, unsigned int)
            + 1.17% st_select_lex::add_item_to_list(THD*, Item*)
            + 1.06% Eq_creator::create(Item*, Item*) const
            + 0.85% cmp_item::get_comparator(Item_result, Item*, charset_info_st const*)
            + 0.85% st_select_lex::save_leaf_tables(THD*)
            + 0.72% ha_innobase::multi_range_read_init(st_range_seq_if*, void*, unsigned int, unsigned int, st_handler_buffer*)
            + 0.71% Item_func::setup_args_and_comparator(THD*, Arg_comparator*)
            + 0.61% key_and(RANGE_OPT_PARAM*, SEL_ARG*, SEL_ARG*, unsigned int) [clone .part.152]
            + 0.60% get_quick_keys(PARAM*, QUICK_RANGE_SELECT*, st_key_part*, SEL_ARG*, unsigned char*, unsigned int, unsigned char*, unsigned int)
            + 0.56% Item_func_between::Item_func_between(Item*, Item*, Item*)
            + 0.52% sql_memdup(void const*, unsigned long)
            + 0.51% Item_cache::get_cache(Item const*, Item_result)

      The most frequent caller is trx_is_interrupted()/thd_kill_level(): it calls current_thd unconditionally.
      Note: it may be fixed in Monty's fastconnect tree.

      Attachments

        Activity

          svoj Sergey Vojtovich created issue -
          svoj Sergey Vojtovich made changes -
          Field Original Value New Value
          Epic Link MDEV-7941 [ 50796 ]
          svoj Sergey Vojtovich made changes -
          Affects Version/s 10.1 [ 16100 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.1 [ 16100 ]

          one option would be to use thread local variables in gcc. they might be faster (needs to be tested) and with macros one can easily hide the underlying implementation (getspecific or tls) from the caller.

          serg Sergei Golubchik added a comment - one option would be to use thread local variables in gcc. they might be faster (needs to be tested) and with macros one can easily hide the underlying implementation (getspecific or tls) from the caller.
          svoj Sergey Vojtovich made changes -
          Assignee Sergey Vojtovich [ svoj ]

          serg, please review 3 patches for this task.

          svoj Sergey Vojtovich added a comment - serg , please review 3 patches for this task.
          svoj Sergey Vojtovich made changes -
          Assignee Sergey Vojtovich [ svoj ] Sergei Golubchik [ serg ]
          Status Open [ 1 ] In Review [ 10002 ]
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ] Sergey Vojtovich [ svoj ]
          Status In Review [ 10002 ] Stalled [ 10000 ]

          serg, please also review 3-d patch for this task.

          svoj Sergey Vojtovich added a comment - serg , please also review 3-d patch for this task.
          svoj Sergey Vojtovich made changes -
          Assignee Sergey Vojtovich [ svoj ] Sergei Golubchik [ serg ]
          Status Stalled [ 10000 ] In Review [ 10002 ]
          svoj Sergey Vojtovich made changes -
          svoj Sergey Vojtovich made changes -
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 60398 ] MariaDB v3 [ 65208 ]

          Out of curiosity, what happened to the thread-local variables idea? Has it proved to be not fast enough to replace pthread_getspecific() calls?

          kaamos Alexey Kopytov added a comment - Out of curiosity, what happened to the thread-local variables idea? Has it proved to be not fast enough to replace pthread_getspecific() calls?

          alexeykopytov, according to my study (with no good benchmarks though) TLS should be faster than pthread_getspecific(), but still slower than passing function args.

          Currently we reduced number of pthread_getspecific() calls from ~1100 to ~300 per OLTP RO transaction. Alas there're different workloads which won't benefit from this.

          The plan is: pass THD through whenever it is possible, otherwise fallback to TLS if there're worthy cases.

          svoj Sergey Vojtovich added a comment - alexeykopytov , according to my study (with no good benchmarks though) TLS should be faster than pthread_getspecific(), but still slower than passing function args. Currently we reduced number of pthread_getspecific() calls from ~1100 to ~300 per OLTP RO transaction. Alas there're different workloads which won't benefit from this. The plan is: pass THD through whenever it is possible, otherwise fallback to TLS if there're worthy cases.

          I see, thanks. I was asking, because I was considering the same idea for Percona Server a few years ago. Leveraging thread-local storage looked like a low-hanging fruit to optimize all those pthread_getspecific() call sites without introducing invasive code changes, but I never got around to evaluating it.

          kaamos Alexey Kopytov added a comment - I see, thanks. I was asking, because I was considering the same idea for Percona Server a few years ago. Leveraging thread-local storage looked like a low-hanging fruit to optimize all those pthread_getspecific() call sites without introducing invasive code changes, but I never got around to evaluating it.
          svoj Sergey Vojtovich made changes -
          Sprint 10.1.6-1 [ 6 ]
          svoj Sergey Vojtovich made changes -
          Rank Ranked higher
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ] Sergey Vojtovich [ svoj ]
          Status In Review [ 10002 ] Stalled [ 10000 ]

          serg, please review another patch for this bug:

          [Commits] a5799f5: MDEV-7943 - pthread_getspecific() takes 0.76% in OLTP RO

          svoj Sergey Vojtovich added a comment - serg , please review another patch for this bug: [Commits] a5799f5: MDEV-7943 - pthread_getspecific() takes 0.76% in OLTP RO
          svoj Sergey Vojtovich made changes -
          Assignee Sergey Vojtovich [ svoj ] Sergei Golubchik [ serg ]
          Status Stalled [ 10000 ] In Review [ 10002 ]
          serg Sergei Golubchik made changes -
          Assignee Sergei Golubchik [ serg ] Sergey Vojtovich [ svoj ]
          Status In Review [ 10002 ] Stalled [ 10000 ]
          svoj Sergey Vojtovich made changes -
          svoj Sergey Vojtovich made changes -
          svoj Sergey Vojtovich made changes -

          Number of pthread_getspecific() calls was reduced from ~1100 to 290. Further improvements (if any) will be done separately.

          svoj Sergey Vojtovich added a comment - Number of pthread_getspecific() calls was reduced from ~1100 to 290. Further improvements (if any) will be done separately.
          svoj Sergey Vojtovich made changes -
          Fix Version/s 10.1.6 [ 19401 ]
          Fix Version/s 10.1 [ 16100 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 65208 ] MariaDB v4 [ 149026 ]

          People

            svoj Sergey Vojtovich
            svoj Sergey Vojtovich
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.