[MDEV-14659] Innodb scalibility issue found in Mariadb code for complex 'select' queries in Arm platform - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3.2
Fix Version/s: 10.6.0
Component/s: Storage Engine - InnoDB
Labels:
- innodb
- performance
Environment:
ubuntu 16.0.2

Epic Link:
arm64 optimization
Sprint:
10.0.34

Description

Hi Sergey,

While testing the performance of complex queries I see huge degradation in performance as we increase the number of client threads. I used the benchamark mysqlslap to evalaute the complex queries.

Background of tables: Two tables populated with 4096 records populated and a common column is populated which helps in joining the two tables.

Sample mysqslap command uses:

mysqlslap -uroot --concurrency=24  --create-schema=test --no-drop --number-of-queries=500 --iterations=10 --query='select count(*), category from task inner join incident on task.sys_id=incident.sys_id group by incident.category' –p

Sample output : The above command runs for 24 threads for the total number of queries of 500 .It runs 10 times the same operation . Number of clients running queries: 24 Average number of queries per client: 20
Attached is the time take for each combination of threads tested in secs. Example 1 ,8 16 threads etc

On profiling I see atomic operations such as cmpxchg is the hottest function

Arm platform :

   │                                                           MY_MEMORY_ORDER_RELAXED);

       │             while (lock_copy > threshold) {

  0.00 │     ↓ b.le   a0cf14 <pfs_rw_lock_s_lock_func(rw_lock_t*, unsigned long, char const*, unsigned int) [clone .constprop.112]+0x1

       │                     if (my_atomic_cas32_strong_explicit(&lock->lock_word,

  0.04 │       ldr    w2, [x29,#80]

  8.68 │ f0:   ldaxr  w0, [x19]

  0.00 │       cmp    w0, w2

  0.01 │     ↓ b.ne   a0cefc <pfs_rw_lock_s_lock_func(rw_lock_t*, unsigned long, char const*, unsigned int) [clone .constprop.112]+0x1

88.19 │       stxr   w3, w1, [x19]

  0.00 │     ↑ cbnz   f0

  2.29 │104: ↑ b.ne   a0ced4 <pfs_rw_lock_s_lock_func(rw_lock_t*, unsigned long, char const*, unsigned int) [clone .constprop.112]+0xd

       │     rw_lock_s_lock_low():

Samples: 15M of event 'cycles:ppp', Event count (approx.): 9845312975908

Overhead  Command  Shared Object        Symbol                                                                                        ◆

  32.58%  mysqld   mysqld               [.] pfs_rw_lock_s_lock_func                                                                   ▒

  26.85%  mysqld   mysqld               [.] row_search_mvcc                                                                           ▒

  18.42%  mysqld   mysqld               [.] pfs_rw_lock_s_unlock_func                                                                 ▒

  11.13%  mysqld   mysqld               [.] pfs_rw_lock_s_unlock_func

I tired to relax the memory orderfrom seq_cst to acq_/relaxed caused a ldaxr/stlxr => ldaxr/stxr but not much benefit availed

Basically these are Low-level function which tries to lock an rw-lock in s-mode. Performs no
spinning.

In Intel I don’t see the function very hot

0.02%  mysqld   mysqld               [.] pfs_rw_lock_s_lock_func                                                                                                                         ▒

   0.02%  mysqld   mysqld               [.] buf_page_get_gen                                                                                                                                ▒

   0.02%  mysqld   mysqld               [.] page_cur_search_with_match_bytes                                                                                                                ▒

   0.02%  mysqld   mysqld               [.] row_search_mvcc                                                                                                                                 ▒

   0.02%  mysqld   mysqld               [.] pfs_rw_lock_s_lock_fun

Can we do away with atomic operation since this being select queries?If not why cant we include spinning/pause in every lock function we try to acquire?

Sample code path

rw_lock_lock_word_decr(

	/*===================*/

	        rw_lock_t*     lock,          /*!< in/out: rw-lock */

	        ulint          amount,        /*!< in: amount to decrement */

	        lint           threshold)     /*!< in: threshold of judgement */

	#ifdef INNODB_RW_LOCKS_USE_ATOMICS

	        lint local_lock_word;

	        os_rmb;

	        local_lock_word = lock->lock_word;

	        while (local_lock_word > threshold) {

	               if (os_compare_and_swap_lint(&lock->lock_word,

	                                           local_lock_word,

	                                           local_lock_word - amount)) {

	                       return(true);

	               local_lock_word = lock->lock_word;

	        return(false);

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

thread.png
11 kB
2017-12-15 09:05

Issue Links

is blocked by

MDEV-24142 rw_lock_t has unnecessarily complex wait logic

Closed

is part of

MDEV-14442 Optimization for ARM64 platform.

Open

relates to

MDEV-22850 Reduce buf_pool.page_hash latch contention

Closed

MDEV-22871 Contention on the buf_pool.page_hash

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Sandeep sethia

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2017-12-15 09:07

Updated:: 2022-08-26 10:00

Resolved:: 2022-08-26 10:00

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.