[MDEV-14659] Innodb scalibility issue found in Mariadb code for complex 'select' queries in Arm platform Created: 2017-12-15 Updated: 2022-08-26 Resolved: 2022-08-26 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.3.2 |
| Fix Version/s: | 10.6.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | Sandeep sethia | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | innodb, performance | ||
| Environment: |
ubuntu 16.0.2 |
||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Epic Link: | arm64 optimization | ||||||||||||||||||||||||||||
| Sprint: | 10.0.34 | ||||||||||||||||||||||||||||
| Description |
|
Hi Sergey, While testing the performance of complex queries I see huge degradation in performance as we increase the number of client threads. I used the benchamark mysqlslap to evalaute the complex queries. Background of tables: Two tables populated with 4096 records populated and a common column is populated which helps in joining the two tables. Sample mysqslap command uses:
Sample output : The above command runs for 24 threads for the total number of queries of 500 .It runs 10 times the same operation . Number of clients running queries: 24 Average number of queries per client: 20 On profiling I see atomic operations such as cmpxchg is the hottest function Arm platform :
I tired to relax the memory orderfrom seq_cst to acq_/relaxed caused a ldaxr/stlxr => ldaxr/stxr but not much benefit availed Basically these are Low-level function which tries to lock an rw-lock in s-mode. Performs no In Intel I don’t see the function very hot
Can we do away with atomic operation since this being select queries?If not why cant we include spinning/pause in every lock function we try to acquire? Sample code path
|
| Comments |
| Comment by Sandeep sethia [ 2017-12-15 ] | ||||||||||||||
|
rw_lock_lock_word_decr was copied from mysql wrongly as i was verifying the codepath there. Its almost similar so the issues is confirmed. | ||||||||||||||
| Comment by Sandeep sethia [ 2017-12-15 ] | ||||||||||||||
|
I tried 4096k records in both the tables and copied few of them to one table so that join can happen | ||||||||||||||
| Comment by Sergey Vojtovich [ 2017-12-15 ] | ||||||||||||||
|
Generally I agree that this code performance wise is far from perfect. We could try optimising it, but it requires rw-locks refactoring. Much simpler option is to try adding UT_RELAX_CPU() into this loop (or MY_RELAX_CPU() if that's recent 10.3). But none of my guesses can explain why ARM is so much slower here. | ||||||||||||||
| Comment by Sandeep sethia [ 2017-12-15 ] | ||||||||||||||
|
I tried including the RELAX_CPU code in the loop but no improvement found. I increase the relax_cpu with 5,10,50 times but the performance got degraded. I also tried COMPILER BARRIER for including delay in a while loop for 50 times but no benefit seen. To answer your question I did ran for 30 and 50 times but results are same. I am comparing Mariadb on intel vs Mariadb on ARM but the issues seems to exist on Mysql as well. I feel cmpxchng is not working well on ARM platform so need to some other workarounds | ||||||||||||||
| Comment by Sergey Vojtovich [ 2017-12-15 ] | ||||||||||||||
|
Thanks for confirming. Now I can think only of 2 options: false sharing and refactoring. Am I correct that cache line size on ARM is 128? | ||||||||||||||
| Comment by Sandeep sethia [ 2017-12-15 ] | ||||||||||||||
|
Yes its 128 i believe. | ||||||||||||||
| Comment by Sergey Vojtovich [ 2017-12-16 ] | ||||||||||||||
|
ssethia, could you check if this patch makes any difference so that we can exclude false sharing guess?
| ||||||||||||||
| Comment by Sandeep sethia [ 2017-12-17 ] | ||||||||||||||
|
I see small improvement with the above patch .Around 5-8% but issue is still pertinent. | ||||||||||||||
| Comment by Sandeep sethia [ 2017-12-20 ] | ||||||||||||||
|
I see queued spin lock in userspace could be a solution but need to try,I tried to do some backoff in while loop but no benefits seen. | ||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-15 ] | ||||||||||||||
|
I wonder if | ||||||||||||||
| Comment by Marko Mäkelä [ 2022-08-26 ] | ||||||||||||||
|
I think that this was fixed by |