[MDEV-10813] Clean-up InnoDB atomics, memory barriers and mutexes Created: 2016-09-15  Updated: 2020-06-10  Resolved: 2016-10-17

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Fix Version/s: 10.2.3

Type: Task Priority: Critical
Reporter: Sergey Vojtovich Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Problem/Incident
causes MDEV-22859 Possible hang in InnoDB rw_lock_lock_... Closed
Relates
relates to MDEV-6113 merge 5.7 innodb Closed
relates to MDEV-6654 Combine ib_mutex_t::waiters and ib_mu... Closed
relates to MDEV-14304 Unnecessary conditions in buf_page_ge... Closed
relates to MDEV-18724 Replace buf_block_t::mutex with more ... Closed
Sprint: 10.2.3-1, 10.2.3-2

 Description   

InnoDB atomics implementation is getting worse: 5.7 implementation is not maintainable. We should remove it and use atomics provided by server instead.

InnoDB is now full of meaningless memory barriers, these should be removed along with memory barriers implementation.

We should restore fixes for InnoDB rwlocks.

Mutexes should be cleaned up so that their code is readable. Also we should ensure that mutexes issue proper memory barriers.

NOTE: there're 4 mutex implementations in InnoDB: futex base, spin lock (never used), pthread/critical_section based, event based (default). We should consider removing 3 implementations and leaving only pthread/critical_section based solution.



 Comments   
Comment by Sergey Vojtovich [ 2016-09-28 ]

serg, please review patches in bb-10.2-mdev10813.

Comment by Sergey Vojtovich [ 2016-09-28 ]

Apparently the only failure detected by buildbot is the windows one. Failures on fulltest occur in 10.2 as well.

Comment by Vladislav Vaintroub [ 2016-09-28 ]

Apparently, Windows failure is not exactly a sign of health . IT is a crash in in bootstrap

Comment by Sergey Vojtovich [ 2016-09-28 ]

wlad, it definitely is. I was going to ask you if you will be willing to help me to solve this. The problem should be fairly simple: it's startup, so no heavy concurrency should be involved. I guess either I used wrong atomic operation (different type) or something is missing to make my_atomic.h to emit the right code. I'm mostly curious about dump of rwlock structure.

Comment by Vladislav Vaintroub [ 2016-09-28 ]

But first the crash - it is an assertion
ut_ad(lock->lock_word <= threshold);

lock->lock_word = 0x00000000f0000000 volatile __int64
threshold = 0xfffffffff0000000 __int64

ulint which is the datatype of lock->lock_word, in Innodb is not synonymous to datatype "long". It is unsigned integer of the size of a pointer. The best standard approximation for that is size_t datatype.

Now, long is 32 bit on all Windows, pointer is of course 64 bit on Win64.

Thus my_atomic_caslong, my_atomic_addlong are wrong inside the sync0rw.ic
It needs to be substituted with int32 or int64 I think.

Comment by Jan Lindström (Inactive) [ 2016-09-29 ]

Wish: MDEV-6654 if relevant to new implementation would be nice to be fixed in this contents also.

Comment by Sergey Vojtovich [ 2016-09-29 ]

jplindst, InnoDB 5.7 comes with this idea for futex and spin-lock implementation. I fixed event based too.

Comment by Sergey Vojtovich [ 2016-09-29 ]

Just pushed a patch that attempts to fix WIn64 build failure. I don't completely like it: I'd better change to types with deterministic size, but it can be fairly intrusive.

Comment by Vladislav Vaintroub [ 2016-09-29 ]

Or, we can have my_atomic_cas_ssize_t and my_atomic_add_ssize_t . Absolutely everything is better than "long"

Comment by Vladislav Vaintroub [ 2016-09-29 ]

pushed a fix for Win64 compilation (missing casts) to bb-10.2-mdev10813

Comment by Vladislav Vaintroub [ 2016-09-30 ]

This failed on OSX, too http://buildbot.askmonty.org/buildbot/builders/labrador/builds/8036/steps/compile/logs/stdio

/private/var/lib/buildslave/maria-slave/labrador/build/storage/innobase/include/sync0rw.ic:341: error: invalid conversion from '_opaque_pthread_t*' to 'int64'
/private/var/lib/buildslave/maria-slave/labrador/build/storage/innobase/include/sync0rw.ic:341: error: initializing argument 2 of 'void my_atomic_store64(volatile int64*, int64)

Comment by Sergei Golubchik [ 2016-10-09 ]

I suppose bb-10.2-mdev10813 branch (at commit d2a4536be9) is good for review now.

Looking at the diff

Comment by Axel Schwenke [ 2016-10-13 ]

I did a round of sysbench OLTP. On our 16 core / 32 thread benchmark machine the latest changes in bb-10.2-mdev10813 make nearly no difference. Here are numbers (queries per second - bigger is better)

# data set 01 -> mdev-10813 head
# data set 02 -> before (commit 5058ced)
 
# read only
#thd    01      02
1       9676.6  9977.2
2       21338   21649
4       39786   40760
8       71194   75239
16      137518  136853
32      195938  196105
64      191771  191567
128     188573  188105
256     191704  191259
 
# read/write (10% writes)
#thd    01      02
1       6530.6  6663.4
2       13575   13533
4       24298   24414
8       45236   45446
16      86590   87654
32      148783  154582
64      194451  196712
128     206457  206319
256     203748  203821
 
# read/write (22% writes)
#thd    01      02
1       4238.5  4221.1
2       8492.5  8608.8
4       14292   14354
8       26833   27170
16      51832   52092
32      104019  104240
64      151609  149283
128     168852  169022
256     170582  170851
 
# read/write (80% writes)
#thd    01      02
1       8890.4  9261.9
2       17155   17560
4       32284   32438
8       60511   60207
16      112681  115927
32      183503  186966
64      219418  221771
128     227306  229264
256     225811  215256

On the 64 core / 128 thread machine it looks a bit different. Benchmark is still running, I'll add numbers when they arrive.

Comment by Axel Schwenke [ 2016-10-13 ]

Here the numbers (again queries per second) from the 64 core / 128 thread machine. Read-only is a bit faster for the head of the tree. All in all performance on this hardware is rather poor, probably caused by contention in the server layers above InnoDB.

# data set 01 -> mdev-10813 head
# data set 02 -> before (commit 5058ced)
 
# read only
#thd    01      02
1       8543.9  8550.4
2       16886   16951
4       32710   31471
8       60611   59759
16      84002   82922
32      61391   60169
64      38369   36371
128     39930   39849
256     20656   20974
512     23212   23343
1024    78925   73784
 
# read/write (10% writes)
#thd    01      02
1       5987.6  5849.4
2       12810   12402
4       24669   24228
8       45750   45463
16      67585   67701
32      67475   66547
64      52460   50705
128     28924   30246
256     17840   17841
512     17094   17108
1024    22048   20821
 
# read/write (22% writes)
#thd    01      02
1       4166.1  4110.3
2       9475.2  9261.5
4       16945   16598
8       31238   30851
16      53908   54055
32      64890   63974
64      49891   50204
128     23642   23546
256     16068   15895
512     15360   15484
1024    15319   15389
 
# read/write (80% writes)
#thd    01      02
1       8007.6  7786.9
2       15966   15724
4       25405   25198
8       43806   43682
16      69070   67875
32      81656   81641
64      77996   78498
128     80387   77480
256     64941   63605
512     57655   58633
1024    56908   56282

Comment by Sergey Vojtovich [ 2016-10-14 ]

According to the above there seem to be no performance/scalability regression introduced by relevant patches.

Generated at Thu Feb 08 07:45:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.