[MDEV-24869] The replication suddenly stops for N minutes in version after version 10.4.15 Created: 2021-02-15  Updated: 2021-04-25  Resolved: 2021-04-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.4.17, 10.5.8
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Alexandr Hacicheant Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 3
Labels: need_feedback

Issue Links:
Relates
relates to MDEV-24275 InnoDB persistent stats analyze force... Closed
relates to MDEV-24378 Crashes on Semaphore wait > 600 seconds Closed

 Description   

After update to 10.4.17 we noticed that the replication suddenly stops for N minutes.
The response of SHOW ENGINE INNODB STATUS shows

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 440929294
--Thread 139722467284736 has waited at btr0cur.cc line 1492 for 60.00 seconds the semaphore:
SX-lock on RW-latch at 0x7ef2b8011158 created in file dict0dict.cc line 1954
a writer (thread id 139581332059904) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
Last time write locked in file dict0stats.cc line 1968
OS WAIT ARRAY INFO: signal count 853851161
RW-shared spins 1894185518, rounds 11977859561, OS waits 89917364
RW-excl spins 928170891, rounds 6504095554, OS waits 24124968
RW-sx spins 18253227, rounds 252368118, OS waits 3117148
Spin rounds per wait: 6.32 RW-shared, 7.01 RW-excl, 13.83 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 60395546748
Purge done for trx's n:o < 60395546744 undo n:o < 0 state: running but idle
History list length 67
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421197791234584, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421197791230328, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421197791226072, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421197791179256, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421197791217560, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 60395546745, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 12, OS thread handle 139722802640640, query id 9925262534 Waiting for prior transaction to commit
---TRANSACTION 60395546740, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 14, OS thread handle 139722468513536, query id 9925262528 Waiting for prior transaction to commit
---TRANSACTION 60395546737, ACTIVE 60 sec updating or deleting
mysql tables in use 1, locked 1
2 lock struct(s), heap size 1128, 1 row lock(s)
MySQL thread id 18, OS thread handle 139722467284736, query id 9925262524 Update_rows_log_event::ha_update_row(-1)
UPDATE tableName
            SET `column` = '[1,2,3,4,5]', `updatedAt` = '2021-02-03T14:54:28Z'
            WHERE id = 1234
---TRANSACTION 60395546747, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 19, OS thread handle 139581271136000, query id 9925262538 Waiting for prior transaction to commit
---TRANSACTION 60395546746, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 13, OS thread handle 139722468820736, query id 9925262537 Waiting for prior transaction to commit
---TRANSACTION 60395546741, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 17, OS thread handle 139722467591936, query id 9925262530 Waiting for prior transaction to commit
---TRANSACTION 60395546744, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 15, OS thread handle 139722468206336, query id 9925262532 Waiting for prior transaction to commit
---TRANSACTION 60395546739, ACTIVE (PREPARED) 60 sec
3 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 2
MySQL thread id 16, OS thread handle 139722467899136, query id 9925262526 Waiting for prior transaction to commit
---TRANSACTION 421197791175000, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421197791170744, not started
0 lock struct(s), heap size 1128, 0 row lock(s)

Also, there is a warning "Thread 139722467284736 has waited at btr0cur.cc line 1492 for 60.00 seconds the semaphore", possible a clue.

We tried to update the server to the version 10.5.8, but this version contains the same issue.

When we downgraded version to 10.4.15 when problem solved.

I hope you will find the difference and will port it to 10.5.X version.



 Comments   
Comment by Marko Mäkelä [ 2021-02-15 ]

Could this report be a duplicate of MDEV-24275?

Comment by Christian Jaentsch [ 2021-02-23 ]

Same issue here on 10.4.17. Write-Operations hang for several minutes.

Thread 139728565298944 has waited at btr0cur.cc line 1492 for 241.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f15040765c8 created in file dict0dict.cc line 1954
a writer (thread id 139728678123264) has reserved it in mode  SX
number of readers 0, waiters flag 1, lock_word: 10000000
FLast time write locked in file dict0stats.cc line 1968

Comment by Christian Jaentsch [ 2021-02-23 ]

We have downgraded to 10.4.15 as well now. Or can anyone comfirm that the issue is fixed in 10.4.18?

Comment by Marko Mäkelä [ 2021-02-26 ]

cjaentsch or disc, did you test 10.4.18, which was released this week? It includes fixes for MDEV-24275 (which I would expect to cause this kind of a ‘recoverable hang’) and MDEV-24188 (an infinite loop).

Comment by Christian Jaentsch [ 2021-02-26 ]

We only experienced the behaviour on our production system. One time it resulted in a crash with a lot of annoyed customers. So we can't risk testing here unfortunately. 10.4.15 runs stable for now. We'll wait some more weeks and then we'll give 10.4.18 a try.

Comment by Alexandr Hacicheant [ 2021-03-26 ]

@marko We've been trying 10.4.18 for two weeks - no replication issue appeared

Comment by Elena Stepanova [ 2021-04-25 ]

Assuming it for now to be a duplicate of above-mentioned bugs.

Generated at Thu Feb 08 09:33:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.