[MDEV-27930] halts on simple update Created: 2022-02-23  Updated: 2022-04-08  Resolved: 2022-04-08

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.5.14
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Anton Petin Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: None
Environment:

10.5.14-MariaDB



 Description   

Today we founded a problem, all our slaves (about 7) halts on the same simple update request.

Such updates we have about 1k per second.
All slaves halts for about 3k +- seconds with this update.

In the error_log there is no any error strings about broken tables or so on.
Table about 300Gb

But in the
show ENGINE INNODB STATUS\G

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 368051
OS WAIT ARRAY INFO: signal count 102855006
RW-shared spins 488816321, rounds 725777394, OS waits 174145
RW-excl spins 47801661, rounds 104209191, OS waits 23569
RW-sx spins 6190, rounds 89642, OS waits 2454
Spin rounds per wait: 1.48 RW-shared, 2.18 RW-excl, 14.48 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 52986560748
Purge done for trx's n:o < 52986560748 undo n:o < 0 state: running
History list length 1
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 52986560743, ACTIVE 4077 sec fetching rows
mysql tables in use 1, locked 1
2948218 lock struct(s), heap size 309649528, 107520080 row lock(s)
MySQL thread id 32697781, OS thread handle 140184323569408, query id 5358576609 Updating
update orders_history set comm_amount = '0', comm_amount_btc = '0', comm_amount_usdt = '0', 
                                        rate = '0', status = '1', amount_processed = NULL,amount_comm = '0.0',amount_usd_comm = '0.00000000',
                                        amount_comm_btc = '0', amount_processed_btc = '0',amount_usd_comm_btc = '0', ws = '1', amount_usd = '0.0151800000'
                                 WHERE id = '6650624433'
---TRANSACTION 421757010240208, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421757010235944, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
---TRANSACTION 421757010227416, not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync) log: 0; buffer pool: 0
7160284 OS file reads, 3106599900 OS file writes, 2440913 OS fsyncs
1 pending reads, 0 pending writes
1007.03 reads/s, 16384 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 60258, free list len 117901, seg size 178160, 837646 merges
merged operations:
 insert 3120906, delete mark 7425274, delete 3244254
discarded operations:
 insert 4009, delete mark 3174, delete 3174
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 43069848715182
Log flushed up to   43069848715182
Pages flushed up to 43067012044564
Last checkpoint at  43067012044564
0 pending log flushes, 0 pending chkp writes
3076190170 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 96670318592
Dictionary memory allocated 573942576
Buffer pool size   5806800
Free buffers       0
Database pages     5787901
Old database pages 2136530
Modified db pages  82836
Percent of dirty pages(LRU & free pages): 1.431
Max dirty pages percent: 90.000
Pending reads 1
Pending writes: LRU 0, flush list 0
Pages made young 2664314, not young 184580935
0.00 youngs/s, 32689.62 non-youngs/s
Pages read 7136189, created 1948676, written 30182641
1007.03 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 969 / 1000, young-making rate 0 / 1000 not 999 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 5787901, unzip_LRU len: 0
I/O sum[57798]:cur[2683], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 24630023, updated 360585336, deleted 14268884, read 2556790303
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 31686.22 reads/s
Number of system rows inserted 2596316843, updated 0, deleted 2596316863, read 2596316908
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================

And very huge hdd reads iops.
As i see the problem in the "1 pending reads"

So what you advice to do and how to solve it?

All slaves halts on same update.



 Comments   
Comment by Anton Petin [ 2022-02-24 ]

Also after my investigation, some bug in our application sends an invalid "id" in update where clause, it's "6650624433" but the ID field is INT, not BIGINT
But when i try to do update on such ID directly on slave - there is no any lag

Comment by Daniel Black [ 2022-02-24 ]

Glad you resolved this Anton. Thanks for updating this issue.

Comment by Anton Petin [ 2022-02-24 ]

No, as i said, if i try to do it directly on my slave there is no lag, but via replication it halts....i think that something wrong....

Comment by Daniel Black [ 2022-02-24 ]

Opps. The number of row locks make it look like its doing a table scan. Is id the primary/unique key of orders_history?

binlog_format=ROW might help ifs its a single row.

Is there a lag if the id value isn't quoted?

Comment by Anton Petin [ 2022-02-24 ]

As i said, if i do such request directly on slave - there is no a problem.

If such request goes from master via binlog - yea - full scan of table, ssd fully loaded, but this key not exists, this is must be millisec check. Cause ID is a PRIMARY KEY
I'm using Maxscale and perl DBI with placeholders, quote or not quote not in my permission =)

Comment by Anton Petin [ 2022-02-24 ]

and replication with MIXED binlog format.

Comment by Andrei Elkin [ 2022-02-24 ]

antonp1976: could you please upload the slave stack threads ?
Also does

All slaves halts for about 3k +- seconds with this update

the slave unhalt after those 3 000 +/- seconds? (I am guessing this can't be though, so probably it just takes long time to process replication events?)

Comment by Anton Petin [ 2022-02-25 ]

Some slaves 3k secs, one slave with raid sync on at this moment - 15k secs

Comment by Anton Petin [ 2022-02-25 ]

As i said before thos id not exists which is where clause

Comment by Sergei Golubchik [ 2022-03-10 ]

What does EXPLAIN UPDATE show if you run it directly? What does ANALYZE UPDATE show?

Generated at Thu Feb 08 09:56:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.