[MDEV-24870] Got error 128 - Out of memory in engine, from time to time Created: 2021-02-15  Updated: 2021-07-20

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.5.8
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Oli Sennhauser Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Environment:

Linux



 Description   

We see in the MariaDB Error Log from time to time the message:

2021-02-14 19:21:43 53526568 [ERROR] Got error 128 when reading table './enswitch/active'

No other message before or after for a while. Mostly it does not harm. But sometimes the system piles up until max connection is reached and then we have to failover to the slave.

The system configuration is not optimal. We still have to tune it.
RAM: 512G, InnoDB Buffer Pool: 320G (50% pages free!) other variables are some sub-optimal.

Connection causing this should be running the following queries:

select count(*) as existing from active where server='?' and callid='?' limit 1
select recording from active where server='?' and uniqueid='?' limit 1
select uniqueid from active where ( stype='?' and snumber='?' and stransferred=? and processing=? and park='?' ) or ( dtype='?' and dnumber='?' and dtransferred=? and processing=? and park='?' ) for update
insert ignore into active ( server, uniqueid, callid, sip_callid, start, answered, scustomer, stype, snumber, spresent, stransferred, ctype, cnumber, dcustomer, dtype, dnumber, callerid_internal, callerid_external, callername, ingroup, outgroup, peer, channel, recording, queue, queue_status, card, overmax, note ) values ( '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?')
select server, delivery_server, dtype, dnumber from active where ( ( stype='?' and snumber='?' and stransferred=? ) or ( dtype='?' and dnumber='?' and dtransferred=? ) ) and ( server!='?' or uniqueid!='?' ) and processing=? order by start desc limit 1

When we have some more information I will update the ticket.



 Comments   
Comment by Oli Sennhauser [ 2021-02-18 ]

We changed all configuration variables to (IMHO) reasonable values:
innodb_buffer_pool_size = 160G

{join_, read_, read_rnd_ and sort_ }

*.buffer_size to their defaults and
disabled P_S again
innodb_log_files_in_group 80 -> 1
innodb_open_files = default

But it did not help we get still randomly the error on a hotspot table:

2021-02-16 8:46:33 1623505 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 8:50:16 1645219 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:13:33 1796018 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:14:30 1803493 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:20:25 1852460 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:22:41 1871102 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:35:17 1984802 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:37:10 2000628 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:45:53 2077023 [ERROR] Got error 128 when reading table './xxx/active'
2021-02-16 9:55:05 2156673 [ERROR] Got error 128 when reading table './xxx/active'

We assume a bug and wait for the next release.

Comment by Oli Sennhauser [ 2021-03-18 ]

10.5.9 did NOT fix the issue...

Comment by Oli Sennhauser [ 2021-03-18 ]

Error messages of the same type happen with Partitioning: https://jira.mariadb.org/browse/MDEV-19613 Which is NOT the case in our scenario.

Comment by Maurice Gasco [ 2021-07-20 ]

I get the same error on 10.4.18

Generated at Thu Feb 08 09:33:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.