Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.5.8
-
Ubuntu 64 bits. CPU: Intel Xeon E5-1660v3 - 8c/16t - 3GHz /3.5GHz
RAM: 128GB DDR4 ECC 2400 MHz
Disks: SoftRaid 2x4TB
Description
Mariadb stops working after 3-5 weeks, this is happening 4th times after upgrading database version to 10.5.*
and all our application servers and slave replication suspends until we restart database again
Unfortunately there is no log, and database is running but all current connections to database becomes unresponsive but i can create new connection using SSH to database and execute queries
For example i can execute this command and see results
_show processlist;_
|
but i can not kill process id, or when i execute this command i never get any response
_show status where `variable_name` = 'Threads_connected';_ |
When i executed this command
_show open tables WHERE In_use > 0;_ |
I saw almost all tables had > 1000 in_use
It seems that something is locked and not allowing other queries to be executed
I saw more than 2000 queries in processlist list waiting to be executed
These are the last logs before mariadb becomes unresponsive
The first two logs are usual and always happening
Jan 30 15:15:45 ns535551 mariadbd[19731]: 2021-01-30 15:15:45 3221 [ERROR] mariadbd: Can't find record in 'MessageContent'
Jan 30 15:15:45 ns535551 mariadbd[19731]: 2021-01-30 15:15:45 3221 [Warning] Slave SQL: Could not execute Update_rows_v1 event on table angel.MessageContent; Can't find record in 'MessageContent', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.065897, end_log_pos 52767840, Gtid 0-1-4286777278, Internal MariaDB error code: 1032
Jan 30 15:20:37 ns535551 mariadbd[19731]: 2021-01-30 15:20:37 322378 [Note] InnoDB: Number of pools: 2
Jan 30 16:07:35 ns535551 mariadbd[19731]: 2021-01-30 16:07:35 323399 [Note] InnoDB: Number of pools: 3_
and these are the first 2 queries which becomes unresponsive
select pe.id as id,ac.name as name, pe.birthDate as birthDate,pe.sex as sex,'*' as aboutMe, '*' as aboutOpponent, pe.status as status,'*' as profileHeader, pe.loginDate as loginDate,(select count(1) from FileData fd where fd.personId=pe.id and fd.status='A' and (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0)) as photoCount, 0 as videoCount, pe.lat as lat,pe.crDate as crDate, pe.height as height, pe.weight as weight, (select CONVERT(group_concat(fd.id) USING 'utf8') from FileData fd where fd.personId=pe.id and fd.status='A' and (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0)) as photos, 0 as privatePhotoCount, pe.userAgent as userAgent,pe.lng as lng, 0 as travelOfferCount,0 as travelSeekerCount, pe.profilePhotoId as profilePhotoId, ac.membership as membership,ac.verified as verified, pe.currCity as currCity,pe.currCountry as currCountry, pe.avatarId as avatarId, 100*st_distance(location, POINT(41.1144 ,28.9967)) as cdistance , pe.distance as distance, ac.lang as lang,'*' as email,'*' as password,'*' as username, pe.currCountryCode as currCountryCode, pe.autoLocation as autoLocation, pe.ipAddress, ac.phoneVerified, ac.photoVerified, '*' as score, '*' as gift, pe.hideMyAge, pe.region, '*' as interests, '*' as activities, '*' as music, '*' as tvShows, '*' as movies, '*' as books, (select amount*unix_timestamp(startDate) from CoinUsage use index (coinusage_profileidservicestatus_index) where profileId=pe.id and service='TS' and status='A' and endDate > now() limit 1) as topServiceValue from Person as pe use index for order by (person_statusphotologindate_index) left join Account ac on ac.id=pe.id where ac.membership not in ('R','X','D','E') and ac.id not in (select profileId from ProfileValidation) and pe.sex='F' and ac.id not in (select secondPersonId from Friend where relationType=2 and firstPersonId=4607996 and secondPersonId is not null) and st_contains(ST_GEOMFROMTEXT( 'Polygon((42.01372160591873 30.190387034446076, 42.01372160591873 27.803012965553926,40.215078394081274 27.803012965553926, 40.215078394081274 30.190387034446076, 42.01372160591873 30.190387034446076))'),pe.location) order by pe.status desc, pe.hasPhoto desc,pe.loginDate desc limit 50 |
select pe.id as id,ac.name as name, pe.birthDate as birthDate,pe.sex as sex,'*' as aboutMe, '*' as aboutOpponent, pe.status as status,'*' as profileHeader, pe.loginDate as loginDate,(select count(1) from FileData fd where fd.personId=pe.id and fd.status='A' and (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0)) as photoCount, 0 as videoCount, pe.lat as lat,pe.crDate as crDate, pe.height as height, pe.weight as weight, (select CONVERT(group_concat(fd.id) USING 'utf8') from FileData fd where fd.personId=pe.id and fd.status='A' and (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0)) as photos, 0 as privatePhotoCount, pe.userAgent as userAgent,pe.lng as lng, 0 as travelOfferCount,0 as travelSeekerCount, pe.profilePhotoId as profilePhotoId, ac.membership as membership,ac.verified as verified, pe.currCity as currCity,pe.currCountry as currCountry, pe.avatarId as avatarId, 100*st_distance(location, POINT(36.75 ,3.05)) as cdistance , pe.distance as distance, ac.lang as lang,'*' as email,'*' as password,'*' as username, pe.currCountryCode as currCountryCode, pe.autoLocation as autoLocation, pe.ipAddress, ac.phoneVerified, ac.photoVerified, '*' as score, '*' as gift, pe.hideMyAge, pe.region, '*' as interests, '*' as activities, '*' as music, '*' as tvShows, '*' as movies, '*' as books, (select amount*unix_timestamp(startDate) from CoinUsage use index (coinusage_profileidservicestatus_index) where profileId=pe.id and service='TS' and status='A' and endDate > now() limit 1) as topServiceValue from Person as pe use index for order by (person_statusphotologindate_index) left join Account ac on ac.id=pe.id where (ac.username = '%' or ac.name like '%') and ac.membership not in ('X','D','E') order by pe.status desc, pe.hasPhoto desc,pe.loginDate desc limit 50 |
I just had an idea about the problem, as you pay attention to following part at my last query
(ac.username = '%' or ac.name like '%') |
Database was unresponsive 4 times before and we always had this part at the last queries
since, query parameter is empty, all Account table data with other joins are loaded into memory, it is > 3 million rows and > 50 rows, I guess as a result of this, buffers are overloaded and system stops responding to other requests
We will fix query but even if we have this case, system must be responsive because we reorder results list and fetch only 50 rows from it.