Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24744

MariaDB unresponsive

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.5.8
    • N/A
    • OTHER
    • Ubuntu 64 bits. CPU: Intel Xeon E5-1660v3 - 8c/16t - 3GHz /3.5GHz
      RAM: 128GB DDR4 ECC 2400 MHz
      Disks: SoftRaid 2x4TB

    Description

      Mariadb stops working after 3-5 weeks, this is happening 4th times after upgrading database version to 10.5.*
      and all our application servers and slave replication suspends until we restart database again

      Unfortunately there is no log, and database is running but all current connections to database becomes unresponsive but i can create new connection using SSH to database and execute queries

      For example i can execute this command and see results

      _show processlist;_
      

      but i can not kill process id, or when i execute this command i never get any response

       _show status where `variable_name` = 'Threads_connected';_
      

      When i executed this command

      _show open tables WHERE In_use > 0;_
      

      I saw almost all tables had > 1000 in_use

      It seems that something is locked and not allowing other queries to be executed
      I saw more than 2000 queries in processlist list waiting to be executed

      These are the last logs before mariadb becomes unresponsive
      The first two logs are usual and always happening

      Jan 30 15:15:45 ns535551 mariadbd[19731]: 2021-01-30 15:15:45 3221 [ERROR] mariadbd: Can't find record in 'MessageContent'
      Jan 30 15:15:45 ns535551 mariadbd[19731]: 2021-01-30 15:15:45 3221 [Warning] Slave SQL: Could not execute Update_rows_v1 event on table angel.MessageContent; Can't find record in 'MessageContent', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mariadb-bin.065897, end_log_pos 52767840, Gtid 0-1-4286777278, Internal MariaDB error code: 1032

      Jan 30 15:20:37 ns535551 mariadbd[19731]: 2021-01-30 15:20:37 322378 [Note] InnoDB: Number of pools: 2
      Jan 30 16:07:35 ns535551 mariadbd[19731]: 2021-01-30 16:07:35 323399 [Note] InnoDB: Number of pools: 3_

      and these are the first 2 queries which becomes unresponsive

      select  pe.id as id,ac.name as name, pe.birthDate as birthDate,pe.sex as sex,'*' as aboutMe, '*' as aboutOpponent, pe.status as status,'*' as profileHeader,  pe.loginDate as loginDate,(select count(1) from FileData fd where fd.personId=pe.id and fd.status='A' and  (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0))  as photoCount, 0 as videoCount,  pe.lat as lat,pe.crDate as crDate,  pe.height as height,  pe.weight as weight, (select CONVERT(group_concat(fd.id) USING 'utf8') from FileData fd where fd.personId=pe.id and fd.status='A' and  (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0))  as photos, 0 as privatePhotoCount, pe.userAgent as userAgent,pe.lng as lng, 0 as travelOfferCount,0 as travelSeekerCount, pe.profilePhotoId as profilePhotoId,  ac.membership as membership,ac.verified as verified,  pe.currCity as currCity,pe.currCountry as currCountry,  pe.avatarId as avatarId,  100*st_distance(location,  POINT(41.1144 ,28.9967)) as cdistance , pe.distance as distance, ac.lang as lang,'*' as email,'*' as password,'*' as username, pe.currCountryCode as currCountryCode,  pe.autoLocation as autoLocation,  pe.ipAddress,  ac.phoneVerified,  ac.photoVerified, '*' as score, '*' as gift,  pe.hideMyAge,  pe.region, '*' as interests, '*' as activities, '*' as music, '*' as tvShows, '*' as movies, '*' as books, (select amount*unix_timestamp(startDate) from CoinUsage use index (coinusage_profileidservicestatus_index) where profileId=pe.id and  service='TS' and status='A' and endDate > now() limit 1) as topServiceValue  from Person as pe   use index for order by (person_statusphotologindate_index)  left join Account ac on ac.id=pe.id  where ac.membership not in  ('R','X','D','E')  and ac.id not in (select profileId from ProfileValidation)  and pe.sex='F'  and ac.id  not in (select secondPersonId from Friend where relationType=2 and firstPersonId=4607996  and secondPersonId is not null) and st_contains(ST_GEOMFROMTEXT( 'Polygon((42.01372160591873 30.190387034446076, 42.01372160591873 27.803012965553926,40.215078394081274 27.803012965553926, 40.215078394081274 30.190387034446076, 42.01372160591873 30.190387034446076))'),pe.location) order by pe.status desc, pe.hasPhoto desc,pe.loginDate desc  limit 50
      

      select  pe.id as id,ac.name as name, pe.birthDate as birthDate,pe.sex as sex,'*' as aboutMe, '*' as aboutOpponent, pe.status as status,'*' as profileHeader,  pe.loginDate as loginDate,(select count(1) from FileData fd where fd.personId=pe.id and fd.status='A' and  (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0))  as photoCount, 0 as videoCount,  pe.lat as lat,pe.crDate as crDate,  pe.height as height,  pe.weight as weight, (select CONVERT(group_concat(fd.id) USING 'utf8') from FileData fd where fd.personId=pe.id and fd.status='A' and  (fd.indicator='P' or fd.indicator is null) and (fd.albumId is null or fd.albumId=0))  as photos, 0 as privatePhotoCount, pe.userAgent as userAgent,pe.lng as lng, 0 as travelOfferCount,0 as travelSeekerCount, pe.profilePhotoId as profilePhotoId,  ac.membership as membership,ac.verified as verified,  pe.currCity as currCity,pe.currCountry as currCountry,  pe.avatarId as avatarId,  100*st_distance(location,  POINT(36.75 ,3.05)) as cdistance , pe.distance as distance, ac.lang as lang,'*' as email,'*' as password,'*' as username, pe.currCountryCode as currCountryCode,  pe.autoLocation as autoLocation,  pe.ipAddress,  ac.phoneVerified,  ac.photoVerified, '*' as score, '*' as gift,  pe.hideMyAge,  pe.region, '*' as interests, '*' as activities, '*' as music, '*' as tvShows, '*' as movies, '*' as books, (select amount*unix_timestamp(startDate) from CoinUsage use index (coinusage_profileidservicestatus_index) where profileId=pe.id and  service='TS' and status='A' and endDate > now() limit 1) as topServiceValue  from Person as pe   use index for order by (person_statusphotologindate_index)  left join Account ac on ac.id=pe.id  where (ac.username = '%' or ac.name like '%')   and ac.membership not in ('X','D','E') order by pe.status desc, pe.hasPhoto desc,pe.loginDate desc  limit 50
      

      I just had an idea about the problem, as you pay attention to following part at my last query

      (ac.username = '%' or ac.name like '%') 
      

      Database was unresponsive 4 times before and we always had this part at the last queries
      since, query parameter is empty, all Account table data with other joins are loaded into memory, it is > 3 million rows and > 50 rows, I guess as a result of this, buffers are overloaded and system stops responding to other requests

      We will fix query but even if we have this case, system must be responsive because we reorder results list and fetch only 50 rows from it.

      Attachments

        Activity

          People

            Unassigned Unassigned
            mehmet Mehmet
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.