Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.0.25
    • N/A
    • Plugin - unix_socket
    • Linux Debian (10.0.25-MariaDB-0+deb8u1 (Debian))

    Description

      I have a standard LAMP setup with PHP FPM connecting to the Mariadb server on the local host via a linux file socket.

      I've been getting an issue that has been getting successively worse for the last few weeks, for some reason the mariadb server just stops responding to the PHP processes. I can log onto the mariadb console and run "SHOW PROCESSLIST" and I can see that all the PHP process are connected to the database, but are all in the sleeping state.

      All the PHP process are in a spin lock waiting for the results from the database, but the database is not responding. The PHP processes run at 100% CPU and while the mariadb is about 5% CPU. If I stop the database, the PHP stop spinning. If I use php5 or php7 processes, the same problem exists.

      This state continues until I log into the console and type a command like "SHOW PROCESSLIST;" and "set global general_log_file = OFF;" and for some reason this then sparks the database into action. Restarting the database does not fix the problem, and after a few transactions, it again locks up.

      The database seems to lock and sometimes unlock itself, but blocks all PHP processes in the action.

      There is nothing in any of the log files, and I've tried changing as many of the setting that I thought would help. Nothing has fixed the issue, or I don't know what resource limit to increase.

      The problem is getting worse as the size of the database has gotten larger and the amount of queries has become more.

      Attachments

        Activity

          4dallasc Dallas Clarke added a comment -

          Upping the innodb_buffer_pool_size to 12GB has help, but the issue still exists. The Mariadb process now has twice the memory and is doing far less work, but something is still blocking it.

          Also the khugepaged process is being called at the start of an event.

          4dallasc Dallas Clarke added a comment - Upping the innodb_buffer_pool_size to 12GB has help, but the issue still exists. The Mariadb process now has twice the memory and is doing far less work, but something is still blocking it. Also the khugepaged process is being called at the start of an event.
          4dallasc Dallas Clarke added a comment -

          After a dramatic crash of the system last night during a peak period, and repeatedly restarting Apache, php-fpm and MariaDB, I was out of ideas on how the fix the system and in desperation rebooted it. Like a classic old Windows server, it fixed the problems and the system is now well behaved and responding directly. Who knew that Linux is becoming more and more like Windows.

          So it looks like the root cause on the bug was within Linux and not MariaDB, although better error messages and warnings from MariaDB would help system administrators resolve the issues with more predictability. Also logging warnings of inefficient queries that would benefit system administrators to identify problems before they become problems.

          Looking at the learning experience, I now know most the the configurable options a LAMP server and mine can now handle about 50% greater load.

          4dallasc Dallas Clarke added a comment - After a dramatic crash of the system last night during a peak period, and repeatedly restarting Apache, php-fpm and MariaDB, I was out of ideas on how the fix the system and in desperation rebooted it. Like a classic old Windows server, it fixed the problems and the system is now well behaved and responding directly. Who knew that Linux is becoming more and more like Windows. So it looks like the root cause on the bug was within Linux and not MariaDB, although better error messages and warnings from MariaDB would help system administrators resolve the issues with more predictability. Also logging warnings of inefficient queries that would benefit system administrators to identify problems before they become problems. Looking at the learning experience, I now know most the the configurable options a LAMP server and mine can now handle about 50% greater load.
          4dallasc Dallas Clarke added a comment -

          After rebooting, these problem steadily returns and I'm now considering rebooting the server every night.

          I've also discovering this problem on a second AWS server, where process connections are randomly being blocked. And with HHVM, these threads are being blocked for hours.

          Is there any method of auto killing a process that's sleeping for more than 10 seconds?

          4dallasc Dallas Clarke added a comment - After rebooting, these problem steadily returns and I'm now considering rebooting the server every night. I've also discovering this problem on a second AWS server, where process connections are randomly being blocked. And with HHVM, these threads are being blocked for hours. Is there any method of auto killing a process that's sleeping for more than 10 seconds?
          elenst Elena Stepanova added a comment - - edited

          Probably 10 seconds is far too strict, but you might want to check and adjust wait_timeouts.
          However, nothing in the provided information indicates that the ball is on the server's side. The connections are there; they are not hanging on any queries; they only spent a few seconds each in 'sleep', so apparently they were doing something else before. There is not enough information to make any conclusions, but the first guess is, if PHP takes a lot of CPU, and if you mostly catch database connections in 'sleep' state, it might mean that PHP has some kind of trouble processing information (either incoming requests or result sets from the server).

          1) set up a process which will collect SHOW PROCESSLIST output every few seconds, to see how exactly it progresses;
          2) enable your general log back for a while and check a log of a connection from the beginning. Importantly, see whether it sets AUTOCOMMIT=0 at the beginning, and if it does, whether it issues proper COMMIT while sending queries to the server;
          3) check the disk activity and health, it's suspicious that general_log was such an obstacle. It is bound to affect performance of course, but on an instance with 14 parallel connections it should not be a big issue, unless something is seriously wrong with the disk or the log itself;
          4) the request for gdb all threads' stack trace still applies. In the log above, you were trying to use mysql binary, so naturally it didn't work. You need mysqld.

          elenst Elena Stepanova added a comment - - edited Probably 10 seconds is far too strict, but you might want to check and adjust wait_timeouts . However, nothing in the provided information indicates that the ball is on the server's side. The connections are there; they are not hanging on any queries; they only spent a few seconds each in 'sleep', so apparently they were doing something else before. There is not enough information to make any conclusions, but the first guess is, if PHP takes a lot of CPU, and if you mostly catch database connections in 'sleep' state, it might mean that PHP has some kind of trouble processing information (either incoming requests or result sets from the server). 1) set up a process which will collect SHOW PROCESSLIST output every few seconds, to see how exactly it progresses; 2) enable your general log back for a while and check a log of a connection from the beginning. Importantly, see whether it sets AUTOCOMMIT=0 at the beginning, and if it does, whether it issues proper COMMIT while sending queries to the server; 3) check the disk activity and health, it's suspicious that general_log was such an obstacle. It is bound to affect performance of course, but on an instance with 14 parallel connections it should not be a big issue, unless something is seriously wrong with the disk or the log itself; 4) the request for gdb all threads' stack trace still applies. In the log above, you were trying to use mysql binary, so naturally it didn't work. You need mysqld .

          Please comment to if you have further information on the issue.

          elenst Elena Stepanova added a comment - Please comment to if you have further information on the issue.

          People

            Unassigned Unassigned
            4dallasc Dallas Clarke
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.