[MDEV-10482] PHP FPM Connection problem Created: 2016-08-02  Updated: 2016-10-24  Resolved: 2016-10-24

Status: Closed
Project: MariaDB Server
Component/s: Plugin - unix_socket
Affects Version/s: 10.0.25
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Dallas Clarke Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Linux Debian (10.0.25-MariaDB-0+deb8u1 (Debian))


Attachments: Text File Engine Innodb Status.txt     PNG File MariaDB Crunch.png     Text File Show Global Status.txt     Text File Show Global Variables.txt    

 Description   

I have a standard LAMP setup with PHP FPM connecting to the Mariadb server on the local host via a linux file socket.

I've been getting an issue that has been getting successively worse for the last few weeks, for some reason the mariadb server just stops responding to the PHP processes. I can log onto the mariadb console and run "SHOW PROCESSLIST" and I can see that all the PHP process are connected to the database, but are all in the sleeping state.

All the PHP process are in a spin lock waiting for the results from the database, but the database is not responding. The PHP processes run at 100% CPU and while the mariadb is about 5% CPU. If I stop the database, the PHP stop spinning. If I use php5 or php7 processes, the same problem exists.

This state continues until I log into the console and type a command like "SHOW PROCESSLIST;" and "set global general_log_file = OFF;" and for some reason this then sparks the database into action. Restarting the database does not fix the problem, and after a few transactions, it again locks up.

The database seems to lock and sometimes unlock itself, but blocks all PHP processes in the action.

There is nothing in any of the log files, and I've tried changing as many of the setting that I thought would help. Nothing has fixed the issue, or I don't know what resource limit to increase.

The problem is getting worse as the size of the database has gotten larger and the amount of queries has become more.



 Comments   
Comment by Daniel Black [ 2016-08-02 ]

Are you actually using a unix socket authenticated user for the PHP fpm connections?

Are you short of ram?

Running with general log enabled in production doesn't seem like a good idea performance wise however if you tail this is it actually having queries going though?

Have you enabled/looked at the slow query log?

Are the PHP processes starving mariadb of CPU?

SHOW PROCESS LIST, SHOW ENGINE INNODB STATUS, SHOW GLOBAL STATUS, SHOW GLOBAL VARIABLES would be useful.

Use https://mariadb.com/kb/en/mariadb/debugging-a-running-server-on-linux/ to get a backtrace would also be useful (even if using the packaged version and not a custom built version).

Comment by Dallas Clarke [ 2016-08-03 ]

The system has 30GB of RAM and about 7GB free at any one time. It has 16CPU cores and I've limited the FPM process to 14 to prevent starvation, plus have dropped the process priority of the FPM process to 39 while lifting the mariadb process to 0.

It seems to happen as the database has gotten larger and inefficient queries are executed. Potentially disk I/O problems as well as the server is about 18 months old.

I had one really bad query, that searched a newsletter subscriber table for an email address and if it wasn't already in there it added it. There are about 170K subscribers, no index on the field and when multiple people did this at the same time, it could have been a cause of deadlock. I have since added an index on the email field, although this problems still exists, it's just the database is now getting through it in about 20 seconds.

The event constantly happens, where the FPM processes lockup waiting for the database, but the database is just sleeping. Then about 10-20 seconds later the DB will just jump into gear, run at 100% CPU and serve the FPM processes. Things settle down, then 10-30 seconds later we go through the same process again.

Comment by Dallas Clarke [ 2016-08-03 ]

Attaching the debugger locked up the entire box.

admin@ip-172-31-17-205:~$ sudo gdb -p 39839 /usr/bin/mysql
GNU gdb (Debian 7.7.1+dfsg-5) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/mysql...(no debugging symbols found)...done.
Attaching to program: /usr/bin/mysql, process 39839
Cannot access memory at address 0x894c000002aa8417
Cannot access memory at address 0x894c000002aa840f
(gdb) set height 0
(gdb) set logging file /tmp/mysql.log
(gdb) set logging on
Copying output to /tmp/mysql.log.
(gdb) thread apply all backtrace full

Thread 1 (process 39839):
#0 0x00007f57e5347d3d in ?? ()
No symbol table info available.
#1 0x0000000000000000 in ?? ()
No symbol table info available.
(gdb) exit
Undefined command: "exit". Try "help".
(gdb) quit
A debugging session is active.

Inferior 1 [process 39839] will be detached.

Quit anyway? (y or n) y
Detaching from program: /usr/bin/mysql, process 39839

Comment by Daniel Black [ 2016-08-03 ]

Attaching /tmp/mysql.log is important.

Your innodb_buffer_pool_size is 128M - far too small - 12G may be more appropriate.

its quite possible that your innodb_log_file_size is too small if you do lots of writes.

Comment by Dallas Clarke [ 2016-08-03 ]

The only thing in the mysql.log file was:-

Thread 1 (process 39839):
#0 0x00007f57e5347d3d in ?? ()
No symbol table info available.
#1 0x0000000000000000 in ?? ()
No symbol table info available.
Undefined command: "exit". Try "help".
A debugging session is active.

Inferior 1 [process 39839] will be detached.

Quit anyway? (y or n) Detaching from program: /usr/bin/mysql, process 39839

The entire box stops. I can't run the debugger as there are about 200 current users.

Comment by Daniel Black [ 2016-08-03 ]

you can script gdb scripts. Probably should document that.

Comment by Dallas Clarke [ 2016-08-03 ]

Upping the innodb_buffer_pool_size to 12GB has help, but the issue still exists. The Mariadb process now has twice the memory and is doing far less work, but something is still blocking it.

Also the khugepaged process is being called at the start of an event.

Comment by Dallas Clarke [ 2016-08-04 ]

After a dramatic crash of the system last night during a peak period, and repeatedly restarting Apache, php-fpm and MariaDB, I was out of ideas on how the fix the system and in desperation rebooted it. Like a classic old Windows server, it fixed the problems and the system is now well behaved and responding directly. Who knew that Linux is becoming more and more like Windows.

So it looks like the root cause on the bug was within Linux and not MariaDB, although better error messages and warnings from MariaDB would help system administrators resolve the issues with more predictability. Also logging warnings of inefficient queries that would benefit system administrators to identify problems before they become problems.

Looking at the learning experience, I now know most the the configurable options a LAMP server and mine can now handle about 50% greater load.

Comment by Dallas Clarke [ 2016-08-15 ]

After rebooting, these problem steadily returns and I'm now considering rebooting the server every night.

I've also discovering this problem on a second AWS server, where process connections are randomly being blocked. And with HHVM, these threads are being blocked for hours.

Is there any method of auto killing a process that's sleeping for more than 10 seconds?

Comment by Elena Stepanova [ 2016-09-25 ]

Probably 10 seconds is far too strict, but you might want to check and adjust wait_timeouts.
However, nothing in the provided information indicates that the ball is on the server's side. The connections are there; they are not hanging on any queries; they only spent a few seconds each in 'sleep', so apparently they were doing something else before. There is not enough information to make any conclusions, but the first guess is, if PHP takes a lot of CPU, and if you mostly catch database connections in 'sleep' state, it might mean that PHP has some kind of trouble processing information (either incoming requests or result sets from the server).

1) set up a process which will collect SHOW PROCESSLIST output every few seconds, to see how exactly it progresses;
2) enable your general log back for a while and check a log of a connection from the beginning. Importantly, see whether it sets AUTOCOMMIT=0 at the beginning, and if it does, whether it issues proper COMMIT while sending queries to the server;
3) check the disk activity and health, it's suspicious that general_log was such an obstacle. It is bound to affect performance of course, but on an instance with 14 parallel connections it should not be a big issue, unless something is seriously wrong with the disk or the log itself;
4) the request for gdb all threads' stack trace still applies. In the log above, you were trying to use mysql binary, so naturally it didn't work. You need mysqld.

Comment by Elena Stepanova [ 2016-10-24 ]

Please comment to if you have further information on the issue.

Generated at Thu Feb 08 07:42:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.