[MDEV-8251] MariaDB server hang with all threads idle Created: 2015-06-01  Updated: 2016-09-23  Resolved: 2016-09-23

Status: Closed
Project: MariaDB Server
Component/s: Platform FreeBSD
Affects Version/s: 10.0.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Alex Strange Assignee: Vladislav Vaintroub
Resolution: Incomplete Votes: 0
Labels: None
Environment:

FreeBSD 10.1-RELEASE-p7 / x86-64 / MariaDB 10.0.17


Attachments: Text File mariadb-hang.txt     Text File mysql gdb.txt    

 Description   

I have a production MariaDB server which is occasionally experiencing a hang; it refuses new connections, does not create any more threads, and stays at 0% cpu until forced to restart.

I'm using threadpool:
thread_handling=pool-of-threads
thread_pool_size=48
thread_pool_max_threads=128
thread_pool_idle_timeout=30

and xtrabackup. The issue always occurs around the same time of day (once every week or two) so I suspect it may be related to xtrabackup's pausing the server to backup.

Captured a gdb backtrace in this state, but imposed a workaround of restarting the process rather than try to debug further, since the server is being used.



 Comments   
Comment by Elena Stepanova [ 2015-06-01 ]

Hi,

Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all?

In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.

Comment by Elena Stepanova [ 2015-07-02 ]

jplindst,

Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?

Comment by Jan Lindström (Inactive) [ 2015-07-21 ]

Hi,

Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following

(1) show processlist; (do this several times e.g. every 1 minute)
(2) show innodb status; (do this also several times e.g. every 1 minute)
(3) wait at least 600s
(4) provide full unedited error log (make sure you have that enabled)
(5) do several gdb stack outputs e.g. 3 on 1 minute wait between
(7) see top or similar several times e.g. every 1 minute

From current stack trace it does not look like a hang.

R: Jan

Comment by Elena Stepanova [ 2015-09-01 ]

astrange,

Did you have a chance to upgrade? Are you still experiencing the problem?

Comment by Alex Strange [ 2015-09-02 ]

We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.

Comment by Elena Stepanova [ 2015-09-30 ]

astrange,

Did you disable the workaround? What was the result?

Comment by Elena Stepanova [ 2015-10-28 ]

Closing for now, but if you have any new information, please comment to re-open the issue.

Comment by Alex Strange [ 2015-11-13 ]

OK, I have seen it once with 10.0.21 just recently. Have not updated to 10.1 yet.

I captured one backtrace (attached), but when I did cont/wait a while/stop again to get a second one, gdb hung. When I killed gdb it killed mysqld as well, so info was lost.

Comment by Elena Stepanova [ 2016-01-21 ]

wlad, could you please look into it when you have a chance? Thanks.

Comment by Vladislav Vaintroub [ 2016-01-21 ]

I'd say the max threads is set too low, perhaps 1000 or so should be about right.

From the stacks provided it is hard to guess what happened (the Nov gdb dump it looks like threadpool has nothing to do because clients would be idle, and May dump shows one thread that is active still)
If you are still able to login with a new user, show processlist output would be a nice to have.

Comment by Vladislav Vaintroub [ 2016-02-10 ]

astrange, any news? Did this ever occur again? As I said, thread_pool_max_threads is too low, this makes, the chances of deadlock higher, at least theoretically, but in this case there would be some output in the error log.

Comment by Alex Strange [ 2016-02-11 ]

The issue has not occurred since I increased max threads to 1000. I moved the xtrabackup to a separate replicated server and will check it hasn't been seeing any problems shortly. I'll close this if it hasn't.

Comment by Vladislav Vaintroub [ 2016-09-23 ]

astrange, any news. It's been a while since last comment. Did it ever occur since?

Comment by Alex Strange [ 2016-09-23 ]

Rarely. It doesn't track xtrabackup, but does track other scheduled backups on the server that don't touch the database. I think this is a FreeBSD issue… harder to debug one of those of course.

This can be closed.

Generated at Thu Feb 08 07:25:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.