[MDEV-8251] MariaDB server hang with all threads idle - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.0.17
Fix Version/s: N/A
Component/s: Platform FreeBSD
Labels:
None
Environment:
FreeBSD 10.1-RELEASE-p7 / x86-64 / MariaDB 10.0.17

Description

I have a production MariaDB server which is occasionally experiencing a hang; it refuses new connections, does not create any more threads, and stays at 0% cpu until forced to restart.

I'm using threadpool:
thread_handling=pool-of-threads
thread_pool_size=48
thread_pool_max_threads=128
thread_pool_idle_timeout=30

and xtrabackup. The issue always occurs around the same time of day (once every week or two) so I suspect it may be related to xtrabackup's pausing the server to backup.

Captured a gdb backtrace in this state, but imposed a workaround of restarting the process rather than try to debug further, since the server is being used.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mariadb-hang.txt
85 kB
2015-11-13 10:05
mysql gdb.txt
166 kB
2015-06-01 06:29

Activity

Ascending order - Click to sort in descending order

Alex Strange created issue - 2015-06-01 06:32

Elena Stepanova added a comment - 2015-06-01 18:51

Hi,

Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all?

In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.

Elena Stepanova added a comment - 2015-06-01 18:51 Hi, Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all? In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.

Elena Stepanova made changes - 2015-06-01 18:51

Field	Original Value	New Value
Labels		need_feedback

Elena Stepanova added a comment - 2015-07-02 20:11

jplindst,

Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?

Elena Stepanova added a comment - 2015-07-02 20:11 jplindst , Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?

Elena Stepanova made changes - 2015-07-02 20:11

Assignee

Jan Lindström [ jplindst ]

Jan Lindström (Inactive) added a comment - 2015-07-21 15:15

Hi,

Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following

(1) show processlist; (do this several times e.g. every 1 minute)
(2) show innodb status; (do this also several times e.g. every 1 minute)
(3) wait at least 600s
(4) provide full unedited error log (make sure you have that enabled)
(5) do several gdb stack outputs e.g. 3 on 1 minute wait between
(7) see top or similar several times e.g. every 1 minute

From current stack trace it does not look like a hang.

R: Jan

Jan Lindström (Inactive) added a comment - 2015-07-21 15:15 Hi, Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following (1) show processlist; (do this several times e.g. every 1 minute) (2) show innodb status; (do this also several times e.g. every 1 minute) (3) wait at least 600s (4) provide full unedited error log (make sure you have that enabled) (5) do several gdb stack outputs e.g. 3 on 1 minute wait between (7) see top or similar several times e.g. every 1 minute From current stack trace it does not look like a hang. R: Jan

Sergei Golubchik made changes - 2015-08-03 20:48

Assignee

Jan Lindström [ jplindst ]

Elena Stepanova added a comment - 2015-09-01 01:16

astrange,

Did you have a chance to upgrade? Are you still experiencing the problem?

Elena Stepanova added a comment - 2015-09-01 01:16 astrange , Did you have a chance to upgrade? Are you still experiencing the problem?

Alex Strange added a comment - 2015-09-02 03:43

We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.

Alex Strange added a comment - 2015-09-02 03:43 We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.

Elena Stepanova added a comment - 2015-09-30 13:35

astrange,

Did you disable the workaround? What was the result?

Elena Stepanova added a comment - 2015-09-30 13:35 astrange , Did you disable the workaround? What was the result?

Elena Stepanova added a comment - 2015-10-28 19:41

Closing for now, but if you have any new information, please comment to re-open the issue.

Elena Stepanova added a comment - 2015-10-28 19:41 Closing for now, but if you have any new information, please comment to re-open the issue.

Elena Stepanova made changes - 2015-10-28 19:41

Component/s		OTHER [ 10125 ]
Fix Version/s		N/A [ 14700 ]
Resolution		Incomplete [ 4 ]
Status	Open [ 1 ]	Closed [ 6 ]

Alex Strange added a comment - 2015-11-13 10:04

OK, I have seen it once with 10.0.21 just recently. Have not updated to 10.1 yet.

I captured one backtrace (attached), but when I did cont/wait a while/stop again to get a second one, gdb hung. When I killed gdb it killed mysqld as well, so info was lost.

Alex Strange added a comment - 2015-11-13 10:04 OK, I have seen it once with 10.0.21 just recently. Have not updated to 10.1 yet. I captured one backtrace (attached), but when I did cont/wait a while/stop again to get a second one, gdb hung. When I killed gdb it killed mysqld as well, so info was lost.

Alex Strange made changes - 2015-11-13 10:05

Attachment

mariadb-hang.txt [ 40415 ]

Elena Stepanova made changes - 2015-11-13 12:05

Labels

need_feedback

Elena Stepanova made changes - 2015-11-13 12:05

Resolution	Incomplete [ 4 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Elena Stepanova added a comment - 2016-01-21 00:45

wlad, could you please look into it when you have a chance? Thanks.

Elena Stepanova added a comment - 2016-01-21 00:45 wlad , could you please look into it when you have a chance? Thanks.

Elena Stepanova made changes - 2016-01-21 00:45

Fix Version/s		10.0 [ 16000 ]
Fix Version/s	N/A [ 14700 ]
Assignee		Vladislav Vaintroub [ wlad ]

Vladislav Vaintroub added a comment - 2016-01-21 01:48 - edited

I'd say the max threads is set too low, perhaps 1000 or so should be about right.

From the stacks provided it is hard to guess what happened (the Nov gdb dump it looks like threadpool has nothing to do because clients would be idle, and May dump shows one thread that is active still)
If you are still able to login with a new user, show processlist output would be a nice to have.

Vladislav Vaintroub added a comment - 2016-01-21 01:48 - edited I'd say the max threads is set too low, perhaps 1000 or so should be about right. From the stacks provided it is hard to guess what happened (the Nov gdb dump it looks like threadpool has nothing to do because clients would be idle, and May dump shows one thread that is active still) If you are still able to login with a new user, show processlist output would be a nice to have.

Vladislav Vaintroub added a comment - 2016-02-10 23:29

astrange, any news? Did this ever occur again? As I said, thread_pool_max_threads is too low, this makes, the chances of deadlock higher, at least theoretically, but in this case there would be some output in the error log.

Vladislav Vaintroub added a comment - 2016-02-10 23:29 astrange , any news? Did this ever occur again? As I said, thread_pool_max_threads is too low, this makes, the chances of deadlock higher, at least theoretically, but in this case there would be some output in the error log.

Alex Strange added a comment - 2016-02-11 00:00

The issue has not occurred since I increased max threads to 1000. I moved the xtrabackup to a separate replicated server and will check it hasn't been seeing any problems shortly. I'll close this if it hasn't.

Alex Strange added a comment - 2016-02-11 00:00 The issue has not occurred since I increased max threads to 1000. I moved the xtrabackup to a separate replicated server and will check it hasn't been seeing any problems shortly. I'll close this if it hasn't.

Vladislav Vaintroub added a comment - 2016-09-23 19:28

astrange, any news. It's been a while since last comment. Did it ever occur since?

Vladislav Vaintroub added a comment - 2016-09-23 19:28 astrange , any news. It's been a while since last comment. Did it ever occur since?

Alex Strange added a comment - 2016-09-23 19:37

Rarely. It doesn't track xtrabackup, but does track other scheduled backups on the server that don't touch the database. I think this is a FreeBSD issue… harder to debug one of those of course.

This can be closed.

Alex Strange added a comment - 2016-09-23 19:37 Rarely. It doesn't track xtrabackup, but does track other scheduled backups on the server that don't touch the database. I think this is a FreeBSD issue… harder to debug one of those of course. This can be closed.

Vladislav Vaintroub made changes - 2016-09-23 20:33

Component/s		Platform FreeBSD [ 10139 ]
Component/s	OTHER [ 10125 ]
Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.0 [ 16000 ]
Resolution		Incomplete [ 4 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2021-12-06 21:41

Workflow

MariaDB v3 [ 69800 ]

MariaDB v4 [ 149227 ]

People

Assignee:: Vladislav Vaintroub

Reporter:: Alex Strange

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2015-06-01 06:32

Updated:: 2016-09-23 20:33

Resolved:: 2016-09-23 20:33

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Activity

People

Dates

Git Integration