Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-8251

MariaDB server hang with all threads idle

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.0.17
    • N/A
    • Platform FreeBSD
    • None
    • FreeBSD 10.1-RELEASE-p7 / x86-64 / MariaDB 10.0.17

    Description

      I have a production MariaDB server which is occasionally experiencing a hang; it refuses new connections, does not create any more threads, and stays at 0% cpu until forced to restart.

      I'm using threadpool:
      thread_handling=pool-of-threads
      thread_pool_size=48
      thread_pool_max_threads=128
      thread_pool_idle_timeout=30

      and xtrabackup. The issue always occurs around the same time of day (once every week or two) so I suspect it may be related to xtrabackup's pausing the server to backup.

      Captured a gdb backtrace in this state, but imposed a workaround of restarting the process rather than try to debug further, since the server is being used.

      Attachments

        Activity

          astrange Alex Strange created issue -

          Hi,

          Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all?

          In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.

          elenst Elena Stepanova added a comment - Hi, Did you, by any chance, happen to capture two consequent backtraces in this state, so we could be sure it totally froze and wasn't doing anything at all? In the provided stack trace, at least thread 88 does not appear to be waiting on anything, so it would be useful to understand whether it was really stuck in this strange place, or it was still doing something, even though so lazily and slowly that the CPU usage appeared to be 0.
          elenst Elena Stepanova made changes -
          Field Original Value New Value
          Labels need_feedback

          jplindst,

          Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?

          elenst Elena Stepanova added a comment - jplindst , Could you please take a look at the stack trace – does it look like a hang to you? And if it does, maybe you've seen it before?
          elenst Elena Stepanova made changes -
          Assignee Jan Lindström [ jplindst ]

          Hi,

          Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following

          (1) show processlist; (do this several times e.g. every 1 minute)
          (2) show innodb status; (do this also several times e.g. every 1 minute)
          (3) wait at least 600s
          (4) provide full unedited error log (make sure you have that enabled)
          (5) do several gdb stack outputs e.g. 3 on 1 minute wait between
          (7) see top or similar several times e.g. every 1 minute

          From current stack trace it does not look like a hang.

          R: Jan

          jplindst Jan Lindström (Inactive) added a comment - Hi, Firstly, I suggest that migrate to latest MariaDB version 10.0.20, if this problem repeats I would need output from following (1) show processlist; (do this several times e.g. every 1 minute) (2) show innodb status; (do this also several times e.g. every 1 minute) (3) wait at least 600s (4) provide full unedited error log (make sure you have that enabled) (5) do several gdb stack outputs e.g. 3 on 1 minute wait between (7) see top or similar several times e.g. every 1 minute From current stack trace it does not look like a hang. R: Jan
          serg Sergei Golubchik made changes -
          Assignee Jan Lindström [ jplindst ]

          astrange,

          Did you have a chance to upgrade? Are you still experiencing the problem?

          elenst Elena Stepanova added a comment - astrange , Did you have a chance to upgrade? Are you still experiencing the problem?
          astrange Alex Strange added a comment -

          We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.

          astrange Alex Strange added a comment - We're running the latest 10.0 version, but have a workaround in place where it gets restarted on a schedule, which is preventing it from reoccurring. I'll disable that.

          astrange,

          Did you disable the workaround? What was the result?

          elenst Elena Stepanova added a comment - astrange , Did you disable the workaround? What was the result?

          Closing for now, but if you have any new information, please comment to re-open the issue.

          elenst Elena Stepanova added a comment - Closing for now, but if you have any new information, please comment to re-open the issue.
          elenst Elena Stepanova made changes -
          Component/s OTHER [ 10125 ]
          Fix Version/s N/A [ 14700 ]
          Resolution Incomplete [ 4 ]
          Status Open [ 1 ] Closed [ 6 ]
          astrange Alex Strange added a comment -

          OK, I have seen it once with 10.0.21 just recently. Have not updated to 10.1 yet.

          I captured one backtrace (attached), but when I did cont/wait a while/stop again to get a second one, gdb hung. When I killed gdb it killed mysqld as well, so info was lost.

          astrange Alex Strange added a comment - OK, I have seen it once with 10.0.21 just recently. Have not updated to 10.1 yet. I captured one backtrace (attached), but when I did cont/wait a while/stop again to get a second one, gdb hung. When I killed gdb it killed mysqld as well, so info was lost.
          astrange Alex Strange made changes -
          Attachment mariadb-hang.txt [ 40415 ]
          elenst Elena Stepanova made changes -
          Labels need_feedback
          elenst Elena Stepanova made changes -
          Resolution Incomplete [ 4 ]
          Status Closed [ 6 ] Stalled [ 10000 ]

          wlad, could you please look into it when you have a chance? Thanks.

          elenst Elena Stepanova added a comment - wlad , could you please look into it when you have a chance? Thanks.
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          Fix Version/s N/A [ 14700 ]
          Assignee Vladislav Vaintroub [ wlad ]
          wlad Vladislav Vaintroub added a comment - - edited

          I'd say the max threads is set too low, perhaps 1000 or so should be about right.

          From the stacks provided it is hard to guess what happened (the Nov gdb dump it looks like threadpool has nothing to do because clients would be idle, and May dump shows one thread that is active still)
          If you are still able to login with a new user, show processlist output would be a nice to have.

          wlad Vladislav Vaintroub added a comment - - edited I'd say the max threads is set too low, perhaps 1000 or so should be about right. From the stacks provided it is hard to guess what happened (the Nov gdb dump it looks like threadpool has nothing to do because clients would be idle, and May dump shows one thread that is active still) If you are still able to login with a new user, show processlist output would be a nice to have.

          astrange, any news? Did this ever occur again? As I said, thread_pool_max_threads is too low, this makes, the chances of deadlock higher, at least theoretically, but in this case there would be some output in the error log.

          wlad Vladislav Vaintroub added a comment - astrange , any news? Did this ever occur again? As I said, thread_pool_max_threads is too low, this makes, the chances of deadlock higher, at least theoretically, but in this case there would be some output in the error log.
          astrange Alex Strange added a comment -

          The issue has not occurred since I increased max threads to 1000. I moved the xtrabackup to a separate replicated server and will check it hasn't been seeing any problems shortly. I'll close this if it hasn't.

          astrange Alex Strange added a comment - The issue has not occurred since I increased max threads to 1000. I moved the xtrabackup to a separate replicated server and will check it hasn't been seeing any problems shortly. I'll close this if it hasn't.

          astrange, any news. It's been a while since last comment. Did it ever occur since?

          wlad Vladislav Vaintroub added a comment - astrange , any news. It's been a while since last comment. Did it ever occur since?
          astrange Alex Strange added a comment -

          Rarely. It doesn't track xtrabackup, but does track other scheduled backups on the server that don't touch the database. I think this is a FreeBSD issue… harder to debug one of those of course.

          This can be closed.

          astrange Alex Strange added a comment - Rarely. It doesn't track xtrabackup, but does track other scheduled backups on the server that don't touch the database. I think this is a FreeBSD issue… harder to debug one of those of course. This can be closed.
          wlad Vladislav Vaintroub made changes -
          Component/s Platform FreeBSD [ 10139 ]
          Component/s OTHER [ 10125 ]
          Fix Version/s N/A [ 14700 ]
          Fix Version/s 10.0 [ 16000 ]
          Resolution Incomplete [ 4 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 69800 ] MariaDB v4 [ 149227 ]

          People

            wlad Vladislav Vaintroub
            astrange Alex Strange
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.