Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30481

Hard lock up with queries in "Opening tables" state

    XMLWordPrintable

Details

    Description

      We have a lot of servers that are using this same set up and configuration, and on one of them recently we started having a really odd issue that I can't explain.

      The short version is that one of the databases in the server stops responding. When we review the process list, it shows that queries are stuck in the "Opening tables" state. Other databases on this same server are still responding normally at this time.

      Killing the SQL processes from the mysql command line doesn't work. No error, it just doesn't kill it.

      Then, if we try to issue the standard systemctl restart mariadb, it looks like it tried to start shutting down, but never can. At that point, the process list looks like this (username and database name have been replaced):

      | 36649 | USERNAME | localhost | DATABASE | Killed  | 4044 | Opening tables | SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (3210213,3209688,1894564,1578 |    0.000 |
      | 36650 | USERNAME | localhost | DATABASE | Killed  | 4038 | Opening tables | SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (3210213,3209688,1894564,1578 |    0.000 |
      | 36653 | USERNAME | localhost | DATABASE | Killed  | 4038 | Opening tables | SELECT post_id, meta_key, meta_value FROM wp_postmeta WHERE post_id IN (3210213,3209688,1894564,1578 |    0.000 |
      ...
      ...
      ...
      

      Each time this has happened, we ended up having to issue a "kill -9" on the Mariadb master process to get it back. Once we did that, it started right back up and is running normally again.

      I have found no errors at the system level and Mariadb is not even recording any errors. Logging is still working because Mariadb does log some things during the event, like when a user doesn't use the right password. But there are zero errors logged.

      Just reviewing the documentation about this, it says that it could be caused by table_open_cache settings. On this server, there are about 1000 tables, including the tables in sys and mysql, etc. And we're using the default value of 2000 for table_open_cache right now. So I can't see how it could have anything to do with that setting.

      It's a difficult issue to debug since there is basically no data recorded about the problem and it's basically just locked up when I get to it.

      Any ideas?

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              wk_bradp Brad
              Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.