[MDEV-15222] Connections hanging in 'checking permissions' state - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 5.5.45, 5.5.59
Fix Version/s: N/A
Component/s: OTHER
Labels:
None
Environment:
CentOS release 6.7 (Final)

Description

We are consistently experiencing a blocking issue on our main database server.

Symptoms:

The enduser keeps waiting forever after a query to get appointments (this is a huge query with a lot of subqueries that normally takes only seconds).
On the server we see the connections that have the issue in the 'checking permissions' state for a long time (more then 2 minutes).
There are a many connections in that state simultaneously and their active time just keeps on going up.
When we kill the hanging connections sometimes the problem gets solved, but not always (The client automatically reconnects and retries the query.)
We have around a 1000 schemas and when the problem happens it only happens for a few of them. We have no clue yet what is different for them.
The issue appears usually in the early afternoon when the load on the server is higher.
We had the issue the first time a few months ago. Now it is back every day since a week.
The application is a swing application that connects to the database via jdbc.
The customers connect from all over canada to our database server directly. When the issue happens we can reproduce it also from our offices where we have a fat internet line.
We see nothing appear in the mysql error logs.
mysqld runs on a 31G 8cpu machine and it takes 17G. When the issue is manifesting mysqld takes up to 6 of the 8 cpus at 100%.

They have lasted from 100 seconds upwards of 2000 seconds, and then either clear out or they restart mysqld.

It is only a certain query that causes this, and while the query is run against multiple schemas, it generally only hangs on a couple of the schemas. Note that most of the time this query runs fast (couple seconds), but only every now and again, this hanging in "checking permissions" starts. Generally the load is higher, but not always.

They are using MariaDB 5.5.59 (latest 5.5), but it also occurred on an earlier 5.5.45. I had them upgrade to rule that out. Also, early indications seemed like their table_cache and table_definition_cache (and max_open_tables) were quite low. So we increased those significantly, but the problem still persists.

Nothing is logged to the error log about it.

The oldest active transaction is the one hung, so it is confusing what could be blocking this.

For instance, here are the last 3 transactions from the latest SHOW ENGINE INNODB STATUS when this occurred:

---TRANSACTION 2A765CFDB, ACTIVE 2044 sec

mysql tables in use 12, locked 0

MySQL thread id 95, OS thread handle 0x7f59a8d2a700, query id 74203968 X.X.X.X db1 checking permissions

select ...

Trx read view will not see trx with id >= 2A765CFDC, sees < 2A765458C

---TRANSACTION 2A7657FA5, ACTIVE 2061 sec

mysql tables in use 12, locked 0

MySQL thread id 224, OS thread handle 0x7f59a5a00700, query id 352097814 X.X.X.X db1 checking permissions

select ...

Trx read view will not see trx with id >= 2A7657FA6, sees < 2A765458C

---TRANSACTION 2A7654ADE, ACTIVE 2075 sec

mysql tables in use 12, locked 0

MySQL thread id 76, OS thread handle 0x7f59a9295700, query id 70280283 X.X.X.X db1 checking permissions

select ...

Trx read view will not see trx with id >= 2A7654ADF, sees < 2A765458C

----------------------------

END OF INNODB MONITOR OUTPUT

============================

Attachments

Issue Links

relates to

MDEV-15649 optimize search for user/db privileges

Closed

Activity

Ascending order - Click to sort in descending order

Sergei Golubchik added a comment - 2018-02-11 18:20

Do you have a lot of rows in the mysql.user or mysql.db table? What does

select count(*) from mysql.user;

select count(*) from mysql.db;

return?

Sergei Golubchik added a comment - 2018-02-11 18:20 Do you have a lot of rows in the mysql.user or mysql.db table? What does select count (*) from mysql. user ; select count (*) from mysql.db; return?

Chris Calender (Inactive) added a comment - 2018-02-12 18:55 - edited

Hello serg

Here are the results:

select count(*) from mysql.user = 567

select count(*) from mysql.db = 566

Chris Calender (Inactive) added a comment - 2018-02-12 18:55 - edited Hello serg Here are the results: select count(*) from mysql.user = 567 select count(*) from mysql.db = 566

Sergei Golubchik added a comment - 2018-02-13 18:03 - edited

According to the stack traces (trace apply all bt in gdb) there are 5 threads stuck in acl_getroot() (called from set_routine_security_ctx() from sp_head::execute_function()), trying to lock the acl_cache->lock mutex. And one thread in in strcmp() invoked from acl_getroot().

Looks like a clear case showing that linear scanning or user/db lists is slow. Still, it cannot possibly take minutes, so might be coupled with some scheduler glitch?

Anyway, there are few ideas how to speed up acl_getroot():

ignore the definer completely if the function does not access any tables (seems to be the case here)
cache the security context in the sp_head, to avoid calling acl_getroot() on every invocation
generally, get rid of linear scanning, and only scan for entries with wildcards, while using a hash for entries without.

Sergei Golubchik added a comment - 2018-02-13 18:03 - edited According to the stack traces ( trace apply all bt in gdb) there are 5 threads stuck in acl_getroot() (called from set_routine_security_ctx() from sp_head::execute_function() ), trying to lock the acl_cache->lock mutex. And one thread in in strcmp() invoked from acl_getroot() . Looks like a clear case showing that linear scanning or user/db lists is slow. Still, it cannot possibly take minutes, so might be coupled with some scheduler glitch? Anyway, there are few ideas how to speed up acl_getroot() : ignore the definer completely if the function does not access any tables (seems to be the case here) cache the security context in the sp_head , to avoid calling acl_getroot() on every invocation generally, get rid of linear scanning, and only scan for entries with wildcards, while using a hash for entries without.

People

Assignee:: Sergei Golubchik

Reporter:: Chris Calender (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2018-02-06 16:26

Updated:: 2024-07-08 00:22

Resolved:: 2019-07-16 19:46

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server