[MDEV-15222] Connections hanging in 'checking permissions' state Created: 2018-02-06 Updated: 2019-07-16 Resolved: 2019-07-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | OTHER |
| Affects Version/s: | 5.5.45, 5.5.59 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Chris Calender (Inactive) | Assignee: | Sergei Golubchik |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS release 6.7 (Final) |
||
| Issue Links: |
|
||||||||
| Description |
|
We are consistently experiencing a blocking issue on our main database server. Symptoms:
They have lasted from 100 seconds upwards of 2000 seconds, and then either clear out or they restart mysqld. It is only a certain query that causes this, and while the query is run against multiple schemas, it generally only hangs on a couple of the schemas. Note that most of the time this query runs fast (couple seconds), but only every now and again, this hanging in "checking permissions" starts. Generally the load is higher, but not always. They are using MariaDB 5.5.59 (latest 5.5), but it also occurred on an earlier 5.5.45. I had them upgrade to rule that out. Also, early indications seemed like their table_cache and table_definition_cache (and max_open_tables) were quite low. So we increased those significantly, but the problem still persists. Nothing is logged to the error log about it. The oldest active transaction is the one hung, so it is confusing what could be blocking this. For instance, here are the last 3 transactions from the latest SHOW ENGINE INNODB STATUS when this occurred:
|
| Comments |
| Comment by Sergei Golubchik [ 2018-02-11 ] | ||
|
Do you have a lot of rows in the mysql.user or mysql.db table? What does
return? | ||
| Comment by Chris Calender (Inactive) [ 2018-02-12 ] | ||
|
Hello serg Here are the results:
| ||
| Comment by Sergei Golubchik [ 2018-02-13 ] | ||
|
According to the stack traces (trace apply all bt in gdb) there are 5 threads stuck in acl_getroot() (called from set_routine_security_ctx() from sp_head::execute_function()), trying to lock the acl_cache->lock mutex. And one thread in in strcmp() invoked from acl_getroot(). Looks like a clear case showing that linear scanning or user/db lists is slow. Still, it cannot possibly take minutes, so might be coupled with some scheduler glitch? Anyway, there are few ideas how to speed up acl_getroot():
|