[MDEV-18027] Running out of file descriptors and eventual crash Created: 2018-12-18 Updated: 2023-10-10 Resolved: 2020-02-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Configuration, Server |
| Affects Version/s: | 10.2.12, 10.2, 10.3, 10.4 |
| Fix Version/s: | 10.2.32, 10.3.23, 10.4.13 |
| Type: | Bug | Priority: | Critical |
| Reporter: | David Crimmins | Assignee: | Oleksandr Byelkin |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | regression, regression-10.2 | ||
| Environment: |
Linux OL7 |
||
| Attachments: |
|
| Description |
|
Server generates errors and eventually crashes due to execeeding limit on number of open file descriptors. This occurs when additional open_table_caches_instances are created. The calculation for open_files_limit does not account for the fact that there may be multiple instances. I expect (but have not proven) problem could be avoided adjusting setting in config file to limit number of open_table_caches_instances or increasing open_files_limit. Currently neither of these are set in our config. |
| Comments |
| Comment by Elena Stepanova [ 2018-12-28 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Could you please paste or attach
Very importantly, it has to be the consistent set of data, all from the same single run. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by David Crimmins [ 2019-01-03 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Config and log files attached. NB log file has been truncated as it was too big. Further information as requested plus limits for running db process:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2019-01-21 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the information. Indeed, with MariaDB implementation of table_open_cache_instances, the total table_open_cache size might grow much higher than configured, and neither calculation of the number of open files to request from the system, nor auto-adjustment of max_connections and table_open_cache takes it into account. In this case, with only 1024 open files as the initial value (raised to 2005 based on the configuration), and with two detected occurrences of contention, which make total table open cache to jump to 1500, the problem is inevitable. (I would expect "Too many open files" rather than "Bad file descriptor", maybe it depends on the linux flavor.) Instead, it should have requested not 2005 open files ((max_connections + extra_max_connections) * 5), but 8431 ((extra_files + max_connections + extra_max_connections + tc_size * 2 * tc_instances)). It wouldn't succeed of course with the hard limit 4096, so the auto-sized values would have to be recalculated. max_connections would probably stay the same, although barely, but table_open_cache would drop to ~230. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Valerii Kravchuk [ 2019-04-26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Note that as table_open_cache_instances is introduced in 10.2.2+ and is 8 by default, users upgrading from 10.1, for example, may start to get "Too many open files" errors with the load that wroked well in 10.1. It's a regression of a kind. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-05-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
One possible way of fixing this is to set hard limit according to table_open_cache_instances. Initial soft limit should stay low. When number of table cache instance goes up, raise soft limit accordingly. Don't let table cache instances number go up if soft limit cannot be raised. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2019-07-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
My concern if it really should go to 10.2 because next complain from support will be that user upgraded and now number of connections and table cache decreased because there is no enough file handlers... | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2019-07-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
commit fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4 (HEAD For automatic number of opened files limit take into account number of table instances for table cache | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-07-10 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
It wouldn't be the case if it were implemented as I suggested May 15: Don't let table cache instances number go up if soft limit cannot be raised. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2019-07-15 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I think svoj suggested a better fix than fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4. New cache instances are created, when the contention is too high, which normally means there is some hot table accessed by many connections concurrently. There are many common workloads wihout a hot table, in these cases there will be only one table cache instance. Your fix in fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4 would unnecessary penalize these workloads — they'll have a smaller table cache for no good reason. I'd suggest to auto-reduce tc_instances instead. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2019-07-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I still do not understand how playing with soft limit can solve the problem described above? We promise big number of instances and big cash, in the middle of the game we say no we can not open more - how it is different from what we have now (inability to open files and crash)? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2019-07-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Don't increment number of instances if failed to increment soft limit. Then everything is under control, right? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Oleksandr Byelkin [ 2019-11-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
ommit edc9059c31bddfaa5294423dafc6adfd5a3eabc0 (HEAD For automatic number of opened files limit take into account number of table instances for table cache |