Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18027

Running out of file descriptors and eventual crash

Details

    Description

      Server generates errors and eventually crashes due to execeeding limit on number of open file descriptors.

      This occurs when additional open_table_caches_instances are created. The calculation for open_files_limit does not account for the fact that there may be multiple instances.

      I expect (but have not proven) problem could be avoided adjusting setting in config file to limit number of open_table_caches_instances or increasing open_files_limit. Currently neither of these are set in our config.

      Attachments

        Issue Links

          Activity

            Could you please paste or attach

            • the exact messages that you see – warnings upon the server startup about adjusting values, errors and the crash report which you are getting later;
            • the output of

              select @@max_connections, @@open_files_limit, @@table_open_cache, @@table_open_cache_instances;
              

            • the output of

              ulimit -a
              ulimit -aH
              

            • your server config file(s) and command-line options if you use any.

            Very importantly, it has to be the consistent set of data, all from the same single run.

            elenst Elena Stepanova added a comment - Could you please paste or attach the exact messages that you see – warnings upon the server startup about adjusting values, errors and the crash report which you are getting later; the output of select @@max_connections, @@open_files_limit, @@table_open_cache, @@table_open_cache_instances; the output of ulimit -a ulimit -aH your server config file(s) and command-line options if you use any. Very importantly, it has to be the consistent set of data, all from the same single run.
            David Crimmins David Crimmins added a comment - - edited

            Config and log files attached. NB log file has been truncated as it was too big.

            Further information as requested plus limits for running db process:

            Welcome to the MariaDB monitor.  Commands end with ; or \g.
            Your MariaDB connection id is 3
            Server version: 10.2.12-MariaDB-log MariaDB Server
             
            Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.
             
            Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
             
            MariaDB [(none)]> select @@max_connections, @@open_files_limit, @@table_open_cache, @@table_open_cache_instances;
            +-------------------+--------------------+--------------------+------------------------------+
            | @@max_connections | @@open_files_limit | @@table_open_cache | @@table_open_cache_instances |
            +-------------------+--------------------+--------------------+------------------------------+
            |               400 |               2005 |                500 |                            8 |
            +-------------------+--------------------+--------------------+------------------------------+
            1 row in set (0.00 sec)
            

            bri-lin7 Entuity # ulimit -a
            core file size          (blocks, -c) 0
            data seg size           (kbytes, -d) unlimited
            scheduling priority             (-e) 0
            file size               (blocks, -f) unlimited
            pending signals                 (-i) 10908
            max locked memory       (kbytes, -l) 64
            max memory size         (kbytes, -m) unlimited
            open files                      (-n) 1024
            pipe size            (512 bytes, -p) 8
            POSIX message queues     (bytes, -q) 819200
            real-time priority              (-r) 0
            stack size              (kbytes, -s) 8192
            cpu time               (seconds, -t) unlimited
            max user processes              (-u) 10908
            virtual memory          (kbytes, -v) unlimited
            file locks                      (-x) unlimited
            

            bri-lin7 Entuity # ulimit -aH
            core file size          (blocks, -c) unlimited
            data seg size           (kbytes, -d) unlimited
            scheduling priority             (-e) 0
            file size               (blocks, -f) unlimited
            pending signals                 (-i) 10908
            max locked memory       (kbytes, -l) 64
            max memory size         (kbytes, -m) unlimited
            open files                      (-n) 4096
            pipe size            (512 bytes, -p) 8
            POSIX message queues     (bytes, -q) 819200
            real-time priority              (-r) 0
            stack size              (kbytes, -s) unlimited
            cpu time               (seconds, -t) unlimited
            max user processes              (-u) 10908
            virtual memory          (kbytes, -v) unlimited
            file locks                      (-x) unlimited
            

            bri-lin7 Entuity # cat /proc/`pgrep mysqld`/limits
            Limit                     Soft Limit           Hard Limit           Units
            Max cpu time              unlimited            unlimited            seconds
            Max file size             unlimited            unlimited            bytes
            Max data size             unlimited            unlimited            bytes
            Max stack size            8388608              unlimited            bytes
            Max core file size        unlimited            unlimited            bytes
            Max resident set          unlimited            unlimited            bytes
            Max processes             10908                10908                processes
            Max open files            2005                 2005                 files
            Max locked memory         65536                65536                bytes
            Max address space         unlimited            unlimited            bytes
            Max file locks            unlimited            unlimited            locks
            Max pending signals       10908                10908                signals
            Max msgqueue size         819200               819200               bytes
            Max nice priority         0                    0
            Max realtime priority     0                    0
            Max realtime timeout      unlimited            unlimited            us
            

            David Crimmins David Crimmins added a comment - - edited Config and log files attached. NB log file has been truncated as it was too big. Further information as requested plus limits for running db process: Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 3 Server version: 10.2.12-MariaDB-log MariaDB Server   Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.   Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.   MariaDB [(none)]> select @@max_connections, @@open_files_limit, @@table_open_cache, @@table_open_cache_instances; +-------------------+--------------------+--------------------+------------------------------+ | @@max_connections | @@open_files_limit | @@table_open_cache | @@table_open_cache_instances | +-------------------+--------------------+--------------------+------------------------------+ | 400 | 2005 | 500 | 8 | +-------------------+--------------------+--------------------+------------------------------+ 1 row in set (0.00 sec) bri-lin7 Entuity # ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 10908 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 10908 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited bri-lin7 Entuity # ulimit -aH core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 10908 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 10908 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited bri-lin7 Entuity # cat /proc/`pgrep mysqld`/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size unlimited unlimited bytes Max resident set unlimited unlimited bytes Max processes 10908 10908 processes Max open files 2005 2005 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 10908 10908 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us
            elenst Elena Stepanova added a comment - - edited

            Thanks for the information.

            Indeed, with MariaDB implementation of table_open_cache_instances, the total table_open_cache size might grow much higher than configured, and neither calculation of the number of open files to request from the system, nor auto-adjustment of max_connections and table_open_cache takes it into account.

            In this case, with only 1024 open files as the initial value (raised to 2005 based on the configuration), and with two detected occurrences of contention, which make total table open cache to jump to 1500, the problem is inevitable. (I would expect "Too many open files" rather than "Bad file descriptor", maybe it depends on the linux flavor.)

            Instead, it should have requested not 2005 open files ((max_connections + extra_max_connections) * 5), but 8431 ((extra_files + max_connections + extra_max_connections + tc_size * 2 * tc_instances)). It wouldn't succeed of course with the hard limit 4096, so the auto-sized values would have to be recalculated. max_connections would probably stay the same, although barely, but table_open_cache would drop to ~230.

            elenst Elena Stepanova added a comment - - edited Thanks for the information. Indeed, with MariaDB implementation of table_open_cache_instances , the total table_open_cache size might grow much higher than configured, and neither calculation of the number of open files to request from the system, nor auto-adjustment of max_connections and table_open_cache takes it into account. In this case, with only 1024 open files as the initial value (raised to 2005 based on the configuration), and with two detected occurrences of contention, which make total table open cache to jump to 1500, the problem is inevitable. (I would expect "Too many open files" rather than "Bad file descriptor", maybe it depends on the linux flavor.) Instead, it should have requested not 2005 open files ( (max_connections + extra_max_connections) * 5 ), but 8431 ( (extra_files + max_connections + extra_max_connections + tc_size * 2 * tc_instances ) ). It wouldn't succeed of course with the hard limit 4096, so the auto-sized values would have to be recalculated. max_connections would probably stay the same, although barely, but table_open_cache would drop to ~230.
            valerii Valerii Kravchuk added a comment - - edited

            Note that as table_open_cache_instances is introduced in 10.2.2+ and is 8 by default, users upgrading from 10.1, for example, may start to get "Too many open files" errors with the load that wroked well in 10.1. It's a regression of a kind.

            valerii Valerii Kravchuk added a comment - - edited Note that as table_open_cache_instances is introduced in 10.2.2+ and is 8 by default, users upgrading from 10.1, for example, may start to get "Too many open files" errors with the load that wroked well in 10.1. It's a regression of a kind.

            One possible way of fixing this is to set hard limit according to table_open_cache_instances. Initial soft limit should stay low. When number of table cache instance goes up, raise soft limit accordingly. Don't let table cache instances number go up if soft limit cannot be raised.

            svoj Sergey Vojtovich added a comment - One possible way of fixing this is to set hard limit according to table_open_cache_instances. Initial soft limit should stay low. When number of table cache instance goes up, raise soft limit accordingly. Don't let table cache instances number go up if soft limit cannot be raised.
            sanja Oleksandr Byelkin added a comment - - edited

            My concern if it really should go to 10.2 because next complain from support will be that user upgraded and now number of connections and table cache decreased because there is no enough file handlers...

            sanja Oleksandr Byelkin added a comment - - edited My concern if it really should go to 10.2 because next complain from support will be that user upgraded and now number of connections and table cache decreased because there is no enough file handlers...

            commit fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4 (HEAD > bb-10.2MDEV-18027, origin/bb-10.2-MDEV-18027)
            Author: Oleksandr Byelkin <sanja@mariadb.com>
            Date: Wed Jul 10 13:40:54 2019 +0200

            MDEV-18027: Running out of file descriptors and eventual crash

            For automatic number of opened files limit take into account number of table instances for table cache

            sanja Oleksandr Byelkin added a comment - commit fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4 (HEAD > bb-10.2 MDEV-18027 , origin/bb-10.2- MDEV-18027 ) Author: Oleksandr Byelkin <sanja@mariadb.com> Date: Wed Jul 10 13:40:54 2019 +0200 MDEV-18027 : Running out of file descriptors and eventual crash For automatic number of opened files limit take into account number of table instances for table cache

            My concern if it really should go to 10.2 because next complain from support will be that user upgraded and now number of connections and table cache decreased because there is no enough file handlers...

            It wouldn't be the case if it were implemented as I suggested May 15: Don't let table cache instances number go up if soft limit cannot be raised.

            svoj Sergey Vojtovich added a comment - My concern if it really should go to 10.2 because next complain from support will be that user upgraded and now number of connections and table cache decreased because there is no enough file handlers... It wouldn't be the case if it were implemented as I suggested May 15: Don't let table cache instances number go up if soft limit cannot be raised.

            I think svoj suggested a better fix than fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4.

            New cache instances are created, when the contention is too high, which normally means there is some hot table accessed by many connections concurrently.

            There are many common workloads wihout a hot table, in these cases there will be only one table cache instance. Your fix in fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4 would unnecessary penalize these workloads — they'll have a smaller table cache for no good reason. I'd suggest to auto-reduce tc_instances instead.

            serg Sergei Golubchik added a comment - I think svoj suggested a better fix than fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4. New cache instances are created, when the contention is too high, which normally means there is some hot table accessed by many connections concurrently. There are many common workloads wihout a hot table, in these cases there will be only one table cache instance. Your fix in fb27ed99a79f7e9b6c4e838d8a788a4685cfbee4 would unnecessary penalize these workloads — they'll have a smaller table cache for no good reason. I'd suggest to auto-reduce tc_instances instead.

            I still do not understand how playing with soft limit can solve the problem described above?

            We promise big number of instances and big cash, in the middle of the game we say no we can not open more - how it is different from what we have now (inability to open files and crash)?

            sanja Oleksandr Byelkin added a comment - I still do not understand how playing with soft limit can solve the problem described above? We promise big number of instances and big cash, in the middle of the game we say no we can not open more - how it is different from what we have now (inability to open files and crash)?

            Don't increment number of instances if failed to increment soft limit. Then everything is under control, right?

            svoj Sergey Vojtovich added a comment - Don't increment number of instances if failed to increment soft limit. Then everything is under control, right?

            ommit edc9059c31bddfaa5294423dafc6adfd5a3eabc0 (HEAD > bb-10.2MDEV-18027, origin/bb-10.2-MDEV-18027)
            Author: Oleksandr Byelkin <sanja@mariadb.com>
            Date: Wed Jul 10 13:40:54 2019 +0200

            MDEV-18027: Running out of file descriptors and eventual crash

            For automatic number of opened files limit take into account number of table instances for table cache

            sanja Oleksandr Byelkin added a comment - ommit edc9059c31bddfaa5294423dafc6adfd5a3eabc0 (HEAD > bb-10.2 MDEV-18027 , origin/bb-10.2- MDEV-18027 ) Author: Oleksandr Byelkin <sanja@mariadb.com> Date: Wed Jul 10 13:40:54 2019 +0200 MDEV-18027 : Running out of file descriptors and eventual crash For automatic number of opened files limit take into account number of table instances for table cache

            People

              sanja Oleksandr Byelkin
              David Crimmins David Crimmins
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.