Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10814

Feature request: Optionally exclude large buffers from core dumps

Details

    Description

      As the size of core dump files by default mostly equals the process size producing core dumps can become an issue on systems with memory buffers, esp. large innodb_buffer_pool_size.

      There needs to be enough file system space to store such large core dumps, and with multi gigabyte files it also takes a non-trivial amount of time to write these, so delaying process restart quite a bit.

      And then there's also a security aspect to it: as the core dump contains the complete innodb buffer pool it contains a substantial amount, or even all, of the actual user data in the database.

      At the same time actual buffer contents are rarely needed when doing a post mortem analysis (usually we only need stack frames and a few pieces of local data).

      So I'm proposing a server option to exclude certain buffers from core dumps by marking them as DONOTDUMP with the madvise() system call.

      Attachments

        Issue Links

          Activity

            hholzgra Hartmut Holzgraefe created issue -

            I had a proof of concept for this lying around for quite a while, it's a bit outdated by now though as I last touched it almost a year ago, then somehow got carried away with other things. The github branch for this is

            https://github.com/hholzgra/mariadb-server/tree/hartmut-coredump-exclusions

            which also contains some usage and implementation documentation:

            https://raw.githubusercontent.com/hholzgra/mariadb-server/c7d32f8265183a7f32b8c4a2f59bf39a54aa7c22/Docs/README-core-dump-exclusion

            hholzgra Hartmut Holzgraefe added a comment - I had a proof of concept for this lying around for quite a while, it's a bit outdated by now though as I last touched it almost a year ago, then somehow got carried away with other things. The github branch for this is https://github.com/hholzgra/mariadb-server/tree/hartmut-coredump-exclusions which also contains some usage and implementation documentation: https://raw.githubusercontent.com/hholzgra/mariadb-server/c7d32f8265183a7f32b8c4a2f59bf39a54aa7c22/Docs/README-core-dump-exclusion
            elenst Elena Stepanova made changes -
            Field Original Value New Value
            Environment Linux
            Issue Type Bug [ 1 ] Task [ 3 ]
            elenst Elena Stepanova made changes -
            Priority Trivial [ 5 ] Minor [ 4 ]
            danblack Daniel Black made changes -
            Labels patch
            danblack Daniel Black made changes -
            Labels patch contribution patch
            danblack Daniel Black added a comment -

            Thanks hholzgra. Rebased your work. Good goal.

            danblack Daniel Black added a comment - Thanks hholzgra . Rebased your work. Good goal.
            danblack Daniel Black added a comment -

            Other candidates for non-dumping:

            Thoughts welcome.

            This will also need to take account of dynamic innodb buffer pool in 10.2.

            danblack Daniel Black added a comment - Other candidates for non-dumping: innodb change buffer (by innodb_change_buffer_max_size defaults to 25% of buffer pool size) innodb log buffer (though this might be too small to worry about - https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_log_buffer_size ) tokudb_cache_size tokudb_loader_memory_size query_cache Thoughts welcome. This will also need to take account of dynamic innodb buffer pool in 10.2.
            danblack Daniel Black made changes -
            Summary Feature requst: Optionally exclude large buffers from core dumps Feature request: Optionally exclude large buffers from core dumps
            svoj Sergey Vojtovich made changes -
            Fix Version/s 10.1 [ 16100 ]
            Assignee Vladislav Vaintroub [ wlad ]
            Description

            As the size of core dump files by default mostly equals the process size producing core dumps can become an issue on systems with memory buffers, esp. large innodb_buffer_pool_size.

            There needs to be enough file system space to store such large core dumps, and with multi gigabyte files it also takes a non-trivial amount of time to write these, so delaying process restart quite a bit.

            And then there's also a security aspect to it: as the core dump contains the complete innodb buffer pool it contains a substantial amount, or even all, of the actual user data in the database.

            At the same time actual buffer contents are rarely needed when doing a post mortem analysis (usually we only need stack frames and a few pieces of local data).

            So I'm proposing a server option to exclude certain buffers from core dumps by marking them as {{DONOTDUMP}} with the {{madvise()}} system call.

            As the size of core dump files by default mostly equals the process size producing core dumps can become an issue on systems with memory buffers, esp. large innodb_buffer_pool_size.

            There needs to be enough file system space to store such large core dumps, and with multi gigabyte files it also takes a non-trivial amount of time to write these, so delaying process restart quite a bit.

            And then there's also a security aspect to it: as the core dump contains the complete innodb buffer pool it contains a substantial amount, or even all, of the actual user data in the database.

            At the same time actual buffer contents are rarely needed when doing a post mortem analysis (usually we only need stack frames and a few pieces of local data).

            So I'm proposing a server option to exclude certain buffers from core dumps by marking them as {{DONOTDUMP}} with the {{madvise()}} system call.
            Due Date 2017-05-05
            Labels contribution patch contribution foundation patch
            Priority Minor [ 4 ] Major [ 3 ]
            svoj Sergey Vojtovich made changes -
            Assignee Vladislav Vaintroub [ wlad ] Marko Mäkelä [ marko ]

            The InnoDB change buffer consists of persistent pages that reside in the system tablespace. (It may buffer changes to secondary index leaf pages.) Some change buffer pages can reside in the buffer pool, but I guess we’d just want to omit the whole InnoDB buffer pool from the core dump if we want to omit things. It’d be tricky and probably "too little" to omit just the change buffer pages of the buffer pool.

            I agree that the InnoDB redo log buffer (log_sys->buf and recv_sys) are rather useless to include in the core dump; I cannot think of a scenario where they could help me debug anything. A crash at crash recovery should normally be repeatable by rerunning recovery on the same files (the state of the files before recovery was attempted). But I guess omitting the log write or read buffers would not save much. By omitting them you could avoid including confidential data. We probably should omit them if we omit the buffer pool from the core dump.

            marko Marko Mäkelä added a comment - The InnoDB change buffer consists of persistent pages that reside in the system tablespace. (It may buffer changes to secondary index leaf pages.) Some change buffer pages can reside in the buffer pool, but I guess we’d just want to omit the whole InnoDB buffer pool from the core dump if we want to omit things. It’d be tricky and probably "too little" to omit just the change buffer pages of the buffer pool. I agree that the InnoDB redo log buffer (log_sys->buf and recv_sys) are rather useless to include in the core dump; I cannot think of a scenario where they could help me debug anything. A crash at crash recovery should normally be repeatable by rerunning recovery on the same files (the state of the files before recovery was attempted). But I guess omitting the log write or read buffers would not save much. By omitting them you could avoid including confidential data. We probably should omit them if we omit the buffer pool from the core dump.
            marko Marko Mäkelä made changes -
            Component/s Storage Engine - InnoDB [ 10129 ]
            Fix Version/s 10.3.5 [ 22905 ]
            Fix Version/s 10.1 [ 16100 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]

            Why is this closed? In my proof-of-concept implementation the feature was configurable, to enable full classic core dumps where needed, and also had support for excluding the MyISAM key buffer, . Neither of this I see in

            https://github.com/MariaDB/server/commit/b600f30786816e33c1706dd36cdabf21034dc781

            hholzgra Hartmut Holzgraefe added a comment - Why is this closed? In my proof-of-concept implementation the feature was configurable, to enable full classic core dumps where needed, and also had support for excluding the MyISAM key buffer, . Neither of this I see in https://github.com/MariaDB/server/commit/b600f30786816e33c1706dd36cdabf21034dc781
            hholzgra Hartmut Holzgraefe made changes -
            Resolution Fixed [ 1 ]
            Status Closed [ 6 ] Stalled [ 10000 ]
            danblack Daniel Black added a comment -

            The fully configurable aspects got rejected in the first review (https://github.com/MariaDB/server/pull/333#issuecomment-295460913).

            MyISAM key buffer wasn't done as it didn't track the allocated size needed when you munmap it (resize). (and I didn't care enough about MyISAM)

            Query cache DONT_DUMP in progress: https://github.com/MariaDB/server/pull/366

            danblack Daniel Black added a comment - The fully configurable aspects got rejected in the first review ( https://github.com/MariaDB/server/pull/333#issuecomment-295460913 ). MyISAM key buffer wasn't done as it didn't track the allocated size needed when you munmap it (resize). (and I didn't care enough about MyISAM) Query cache DONT_DUMP in progress: https://github.com/MariaDB/server/pull/366
            serg Sergei Golubchik made changes -
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.3.5 [ 22905 ]

            I am reassigning to the reviewer of pull request 366 (excluding the query cache from core dumps). As far as I can tell, there is no more InnoDB data structures that could be reasonably omitted from core dumps.

            marko Marko Mäkelä added a comment - I am reassigning to the reviewer of pull request 366 (excluding the query cache from core dumps). As far as I can tell, there is no more InnoDB data structures that could be reasonably omitted from core dumps.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Oleksandr Byelkin [ sanja ]
            marko Marko Mäkelä made changes -
            Component/s Query Cache [ 10120 ]

            I checked QC part looks OK, but I do not know how that madwise works, and failed test on github makes me doubting about allowing that changes

            sanja Oleksandr Byelkin added a comment - I checked QC part looks OK, but I do not know how that madwise works, and failed test on github makes me doubting about allowing that changes

            It should not affect mysqld at runtime at all, it is only evaluated when actually writing a core dump:

            From the madvise(2) man page:

            MADV_DONTDUMP (since Linux 3.4)
            Exclude from a core dump those pages in the range specified by
            addr and length. This is useful in applications that have
            large areas of memory that are known not to be useful in a
            core dump. The effect of MADV_DONTDUMP takes precedence over
            the bit mask that is set via the /proc/[pid]/coredump_filter
            file (see core(5)).

            hholzgra Hartmut Holzgraefe added a comment - It should not affect mysqld at runtime at all, it is only evaluated when actually writing a core dump: From the madvise(2) man page: MADV_DONTDUMP (since Linux 3.4) Exclude from a core dump those pages in the range specified by addr and length. This is useful in applications that have large areas of memory that are known not to be useful in a core dump. The effect of MADV_DONTDUMP takes precedence over the bit mask that is set via the /proc/ [pid] /coredump_filter file (see core(5)).
            danblack Daniel Black added a comment -

            Github / Travis-CI tests have been failing for other reasons (MDEV-15838)

            danblack Daniel Black added a comment - Github / Travis-CI tests have been failing for other reasons ( MDEV-15838 )
            sanja Oleksandr Byelkin made changes -
            Fix Version/s 10.3.7 [ 23005 ]
            Fix Version/s 10.3 [ 22126 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -

            kpenza reported in the maria-discuss list that this change may cause error messages to be displayed on startup and shutdown:

            Sep 25 10:40:53 srv1 mysqld: 2018-09-25 10:40:53 0 [Warning] InnoDB: Failed to set memory to DODUMP: Invalid argument ptr 0x2aaac5400000 size 2097152
            …
            Sep 25 10:41:19 srv1 mysqld: 2018-09-25 10:41:19 0 [Warning] InnoDB: Failed to set memory to DODUMP: Invalid argument ptr 0x2aaac3400000 size 33554432

            The reason for this turned out to be a Linux kernel bug, for which danblack contributed a fix for the Linux 4.19 kernel:

            mm: madvise(MADV_DODUMP): allow hugetlbfs pages

            This was also included in the backport queue for older kernels.

            The messages in the MariaDB server error log can be ignored, and they should disappear after upgrading the kernel.

            marko Marko Mäkelä added a comment - kpenza reported in the maria-discuss list that this change may cause error messages to be displayed on startup and shutdown: Sep 25 10:40:53 srv1 mysqld: 2018-09-25 10:40:53 0 [Warning] InnoDB: Failed to set memory to DODUMP: Invalid argument ptr 0x2aaac5400000 size 2097152 … Sep 25 10:41:19 srv1 mysqld: 2018-09-25 10:41:19 0 [Warning] InnoDB: Failed to set memory to DODUMP: Invalid argument ptr 0x2aaac3400000 size 33554432 The reason for this turned out to be a Linux kernel bug, for which danblack contributed a fix for the Linux 4.19 kernel : mm: madvise(MADV_DODUMP): allow hugetlbfs pages This was also included in the backport queue for older kernels . The messages in the MariaDB server error log can be ignored, and they should disappear after upgrading the kernel.
            danblack Daniel Black added a comment -

            Notes for anyone else that comes across this:

            The kernel fix has been released in 4.19 and stable kernels 4.18.14, 4.14.76, 4.9.133, 4.4.161, and 3.18.124, and will be in 3.16.62. Redhat has confirmed it will be in their kernels (probably out already).

            Without this fix, the impact is that a core dump may not contain all the information.

            danblack Daniel Black added a comment - Notes for anyone else that comes across this: The kernel fix has been released in 4.19 and stable kernels 4.18.14, 4.14.76, 4.9.133, 4.4.161, and 3.18.124, and will be in 3.16.62. Redhat has confirmed it will be in their kernels (probably out already). Without this fix, the impact is that a core dump may not contain all the information.
            danblack Daniel Black made changes -
            anel Anel Husakovic made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            alice Alice Sherepa made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 77108 ] MariaDB v4 [ 132947 ]
            marko Marko Mäkelä made changes -

            People

              sanja Oleksandr Byelkin
              hholzgra Hartmut Holzgraefe
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.