[MDEV-32686] crash information to include Distro information Created: 2023-11-06 Updated: 2023-11-10 |
|
| Status: | In Review |
| Project: | MariaDB Server |
| Component/s: | Server |
| Affects Version/s: | 10.4.31 |
| Fix Version/s: | 10.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Daniel Black | Assignee: | Sergei Golubchik |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Description |
|
When resolving bug reports, sometimes is not immediately obvious which distro is in use. Sometimes core dumps are provided, and with the right disto, this is resolvable without further user questioning. Linux distros have converged on /etc/os-release to communicate the release information in a standard form. There's sometimes up to 22 lines of information in this file, which is a bit verbose. Looking at the samples in the following urls, 3 lines appears sufficient to communicate distro and version across a significant set of distros. So lets include these in the crash handler information. |
| Comments |
| Comment by Sergei Golubchik [ 2023-11-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
please, let's not add even more code into the crash handler and even more output to the already long crash message. OS information can be easily added to the log after the crash by some external tool. E.g. mariadbd-safe can do, like
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-11-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
mariadbd-safe, like most external tools, an obsolete script that no-one uses any more. The purpose was with a few syscalls get the most information as quickly as possible, and since os-release is pretty much all of Linux, FreeBSD and Solaris compatible, was an easy one shot to get all of them. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-11-06 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Correct, I only used mariadbd-safe as an example, I didn't dare to write an example of how it could've been done in systemd service, you know it much better than I. The point is still the same, it's not something that needs to be read from the memory of the crashed process. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-11-07 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
So containers execute mariadbd directly and no other process is left running. systemd services run mariadbd directly. So handlers we can count on are kernel.core_pattern, whatever the user has it set to. This is also kernel wide and still isn't settable in a namespace. On kernel.core_pattern handers, systemd-coredump would record enough. If users reported with coredump like below, there's enough information to determine OS,
On Ubuntu system abrt would have enough info, but not enabled by default. But as we've seen, it seldom happens. Some recent work (3 weeks ago) added delation from the systemd-coredump to a container systemd instigates. Then we'd be packing a core hander in the container, which we can do. Not the most common of use case. On container, at least the Docker Official Image, we might (conditional on Docker Inc accepting it) be able to have a handler that happens to match common kernel.core_pattern paths. On systemd level there is ExecStopPost which has access to the error code and some reporting information could be made here. It seems a little beyond intended purpose however. In containers the filesystem namespace is a temporary construct tied to the process, so its mariadbd that has visibility of this and the namespace disappears with the process. So I'm still thinking the server is the right place for this. On output, yes its a bit long. Can we remove the following?
Seems dated, thread_count might have some use, but a full backtrace will show this anyway. And remove/replace this?
Could be shorter "Please include the information from the server start to the end information below". | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-11-09 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Yes, please, by all means feel free to remove old outdated text from the crash message. But it won't fix the problem that crash handler does too much (when the memory is possibly corrupted so anything it's doing increases a chance of something going very wrong). And prints too much. For example, limits there aren't helpful at all. Only the coredump size limit is (and only when core dumping failed, but I'm not sure you can always detect it). I suppose if it has to be printed, this information must be collected before the crash into some fixed buffer. And then printed in case of a crash. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Black [ 2023-11-10 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
> Yes, please, by all means feel free to remove old outdated text from the crash message. Ok removed in update PR. > But it won't fix the problem that crash handler does too much (when the memory is possibly corrupted so anything it's doing increases a chance of something going very wrong). Removing outdated text now access less THD data (query, connection, optimizer_switch - had doubts about removing this). > And prints too much. For example, limits there aren't helpful at all. Only the coredump size limit is (and only when core dumping failed, but I'm not sure you can always detect it). I added a patch to exclude resource limits that where softlimit to unlimited which reduces the size of it by ~1/2. > I suppose if it has to be printed, this information must be collected before the crash into some fixed buffer. And then printed in case of a crash. Yes, reusing the same buffers that were already there. Crash sample with change in PR:
|