[MDEV-11499] mysqltest, Windows : improve diagnostics if server fails to shutdown Created: 2016-12-07 Updated: 2021-09-24 Resolved: 2021-09-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Platform Windows, Tests |
| Fix Version/s: | 10.3.32, 10.4.22, 10.5.13, 10.6.5 |
| Type: | Task | Priority: | Major |
| Reporter: | Vladislav Vaintroub | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
MySQL commit 74726c59b7a650b948929b2839e83e26f9853d4d |
| Comments |
| Comment by Marko Mäkelä [ 2017-01-27 ] | ||
|
Normally, a forced kill of InnoDB should be harmless, because crash recovery will take care of it. However, innodb_read_only (introduced in MySQL 5.6 and MariaDB 10.0) prevents crash recovery, and the forced kills on shutdown_server timeout will be visible as failures of tests that use the innodb_read_only option. Now that Before To be able to find the root cause of the shutdown hangs, we must first fix the shutdown_server statement in mtr by applying the MySQL commits mentioned in the description. We should also make sure that a core dump is produced when the shutdown hangs, and that mtr will dump the stack traces from the core dump, so that the buildbot logs will show some hints why the shutdown is hanging. I think that this work should initially target 10.1 and ultimately be ported to 10.0 if possible. | ||
| Comment by Elena Stepanova [ 2017-01-27 ] | ||
|
I' have some scattered notes from my previous short brush with this problem. There aren't really useful suggestions in there, so you can safely ignore them, but maybe they'll just make you consider some things that you'd otherwise miss at first. Initially the main goal was to catch problems on server startup and shutdown which happens outside a test – that is, when MTR tries to start the server before a test, or when it discards a server which is no longer needed. The problem that marko mentions in the previous comment is a bit different. The mysqltest command shutdown_server only affects server restarts which happen inside a test, when the test uses it either directly or through a chain on include files. This should be easy to fix with a one-liner, i'll give it a try. You can later re-do it with merging MySQL 5.7 changes in mysqltest.cc, it seemed very intrusive when I looked at it. But it's not a solution to the general problem. For usual startup and shutdown MTR doesn't use shutdown_server command, it's MTR's own doing, in safe_process.cc and such. In a short attempt, I couldn't get it done reliably, it would either kill too little or too much. It must be doable, I guess I just didn't dig up the right place. Please remember that we can't rely on SIGABRT always terminating the server, it's known to hang. There must be SIGKILL after another timeout, either unconditionally or upon a check that the process still exists. The actual challenge is reporting, as just SIGABRT-ing the server isn't helping if we don't get a stack trace out of it. When it happens on server startup, it should be somewhat easier, and possibly it will even happen automatically if you solve the problem of killing the hanging server (and will abort it instead). When crash happens on startup, we are about to enter a test, so MTR is somewhere in check-testcase and is already able to report a failure and produce a stack trace if there is a coredump. Maybe it's not reliable and needs to be fixed, but at least there is a mechanism for that. But the shutdown (server restart between tests) is a problem. There is no mechanism for processing it properly, everything needs to be added. MTR discards the server, so it doesn't care what's happening to it. It only checks for "warnings generated in error logs during shutdown" (it's the exact line from mysql-test-run.pl if you need to find the place). Somewhere around it, but not in the elsif itself of course, a search for coredumps and calls for My::CoreDump->show probably need to be added. It's also important to honor there opt_max_save_core if it's set, because otherwise on a bad build we can cause a big problem with disk space. There is no mechanism of attaching a hang/crash on shutdown to any specific test, they can only be reported the same way as "warnings during shutdown" are reported now after a chain of tests. The problem is that they are very often overlooked – when there are any test failures in the output, people are usually only looking for them. It would be nice to make them somewhat more visible; ideally, maybe, to add a separate "server_startup_shutdown_report" pseudo-test, much like we have valgrind_report when valgrind is enabled. It might be complicated though, and can definitely wait. On a somewhat different (yet related) note, I might have spotted another reason why a stack trace isn't printed even when it should be. The theory remained unchecked, so unless I do it before, you might want to look at it.
It shouldn't be so, it's one thing not to save a datadir and quite another not to check for useful stuff before removing. And we do run with max-save-datadir=1 on some builders, so it might be a problem. |