[MDEV-23511] shutdown_server 10 times out, causing server kill at shutdown Created: 2020-08-19 Updated: 2020-10-06 Resolved: 2020-08-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication, Tests |
| Affects Version/s: | 10.2, 10.3, 10.4, 10.5, 10.6 |
| Fix Version/s: | 10.2.35, 10.3.26, 10.4.16, 10.5.7 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | shutdown | ||
| Description |
|
Occasionally, during a test that is restarting the server, a SIGABRT will be sent to the server even though there is no hang. I think that I have seen this happen on buildbot, but I do not know how to efficiently search for this in the cross-reference. The most recent occurrence was in a different CI environment:
According to the server error log file, the signal is triggered during the second shutdown that the test is initiating (by executing the statement shutdown_server 10):
In the above output, we can see that the InnoDB shutdown almost completed within those 10 seconds. The last messages are interleaved with the stack trace output. In the core dump, only two threads exist: the signal handler, and the main thread. The SIGABRT is delivered to the signal_hand() thread that is executing a syscall inside the following:
The mysqld_main() is waiting for that thread to terminate:
I think that the 10-second timeout is unreasonably short for some CI environments. Many tests are using a 30-second timeout (which is on the border of being questionably short for debug builds), and the default timeout value is 60 seconds. Since this change in MariaDB 10.3.1, shutdown timeouts will result in prominent failures, due to a SIGABRT being sent before a final SIGKILL. Please fix all tests to use a more reasonable timeout than 10 seconds, and also consider replacing the 30-second timeouts with 60-second ones.
to find what to change. |
| Comments |
| Comment by Andrei Elkin [ 2020-08-20 ] |
|
Marko, could you please check out the commit. Thanks. Andrei. |
| Comment by Andrei Elkin [ 2020-08-21 ] |
|
Review is done by Marko on slack. At merging 10.2->10.3 pick 'ours' for two conflicts (incl a deleted file on 10.3). |
| Comment by Marko Mäkelä [ 2020-08-21 ] |
|
Thanks, it looks good. In 10.3, there were a few more non-zero arguments to shutdown_server, which I removed on the merge. |