[MDEV-23777] mtr --rr leaves a broken trace in some cases Created: 2020-09-21  Updated: 2020-09-22

Status: Open
Project: MariaDB Server
Component/s: None
Fix Version/s: None

Type: Task Priority: Major
Reporter: Nikita Malyavin Assignee: Aleksey Midenkov
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-23787 mtr --rr fixes Closed

 Description   

We have a working patch that fixes a problem:

 Index: mysql-test/lib/My/SafeProcess/safe_process.cc
 IDEA additional info:
 Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
 <+>UTF-8
 ===================================================================
 --- mysql-test/lib/My/SafeProcess/safe_process.cc    (revision a7e7508a75f6bd87ac8e8d1f32930e1f3799d226)
 +++ mysql-test/lib/My/SafeProcess/safe_process.cc    (date 1596518504000)
 @@ -97,7 +97,7 @@
    message("Killing child: %d", child_pid);
    // Terminate whole process group
    if (! was_killed)
 -    kill(-child_pid, SIGKILL);
 +    kill(-child_pid, SIGINT);
  
    pid_t ret_pid= waitpid(child_pid, &status, 0);
    if (ret_pid == child_pid)

However it needs a hang defence to be in trunk.

Marko: IMO that one should be replaced with something like SIGINT, wait a bit, SIGABRT (to get proof of server shutdown hang), wait a bit more, then SIGKILL



 Comments   
Comment by Marko Mäkelä [ 2020-09-21 ]

I think that sometimes this is not enough. But, if we apply all of my patch below (which I used on 10.5 to make one test rr-friendly), then replication tests will start failing massively (because apparently they like to SIGKILL processes).

diff --git a/client/mysqltest.cc b/client/mysqltest.cc
index 417d3615995..48b8f132eb2 100644
--- a/client/mysqltest.cc
+++ b/client/mysqltest.cc
@@ -5141,7 +5141,7 @@ void do_shutdown_server(struct st_command *command)
     if (timeout)
       (void) my_kill(pid, SIGABRT);
     /* Give server a few seconds to die in all cases */
-    if (!timeout || wait_until_dead(pid, timeout < 5 ? 5 : timeout))
+    if (!timeout || wait_until_dead(pid, timeout < 60 ? 60 : timeout))
     {
       (void) my_kill(pid, SIGKILL);
     }
diff --git a/mysql-test/lib/My/SafeProcess/safe_process.cc b/mysql-test/lib/My/SafeProcess/safe_process.cc
index 4d0d1e2a3a0..abc167a4300 100644
--- a/mysql-test/lib/My/SafeProcess/safe_process.cc
+++ b/mysql-test/lib/My/SafeProcess/safe_process.cc
@@ -144,7 +144,7 @@ static int kill_child(bool was_killed)
   message("Killing child: %d", child_pid);
   // Terminate whole process group
   if (! was_killed)
-    kill(-child_pid, SIGKILL);
+    kill(-child_pid, SIGABRT);
 
   pid_t ret_pid= waitpid(child_pid, &status, 0);
   if (ret_pid == child_pid)
diff --git a/mysql-test/lib/v1/mtr_process.pl b/mysql-test/lib/v1/mtr_process.pl
index fd9f3817699..ee9a370c467 100644
--- a/mysql-test/lib/v1/mtr_process.pl
+++ b/mysql-test/lib/v1/mtr_process.pl
@@ -456,8 +456,8 @@ sub mtr_kill_leftovers () {
         my $retries= 10;                    # 10 seconds
         do
         {
-          mtr_debug("Sending SIGKILL to pids: " . join(' ', @pids));
-          kill(9, @pids);
+          mtr_debug("Sending SIGABRT to pids: " . join(' ', @pids));
+          kill(6, @pids);
           mtr_report("Sleep 1 second waiting for processes to die");
           sleep(1)                      # Wait one second
         } while ( $retries-- and  kill(0, @pids) );

Maybe a subset of this would be safe to apply?

Generated at Thu Feb 08 09:24:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.