Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25802

mtr: race condition if the test quickly restarts twice

    XMLWordPrintable

    Details

      Description

      include/wait_until_connected_again.inc has the following loop:

      while ($mysql_errno)
      {
        --error 0,ER_ACCESS_DENIED_ERROR,ER_SERVER_SHUTDOWN,ER_CONNECTION_KILLED,ER_LOCK_WAIT_TIMEOUT,2002,2006,2013
        show status;
       
        dec $counter;
        if (!$counter)
        {
          --die Server failed to restart
        }
        -- <cut out for brevity...> --
        --sleep 0.1
      }
      

      Note that it sleeps even if $mysql_errno is 0. Michael Widenius tried to fix it in commit 9ec2129f715 in 10.6. This broke some tests, for example main.events_restart.

      What happens is, the test restarts the server with --server-id=100. mtr goes into sleep_until_file_created() function that does

        for ( my $loop= 1; $loop <= $loops; $loop++ )
        {
          if ( -r $pidfile ) { return 1; }
          my $seconds= ($loop * $sleeptime) / 1000;
          if ( defined $proc and ! $proc->wait_one(0) )
          {
            mtr_warning("Process $proc died after mysql-test-run waited $seconds " .
      		  "seconds for $pidfile to be created.");
            return 0;
          }
          mtr_debug("Sleep $sleeptime milliseconds waiting for $pidfile");
          mtr_milli_sleep($sleeptime);
        }
      

      (also reformatted for brevity). This function verifies that there's no pid file yet, and that the process exists wait_one(0). Then it waits. While it waits, include/wait_until_connected_again.inc notices that the server is up, passes the control back to the test, the test verifies that server_id is 100 and event scheduler is up, and wants to restart the server with the default --server-id=1. The test stops the server, mtr comes back from the sleep, notices that there's no pid file and no process — and reports that "Process ... died".

      This is how it could be fixed. Before restarting the server the test creates an .expect file. mtr reads it to know it's an expected server shutdown, and not an error. If mtr will delete the .expect file immediately after reading it, then sleep_until_file_created() could check, if this file exists, it is an expected next shutdown, not a failure of the previous one.

      Note, that while the 10.6 commit above removes sleep 0.1 from include/wait_until_connected_again.inc, it adds it to include/restart_mysqld.inc precisely to workaround this bug. After fixing this bug, please, remove that sleep. Also, that commit reduces the sleep duration in mtr, this could be reverted too if it takes too much CPU.

        Attachments

          Activity

            People

            Assignee:
            serg Sergei Golubchik
            Reporter:
            serg Sergei Golubchik
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration