Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-201

LP:962540 - Assertion `!thd->spcont' failed in net_send_error on server shutdown

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • 5.5.22, 10.0.6
    • None
    • None

    Description

      https://bugs.launchpad.net/maria/+bug/962540

      120323 0:32:52 [Note] debug/sql/mysqld: ready for connections.
      Version: '5.5.21-MariaDB-debug-log' socket: 'debug/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution
      120323 0:33:07 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 4)
      120323 0:33:08 [Note] Semi-sync replication initialized for transactions.
      120323 0:33:08 [Note] Semi-sync replication enabled on the master.
      120323 0:33:08 [Note] Stop asynchronous binlog_dump to slave (server_id: 2)
      120323 0:33:08 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 245)
      120323 0:33:08 [Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(master-bin.000001, 245)
      120323 0:33:09 [Note] debug/sql/mysqld: Normal shutdown
       
      120323 0:33:10 [Note] Event Scheduler: Purging the queue. 0 events
      120323 0:33:10 [Note] Stop semi-sync binlog_dump to slave (server_id: 2)
      120323 0:33:12 [Warning] debug/sql/mysqld: Forcing close of thread 7 user: 'root'
       
      120323 0:33:12 [Warning] debug/sql/mysqld: Forcing close of thread 6 user: 'root'
       
      mysqld: sql/protocol.cc:151: bool net_send_error(THD*, uint, const char*, const char*): Assertion `!thd->spcont' failed.
      120323 0:33:12 [ERROR] mysqld got signal 6 ;
       
      #7 0xb74f50f0 in abort () from /lib/libc.so.6
      #8 0xb74ec014 in __assert_fail () from /lib/libc.so.6
      #9 0x081a7bef in net_send_error (thd=0x97760e8, sql_errno=1053,
          err=0x95b0242 "Server shutdown in progress", sqlstate=0x0)
          at sql/protocol.cc:151
      #10 0x08197d83 in close_connection (thd=0x97760e8, sql_errno=1053)
          at sql/mysqld.cc:2358
      #11 0x081963bc in close_connections () at sql/mysqld.cc:1476
      #12 0x0819672f in kill_server (sig_ptr=0x0) at sql/mysqld.cc:1640
      #13 0x0819676f in kill_server_thread (arg=0xad1a4328) at sql/mysqld.cc:1668
      #14 0x0854933b in pfs_spawn_thread (arg=0x965d9c0)
          at storage/perfschema/pfs.cc:1015
      #15 0xb77cbb25 in start_thread () from /lib/libpthread.so.0

      bzr version-info

      revision-id: wlad@montyprogram.com-20120322192114-4hh3m2imbsx7r2vw
      date: 2012-03-22 20:21:14 +0100
      build-date: 2012-03-23 00:35:31 +0400
      revno: 3345

      Could not reproduce on MySQL 5.5 or MySQL trunk.

      Please note that the test case requires semisync plugins, so they should be built and placed in the plugin folder.
      On the same reason the test case is not applicable to versions below 5.5.

      The test case works pretty reliably for me, but a couple of times (out of dozens) it didn't cause the assertion, so I needed to re-run it.

      1. Test case:

      --source include/master-slave.inc
      --source include/have_binlog_format_mixed.inc
      --source include/have_semisync_plugin.inc
       
      eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
      SET GLOBAL rpl_semi_sync_master_enabled = 1;
       
      --connection slave
      eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
      SET GLOBAL rpl_semi_sync_slave_enabled = 1;
      STOP SLAVE;
      START SLAVE;
       
      --connection master
      CREATE TABLE t1 (
        pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        a DATETIME
      ) ENGINE = MyISAM;
       
      --delimiter |
      CREATE PROCEDURE pr ()
      BEGIN
        DECLARE done INT DEFAULT 10000;
        wl: WHILE done DO
          INSERT INTO t1 (a) VALUES ( NOW() );
          SET done = done - 1;
        END WHILE wl;
      END|
      --delimiter ;
       
      --send
      CALL pr();
      --sleep 1
      --let $rpl_server_number= 1
      --source include/rpl_stop_server.inc

      Attachments

        Activity

          Modified test:

          --source include/master-slave.inc
          --source include/have_binlog_format_mixed.inc
          --source include/have_semisync_plugin.inc

          eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
          SET GLOBAL rpl_semi_sync_master_enabled = 1;

          --connection slave
          eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
          SET GLOBAL rpl_semi_sync_slave_enabled = 1;
          STOP SLAVE;
          START SLAVE;

          --connection master
          CREATE TABLE t1 (
          pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
          a TEXT
          ) ENGINE = MyISAM;

          --delimiter |
          CREATE PROCEDURE pr ()
          BEGIN
          DECLARE done INT DEFAULT 10000;
          wl: WHILE done DO
          INSERT INTO t1 (a) VALUES ( REPEAT('a',65536) ), ( REPEAT('b',65536) ), ( REPEAT('c',65536) ), ( REPEAT('d',65536) ), ( REPEAT('e',65536) );
          SET done = done - 1;
          END WHILE wl;
          END|
          --delimiter ;

          --send
          CALL pr();

          --let $rpl_server_number= 1
          --source include/rpl_stop_server.inc

          elenst Elena Stepanova added a comment - Modified test: --source include/master-slave.inc --source include/have_binlog_format_mixed.inc --source include/have_semisync_plugin.inc eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO'; SET GLOBAL rpl_semi_sync_master_enabled = 1; --connection slave eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO'; SET GLOBAL rpl_semi_sync_slave_enabled = 1; STOP SLAVE; START SLAVE; --connection master CREATE TABLE t1 ( pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY, a TEXT ) ENGINE = MyISAM; --delimiter | CREATE PROCEDURE pr () BEGIN DECLARE done INT DEFAULT 10000; wl: WHILE done DO INSERT INTO t1 (a) VALUES ( REPEAT('a',65536) ), ( REPEAT('b',65536) ), ( REPEAT('c',65536) ), ( REPEAT('d',65536) ), ( REPEAT('e',65536) ); SET done = done - 1; END WHILE wl; END| --delimiter ; --send CALL pr(); --let $rpl_server_number= 1 --source include/rpl_stop_server.inc

          Re-assigning back since it seems to be reproducible on perro at the moment.

          elenst Elena Stepanova added a comment - Re-assigning back since it seems to be reproducible on perro at the moment.
          serg Sergei Golubchik added a comment - - edited

          The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?

          serg Sergei Golubchik added a comment - - edited The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?
          elenst Elena Stepanova added a comment - - edited

          I couldn't reproduce the failure on 10.0 tree so far (still trying).
          But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.

          elenst Elena Stepanova added a comment - - edited I couldn't reproduce the failure on 10.0 tree so far (still trying). But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.

          I found when it stopped happening on 10.0 tree. The following revision did it:

          ------------------------------------------------------------
          revno: 3748 [merge]
          revision-id: knielsen@knielsen-hq.org-20130429100354-0kujsw18axni9syq
          timestamp: Mon 2013-04-29 12:03:54 +0200
          message:
          Merge 10.0-base -> 10.0
          ------------------------------------------------------------

          More particularly, this one:

          ------------------------------------------------------------
          revno: 3427.1.194
          revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios
          timestamp: Wed 2013-04-24 13:05:40 +0200
          message:
          Add missing check for thd->killed in mysql_binlog_send().

          The slave dump thread running on the master only checked thd->killed whenever
          it reached the end of a binlog file, not between events. This could
          unnecessarily delay server shutdown.

          This was found by code inspection while tracking down some occasional "forcing
          close of thread..." errors in Buildbot. Hopefully this will fix the failures,
          but the fix is correct in any case.

          Also increase the wait during server shutdown, 2 seconds is a bit tight in
          case of heavy I/O stall, and it seems better to delay shutdown a bit than
          force-kill threads unnecessarily.

          Also fix some races in test cases that restart the mysqld server. The .expect
          file should be changed with --append_file, --remove_file + --write_file
          creates a short window where mysqld can error out due to .expect file missing.
          ------------------------------------------------------------

          Even more particularly, this change:

          === modified file 'sql/mysqld.cc'
          — sql/mysqld.cc 2013-04-16 11:43:28 +0000
          +++ sql/mysqld.cc 2013-04-24 11:05:40 +0000
          @@ -1494,8 +1494,21 @@
          Events::deinit();
          end_slave();

          • /* Give threads time to die. */
          • for (int i= 0; (volatile int32) &thread_count && i < 100; i++)
            + /*
            + Give threads time to die.
            +
            + In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
            + as 2 seconds, depending on thread scheduling.
            +
            + From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
            + that on a server with heavy I/O load, it is quite possible for eg. an
            + fsync() of the binlog or whatever to cause something like LOCK_log to be
            + held for more than 2 seconds. We do not want to force kill threads in
            + such cases, if it can be avoided. Note that normally, the wait will be
            + much smaller than even 2 seconds, this is only a safety fallback against
            + stuck threads so server shutdown is not held up forever.
            + */
            + for (int i= 0; (volatile int32) &thread_count && i < 1000; i++)
            my_sleep(20000);

          /*

          If I revert the change on the current 10.0 tree, the assertion failure starts happening again.
          The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all.

          I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb.
          To reproduce,
          cd mdev201/mysql-test
          perl ./mtr mdev201 -mysqld=plugin-rpl-semi-sync-master --mysqld=-plugin-rpl-semi-sync-slave

          elenst Elena Stepanova added a comment - I found when it stopped happening on 10.0 tree. The following revision did it: ------------------------------------------------------------ revno: 3748 [merge] revision-id: knielsen@knielsen-hq.org-20130429100354-0kujsw18axni9syq timestamp: Mon 2013-04-29 12:03:54 +0200 message: Merge 10.0-base -> 10.0 ------------------------------------------------------------ More particularly, this one: ------------------------------------------------------------ revno: 3427.1.194 revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios timestamp: Wed 2013-04-24 13:05:40 +0200 message: Add missing check for thd->killed in mysql_binlog_send(). The slave dump thread running on the master only checked thd->killed whenever it reached the end of a binlog file, not between events. This could unnecessarily delay server shutdown. This was found by code inspection while tracking down some occasional "forcing close of thread..." errors in Buildbot. Hopefully this will fix the failures, but the fix is correct in any case. Also increase the wait during server shutdown, 2 seconds is a bit tight in case of heavy I/O stall, and it seems better to delay shutdown a bit than force-kill threads unnecessarily. Also fix some races in test cases that restart the mysqld server. The .expect file should be changed with --append_file, --remove_file + --write_file creates a short window where mysqld can error out due to .expect file missing. ------------------------------------------------------------ Even more particularly, this change: === modified file 'sql/mysqld.cc' — sql/mysqld.cc 2013-04-16 11:43:28 +0000 +++ sql/mysqld.cc 2013-04-24 11:05:40 +0000 @@ -1494,8 +1494,21 @@ Events::deinit(); end_slave(); /* Give threads time to die. */ for (int i= 0; (volatile int32 ) &thread_count && i < 100; i++) + /* + Give threads time to die. + + In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little + as 2 seconds, depending on thread scheduling. + + From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is + that on a server with heavy I/O load, it is quite possible for eg. an + fsync() of the binlog or whatever to cause something like LOCK_log to be + held for more than 2 seconds. We do not want to force kill threads in + such cases, if it can be avoided. Note that normally, the wait will be + much smaller than even 2 seconds, this is only a safety fallback against + stuck threads so server shutdown is not held up forever. + */ + for (int i= 0; (volatile int32 ) &thread_count && i < 1000; i++) my_sleep(20000); /* If I revert the change on the current 10.0 tree, the assertion failure starts happening again. The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all. I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb. To reproduce, cd mdev201/mysql-test perl ./mtr mdev201 - mysqld= plugin-rpl-semi-sync-master --mysqld= -plugin-rpl-semi-sync-slave

          People

            serg Sergei Golubchik
            elenst Elena Stepanova
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.