[MDEV-201] LP:962540 - Assertion `!thd->spcont' failed in net_send_error on server shutdown Created: 2012-03-22  Updated: 2013-11-10  Resolved: 2013-11-10

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.5.22, 10.0.6

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None


 Description   

https://bugs.launchpad.net/maria/+bug/962540

120323 0:32:52 [Note] debug/sql/mysqld: ready for connections.
Version: '5.5.21-MariaDB-debug-log' socket: 'debug/mysql-test/var/tmp/mysqld.1.sock' port: 16000 Source distribution
120323 0:33:07 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 4)
120323 0:33:08 [Note] Semi-sync replication initialized for transactions.
120323 0:33:08 [Note] Semi-sync replication enabled on the master.
120323 0:33:08 [Note] Stop asynchronous binlog_dump to slave (server_id: 2)
120323 0:33:08 [Note] Start binlog_dump to slave_server(2), pos(master-bin.000001, 245)
120323 0:33:08 [Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(master-bin.000001, 245)
120323 0:33:09 [Note] debug/sql/mysqld: Normal shutdown
 
120323 0:33:10 [Note] Event Scheduler: Purging the queue. 0 events
120323 0:33:10 [Note] Stop semi-sync binlog_dump to slave (server_id: 2)
120323 0:33:12 [Warning] debug/sql/mysqld: Forcing close of thread 7 user: 'root'
 
120323 0:33:12 [Warning] debug/sql/mysqld: Forcing close of thread 6 user: 'root'
 
mysqld: sql/protocol.cc:151: bool net_send_error(THD*, uint, const char*, const char*): Assertion `!thd->spcont' failed.
120323 0:33:12 [ERROR] mysqld got signal 6 ;
 
#7 0xb74f50f0 in abort () from /lib/libc.so.6
#8 0xb74ec014 in __assert_fail () from /lib/libc.so.6
#9 0x081a7bef in net_send_error (thd=0x97760e8, sql_errno=1053,
    err=0x95b0242 "Server shutdown in progress", sqlstate=0x0)
    at sql/protocol.cc:151
#10 0x08197d83 in close_connection (thd=0x97760e8, sql_errno=1053)
    at sql/mysqld.cc:2358
#11 0x081963bc in close_connections () at sql/mysqld.cc:1476
#12 0x0819672f in kill_server (sig_ptr=0x0) at sql/mysqld.cc:1640
#13 0x0819676f in kill_server_thread (arg=0xad1a4328) at sql/mysqld.cc:1668
#14 0x0854933b in pfs_spawn_thread (arg=0x965d9c0)
    at storage/perfschema/pfs.cc:1015
#15 0xb77cbb25 in start_thread () from /lib/libpthread.so.0

bzr version-info

revision-id: wlad@montyprogram.com-20120322192114-4hh3m2imbsx7r2vw
date: 2012-03-22 20:21:14 +0100
build-date: 2012-03-23 00:35:31 +0400
revno: 3345

Could not reproduce on MySQL 5.5 or MySQL trunk.

Please note that the test case requires semisync plugins, so they should be built and placed in the plugin folder.
On the same reason the test case is not applicable to versions below 5.5.

The test case works pretty reliably for me, but a couple of times (out of dozens) it didn't cause the assertion, so I needed to re-run it.

  1. Test case:

--source include/master-slave.inc
--source include/have_binlog_format_mixed.inc
--source include/have_semisync_plugin.inc
 
eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
SET GLOBAL rpl_semi_sync_master_enabled = 1;
 
--connection slave
eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
STOP SLAVE;
START SLAVE;
 
--connection master
CREATE TABLE t1 (
  pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  a DATETIME
) ENGINE = MyISAM;
 
--delimiter |
CREATE PROCEDURE pr ()
BEGIN
  DECLARE done INT DEFAULT 10000;
  wl: WHILE done DO
    INSERT INTO t1 (a) VALUES ( NOW() );
    SET done = done - 1;
  END WHILE wl;
END|
--delimiter ;
 
--send
CALL pr();
--sleep 1
--let $rpl_server_number= 1
--source include/rpl_stop_server.inc



 Comments   
Comment by Elena Stepanova [ 2012-03-26 ]

Modified test:

--source include/master-slave.inc
--source include/have_binlog_format_mixed.inc
--source include/have_semisync_plugin.inc

eval INSTALL PLUGIN rpl_semi_sync_master SONAME '$SEMISYNC_MASTER_SO';
SET GLOBAL rpl_semi_sync_master_enabled = 1;

--connection slave
eval INSTALL PLUGIN rpl_semi_sync_slave SONAME '$SEMISYNC_SLAVE_SO';
SET GLOBAL rpl_semi_sync_slave_enabled = 1;
STOP SLAVE;
START SLAVE;

--connection master
CREATE TABLE t1 (
pk INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
a TEXT
) ENGINE = MyISAM;

--delimiter |
CREATE PROCEDURE pr ()
BEGIN
DECLARE done INT DEFAULT 10000;
wl: WHILE done DO
INSERT INTO t1 (a) VALUES ( REPEAT('a',65536) ), ( REPEAT('b',65536) ), ( REPEAT('c',65536) ), ( REPEAT('d',65536) ), ( REPEAT('e',65536) );
SET done = done - 1;
END WHILE wl;
END|
--delimiter ;

--send
CALL pr();

--let $rpl_server_number= 1
--source include/rpl_stop_server.inc

Comment by Elena Stepanova [ 2012-03-26 ]

Re-assigning back since it seems to be reproducible on perro at the moment.

Comment by Sergei Golubchik [ 2013-10-12 ]

The fix for this bug was removed from 10.0 (probably accidentally in 5.6 merge). Elena, could you please test this again in 10.0 tree to see whether the bug reappears?

Comment by Elena Stepanova [ 2013-10-14 ]

I couldn't reproduce the failure on 10.0 tree so far (still trying).
But I'm also not getting "Forcing close of thread ..." warnings, so maybe something changed that was a prerequisite for this use case.

Comment by Elena Stepanova [ 2013-10-14 ]

I found when it stopped happening on 10.0 tree. The following revision did it:

------------------------------------------------------------
revno: 3748 [merge]
revision-id: knielsen@knielsen-hq.org-20130429100354-0kujsw18axni9syq
timestamp: Mon 2013-04-29 12:03:54 +0200
message:
Merge 10.0-base -> 10.0
------------------------------------------------------------

More particularly, this one:

------------------------------------------------------------
revno: 3427.1.194
revision-id: knielsen@knielsen-hq.org-20130424110540-hf6f77511ahhuios
timestamp: Wed 2013-04-24 13:05:40 +0200
message:
Add missing check for thd->killed in mysql_binlog_send().

The slave dump thread running on the master only checked thd->killed whenever
it reached the end of a binlog file, not between events. This could
unnecessarily delay server shutdown.

This was found by code inspection while tracking down some occasional "forcing
close of thread..." errors in Buildbot. Hopefully this will fix the failures,
but the fix is correct in any case.

Also increase the wait during server shutdown, 2 seconds is a bit tight in
case of heavy I/O stall, and it seems better to delay shutdown a bit than
force-kill threads unnecessarily.

Also fix some races in test cases that restart the mysqld server. The .expect
file should be changed with --append_file, --remove_file + --write_file
creates a short window where mysqld can error out due to .expect file missing.
------------------------------------------------------------

Even more particularly, this change:

=== modified file 'sql/mysqld.cc'
— sql/mysqld.cc 2013-04-16 11:43:28 +0000
+++ sql/mysqld.cc 2013-04-24 11:05:40 +0000
@@ -1494,8 +1494,21 @@
Events::deinit();
end_slave();

  • /* Give threads time to die. */
  • for (int i= 0; (volatile int32) &thread_count && i < 100; i++)
    + /*
    + Give threads time to die.
    +
    + In 5.5, this was waiting 100 rounds @ 20 milliseconds/round, so as little
    + as 2 seconds, depending on thread scheduling.
    +
    + From 10.0, we increase this to 1000 rounds / 20 seconds. The rationale is
    + that on a server with heavy I/O load, it is quite possible for eg. an
    + fsync() of the binlog or whatever to cause something like LOCK_log to be
    + held for more than 2 seconds. We do not want to force kill threads in
    + such cases, if it can be avoided. Note that normally, the wait will be
    + much smaller than even 2 seconds, this is only a safety fallback against
    + stuck threads so server shutdown is not held up forever.
    + */
    + for (int i= 0; (volatile int32) &thread_count && i < 1000; i++)
    my_sleep(20000);

/*

If I revert the change on the current 10.0 tree, the assertion failure starts happening again.
The increased sleep time doesn't look like a real fix to me, but I couldn't overcome it with increased rpl_semi_sync_master_timeout and MTR's shutdown-timeout, so maybe it is, after all.

I've set up the test on perro, just in case. It's under mariadb. 10.0 tree, revno 3850, with the reverted patch quoted above (it is also in mdev201/dif), built with BUILD/compile-pentium-debug-max-no-ndb.
To reproduce,
cd mdev201/mysql-test
perl ./mtr mdev201 -mysqld=plugin-rpl-semi-sync-master --mysqld=-plugin-rpl-semi-sync-slave

Generated at Thu Feb 08 06:27:00 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.