Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.15, 10.0(EOL), 10.1(EOL)
-
None
-
Linux
Description
This has so far only been observed once and was not reproducible so far
(even with several clients doing transactions in parallel for days while the plugin was unloaded and then installed again every second):
150331 9:19:22 [Note] Semi-sync replication switched OFF.
|
150331 9:19:22 [Note] Semi-sync replication disabled on the master.
|
150331 9:19:22 [ERROR] mysqld got signal 11 ;
|
[...]
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb70d4b]
|
/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x7257b8]
|
/lib64/libpthread.so.0[0x377040f710]
|
/usr/lib64/mysql/plugin/semisync_master.so(ActiveTranx::is_tranx_end_pos(char const*, unsigned long long)+0x24)[0x7fe91a5f9fe4]
|
/usr/lib64/mysql/plugin/semisync_master.so(ReplSemiSyncMaster::commitTrx(char const*, unsigned long long)+0x19e)[0x7fe91a5fab3e]
|
/usr/sbin/mysqld(Trans_delegate::after_commit(THD*, bool)+0xa2)[0x69b612]
|
/usr/sbin/mysqld(ha_commit_trans(THD*, bool)+0x222)[0x728872]
|
/usr/sbin/mysqld(trans_commit_stmt(THD*)+0x1b)[0x6a350b]
|
/usr/sbin/mysqld(mysql_execute_command(THD*)+0x514)[0x5d16d4]
|
/usr/sbin/mysqld[0x5d79d2]
|
/usr/sbin/mysqld(dispatch_command(enum_server_command, THD*, char*, unsigned int)+0x1b20)[0x5d9b90]
|
/usr/sbin/mysqld(do_handle_one_connection(THD*)+0x453)[0x6956a3]
|
/usr/sbin/mysqld(handle_one_connection+0x42)[0x695772]
|
/lib64/libpthread.so.0[0x37704079d1]
|
/lib64/libc.so.6(clone+0x6d)[0x37700e8b6d]
|
My blind educated guess is a race condition between plugin callbacks and plugin teardown code. More specific: I think that ReplSemiSyncMaster::commitTrx() is still registered as an after_commit callback, but either at the time it gets called the plugins transaction hash table has just been freed, or it got freed just at the "right" time between invoking the callback and actually processing it?
Attachments
Issue Links
- blocks
-
MDEV-7096 In THD::cleanup(), apc_target.destroy() has assertion failure.
-
- Closed
-
Activity
knielsen fixed a similar issue once long time ago (MDEV-359), so maybe the above information will be enough to fix it again?
Forget my note regarding assert(), i somehow thought it was off by default and only enabled when compiling with DEBUG defined, but it is actually the opposite: enabled by default unless NDEBUG is defined to disable it ...
We are having this problem sporadically in biuldbot on rpl.rpl_semi_sync_event test. Example:
http://buildbot.askmonty.org/buildbot/builders/kvm-bintar-centos5-x86/builds/3528/steps/test/logs/stdio
rpl.rpl_semi_sync_event 'stmt,xtradb' w3 [ fail ]
|
Test ended at 2015-11-06 19:51:42
|
|
CURRENT_TEST: rpl.rpl_semi_sync_event
|
mysqltest: At line 87: query 'DROP TABLE t1' failed: 2013: Lost connection to MySQL server during query
|
|
The result from queries just before the failure was:
|
< snip >
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 12;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 11;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 10;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 9;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 8;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 7;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 6;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 5;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 4;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 3;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 2;
|
UPDATE t1 SET f = CONCAT('up_',CONNECTION_ID()) WHERE i = 1;
|
SET GLOBAL event_scheduler = OFF;
|
include/stop_slave.inc
|
set global rpl_semi_sync_slave_enabled = 0;
|
set global rpl_semi_sync_master_enabled = 0;
|
include/start_slave.inc
|
DROP EVENT ev1;
|
DROP EVENT ev2;
|
DROP TABLE t1;
|
|
More results from queries before failure can be found in /usr/local/mariadb-10.1.9-linux-i686/mysql-test/var/3/log/rpl_semi_sync_event.log
|
|
|
Server [mysqld.1 - pid: 30241, winpid: 30241, exit: 256] failed during test run
|
Server log from this test:
|
----------SERVER LOG START-----------
|
2015-11-06 19:51:32 3086317296 [Note] /usr/local/mariadb-10.1.9-linux-i686/bin/mysqld (mysqld 10.1.9-MariaDB-log) starting as process 30242 ...
|
2015-11-06 19:51:32 3086317296 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4051)
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'partition' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'SEQUENCE' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: The InnoDB memory heap is disabled
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Memory barrier is not used
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Compressed tables use zlib 1.2.3
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Using Linux native AIO
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Not using CPU crc32 instructions
|
2015-11-06 19:51:32 3086317296 [ERROR] InnoDB: Linux Native AIO not supported. You can either move tmpdir to a file system that supports native AIO or you can set innodb_use_native_aio to FALSE to avoid this message.
|
2015-11-06 19:51:32 3086317296 [ERROR] InnoDB: Linux Native AIO check on tmpdir returned error[22]
|
2015-11-06 19:51:32 3086317296 [Warning] InnoDB: Linux Native AIO disabled.
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Initializing buffer pool, size = 8.0M
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Completed initialization of buffer pool
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Highest supported file format is Barracuda.
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: 128 rollback segment(s) are active.
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Waiting for purge to start
|
2015-11-06 19:51:32 3086317296 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.26-74.0 started; log sequence number 1629517
|
2015-11-06 19:51:32 2780822416 [Note] InnoDB: Dumping buffer pool(s) not yet started
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_DATAFILES' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_TABLESTATS' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_INDEXES' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'XTRADB_RSEG' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'CHANGED_PAGE_BITMAPS' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_FT_BEING_DELETED' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_LOCK_WAITS' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_TABLESPACES_ENCRYPTION' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'XTRADB_INTERNAL_HASH_TABLES' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_FIELDS' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_CMPMEM_RESET' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'FEEDBACK' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_CMP' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_FT_INDEX_TABLE' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_TABLESPACES' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_MUTEXES' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_CMP_RESET' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_FT_INDEX_CACHE' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_FT_DEFAULT_STOPWORD' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_COLUMNS' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_FT_CONFIG' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_CMP_PER_INDEX_RESET' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'XTRADB_READ_VIEW' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_SYS_SEMAPHORE_WAITS' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_CHANGED_PAGES' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_FT_DELETED' is disabled.
|
2015-11-06 19:51:32 3086317296 [Note] Plugin 'INNODB_TABLESPACES_SCRUBBING' is disabled.
|
2015-11-06 19:51:32 3086317296 [Warning] /usr/local/mariadb-10.1.9-linux-i686/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300'
|
2015-11-06 19:51:32 3086317296 [Note] Server socket created on IP: '127.0.0.1'.
|
2015-11-06 19:51:32 3086317296 [Note] Event Scheduler: Loaded 0 events
|
2015-11-06 19:51:32 3086317296 [Note] /usr/local/mariadb-10.1.9-linux-i686/bin/mysqld: ready for connections.
|
Version: '10.1.9-MariaDB-log' socket: '/usr/local/mariadb-10.1.9-linux-i686/mysql-test/var/tmp/3/mysqld.1.sock' port: 16020 MariaDB Server
|
2015-11-06 19:51:32 3084819344 [Note] Start asynchronous binlog_dump to slave (server_id: 2), pos(master-bin.000001, 4)
|
2015-11-06 19:51:32 2701130640 [Note] Semi-sync replication initialized for transactions.
|
2015-11-06 19:51:32 2701130640 [Note] Semi-sync replication enabled on the master.
|
2015-11-06 19:51:32 2688547728 [Note] Start semi-sync binlog_dump to slave (server_id: 2), pos(master-bin.000001, 1618)
|
2015-11-06 19:51:32 3084819344 [Note] Stop asynchronous binlog_dump to slave (server_id: 2)
|
2015-11-06 19:51:32 3084819344 [Note] Event Scheduler: scheduler thread started with id 11
|
2015-11-06 19:51:32 2701130640 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave. Statement: INSERT INTO t1 SELECT i+5, f FROM t1
|
2015-11-06 19:51:32 2701130640 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ on master and the slave. Statement: INSERT INTO t1 SELECT i+10, f FROM t1
|
2015-11-06 19:51:33 2688547728 [ERROR] Read semi-sync reply network error: (errno: 1158)
|
2015-11-06 19:51:33 2688547728 [Note] Stop semi-sync binlog_dump to slave (server_id: 2)
|
2015-11-06 19:51:33 2701130640 [Note] Event Scheduler: Killing the scheduler thread, thread id 11
|
2015-11-06 19:51:33 2701130640 [Note] Event Scheduler: Waiting for the scheduler thread to reply
|
2015-11-06 19:51:33 2701130640 [Note] Event Scheduler: Stopped
|
2015-11-06 19:51:41 3083623312 [Warning] Timeout waiting for reply of binlog (file: master-bin.000001, pos: 3066), semi-sync up to file master-bin.000001, position 2878.
|
2015-11-06 19:51:41 3083623312 [Note] Semi-sync replication switched OFF.
|
2015-11-06 19:51:41 3084520336 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave. Statement: CREATE PROCEDURE `ev1`() SQL SECURITY INVOKER INSERT INTO t1 VALUES (SLEEP(5),CONCAT('ev1_',CONNECTION_ID()))
|
2015-11-06 19:51:41 3084520336 [Note] Event Scheduler: [root@localhost][test.ev1] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave.
|
2015-11-06 19:51:41 3083922320 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave. Statement: CREATE PROCEDURE `ev2`() SQL SECURITY INVOKER INSERT INTO t1 VALUES (SLEEP(5),CONCAT('ev2_',CONNECTION_ID()))
|
2015-11-06 19:51:41 3083922320 [Note] Event Scheduler: [root@localhost][test.ev2] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statement is unsafe because it uses a system function that may return a different value on the slave.
|
2015-11-06 19:51:41 2701130640 [Note] Semi-sync replication switched OFF.
|
2015-11-06 19:51:41 2701130640 [Note] Semi-sync replication disabled on the master.
|
151106 19:51:41 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.1.9-MariaDB-log
|
key_buffer_size=1048576
|
read_buffer_size=131072
|
max_used_connections=25
|
max_threads=42
|
thread_count=8
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 17686 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0xa103f008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0xb7d5732b thread_stack 0x48000
|
2015-11-06 19:51:41 2987539344 [Note] Start asynchronous binlog_dump to slave (server_id: 2), pos(master-bin.000001, 2878)
|
mysys/stacktrace.c:246(my_print_stacktrace)[0x8873b3d]
|
sql/signal_handler.cc:160(handle_fatal_signal)[0x839d91a]
|
[0xcf7420]
|
/usr/local/mariadb-10.1.9-linux-i686/lib/plugin/semisync_master.so(_ZN18ReplSemiSyncMaster9commitTrxEPKcy+0x37c)[0x7037fc]
|
/usr/local/mariadb-10.1.9-linux-i686/lib/plugin/semisync_master.so(repl_semi_report_commit+0x60)[0x704cb0]
|
sql/rpl_handler.cc:201(Trans_delegate::after_commit(THD*, bool))[0x82c4c5a]
|
sql/handler.cc:1475(ha_commit_trans(THD*, bool))[0x83a5921]
|
sql/transaction.cc:435(trans_commit_stmt(THD*))[0x82cf212]
|
sql/sql_error.h:668(Diagnostics_area::set_overwrite_status(bool))[0x81dafb1]
|
sql/sp_head.cc:3152(sp_instr_stmt::exec_core(THD*, unsigned int*))[0x84cfbc1]
|
sql/sp_head.cc:2916(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x84cfdc7]
|
sql/sp_head.cc:3066(sp_instr_stmt::execute(THD*, unsigned int*))[0x84d4013]
|
sql/sp_head.cc:1317(sp_head::execute(THD*, bool))[0x84d4c85]
|
sql/sp_head.cc:2108(sp_head::execute_procedure(THD*, List<Item>*))[0x84d64a5]
|
sql/event_data_objects.cc:1433(Event_job_data::execute(THD*, bool))[0x82a2714]
|
sql/event_scheduler.cc:327(Event_worker_thread::run(THD*, Event_queue_element_for_exec*))[0x850e55f]
|
sql/event_scheduler.cc:281(event_worker_thread)[0x850e5fd]
|
perfschema/pfs.cc:1863(pfs_spawn_thread)[0x8558838]
|
/lib/libpthread.so.0[0x1c449b]
|
/lib/libc.so.6(clone+0x5e)[0x2a742e]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0xb1833018): INSERT INTO t1 VALUES (SLEEP(5),CONCAT('ev1_',CONNECTION_ID()))
|
Connection ID (thread ID): 13
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
|
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
Writing a core file
|
----------SERVER LOG END-------------
|
|
|
- saving '/usr/local/mariadb-10.1.9-linux-i686/mysql-test/var/3/log/rpl.rpl_semi_sync_event-stmt,xtradb/' to '/usr/local/mariadb-10.1.9-linux-i686/mysql-test/var/log/rpl.rpl_semi_sync_event-stmt,xtradb/'
|
Most often it happens on centos5-x86 builder. I am also getting it while running the test repeatedly many (hundreds) times with the same buildbot binaries on my local CentOS 5 x86 VM. It seems to happen faster if it's run with --force-restart, but it can be a false observation.
Bug was that ReplSemiSyncMaster::commitTrx() was waiting on a condition for state to change, but didn't take into account that one could have disabled semi-sync during the wait.
Looking at the semisync code again the "funny" part is that the above stack trace should never be seen with a production aka. non-debug build of mysqld?
The last two strack trace lines, before the signal handler kicks in, were:
.../semisync_master.so(ActiveTranx::is_tranx_end_pos(...)
.../semisync_master.so(ReplSemiSyncMaster::commitTrx(...)
The only place where commitTrx() calls is_tranx_end_pos is the following assert() though:
At this point, the binlog file and position of this transaction
must have been removed from ActiveTranx.
*/
!active_tranxs_->is_tranx_end_pos(trx_wait_binlog_name,
trx_wait_binlog_pos));
So is_tranx_end_pos() should never be called in a non-debug build
where assert() is just defined as empty?