[MDEV-517] Galera: Server crashes with "no such a transition REPLICATING -> REPLICATING" on concurrent CREATE TABLE .. AS Created: 2012-09-09  Updated: 2012-09-18  Resolved: 2012-09-18

Status: Closed
Project: MariaDB Server
Component/s: None
Affects Version/s: 5.5.25-galera
Fix Version/s: 5.5.28a-galera

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Seppo Jaakola
Resolution: Fixed Votes: 0
Labels: None

Attachments: File log.err     File threads1     Text File ubuntu11.log    
Issue Links:
Relates

 Description   

WSREP: FSM: no such a transition REPLICATING -> REPLICATING
mysqld got signal 6 ;

#4  0x00007f870932ab0b in __GI_abort () at abort.c:92
#5  0x00007f8708673b21 in galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to (this=0x7f86d0018ca0, state=galera::TrxHandle::S_REPLICATING) at galera/src/fsm.hpp:81
#6  0x00007f87086697f6 in set_state (state=galera::TrxHandle::S_REPLICATING, this=0x7f86d0018c10) at galera/src/trx_handle.hpp:196
#7  galera::ReplicatorSMM::replicate (this=0x3276fc0, trx=0x7f86d0018c10) at galera/src/replicator_smm.cpp:513
#8  0x00007f870868090f in galera_pre_commit (gh=<optimized out>, conn_id=<optimized out>, trx_handle=<optimized out>, rbr_data=0x42669e0, rbr_data_len=172, flags=0, global_seqno=0x7f86d0012e40) at galera/src/wsrep_provider.cpp:386
#9  0x00000000006659bc in wsrep_run_wsrep_commit (thd=0x7f86d0010d10, hton=<optimized out>, all=<optimized out>) at sql/wsrep_hton.cc:332
#10 0x00000000006665a3 in wsrep_prepare (hton=0x330df60, thd=0x7f86d0010d10, all=false) at sql/wsrep_hton.cc:118
#11 0x00000000006b0a40 in ha_commit_trans (thd=0x7f86d0010d10, all=<optimized out>) at sql/handler.cc:1276
#12 0x00000000006443f4 in trans_commit_stmt (thd=0x7f86d0010d10) at sql/transaction.cc:325
#13 0x0000000000579a59 in select_create::send_eof (this=0x7f86c8002268) at sql/sql_insert.cc:4157
#14 0x00000000005d3195 in return_zero_rows (all_fields=..., having=0x0, info=<optimized out>, select_options=<optimized out>, send_row=false, fields=..., tables=<optimized out>, result=0x7f86c8002268, join=0x7f86c8002340) at sql/sql_select.cc:11036
#15 JOIN::exec (this=0x7f86c8002340) at sql/sql_select.cc:2277
#16 0x00000000005d53b4 in mysql_select (thd=0x7f86d0010d10, rref_pointer_array=<optimized out>, tables=0x7f86c8001190, wild_num=0, fields=<optimized out>, conds=<optimized out>, og_num=0, order=0x0, group=0x0, having=0x0, proc_param=0x0, select_options=2953054976, result=0x7f86c8002268, unit=0x7f86d0013080, select_lex=0x7f86d0013758) at sql/sql_select.cc:3044
#17 0x00000000005d9534 in handle_select (thd=0x7f86d0010d10, lex=0x7f86d0012fd0, result=0x7f86c8002268, setup_tables_done_option=0) at sql/sql_select.cc:313
#18 0x0000000000598b14 in mysql_execute_command (thd=0x7f86d0010d10) at sql/sql_parse.cc:2972
#19 0x0000000000599db9 in mysql_parse (parser_state=0x7f86fc30d340, thd=0x7f86d0010d10, rawbuf=<optimized out>, length=<optimized out>) at sql/sql_parse.cc:6235
#20 mysql_parse (thd=0x7f86d0010d10, rawbuf=<optimized out>, length=103, parser_state=0x7f86fc30d340) at sql/sql_parse.cc:6160
#21 0x000000000059bcfc in dispatch_command (command=COM_QUERY, thd=0x7f86d0010d10, packet=0x7f86d002fec1 "CREATE TABLE IF NOT EXISTS test.ts_3 AS SELECT SCHEMA_NAME FROM information_schema.schemata WHERE 1 = 0", packet_length=<optimized out>) at sql/sql_parse.cc:1240
#22 0x000000000059c412 in do_command (thd=0x7f86d0010d10) at sql/sql_parse.cc:899
#23 0x0000000000638394 in do_handle_one_connection (thd_arg=<optimized out>) at sql/sql_connect.cc:1278
#24 0x000000000063843a in handle_one_connection (arg=0x7f86d0010d10) at sql/sql_connect.cc:1186
#25 0x00007f8709c24efc in start_thread (arg=0x7f86fc30e700) at pthread_create.c:304
#26 0x00007f87093d459d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112

All threads stack trace is attached.

bzr version-info

revision-id: sergii@pisem.net-20120908075116-u2v35aaixa3nnhrl
date: 2012-09-08 09:51:16 +0200
build-date: 2012-09-09 02:36:37 +0400
revno: 3346

Also reproducible on 5.5.25-galera release binaries (x64_64 binary tarball).

To reproduce, start the cluster and run the following script on one of nodes. Please note that you need either to have mysqlslap 5.5 on the path, or provide full path to it in slap option. If you have a non-default server port (not 3306), provide it in port option. Default number of concurrent instances is 10; if you are getting the crash fast, you can try to decrease it via clients option. If you are not getting the crash, try to increase it and maybe the number of cycles (cycles, default 50). On my machine, 2 clients is enough to produce the crash, but with 10 it happens almost momentarily.

 
#!/bin/bash
 
slap=`which mysqlslap`
port=3306
num_of_clients=10
num_of_cycles=50
options="--create-schema=test --concurrency=1 --detach=4 --protocol=tcp"
 
for arg do
  val=`echo "$arg" | sed -e 's;^--[^=]*=;;'`
  case "$arg" in
    --port=*) port="$val" ;;
    --slap=*) slap="$val" ;;
    --clients=*) num_of_clients="$val" ;;
    --cycles=*) num_of_cycles="$val" ;;
    *) options="$options $val" ;;
  esac
done
 
 
if [ -z "$slap" ]
then
  echo ERROR: mysqlslap is not found.
  echo Either provide it as --slap option,
  echo or make sure it is on the path
  exit 1
fi
 
c=1
while [ $c -le $num_of_clients ] ; do
  echo CREATE TABLE IF NOT EXISTS test.ts_$c AS SELECT SCHEMA_NAME FROM information_schema.schemata WHERE 1 = 0 > slap_client_$c
  echo CREATE TABLE IF NOT EXISTS test.tr_$c  AS SELECT ROUTINE_SCHEMA FROM information_schema.routines WHERE 1 = 0 >> slap_client_$c
  echo DROP TABLE IF EXISTS test.ts_$c >> slap_client_$c
  echo DROP TABLE IF EXISTS test.tr_$c >> slap_client_$c
 
  (( c += 1 ))
done
 
c=1
(( num_of_queries = num_of_cycles * 4 ))
while [ $c -lt $num_of_clients ] ; do
  $slap --silent --query=slap_client_$c --number_of_queries=$num_of_queries --port=$port $options &
  (( c += 1 ))
done
$slap --silent --query=slap_client_$c --number_of_queries=$num_of_queries --port=$port $options
 

My server command line:

maria-5.5-galera/sql/mysqld --no-defaults --datadir=maria-5.5-galera/data1 --wsrep_provider=galera-23.2.1-src/libgalera_smm.so --wsrep_sst_method=rsync --core --binlog-format=row --default-storage-engine=InnoDB --innodb_autoinc_lock_mode=2 --innodb_locks_unsafe_for_binlog=1 --innodb_flush_log_at_trx_commit=0 --log-error=log.err --wsrep_sst_auth=galera:galera --basedir=maria-5.5-galera/ --port=8306 --loose-lc-messages-dir=maria-5.5-galera/sql/share --socket=/tmp/galera-1.sock --tmpdir=maria-5.5-galera/data1/tmp --general-log=1 --wsrep_cluster_address=gcomm:// --core



 Comments   
Comment by Seppo Jaakola [ 2012-09-09 ]

The problem has happened in the crashing thread during the processing of previous SQL statement. Do you have mysql error log saved from the test run? It would be good to analyze.
If you will reproduce, set wsrep_debug=1 for the test run. It may log something useful for the troubleshooting.

Comment by Elena Stepanova [ 2012-09-09 ]

I've put the test script in the bug description, does it not work for you?

Comment by Elena Stepanova [ 2012-09-09 ]

I ran the server with wsrep_debug=1, but I don't see and debug records in there. Anyway, I've attached the error and general log (log.err and ubuntu11.log, correspondingly); but please do try the script from the description, it will be much more efficient if you're able to reproduce it locally.

Comment by Seppo Jaakola [ 2012-09-18 ]

CREATE TABLE AS SELECT.. is not processed as atomic operation, but is cut in two halves:
1. The table create is replicated and applied as TO isolated operation, up front in all nodes
2. Inserts are replicated at commit time through write set replication

Comment by Seppo Jaakola [ 2012-09-18 ]

fix merged in from upstream

Generated at Thu Feb 08 06:29:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.