[MDEV-14069] galera_sst_mysqldump.test fails with: 'show status' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 0... Created: 2017-10-14  Updated: 2018-04-27  Resolved: 2018-04-27

Status: Closed
Project: MariaDB Server
Component/s: Galera SST, Tests
Affects Version/s: 10.1, 10.2, 10.3
Fix Version/s: 10.1.33, 10.2.15, 10.3.7

Type: Bug Priority: Critical
Reporter: Andrii Nikitin (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-15409 make sure every sst script is tested ... Closed
is blocked by MDEV-13329 node cannot join cluster if being mon... Closed
is blocked by MDEV-14070 remove check: WSREP: wsrep_sst_receiv... Closed
Relates
relates to MDEV-13549 Galera 3 test failures Closed

 Description   

This test fails always:

 bld/mysql-test/mtr galera_sst_mysqldump --mem
Logging: /home/a/e1/m1-10.1/src/mysql-test/mysql-test-run.pl  galera_sst_mysqldump --mem
vardir: /home/a/e1/_depot/m-branch/m1-10.1-bld/mysql-test/var
Checking leftover processes...
Removing old var directory...
Creating var directory '/home/a/e1/_depot/m-branch/m1-10.1-bld/mysql-test/var'...
 - symlinking 'var' to '/dev/shm/var_auto_tjWY'
Checking supported features...
MariaDB Version 10.1.29-MariaDB-debug
 - SSL connections supported
 - binaries are debug compiled
Sphinx 'indexer' binary not found, sphinx suite will be skipped
Collecting tests...
Installing system database...
 
==============================================================================
 
TEST                                      RESULT   TIME (ms) or COMMENT
--------------------------------------------------------------------------
 
worker[1] Using MTR_BUILD_THREAD 300, with reserved ports 16000..16019
galera.galera_sst_mysqldump 'innodb_plugin' [ fail ]
        Test ended at 2017-10-14 14:11:25
 
CURRENT_TEST: galera.galera_sst_mysqldump
mysqltest: In included file "./include/wait_until_connected_again.inc": 
included from ./suite/galera/include/galera_load_provider.inc at line 10:
included from ./suite/galera/include/galera_st_disconnect_slave.inc at line 57:
included from /home/a/e1/_depot/m-branch/m1-10.1-src/mysql-test/suite/galera/t/galera_sst_mysqldump.test at line 9:
At line 26: query 'show status' failed with wrong errno 1205: 'Lock wait timeout exceeded; try restarting transaction', instead of 0...
 
The result from queries just before the failure was:
< snip >
INSERT INTO t1 VALUES ('node1_committed_during');
INSERT INTO t1 VALUES ('node1_committed_during');
INSERT INTO t1 VALUES ('node1_committed_during');
INSERT INTO t1 VALUES ('node1_committed_during');
INSERT INTO t1 VALUES ('node1_committed_during');
COMMIT;
START TRANSACTION;
INSERT INTO t1 VALUES ('node1_to_be_committed_after');
INSERT INTO t1 VALUES ('node1_to_be_committed_after');
INSERT INTO t1 VALUES ('node1_to_be_committed_after');
INSERT INTO t1 VALUES ('node1_to_be_committed_after');
INSERT INTO t1 VALUES ('node1_to_be_committed_after');
SET AUTOCOMMIT=OFF;
START TRANSACTION;
INSERT INTO t1 VALUES ('node1_to_be_rollbacked_after');
INSERT INTO t1 VALUES ('node1_to_be_rollbacked_after');
INSERT INTO t1 VALUES ('node1_to_be_rollbacked_after');
INSERT INTO t1 VALUES ('node1_to_be_rollbacked_after');
INSERT INTO t1 VALUES ('node1_to_be_rollbacked_after');
Loading wsrep provider ...
 
More results from queries before failure can be found in /home/a/e1/_depot/m-branch/m1-10.1-bld/mysql-test/var/log/galera_sst_mysqldump.log
 
 - saving '/home/a/e1/_depot/m-branch/m1-10.1-bld/mysql-test/var/log/galera.galera_sst_mysqldump-innodb_plugin/' to '/home/a/e1/_depot/m-branch/m1-10.1-bld/mysql-test/var/log/galera.galera_sst_mysqldump-innodb_plugin/'
 
Only  1  of 2 completed.
--------------------------------------------------------------------------
The servers were restarted 0 times
Spent 0.000 of 59 seconds executing testcases
 
Failure: Failed 1/1 tests, 0.00% were successful.
 
Failing test(s): galera.galera_sst_mysqldump
 
The log files in var/log may give you some hint of what went wrong.
 
If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
 
mysql-test-run: *** ERROR: there were failing test cases



 Comments   
Comment by Andrii Nikitin (Inactive) [ 2017-10-31 ]

After changing some include files the joiner seem to hanging after first rsync sst and these two threads are probably to blame:

Thread 24 (Thread 0x7fd279095700 (LWP 10989)):
#0  0x00007fd2772c1d8d in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055d05a4b9fc4 in poll (__timeout=28800000, __nfds=1, __fds=0x7fd279094758) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#2  vio_io_wait (vio=vio@entry=0x7fd276441c08, event=event@entry=VIO_IO_EVENT_READ, timeout=28800000) at /home/a/env1/m3-10.0-galera/src/vio/viosocket.c:992
#3  0x000055d05a4ba068 in vio_socket_io_wait (vio=vio@entry=0x7fd276441c08, event=event@entry=VIO_IO_EVENT_READ) at /home/a/env1/m3-10.0-galera/src/vio/viosocket.c:108
#4  0x000055d05a4ba171 in vio_read (vio=0x7fd276441c08, buf=0x7fd268fca008 "<", size=4) at /home/a/env1/m3-10.0-galera/src/vio/viosocket.c:184
#5  0x000055d059ebffda in my_real_read (net=0x7fd267b23238, complen=complen@entry=0x7fd279094940, header=header@entry=1 '\001') at /home/a/env1/m3-10.0-galera/src/sql/net_serv.cc:874
#6  0x000055d059ec0df1 in my_net_read_packet (net=net@entry=0x7fd267b23238, read_from_server=read_from_server@entry=1 '\001') at /home/a/env1/m3-10.0-galera/src/sql/net_serv.cc:1137
#7  0x000055d059f46ff0 in do_command (thd=0x7fd267b23008) at /home/a/env1/m3-10.0-galera/src/sql/sql_parse.cc:953
#8  0x000055d059ffe98c in do_handle_one_connection (thd_arg=thd_arg@entry=0x7fd267b23008) at /home/a/env1/m3-10.0-galera/src/sql/sql_connect.cc:1394
#9  0x000055d059ffe9f8 in handle_one_connection (arg=arg@entry=0x7fd267b23008) at /home/a/env1/m3-10.0-galera/src/sql/sql_connect.cc:1306
#10 0x000055d05a2b3b14 in pfs_spawn_thread (arg=0x7fd276441e08) at /home/a/env1/m3-10.0-galera/src/storage/perfschema/pfs.cc:1860
#11 0x00007fd2784d56da in start_thread (arg=0x7fd279095700) at pthread_create.c:456
#12 0x00007fd2772cdd7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105
 
Thread 23 (Thread 0x7fd279127700 (LWP 10986)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055d059eb6b3f in inline_mysql_cond_wait (src_file=0x55d05a4dfb88 "/home/a/env1/m3-10.0-galera/src/sql/mysqld.cc", src_line=5658, mutex=0x55d05ac84540 <LOCK_thread_count>, 
    that=0x55d05ac83e80 <COND_thread_count>) at /home/a/env1/m3-10.0-galera/src/include/mysql/psi/mysql_thread.h:1154
#2  wsrep_close_client_connections (wait_to_end=wait_to_end@entry=1 '\001') at /home/a/env1/m3-10.0-galera/src/sql/mysqld.cc:5658
#3  0x000055d05a044f18 in wsrep_view_handler_cb (app_ctx=<optimized out>, recv_ctx=<optimized out>, view=0x7fd26a85a400, state=<optimized out>, state_len=<optimized out>, sst_req=0x7fd279125ca0, 
    sst_req_len=0x7fd279125ca8) at /home/a/env1/m3-10.0-galera/src/sql/wsrep_mysqld.cc:288
#4  0x00007fd271979550 in galera::ReplicatorSMM::process_conf_change (this=0x7fd2727f5000, recv_ctx=0x7fd26a812008, view_info=..., repl_proto=7, next_state=galera::Replicator::S_CONNECTED, seqno_l=<optimized out>)
    at galera/src/replicator_smm.cpp:1414
#5  0x00007fd27194f19b in galera::GcsActionSource::dispatch (this=this@entry=0x7fd2727f5730, recv_ctx=recv_ctx@entry=0x7fd26a812008, act=..., exit_loop=@0x7fd2791261ff: false)
    at galera/src/gcs_action_source.cpp:139
#6  0x00007fd271950af6 in galera::GcsActionSource::process (this=0x7fd2727f5730, recv_ctx=0x7fd26a812008, exit_loop=@0x7fd2791261ff: false) at galera/src/gcs_action_source.cpp:181
#7  0x00007fd271978e4b in galera::ReplicatorSMM::async_recv (this=0x7fd2727f5000, recv_ctx=0x7fd26a812008) at galera/src/replicator_smm.cpp:371
#8  0x00007fd27199332b in galera_recv (gh=<optimized out>, recv_ctx=<optimized out>) at galera/src/wsrep_provider.cpp:244
#9  0x000055d05a0502ec in wsrep_replication_process (thd=0x7fd26a812008) at /home/a/env1/m3-10.0-galera/src/sql/wsrep_thd.cc:359
#10 0x000055d059ebacce in start_wsrep_THD (arg=0x55d05a050290 <wsrep_replication_process(THD*)>) at /home/a/env1/m3-10.0-galera/src/sql/mysqld.cc:5417
#11 0x00007fd2784d56da in start_thread (arg=0x7fd279127700) at pthread_create.c:456
#12 0x00007fd2772cdd7f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:105

It looks Galera is waiting mtr thread to disconnect gracefully, which is just looping in wait_until_connected_again.inc

Comment by Andrii Nikitin (Inactive) [ 2017-11-02 ]

I can make test pass with following changes in 10.0-galera:

index 6f64ef45440..8cbdcea09f0 100644
--- a/mysql-test/include/wait_until_connected_again.inc
+++ b/mysql-test/include/wait_until_connected_again.inc
@@ -14,8 +14,15 @@ while ($mysql_errno)
   # Strangely enough, the server might return "Too many connections"
   # while being shutdown, thus 1040 is an "allowed" error
   # See BUG#36228
-  --error 0,1040,1053,2002,2003,2005,2006,2013,1927
-  show status;
+  --error 0,1053,2002,2003,2005,2006,2013,1927,1047
+  select 1;
 
   dec $counter;
   if (!$counter)
diff --git a/mysql-test/suite/galera/t/galera_sst_mysqldump.cnf b/mysql-test/suite/galera/t/galera_sst_mysqldump.cnf
index 574ae28b54a..4208637b69d 100644
--- a/mysql-test/suite/galera/t/galera_sst_mysqldump.cnf
+++ b/mysql-test/suite/galera/t/galera_sst_mysqldump.cnf
@@ -5,7 +5,11 @@
 
 [mysqld.1]
 wsrep_provider_options='base_port=@mysqld.1.#galera_port;gcache.size=1;pc.ignore_sb=true'
+# must present because test uses both 192.0.0.1 and 192.0.0.2
+bind-address=0.0.0.0
 
 [mysqld.2]
 wsrep_provider_options='base_port=@mysqld.2.#galera_port;gcache.size=1;pc.ignore_sb=true'
+# must present because test uses both 192.0.0.1 and 192.0.0.2
+bind-address=0.0.0.0
diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
index 49988287933..3a162f88e2f 100644
--- a/sql/wsrep_mysqld.cc
+++ b/sql/wsrep_mysqld.cc
 @@ -285,7 +285,7 @@ wsrep_view_handler_cb (void*                    app_ctx,
     if (!wsrep_before_SE())
     {
         WSREP_DEBUG("[debug]: closing client connections for PRIM");
-        wsrep_close_client_connections(TRUE);
+        wsrep_close_client_connections(FALSE);
     }
 
     ssize_t const req_len= wsrep_sst_prepare (sst_req);

(Change in wsrep_mysqld.cc is related to MDEV-13329 )

Now need to find why 'show status' returns 1205

Comment by Andrii Nikitin (Inactive) [ 2017-11-02 ]

MDEV-14070 must be fixed because mtr adds bind-address=127.0.0.1 to all tests by default and there is really no reasons to have it 0.0.0.0 for sst mysqldump method

Comment by Andrii Nikitin (Inactive) [ 2017-11-02 ]

Ok, actually error 1205 happens with wsrep_sync_wait=15 , so final patch to make test pass with dependence on MDEV-13329 :

--- a/mysql-test/suite/galera/t/galera_sst_mysqldump.cnf
+++ b/mysql-test/suite/galera/t/galera_sst_mysqldump.cnf
@@ -5,7 +5,13 @@
 
 [mysqld.1]
 wsrep_provider_options='base_port=@mysqld.1.#galera_port;gcache.size=1;pc.ignore_sb=true'
+# must present because test uses both 192.0.0.1 and 192.0.0.2
+bind-address=0.0.0.0
+wsrep_sync_wait=0
 
 [mysqld.2]
 wsrep_provider_options='base_port=@mysqld.2.#galera_port;gcache.size=1;pc.ignore_sb=true'
+# must present because test uses both 192.0.0.1 and 192.0.0.2
+bind-address=0.0.0.0
+wsrep_sync_wait=0

Comment by Andrii Nikitin (Inactive) [ 2017-11-02 ]

Please review the patch in the last comment , which makes the test pass in 10.0-galera debug build with patch from MDEV-13329

Generated at Thu Feb 08 08:10:41 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.