[MDEV-23612] galera_sr.galera_sr_shutdown_master MTR failed: WSREP_SST: [ERROR] Possible timeout in receving first data from donor in gtid stage Created: 2020-08-27  Updated: 2022-05-24  Resolved: 2022-05-24

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.5.6
Fix Version/s: 10.5.16, 10.6.8, 10.7.4, 10.8.3

Type: Bug Priority: Major
Reporter: Stepan Patryshev (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

kvm-zyp-sles123-amd64


Attachments: Zip Archive MDEV-23612_104_crash_logs_201214.zip    
Issue Links:
Blocks
blocks MDEV-22122 Galera test failures on 10.5 Open

 Description   

galera_sr.galera_sr_shutdown_master MTR failed on BB 10.5: "WSREP_SST: [ERROR] Possible timeout in receving first data from donor in gtid stage".
It seems to be a sporadic issue.

stdio.log:

10.5.6 8f8f2aea93835899345454f87768fd649749e29c

galera_sr.galera_sr_shutdown_master 'innodb' w2 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2020-08-26 07:20:57
line
WSREP_SST: [ERROR] Possible timeout in receving first data from donor in gtid stage (20200826 07:20:53.339)
WSREP_SST: [ERROR] Cleanup after exit with status:32 (20200826 07:20:53.343)
^ Found warnings in /dev/shm/var/2/log/mysqld.2.err
ok
 
worker[2] > Restart  - not started
worker[2] > Restart  - not started



 Comments   
Comment by Stepan Patryshev (Inactive) [ 2020-12-23 ]

It failed on BB, 10.5 with signal 11.
stdio.log:

10.5.9, e8217d070fc3e60870131615a48515836c773b07, kvm-deb-xenial-amd64

galera_sr.galera_sr_shutdown_master 'innodb' w1 [ fail ]
        Test ended at 2020-12-14 14:47:50
 
CURRENT_TEST: galera_sr.galera_sr_shutdown_master
mysqltest: In included file "./include/galera_init.inc": 
included from ./include/galera_cluster.inc at line 16:
included from /usr/share/mysql/mysql-test/suite/galera_sr/t/galera_sr_shutdown_master.test at line 6:
At line 25: query 'connect $galera_connection_name,127.0.0.1,root,,test,$_galera_port,' failed: 2013: Lost connection to MySQL server at 'handshake: reading initial communication packet', system error: 11
 
 
Server [mysqld.1 - pid: 5735, winpid: 5735, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
201214 14:47:36 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.9-MariaDB-1:10.5.9+maria~xenial-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=64
max_threads=153
thread_count=67
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63638 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f87bcb3d778
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f87d0357cb8 thread_stack 0x49000
??:0(my_print_stacktrace)[0x55d6e0acae2e]
??:0(handle_fatal_signal)[0x55d6e04ff1bf]
??:0(__restore_rt)[0x7f8821e8d390]
??:0(thd_clear_errors(THD*))[0x55d6e02aaa6e]
??:0(THD::change_user())[0x55d6e02af941]
??:0(THD::reset_for_reuse())[0x55d6e02afb29]
??:0(CONNECT::create_thd(THD*))[0x55d6e03f0a24]
??:0(do_handle_one_connection(CONNECT*, bool))[0x55d6e03f103b]
??:0(handle_one_connection)[0x55d6e03f1454]
??:0(MyCTX_nopad::finish(unsigned char*, unsigned int*))[0x55d6e07358f1]
??:0(start_thread)[0x7f8821e836ba]
x86_64/clone.S:111(clone)[0x7f882132a4dd]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 319
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
Writing a core file...
Working directory at /dev/shm/var/1/mysqld.1/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             23720                23720                processes 
Max open files            1024                 1024                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23720                23720                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/share/apport/apport %p %s %c %P
 
----------SERVER LOG END-------------
 
 
 - found 'core' (0/0)
 
Trying 'dbx' to get a backtrace
 
Trying 'gdb' to get a backtrace from coredump /dev/shm/var/1/log/galera_sr.galera_sr_shutdown_master-innodb/mysqld.1/data/core
 
Trying 'lldb' to get a backtrace from coredump /dev/shm/var/1/log/galera_sr.galera_sr_shutdown_master-innodb/mysqld.1/data/core
 - deleting it, already saved 0
 - saving '/dev/shm/var/1/log/galera_sr.galera_sr_shutdown_master-innodb/' to '/dev/shm/var/log/galera_sr.galera_sr_shutdown_master-innodb/'
 
Retrying test galera_sr.galera_sr_shutdown_master, attempt(2/3)...
 
worker[1] > Restart  - not started
worker[1] > Restart  - not started

10.5.9 Server crash logs

Generated at Thu Feb 08 09:23:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.