[MDEV-10854] MariaDB 10.1.17 crashes when SST fails Created: 2016-09-21  Updated: 2022-05-24  Resolved: 2022-05-16

Status: Closed
Project: MariaDB Server
Component/s: Galera SST, wsrep
Affects Version/s: 10.1.16, 10.1.17
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Marc A. Mueller Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

CENTOS 7.2 docker container



 Description   

1.) Start MariaDB to join existing cluster of SYNCED nodes
2.) SST from selected donor DBPROD12 fails (due to misconfiguration: no write permission to /var/lib/mysql on DBPROD12)
3.) mysqld should terminate, but it crashes with SIG11

Log output:

exec /usr/bin/mysqld --user=mysql --console --wsrep-node-name PROD_DB_CPY3 --wsrep-cluster-address=gcomm://192.168.5.0 --wsrep-sst-donor=DBPROD12
2016-09-21  9:47:43 140115759253632 [Note] mysqld (mysqld 10.1.17-MariaDB) starting as process 1 ...
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Read nil XID from storage engines, skipping position init
2016-09-21  9:47:43 140115759253632 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
2016-09-21  9:47:43 140115759253632 [Note] WSREP: wsrep_load(): Galera 25.3.17(r3619) by Codership Oy <info@codership.com> loaded successfully.
2016-09-21  9:47:43 140115759253632 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2016-09-21  9:47:43 140115759253632 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.3.5; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 1G; gcache.size = 300M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = false; pc.ignore_quorum = false; pc
2016-09-21  9:47:43 140115317462784 [Note] WSREP: Service thread queue flushed.
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Assign initial position for certification: -1, protocol version: -1
2016-09-21  9:47:43 140115759253632 [Note] WSREP: wsrep_sst_grab()
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Start replication
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
2016-09-21  9:47:43 140115759253632 [Note] WSREP: protonet asio version 0
2016-09-21  9:47:43 140115759253632 [Note] WSREP: Using CRC-32C for message checksums.
2016-09-21  9:47:43 140115759253632 [Note] WSREP: backend: asio
2016-09-21  9:47:43 140115759253632 [Note] WSREP: gcomm thread scheduling priority set to other:0 
2016-09-21  9:47:43 140115759253632 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2016-09-21  9:47:43 140115759253632 [Note] WSREP: restore pc from disk failed
2016-09-21  9:47:43 140115759253632 [Note] WSREP: GMCast version 0
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2016-09-21  9:47:43 140115759253632 [Note] WSREP: EVS version 0
2016-09-21  9:47:43 140115759253632 [Note] WSREP: gcomm: connecting to group 'db_prod', peer '192.168.5.0:'
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to bab149a0 tcp://192.168.5.0:4567
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://192.168.1.0:4567 tcp://192.168.3.0:4567 tcp://192.168.3.4:4567 
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to e20d6c1d tcp://192.168.1.0:4567
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to 34c496e7 tcp://192.168.3.0:4567
2016-09-21  9:47:43 140115759253632 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') connection established to 379ead20 tcp://192.168.3.4:4567
2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring 34c496e7 at tcp://192.168.3.0:4567 stable
2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring 379ead20 at tcp://192.168.3.4:4567 stable
2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring bab149a0 at tcp://192.168.5.0:4567 stable
2016-09-21  9:47:43 140115759253632 [Note] WSREP: declaring e20d6c1d at tcp://192.168.1.0:4567 stable
2016-09-21  9:47:44 140115759253632 [Note] WSREP: Node 34c496e7 state prim
2016-09-21  9:47:44 140115759253632 [Note] WSREP: view(view_id(PRIM,34c496e7,19) memb {
	34c496e7,0
	379ead20,0
	719a33a6,0
	bab149a0,0
	e20d6c1d,0
} joined {
} left {
} partitioned {
})
2016-09-21  9:47:44 140115759253632 [Note] WSREP: save pc into disk
2016-09-21  9:47:45 140115759253632 [Note] WSREP: gcomm: connected
2016-09-21  9:47:45 140115759253632 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
2016-09-21  9:47:45 140115759253632 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
2016-09-21  9:47:45 140115759253632 [Note] WSREP: Opened channel 'db_prod'
2016-09-21  9:47:45 140115759253632 [Note] WSREP: Waiting for SST to complete.
2016-09-21  9:47:45 140115246577408 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 2, memb_num = 5
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: sent state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 0 (DBPROD12)
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 1 (PROD_DB_CPY2)
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 3 (DBPROD13)
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 4 (DBPROD11)
2016-09-21  9:47:45 140115246577408 [Note] WSREP: STATE EXCHANGE: got state msg: 723eb4c1-7fe0-11e6-8c54-fe7027df9419 from 2 (PROD_DB_CPY3)
2016-09-21  9:47:45 140115246577408 [Note] WSREP: Quorum results:
	version    = 3,
	component  = PRIMARY,
	conf_id    = 18,
	members    = 4/5 (joined/total),
	act_id     = 124760,
	last_appl. = -1,
	protocols  = 0/7/3 (gcs/repl/appl),
	group UUID = bab21960-6ea4-11e6-ae86-86fdc580dcae
2016-09-21  9:47:45 140115246577408 [Note] WSREP: Flow-control interval: [36, 36]
2016-09-21  9:47:45 140115246577408 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 124760)
2016-09-21  9:47:45 140115758938880 [Note] WSREP: State transfer required: 
	Group state: bab21960-6ea4-11e6-ae86-86fdc580dcae:124760
	Local state: 00000000-0000-0000-0000-000000000000:-1
2016-09-21  9:47:45 140115758938880 [Note] WSREP: New cluster view: global state: bab21960-6ea4-11e6-ae86-86fdc580dcae:124760, view# 19: Primary, number of nodes: 5, my index: 2, protocol version 3
2016-09-21  9:47:45 140115758938880 [Warning] WSREP: Gap in state sequence. Need state transfer.
2016-09-21  9:47:45 140115217217280 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.3.5' --datadir '/var/lib/mysql/'   --parent '1' --binlog 'proddbcpy-bin' '
2016-09-21  9:47:45 140115758938880 [Note] WSREP: Prepared SST request: rsync|192.168.3.5:4444/rsync_sst
2016-09-21  9:47:45 140115758938880 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2016-09-21  9:47:45 140115758938880 [Note] WSREP: REPL Protocols: 7 (3, 2)
2016-09-21  9:47:45 140115317462784 [Note] WSREP: Service thread queue flushed.
2016-09-21  9:47:45 140115758938880 [Note] WSREP: Assign initial position for certification: 124760, protocol version: 3
2016-09-21  9:47:45 140115317462784 [Note] WSREP: Service thread queue flushed.
2016-09-21  9:47:45 140115758938880 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (bab21960-6ea4-11e6-ae86-86fdc580dcae): 1 (Operation not permitted)
	 at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
2016-09-21  9:47:45 140115246577408 [Note] WSREP: Member 2.0 (PROD_DB_CPY3) requested state transfer from 'DBPROD12'. Selected 0.0 (DBPROD12)(SYNCED) as donor.
2016-09-21  9:47:45 140115246577408 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 124760)
2016-09-21  9:47:45 140115758938880 [Note] WSREP: Requesting state transfer: success, donor: 0
2016-09-21  9:47:47 140115246577408 [Warning] WSREP: 0.0 (DBPROD12): State transfer to 2.0 (PROD_DB_CPY3) failed: -255 (Unknown error 255)

2016-09-21  9:47:47 140115246577408 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():736: Will never receive state. Need to abort.
2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: terminating thread
2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: joining thread
2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: closing backend
2016-09-21  9:47:47 140115246577408 [Note] WSREP: view(view_id(NON_PRIM,34c496e7,19) memb {
	719a33a6,0
} joined {
} left {
} partitioned {
	34c496e7,0
	379ead20,0
	bab149a0,0
	e20d6c1d,0
})
2016-09-21  9:47:47 140115246577408 [Note] WSREP: (719a33a6, 'tcp://0.0.0.0:4567') turning message relay requesting off
2016-09-21  9:47:47 140115246577408 [Note] WSREP: view((empty))
2016-09-21  9:47:47 140115246577408 [Note] WSREP: gcomm: closed
2016-09-21  9:47:47 140115246577408 [Note] WSREP: mysqld: Terminated.
160921  9:47:47 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
Server version: 10.1.17-MariaDB
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=153
thread_count=2
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 336049 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48400
mysqld(my_print_stacktrace+0x2e)[0x7f6f3eb95ebe]
mysqld(handle_fatal_signal+0x2d5)[0x7f6f3e6bc915]
/lib64/libpthread.so.0(+0xf100)[0x7f6f3dcda100]
/lib64/libc.so.6(abort+0x297)[0x7f6f3c034e37]
/usr/lib64/galera/libgalera_smm.so(+0x803c7)[0x7f6f3681b3c7]
addr2line: 'mysqld': No such file
/usr/lib64/galera/libgalera_smm.so(_Z13gcs_core_recvP8gcs_coreP12gcs_act_rcvdx+0x1065)[0x7f6f36929855]
/usr/lib64/galera/libgalera_smm.so(+0x191a24)[0x7f6f3692ca24]
/lib64/libpthread.so.0(+0x7dc5)[0x7f6f3dcd2dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f6f3c0f4ced]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Jan Lindström (Inactive) [ 2022-05-16 ]

10.1 has been EOL.

Generated at Thu Feb 08 07:45:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.