Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Won't Fix
-
10.1.28
Description
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required:
|
Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977
|
Local state: 00000000-0000-0000-0000-000000000000:-1
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3
|
2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer.
|
2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' '
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3
|
2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed.
|
2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted)
|
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
|
2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor.
|
2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977)
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977)
|
2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S
|
2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete.
|
WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145)
|
db-g3-bin.000042
|
ls: cannot access db-node1-binlog.*: No such file or directory
|
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156)
|
WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662)
|
2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda.
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453
|
2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group.
|
2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
|
2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log
|
2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log
|
InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041
|
2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active.
|
2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start
|
2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235
|
2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled.
|
2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started
|
2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'.
|
 |
Status information:
|
 |
Current dir: /app/mysql/galera/
|
Running threads: 2 Stack size: 295936
|
Current locks:
|
lock: 0x7fc98da48788:
|
 |
lock: 0x7fc9cdc6a508:
|
 |
lock: 0x7fcc598b5520:
|
 |
lock: 0x7fcc598b5440:
|
 |
lock: 0x7fcc598b4480:
|
 |
lock: 0x7fcc598b43a0:
|
 |
lock: 0x7fcc598b42c0:
|
 |
lock: 0x7fcc598b4640:
|
 |
lock: 0x7fcc598b2880:
|
 |
lock: 0x7fcc598b5360:
|
 |
lock: 0x7fcc598b2f80:
|
 |
lock: 0x7fcc598b3220:
|
 |
lock: 0x7fcc598b2ea0:
|
 |
lock: 0x7fcc598b3140:
|
 |
lock: 0x7fcc598b2dc0:
|
 |
lock: 0x7fcc598b3060:
|
 |
lock: 0x7fcc598b35a0:
|
 |
lock: 0x7fcc598b3680:
|
 |
lock: 0x7fcc598b3760:
|
 |
lock: 0x7fcc598b2ce0:
|
 |
lock: 0x7fcc598b2a40:
|
 |
lock: 0x7fcc598b2c00:
|
 |
lock: 0x7fcc598b2960:
|
 |
lock: 0x7fcc598b2b20:
|
 |
lock: 0x7fcc598b3300:
|
 |
lock: 0x7fcc598b33e0:
|
 |
lock: 0x7fcc598b34c0:
|
 |
lock: 0x7fcc598b4fe0:
|
 |
lock: 0x7fcc598b5280:
|
 |
lock: 0x7fcc598b51a0:
|
 |
lock: 0x7fcc598b50c0:
|
 |
lock: 0x7fcc598b4c60:
|
 |
lock: 0x7fcc598b4b80:
|
 |
lock: 0x7fcc598b4aa0:
|
 |
lock: 0x7fcc598b49c0:
|
 |
lock: 0x7fcc598b48e0:
|
 |
lock: 0x7fcc598b4e20:
|
 |
lock: 0x7fcc598b4800:
|
 |
lock: 0x7fcc598b4720:
|
 |
lock: 0x7fcc598b4f00:
|
 |
lock: 0x7fcc598b4560:
|
 |
lock: 0x7fcc598b4100:
|
 |
lock: 0x7fcc598b41e0:
|
 |
lock: 0x7fcc598b4020:
|
 |
lock: 0x7fcc598b3f40:
|
 |
lock: 0x7fcc598b3bc0:
|
 |
lock: 0x7fcc598b3e60:
|
 |
lock: 0x7fcc598b3d80:
|
 |
lock: 0x7fcc598b3ae0:
|
 |
lock: 0x7fcc598b3ca0:
|
 |
lock: 0x7fcc598b3840:
|
 |
lock: 0x7fcc598b3920:
|
 |
lock: 0x7fcc598b3a00:
|
 |
lock: 0x7fcc598b4d40:
|
 |
 |
Key caches:
|
default
|
Buffer_size: 134217728
|
Block_size: 1024
|
Division_limit: 100
|
Age_threshold: 300
|
Partitions: 0
|
blocks used: 0
|
not flushed: 0
|
w_requests: 0
|
writes: 0
|
r_requests: 0
|
reads: 0
|
 |
 |
handler status:
|
read_key: 0
|
read_next: 0
|
read_rnd 0
|
read_first: 0
|
write: 0
|
delete 0
|
update: 0
|
 |
Table status:
|
Opened tables: 2
|
Open tables: 2
|
Open files: 8
|
Open streams: 0
|
 |
Alarm status:
|
Active alarms: 0
|
Max used alarms: 0
|
Next alarm time: 0
|
 |
Memory status:
|
Non-mmapped space allocated from system: 0
|
Number of free chunks: 1
|
Number of fastbin blocks: 0
|
Number of mmapped regions: 0
|
Space in mmapped regions: 0
|
Maximum total allocated space: 0
|
Space available in freed fastbin blocks: 0
|
Total allocated space: 0
|
Total free space: 0
|
Top-most, releasable space: 0
|
Estimated memory (with thread stack): 591872
|
 |
 |
 |
Events status:
|
LLA = Last Locked At LUA = Last Unlocked At
|
WOC = Waiting On Condition DL = Data Locked
|
The Event Scheduler is disabled
|
 |
2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue.
|
2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977
|
2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4
|
171103 17:07:23 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
 |
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
 |
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
 |
Server version: 10.1.28-MariaDB
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=0
|
max_threads=1002
|
thread_count=10
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
 |
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x48400
|
mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e]
|
2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded
|
2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table
|
2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections.
|
Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server
|
2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete.
|
2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731)
|
mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5]
|
/lib64/libpthread.so.0(+0xf370)[0x7fcc58161370]
|
addr2line: 'mysqld': No such file
|
mysqld(my_hash_element+0x13)[0x7fcc5900c023]
|
mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83]
|
mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672]
|
mysqld(signal_hand+0x382)[0x7fcc5892c812]
|
/lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5]
|
/lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d]
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
I'm not sure if it could provide any hint as to the cause of the crash, but the donor's error log contained a lot of messages about long semaphore waits involving log_sys->checkpoint_lock during the time period in which it acted as donor:
InnoDB: ###### Diagnostic info printed to the standard error stream
|
WSREP: avoiding InnoDB self crash due to long semaphore wait of > 600 seconds
|
Server is processing SST donor operation, fatal_cnt now: 0InnoDB: Warning: a long semaphore wait:
|
--Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore:
|
S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock'
|
a writer (thread id 140439973771008) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time read locked in file log0log.cc line 2352
|
Last time write locked in file log0log.cc line 2109
|
Holder thread 0 file not yet reserved line 0
|
InnoDB: Warning: semaphore wait:
|
--Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore:
|
S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock'
|
a writer (thread id 140439973771008) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time read locked in file log0log.cc line 2352
|
Last time write locked in file log0log.cc line 2109
|
Holder thread 0 file not yet reserved line 0
|
InnoDB: Warning: Writer thread is waiting this semaphore:
|
--Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore:
|
S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock'
|
a writer (thread id 140439973771008) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time read locked in file log0log.cc line 2352
|
Last time write locked in file log0log.cc line 2109
|
Holder thread 0 file not yet reserved line 0
|
In case it is relevant, the donor node was a replication master, but not a slave.
After the first SST led to a crash, another SST was attempted, and that one was successful without a crash.
Attachments
Issue Links
- relates to
-
MDEV-15891 SIGHUP during rsync SST causes SST to fail
- Closed