Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Won't Fix
-
10.1.28
Description
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required:
|
Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977
|
Local state: 00000000-0000-0000-0000-000000000000:-1
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3
|
2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer.
|
2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' '
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3
|
2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed.
|
2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted)
|
at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.
|
2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor.
|
2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977)
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0
|
2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977)
|
2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S
|
2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete.
|
WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145)
|
db-g3-bin.000042
|
ls: cannot access db-node1-binlog.*: No such file or directory
|
WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156)
|
WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662)
|
2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda.
|
2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453
|
2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group.
|
2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
|
2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log
|
2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log
|
InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041
|
2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active.
|
2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start
|
2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235
|
2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled.
|
2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started
|
2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'.
|
|
Status information:
|
|
Current dir: /app/mysql/galera/
|
Running threads: 2 Stack size: 295936
|
Current locks:
|
lock: 0x7fc98da48788:
|
|
lock: 0x7fc9cdc6a508:
|
|
lock: 0x7fcc598b5520:
|
|
lock: 0x7fcc598b5440:
|
|
lock: 0x7fcc598b4480:
|
|
lock: 0x7fcc598b43a0:
|
|
lock: 0x7fcc598b42c0:
|
|
lock: 0x7fcc598b4640:
|
|
lock: 0x7fcc598b2880:
|
|
lock: 0x7fcc598b5360:
|
|
lock: 0x7fcc598b2f80:
|
|
lock: 0x7fcc598b3220:
|
|
lock: 0x7fcc598b2ea0:
|
|
lock: 0x7fcc598b3140:
|
|
lock: 0x7fcc598b2dc0:
|
|
lock: 0x7fcc598b3060:
|
|
lock: 0x7fcc598b35a0:
|
|
lock: 0x7fcc598b3680:
|
|
lock: 0x7fcc598b3760:
|
|
lock: 0x7fcc598b2ce0:
|
|
lock: 0x7fcc598b2a40:
|
|
lock: 0x7fcc598b2c00:
|
|
lock: 0x7fcc598b2960:
|
|
lock: 0x7fcc598b2b20:
|
|
lock: 0x7fcc598b3300:
|
|
lock: 0x7fcc598b33e0:
|
|
lock: 0x7fcc598b34c0:
|
|
lock: 0x7fcc598b4fe0:
|
|
lock: 0x7fcc598b5280:
|
|
lock: 0x7fcc598b51a0:
|
|
lock: 0x7fcc598b50c0:
|
|
lock: 0x7fcc598b4c60:
|
|
lock: 0x7fcc598b4b80:
|
|
lock: 0x7fcc598b4aa0:
|
|
lock: 0x7fcc598b49c0:
|
|
lock: 0x7fcc598b48e0:
|
|
lock: 0x7fcc598b4e20:
|
|
lock: 0x7fcc598b4800:
|
|
lock: 0x7fcc598b4720:
|
|
lock: 0x7fcc598b4f00:
|
|
lock: 0x7fcc598b4560:
|
|
lock: 0x7fcc598b4100:
|
|
lock: 0x7fcc598b41e0:
|
|
lock: 0x7fcc598b4020:
|
|
lock: 0x7fcc598b3f40:
|
|
lock: 0x7fcc598b3bc0:
|
|
lock: 0x7fcc598b3e60:
|
|
lock: 0x7fcc598b3d80:
|
|
lock: 0x7fcc598b3ae0:
|
|
lock: 0x7fcc598b3ca0:
|
|
lock: 0x7fcc598b3840:
|
|
lock: 0x7fcc598b3920:
|
|
lock: 0x7fcc598b3a00:
|
|
lock: 0x7fcc598b4d40:
|
|
|
Key caches:
|
default
|
Buffer_size: 134217728
|
Block_size: 1024
|
Division_limit: 100
|
Age_threshold: 300
|
Partitions: 0
|
blocks used: 0
|
not flushed: 0
|
w_requests: 0
|
writes: 0
|
r_requests: 0
|
reads: 0
|
|
|
handler status:
|
read_key: 0
|
read_next: 0
|
read_rnd 0
|
read_first: 0
|
write: 0
|
delete 0
|
update: 0
|
|
Table status:
|
Opened tables: 2
|
Open tables: 2
|
Open files: 8
|
Open streams: 0
|
|
Alarm status:
|
Active alarms: 0
|
Max used alarms: 0
|
Next alarm time: 0
|
|
Memory status:
|
Non-mmapped space allocated from system: 0
|
Number of free chunks: 1
|
Number of fastbin blocks: 0
|
Number of mmapped regions: 0
|
Space in mmapped regions: 0
|
Maximum total allocated space: 0
|
Space available in freed fastbin blocks: 0
|
Total allocated space: 0
|
Total free space: 0
|
Top-most, releasable space: 0
|
Estimated memory (with thread stack): 591872
|
|
|
|
Events status:
|
LLA = Last Locked At LUA = Last Unlocked At
|
WOC = Waiting On Condition DL = Data Locked
|
The Event Scheduler is disabled
|
|
2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue.
|
2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977
|
2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4
|
171103 17:07:23 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.1.28-MariaDB
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=0
|
max_threads=1002
|
thread_count=10
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x48400
|
mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e]
|
2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded
|
2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table
|
2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections.
|
Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server
|
2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete.
|
2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731)
|
mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5]
|
/lib64/libpthread.so.0(+0xf370)[0x7fcc58161370]
|
addr2line: 'mysqld': No such file
|
mysqld(my_hash_element+0x13)[0x7fcc5900c023]
|
mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83]
|
mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672]
|
mysqld(signal_hand+0x382)[0x7fcc5892c812]
|
/lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5]
|
/lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d]
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
I'm not sure if it could provide any hint as to the cause of the crash, but the donor's error log contained a lot of messages about long semaphore waits involving log_sys->checkpoint_lock during the time period in which it acted as donor:
InnoDB: ###### Diagnostic info printed to the standard error stream
|
WSREP: avoiding InnoDB self crash due to long semaphore wait of > 600 seconds
|
Server is processing SST donor operation, fatal_cnt now: 0InnoDB: Warning: a long semaphore wait:
|
--Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore:
|
S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock'
|
a writer (thread id 140439973771008) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time read locked in file log0log.cc line 2352
|
Last time write locked in file log0log.cc line 2109
|
Holder thread 0 file not yet reserved line 0
|
InnoDB: Warning: semaphore wait:
|
--Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore:
|
S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock'
|
a writer (thread id 140439973771008) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time read locked in file log0log.cc line 2352
|
Last time write locked in file log0log.cc line 2109
|
Holder thread 0 file not yet reserved line 0
|
InnoDB: Warning: Writer thread is waiting this semaphore:
|
--Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore:
|
S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock'
|
a writer (thread id 140439973771008) has reserved it in mode exclusive
|
number of readers 0, waiters flag 1, lock_word: 0
|
Last time read locked in file log0log.cc line 2352
|
Last time write locked in file log0log.cc line 2109
|
Holder thread 0 file not yet reserved line 0
|
In case it is relevant, the donor node was a replication master, but not a slave.
After the first SST led to a crash, another SST was attempted, and that one was successful without a crash.
Attachments
Issue Links
- relates to
-
MDEV-15891 SIGHUP during rsync SST causes SST to fail
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Assignee | Andrii Nikitin [ anikitin ] |
Description |
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
{noformat} 2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete. WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145)$ db-g3-bin.000042$ ls: cannot access db-devg1-binlog.*: No such file or directory$ WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156)$ WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662)$ 2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used or memory barrier$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda.$ 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN= 244892544453$ 2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) sy ced with group.$ 2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages f om the doublewrite buffer...$ 2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log$ 2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages f om redo log$ InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.0000 1$ 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active.$ 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start$ 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6 36-82.2 started; log sequence number 1244894457235$ 2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled.$ 2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started$ 2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'.$ $ Status information:$ $ Current dir: /app/mysql/galera/$ Running threads: 2 Stack size: 295936$ Current locks:$ lock: 0x7fc98da48788:$ $ lock: 0x7fc9cdc6a508:$ $ lock: 0x7fcc598b5520:$ $ lock: 0x7fcc598b5440:$ $ lock: 0x7fcc598b4480:$ $ lock: 0x7fcc598b43a0:$ $ lock: 0x7fcc598b42c0:$ $ lock: 0x7fcc598b4640:$ $ lock: 0x7fcc598b2880:$ $ lock: 0x7fcc598b5360:$ $ lock: 0x7fcc598b2f80:$ $ lock: 0x7fcc598b3220:$ $ lock: 0x7fcc598b2ea0:$ $ lock: 0x7fcc598b3140:$ $ lock: 0x7fcc598b2dc0:$ $ lock: 0x7fcc598b3060:$ $ lock: 0x7fcc598b35a0:$ $ lock: 0x7fcc598b3680:$ $ lock: 0x7fcc598b3760:$ $ lock: 0x7fcc598b2ce0:$ $ lock: 0x7fcc598b2a40:$ $ lock: 0x7fcc598b2c00:$ $ lock: 0x7fcc598b2960:$ $ lock: 0x7fcc598b2b20:$ $ lock: 0x7fcc598b3300:$ $ lock: 0x7fcc598b33e0:$ $ lock: 0x7fcc598b34c0:$ $ lock: 0x7fcc598b4fe0:$ $ lock: 0x7fcc598b5280:$ $ lock: 0x7fcc598b51a0:$ $ lock: 0x7fcc598b50c0:$ $ lock: 0x7fcc598b4c60:$ $ lock: 0x7fcc598b4b80:$ $ lock: 0x7fcc598b4aa0:$ $ lock: 0x7fcc598b49c0:$ $ lock: 0x7fcc598b48e0:$ $ lock: 0x7fcc598b4e20:$ $ lock: 0x7fcc598b4800:$ $ lock: 0x7fcc598b4720:$ $ lock: 0x7fcc598b4f00:$ $ lock: 0x7fcc598b4560:$ $ lock: 0x7fcc598b4100:$ $ lock: 0x7fcc598b41e0:$ $ lock: 0x7fcc598b4020:$ $ lock: 0x7fcc598b3f40:$ $ lock: 0x7fcc598b3bc0:$ $ lock: 0x7fcc598b3e60:$ $ lock: 0x7fcc598b3d80:$ $ lock: 0x7fcc598b3ae0:$ $ lock: 0x7fcc598b3ca0:$ $ lock: 0x7fcc598b3840:$ $ lock: 0x7fcc598b3920:$ $ lock: 0x7fcc598b3a00:$ $ lock: 0x7fcc598b4d40:$ $ $ Key caches:$ default$ Buffer_size: 134217728$ Block_size: 1024$ Division_limit: 100$ Age_threshold: 300$ Partitions: 0$ blocks used: 0$ not flushed: 0$ w_requests: 0$ writes: 0$ r_requests: 0$ reads: 0$ $ $ handler status:$ read_key: 0$ read_next: 0$ read_rnd 0$ read_first: 0$ write: 0$ delete 0$ update: 0$ $ Table status:$ Opened tables: 2$ Open tables: 2$ Open files: 8$ Open streams: 0$ $ Alarm status:$ Active alarms: 0$ Max used alarms: 0$ Next alarm time: 0$ $ Memory status:$ Non-mmapped space allocated from system: 0$ Number of free chunks: 1$ Number of fastbin blocks: 0$ Number of mmapped regions: 0$ Space in mmapped regions: 0$ Maximum total allocated space: 0$ Space available in freed fastbin blocks: 0$ Total allocated space: 0$ Total free space: 0$ Top-most, releasable space: 0$ Estimated memory (with thread stack): 591872$ $ $ $ Events status:$ LLA = Last Locked At LUA = Last Unlocked At$ WOC = Waiting On Condition DL = Data Locked$ The Event Scheduler is disabled$ $ 2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue.$ 2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748 6170:230930977$ 2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4$ 171103 17:07:23 [ERROR] mysqld got signal 11 ;$ This could be because you hit a bug. It is also possible that this binary$ or one of the libraries it was linked against is corrupt, improperly built,$ or misconfigured. This error can also be caused by malfunctioning hardware.$ $ To report this bug, see https://mariadb.com/kb/en/reporting-bugs$ $ We will try our best to scrape up some info that will hopefully help$ diagnose the problem, but since we have already crashed, $ something is definitely wrong and this may fail.$ $ Server version: 10.1.28-MariaDB$ key_buffer_size=134217728$ read_buffer_size=131072$ max_used_connections=0$ max_threads=1002$ thread_count=10$ It is possible that mysqld could use up to $ key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memor $ Hope that's ok; if not, decrease some variables in the equation.$ $ Thread pointer: 0x0$ Attempting backtrace. You can use the following information to find out$ where mysqld died. If you see no messages after this, something went$ terribly wrong...$ stack_bottom = 0x0 thread_stack 0x48400$ mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e]$ 2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded$ 2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table$ 2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections.$ Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server$ 2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete.$ 2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731)$ mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5]$ /lib64/libpthread.so.0(+0xf370)[0x7fcc58161370]$ addr2line: 'mysqld': No such file$ mysqld(my_hash_element+0x13)[0x7fcc5900c023]$ mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83]$ mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672]$ mysqld(signal_hand+0x382)[0x7fcc5892c812]$ /lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5]$ /lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d]$ The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains$ information that should help you find out what is causing the crash.$ {noformat} In case it is relevant, the donor node was a replication master, but not a slave. After the first SST led to a crash, another SST was attempted, and that one was successful without a crash. |
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
{noformat} 2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required: Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' ' 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst 2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3 2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed. 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0 2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977) 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete. WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145) db-g3-bin.000042 ls: cannot access db-node1-binlog.*: No such file or directory WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156) WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662) 2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda. 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453 2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group. 2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... 2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log 2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active. 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235 2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled. 2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started 2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'. Status information: Current dir: /app/mysql/galera/ Running threads: 2 Stack size: 295936 Current locks: lock: 0x7fc98da48788: lock: 0x7fc9cdc6a508: lock: 0x7fcc598b5520: lock: 0x7fcc598b5440: lock: 0x7fcc598b4480: lock: 0x7fcc598b43a0: lock: 0x7fcc598b42c0: lock: 0x7fcc598b4640: lock: 0x7fcc598b2880: lock: 0x7fcc598b5360: lock: 0x7fcc598b2f80: lock: 0x7fcc598b3220: lock: 0x7fcc598b2ea0: lock: 0x7fcc598b3140: lock: 0x7fcc598b2dc0: lock: 0x7fcc598b3060: lock: 0x7fcc598b35a0: lock: 0x7fcc598b3680: lock: 0x7fcc598b3760: lock: 0x7fcc598b2ce0: lock: 0x7fcc598b2a40: lock: 0x7fcc598b2c00: lock: 0x7fcc598b2960: lock: 0x7fcc598b2b20: lock: 0x7fcc598b3300: lock: 0x7fcc598b33e0: lock: 0x7fcc598b34c0: lock: 0x7fcc598b4fe0: lock: 0x7fcc598b5280: lock: 0x7fcc598b51a0: lock: 0x7fcc598b50c0: lock: 0x7fcc598b4c60: lock: 0x7fcc598b4b80: lock: 0x7fcc598b4aa0: lock: 0x7fcc598b49c0: lock: 0x7fcc598b48e0: lock: 0x7fcc598b4e20: lock: 0x7fcc598b4800: lock: 0x7fcc598b4720: lock: 0x7fcc598b4f00: lock: 0x7fcc598b4560: lock: 0x7fcc598b4100: lock: 0x7fcc598b41e0: lock: 0x7fcc598b4020: lock: 0x7fcc598b3f40: lock: 0x7fcc598b3bc0: lock: 0x7fcc598b3e60: lock: 0x7fcc598b3d80: lock: 0x7fcc598b3ae0: lock: 0x7fcc598b3ca0: lock: 0x7fcc598b3840: lock: 0x7fcc598b3920: lock: 0x7fcc598b3a00: lock: 0x7fcc598b4d40: Key caches: default Buffer_size: 134217728 Block_size: 1024 Division_limit: 100 Age_threshold: 300 Partitions: 0 blocks used: 0 not flushed: 0 w_requests: 0 writes: 0 r_requests: 0 reads: 0 handler status: read_key: 0 read_next: 0 read_rnd 0 read_first: 0 write: 0 delete 0 update: 0 Table status: Opened tables: 2 Open tables: 2 Open files: 8 Open streams: 0 Alarm status: Active alarms: 0 Max used alarms: 0 Next alarm time: 0 Memory status: Non-mmapped space allocated from system: 0 Number of free chunks: 1 Number of fastbin blocks: 0 Number of mmapped regions: 0 Space in mmapped regions: 0 Maximum total allocated space: 0 Space available in freed fastbin blocks: 0 Total allocated space: 0 Total free space: 0 Top-most, releasable space: 0 Estimated memory (with thread stack): 591872 Events status: LLA = Last Locked At LUA = Last Unlocked At WOC = Waiting On Condition DL = Data Locked The Event Scheduler is disabled 2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue. 2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4 171103 17:07:23 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=1002 thread_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x48400 mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e] 2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded 2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table 2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections. Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server 2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete. 2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731) mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5] /lib64/libpthread.so.0(+0xf370)[0x7fcc58161370] addr2line: 'mysqld': No such file mysqld(my_hash_element+0x13)[0x7fcc5900c023] mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83] mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672] mysqld(signal_hand+0x382)[0x7fcc5892c812] /lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5] /lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} In case it is relevant, the donor node was a replication master, but not a slave. After the first SST led to a crash, another SST was attempted, and that one was successful without a crash. |
Description |
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
{noformat} 2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required: Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' ' 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst 2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3 2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed. 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0 2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977) 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete. WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145) db-g3-bin.000042 ls: cannot access db-node1-binlog.*: No such file or directory WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156) WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662) 2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda. 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453 2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group. 2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... 2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log 2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active. 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235 2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled. 2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started 2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'. Status information: Current dir: /app/mysql/galera/ Running threads: 2 Stack size: 295936 Current locks: lock: 0x7fc98da48788: lock: 0x7fc9cdc6a508: lock: 0x7fcc598b5520: lock: 0x7fcc598b5440: lock: 0x7fcc598b4480: lock: 0x7fcc598b43a0: lock: 0x7fcc598b42c0: lock: 0x7fcc598b4640: lock: 0x7fcc598b2880: lock: 0x7fcc598b5360: lock: 0x7fcc598b2f80: lock: 0x7fcc598b3220: lock: 0x7fcc598b2ea0: lock: 0x7fcc598b3140: lock: 0x7fcc598b2dc0: lock: 0x7fcc598b3060: lock: 0x7fcc598b35a0: lock: 0x7fcc598b3680: lock: 0x7fcc598b3760: lock: 0x7fcc598b2ce0: lock: 0x7fcc598b2a40: lock: 0x7fcc598b2c00: lock: 0x7fcc598b2960: lock: 0x7fcc598b2b20: lock: 0x7fcc598b3300: lock: 0x7fcc598b33e0: lock: 0x7fcc598b34c0: lock: 0x7fcc598b4fe0: lock: 0x7fcc598b5280: lock: 0x7fcc598b51a0: lock: 0x7fcc598b50c0: lock: 0x7fcc598b4c60: lock: 0x7fcc598b4b80: lock: 0x7fcc598b4aa0: lock: 0x7fcc598b49c0: lock: 0x7fcc598b48e0: lock: 0x7fcc598b4e20: lock: 0x7fcc598b4800: lock: 0x7fcc598b4720: lock: 0x7fcc598b4f00: lock: 0x7fcc598b4560: lock: 0x7fcc598b4100: lock: 0x7fcc598b41e0: lock: 0x7fcc598b4020: lock: 0x7fcc598b3f40: lock: 0x7fcc598b3bc0: lock: 0x7fcc598b3e60: lock: 0x7fcc598b3d80: lock: 0x7fcc598b3ae0: lock: 0x7fcc598b3ca0: lock: 0x7fcc598b3840: lock: 0x7fcc598b3920: lock: 0x7fcc598b3a00: lock: 0x7fcc598b4d40: Key caches: default Buffer_size: 134217728 Block_size: 1024 Division_limit: 100 Age_threshold: 300 Partitions: 0 blocks used: 0 not flushed: 0 w_requests: 0 writes: 0 r_requests: 0 reads: 0 handler status: read_key: 0 read_next: 0 read_rnd 0 read_first: 0 write: 0 delete 0 update: 0 Table status: Opened tables: 2 Open tables: 2 Open files: 8 Open streams: 0 Alarm status: Active alarms: 0 Max used alarms: 0 Next alarm time: 0 Memory status: Non-mmapped space allocated from system: 0 Number of free chunks: 1 Number of fastbin blocks: 0 Number of mmapped regions: 0 Space in mmapped regions: 0 Maximum total allocated space: 0 Space available in freed fastbin blocks: 0 Total allocated space: 0 Total free space: 0 Top-most, releasable space: 0 Estimated memory (with thread stack): 591872 Events status: LLA = Last Locked At LUA = Last Unlocked At WOC = Waiting On Condition DL = Data Locked The Event Scheduler is disabled 2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue. 2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4 171103 17:07:23 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=1002 thread_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x48400 mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e] 2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded 2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table 2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections. Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server 2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete. 2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731) mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5] /lib64/libpthread.so.0(+0xf370)[0x7fcc58161370] addr2line: 'mysqld': No such file mysqld(my_hash_element+0x13)[0x7fcc5900c023] mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83] mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672] mysqld(signal_hand+0x382)[0x7fcc5892c812] /lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5] /lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} In case it is relevant, the donor node was a replication master, but not a slave. After the first SST led to a crash, another SST was attempted, and that one was successful without a crash. |
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
{noformat} 2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required: Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' ' 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst 2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3 2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed. 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0 2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977) 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete. WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145) db-g3-bin.000042 ls: cannot access db-node1-binlog.*: No such file or directory WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156) WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662) 2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda. 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453 2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group. 2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... 2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log 2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active. 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235 2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled. 2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started 2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'. Status information: Current dir: /app/mysql/galera/ Running threads: 2 Stack size: 295936 Current locks: lock: 0x7fc98da48788: lock: 0x7fc9cdc6a508: lock: 0x7fcc598b5520: lock: 0x7fcc598b5440: lock: 0x7fcc598b4480: lock: 0x7fcc598b43a0: lock: 0x7fcc598b42c0: lock: 0x7fcc598b4640: lock: 0x7fcc598b2880: lock: 0x7fcc598b5360: lock: 0x7fcc598b2f80: lock: 0x7fcc598b3220: lock: 0x7fcc598b2ea0: lock: 0x7fcc598b3140: lock: 0x7fcc598b2dc0: lock: 0x7fcc598b3060: lock: 0x7fcc598b35a0: lock: 0x7fcc598b3680: lock: 0x7fcc598b3760: lock: 0x7fcc598b2ce0: lock: 0x7fcc598b2a40: lock: 0x7fcc598b2c00: lock: 0x7fcc598b2960: lock: 0x7fcc598b2b20: lock: 0x7fcc598b3300: lock: 0x7fcc598b33e0: lock: 0x7fcc598b34c0: lock: 0x7fcc598b4fe0: lock: 0x7fcc598b5280: lock: 0x7fcc598b51a0: lock: 0x7fcc598b50c0: lock: 0x7fcc598b4c60: lock: 0x7fcc598b4b80: lock: 0x7fcc598b4aa0: lock: 0x7fcc598b49c0: lock: 0x7fcc598b48e0: lock: 0x7fcc598b4e20: lock: 0x7fcc598b4800: lock: 0x7fcc598b4720: lock: 0x7fcc598b4f00: lock: 0x7fcc598b4560: lock: 0x7fcc598b4100: lock: 0x7fcc598b41e0: lock: 0x7fcc598b4020: lock: 0x7fcc598b3f40: lock: 0x7fcc598b3bc0: lock: 0x7fcc598b3e60: lock: 0x7fcc598b3d80: lock: 0x7fcc598b3ae0: lock: 0x7fcc598b3ca0: lock: 0x7fcc598b3840: lock: 0x7fcc598b3920: lock: 0x7fcc598b3a00: lock: 0x7fcc598b4d40: Key caches: default Buffer_size: 134217728 Block_size: 1024 Division_limit: 100 Age_threshold: 300 Partitions: 0 blocks used: 0 not flushed: 0 w_requests: 0 writes: 0 r_requests: 0 reads: 0 handler status: read_key: 0 read_next: 0 read_rnd 0 read_first: 0 write: 0 delete 0 update: 0 Table status: Opened tables: 2 Open tables: 2 Open files: 8 Open streams: 0 Alarm status: Active alarms: 0 Max used alarms: 0 Next alarm time: 0 Memory status: Non-mmapped space allocated from system: 0 Number of free chunks: 1 Number of fastbin blocks: 0 Number of mmapped regions: 0 Space in mmapped regions: 0 Maximum total allocated space: 0 Space available in freed fastbin blocks: 0 Total allocated space: 0 Total free space: 0 Top-most, releasable space: 0 Estimated memory (with thread stack): 591872 Events status: LLA = Last Locked At LUA = Last Unlocked At WOC = Waiting On Condition DL = Data Locked The Event Scheduler is disabled 2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue. 2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4 171103 17:07:23 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=1002 thread_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x48400 mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e] 2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded 2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table 2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections. Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server 2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete. 2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731) mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5] /lib64/libpthread.so.0(+0xf370)[0x7fcc58161370] addr2line: 'mysqld': No such file mysqld(my_hash_element+0x13)[0x7fcc5900c023] mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83] mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672] mysqld(signal_hand+0x382)[0x7fcc5892c812] /lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5] /lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} I'm not sure if it could provide any hint as to the cause of the crash, but the donor's error log contained a lot of messages about long semaphore waits during the time period in which it acted as donor: {noformat} InnoDB: ###### Diagnostic info printed to the standard error stream WSREP: avoiding InnoDB self crash due to long semaphore wait of > 600 seconds Server is processing SST donor operation, fatal_cnt now: 0InnoDB: Warning: a long semaphore wait: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 InnoDB: Warning: semaphore wait: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 InnoDB: Warning: Writer thread is waiting this semaphore: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 {noformat} In case it is relevant, the donor node was a replication master, but not a slave. After the first SST led to a crash, another SST was attempted, and that one was successful without a crash. |
Description |
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
{noformat} 2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required: Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' ' 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst 2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3 2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed. 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0 2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977) 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete. WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145) db-g3-bin.000042 ls: cannot access db-node1-binlog.*: No such file or directory WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156) WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662) 2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda. 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453 2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group. 2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... 2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log 2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active. 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235 2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled. 2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started 2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'. Status information: Current dir: /app/mysql/galera/ Running threads: 2 Stack size: 295936 Current locks: lock: 0x7fc98da48788: lock: 0x7fc9cdc6a508: lock: 0x7fcc598b5520: lock: 0x7fcc598b5440: lock: 0x7fcc598b4480: lock: 0x7fcc598b43a0: lock: 0x7fcc598b42c0: lock: 0x7fcc598b4640: lock: 0x7fcc598b2880: lock: 0x7fcc598b5360: lock: 0x7fcc598b2f80: lock: 0x7fcc598b3220: lock: 0x7fcc598b2ea0: lock: 0x7fcc598b3140: lock: 0x7fcc598b2dc0: lock: 0x7fcc598b3060: lock: 0x7fcc598b35a0: lock: 0x7fcc598b3680: lock: 0x7fcc598b3760: lock: 0x7fcc598b2ce0: lock: 0x7fcc598b2a40: lock: 0x7fcc598b2c00: lock: 0x7fcc598b2960: lock: 0x7fcc598b2b20: lock: 0x7fcc598b3300: lock: 0x7fcc598b33e0: lock: 0x7fcc598b34c0: lock: 0x7fcc598b4fe0: lock: 0x7fcc598b5280: lock: 0x7fcc598b51a0: lock: 0x7fcc598b50c0: lock: 0x7fcc598b4c60: lock: 0x7fcc598b4b80: lock: 0x7fcc598b4aa0: lock: 0x7fcc598b49c0: lock: 0x7fcc598b48e0: lock: 0x7fcc598b4e20: lock: 0x7fcc598b4800: lock: 0x7fcc598b4720: lock: 0x7fcc598b4f00: lock: 0x7fcc598b4560: lock: 0x7fcc598b4100: lock: 0x7fcc598b41e0: lock: 0x7fcc598b4020: lock: 0x7fcc598b3f40: lock: 0x7fcc598b3bc0: lock: 0x7fcc598b3e60: lock: 0x7fcc598b3d80: lock: 0x7fcc598b3ae0: lock: 0x7fcc598b3ca0: lock: 0x7fcc598b3840: lock: 0x7fcc598b3920: lock: 0x7fcc598b3a00: lock: 0x7fcc598b4d40: Key caches: default Buffer_size: 134217728 Block_size: 1024 Division_limit: 100 Age_threshold: 300 Partitions: 0 blocks used: 0 not flushed: 0 w_requests: 0 writes: 0 r_requests: 0 reads: 0 handler status: read_key: 0 read_next: 0 read_rnd 0 read_first: 0 write: 0 delete 0 update: 0 Table status: Opened tables: 2 Open tables: 2 Open files: 8 Open streams: 0 Alarm status: Active alarms: 0 Max used alarms: 0 Next alarm time: 0 Memory status: Non-mmapped space allocated from system: 0 Number of free chunks: 1 Number of fastbin blocks: 0 Number of mmapped regions: 0 Space in mmapped regions: 0 Maximum total allocated space: 0 Space available in freed fastbin blocks: 0 Total allocated space: 0 Total free space: 0 Top-most, releasable space: 0 Estimated memory (with thread stack): 591872 Events status: LLA = Last Locked At LUA = Last Unlocked At WOC = Waiting On Condition DL = Data Locked The Event Scheduler is disabled 2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue. 2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4 171103 17:07:23 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=1002 thread_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x48400 mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e] 2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded 2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table 2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections. Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server 2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete. 2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731) mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5] /lib64/libpthread.so.0(+0xf370)[0x7fcc58161370] addr2line: 'mysqld': No such file mysqld(my_hash_element+0x13)[0x7fcc5900c023] mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83] mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672] mysqld(signal_hand+0x382)[0x7fcc5892c812] /lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5] /lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} I'm not sure if it could provide any hint as to the cause of the crash, but the donor's error log contained a lot of messages about long semaphore waits during the time period in which it acted as donor: {noformat} InnoDB: ###### Diagnostic info printed to the standard error stream WSREP: avoiding InnoDB self crash due to long semaphore wait of > 600 seconds Server is processing SST donor operation, fatal_cnt now: 0InnoDB: Warning: a long semaphore wait: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 InnoDB: Warning: semaphore wait: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 InnoDB: Warning: Writer thread is waiting this semaphore: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 {noformat} In case it is relevant, the donor node was a replication master, but not a slave. After the first SST led to a crash, another SST was attempted, and that one was successful without a crash. |
A joiner node performed an rsync SST from another node in the cluster, and when the SST finished, the node saw the following crash:
{noformat} 2017-11-03 16:44:46 140515631684352 [Note] WSREP: State transfer required: Group state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 Local state: 00000000-0000-0000-0000-000000000000:-1 2017-11-03 16:44:46 140515631684352 [Note] WSREP: New cluster view: global state: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977, view# 102: Primary, number of nodes: 4, my index: 3, protocol version 3 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Gap in state sequence. Need state transfer. 2017-11-03 16:44:46 140504717063936 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.119.17.237' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '8214' --binlog '/app/mysql/galera/db-node1-binlog' ' 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Prepared SST request: rsync|10.119.17.237:4444/rsync_sst 2017-11-03 16:44:46 140515631684352 [Note] WSREP: REPL Protocols: 7 (3, 2) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Assign initial position for certification: 230930977, protocol version: 3 2017-11-03 16:44:46 140504792139520 [Note] WSREP: Service thread queue flushed. 2017-11-03 16:44:46 140515631684352 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (b1c4fe17-b433-11e7-bd44-52c2748b6170): 1 (Operation not permitted) at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Member 3.0 (10-119-17-237.domain.com) requested state transfer from '*any*'. Selected 0.0 (10-119-17-69.domain.com)(SYNCED) as donor. 2017-11-03 16:44:46 140504733837056 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 230930977) 2017-11-03 16:44:46 140515631684352 [Note] WSREP: Requesting state transfer: success, donor: 0 2017-11-03 16:44:46 140515631684352 [Note] WSREP: GCache history reset: old(00000000-0000-0000-0000-000000000000:0) -> new(b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977) 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') connection to peer d3790118 with addr tcp://10.119.17.237:4567 timed out, no messages seen in PT3S 2017-11-03 16:44:49 140504742229760 [Note] WSREP: (d3790118, 'tcp://0.0.0.0:4567') turning message relay requesting off 2017-11-03 17:06:34 140504733837056 [Note] WSREP: 0.0 (10-119-17-69.domain.com): State transfer to 3.0 (10-119-17-237.domain.com) complete. WSREP_SST: [INFO] Extracting binlog files: (20171103 17:06:34.145) db-g3-bin.000042 ls: cannot access db-node1-binlog.*: No such file or directory WSREP_SST: [INFO] Joiner cleanup. rsync PID: 8293 (20171103 17:06:34.156) WSREP_SST: [INFO] Joiner cleanup done. (20171103 17:06:34.662) 2017-11-03 17:06:34 140515632003328 [Note] WSREP: SST complete, seqno: 230930977 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: The InnoDB memory heap is disabled 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Compressed tables use zlib 1.2.7 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using Linux native AIO 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Using SSE crc32 instructions 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Initializing buffer pool, size = 512.0M 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Completed initialization of buffer pool 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Highest supported file format is Barracuda. 2017-11-03 17:06:34 140515632003328 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1244892544453 2017-11-03 17:06:36 140504733837056 [Note] WSREP: Member 0.0 (10-119-17-69.domain.com) synced with group. 2017-11-03 17:06:42 140515632003328 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer... 2017-11-03 17:06:49 140515632003328 [Note] InnoDB: To recover: 2702 pages from log 2017-11-03 17:06:53 140515632003328 [Note] InnoDB: Starting final batch to recover 2585 pages from redo log InnoDB: Last MySQL binlog file position 0 786976686, file name /app/mysql/galera/db-g3-bin.000041 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: 128 rollback segment(s) are active. 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Waiting for purge to start 2017-11-03 17:07:15 140515632003328 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.36-82.2 started; log sequence number 1244894457235 2017-11-03 17:07:23 140515632003328 [Note] Plugin 'FEEDBACK' is disabled. 2017-11-03 17:07:23 140503555213056 [Note] InnoDB: Dumping buffer pool(s) not yet started 2017-11-03 17:07:23 140515632003328 [Note] Server socket created on IP: '10.119.17.237'. Status information: Current dir: /app/mysql/galera/ Running threads: 2 Stack size: 295936 Current locks: lock: 0x7fc98da48788: lock: 0x7fc9cdc6a508: lock: 0x7fcc598b5520: lock: 0x7fcc598b5440: lock: 0x7fcc598b4480: lock: 0x7fcc598b43a0: lock: 0x7fcc598b42c0: lock: 0x7fcc598b4640: lock: 0x7fcc598b2880: lock: 0x7fcc598b5360: lock: 0x7fcc598b2f80: lock: 0x7fcc598b3220: lock: 0x7fcc598b2ea0: lock: 0x7fcc598b3140: lock: 0x7fcc598b2dc0: lock: 0x7fcc598b3060: lock: 0x7fcc598b35a0: lock: 0x7fcc598b3680: lock: 0x7fcc598b3760: lock: 0x7fcc598b2ce0: lock: 0x7fcc598b2a40: lock: 0x7fcc598b2c00: lock: 0x7fcc598b2960: lock: 0x7fcc598b2b20: lock: 0x7fcc598b3300: lock: 0x7fcc598b33e0: lock: 0x7fcc598b34c0: lock: 0x7fcc598b4fe0: lock: 0x7fcc598b5280: lock: 0x7fcc598b51a0: lock: 0x7fcc598b50c0: lock: 0x7fcc598b4c60: lock: 0x7fcc598b4b80: lock: 0x7fcc598b4aa0: lock: 0x7fcc598b49c0: lock: 0x7fcc598b48e0: lock: 0x7fcc598b4e20: lock: 0x7fcc598b4800: lock: 0x7fcc598b4720: lock: 0x7fcc598b4f00: lock: 0x7fcc598b4560: lock: 0x7fcc598b4100: lock: 0x7fcc598b41e0: lock: 0x7fcc598b4020: lock: 0x7fcc598b3f40: lock: 0x7fcc598b3bc0: lock: 0x7fcc598b3e60: lock: 0x7fcc598b3d80: lock: 0x7fcc598b3ae0: lock: 0x7fcc598b3ca0: lock: 0x7fcc598b3840: lock: 0x7fcc598b3920: lock: 0x7fcc598b3a00: lock: 0x7fcc598b4d40: Key caches: default Buffer_size: 134217728 Block_size: 1024 Division_limit: 100 Age_threshold: 300 Partitions: 0 blocks used: 0 not flushed: 0 w_requests: 0 writes: 0 r_requests: 0 reads: 0 handler status: read_key: 0 read_next: 0 read_rnd 0 read_first: 0 write: 0 delete 0 update: 0 Table status: Opened tables: 2 Open tables: 2 Open files: 8 Open streams: 0 Alarm status: Active alarms: 0 Max used alarms: 0 Next alarm time: 0 Memory status: Non-mmapped space allocated from system: 0 Number of free chunks: 1 Number of fastbin blocks: 0 Number of mmapped regions: 0 Space in mmapped regions: 0 Maximum total allocated space: 0 Space available in freed fastbin blocks: 0 Total allocated space: 0 Total free space: 0 Top-most, releasable space: 0 Estimated memory (with thread stack): 591872 Events status: LLA = Last Locked At LUA = Last Unlocked At WOC = Waiting On Condition DL = Data Locked The Event Scheduler is disabled 2017-11-03 17:07:23 140515632003328 [Note] WSREP: Signalling provider to continue. 2017-11-03 17:07:23 140515632003328 [Note] WSREP: SST received: b1c4fe17-b433-11e7-bd44-52c2748b6170:230930977 2017-11-03 17:07:23 140515630676736 [Note] Event Scheduler: scheduler thread started with id 4 171103 17:07:23 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=0 max_threads=1002 thread_count=10 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2332069 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x48400 mysqld(my_print_stacktrace+0x2e)[0x7fcc5902d92e] 2017-11-03 17:07:23 140515632003328 [Note] Reading of all Master_info entries succeded 2017-11-03 17:07:23 140515632003328 [Note] Added new Master_info '' to hash table 2017-11-03 17:07:23 140515632003328 [Note] mysqld: ready for connections. Version: '10.1.28-MariaDB' socket: '/app/mysql/temporaries/galera/mysqld.sock' port: 3306 MariaDB Server 2017-11-03 17:07:23 140504733837056 [Note] WSREP: 3.0 (10-119-17-237.domain.com): State transfer from 0.0 (10-119-17-69.domain.com) complete. 2017-11-03 17:07:23 140504733837056 [Note] WSREP: Shifting JOINER -> JOINED (TO: 230971731) mysqld(handle_fatal_signal+0x305)[0x7fcc58b50ce5] /lib64/libpthread.so.0(+0xf370)[0x7fcc58161370] addr2line: 'mysqld': No such file mysqld(my_hash_element+0x13)[0x7fcc5900c023] mysqld(_ZN17Master_info_index20flush_all_relay_logsEv+0x93)[0x7fcc58a98c83] mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x5d2)[0x7fcc58ab8672] mysqld(signal_hand+0x382)[0x7fcc5892c812] /lib64/libpthread.so.0(+0x7dc5)[0x7fcc58159dc5] /lib64/libc.so.6(clone+0x6d)[0x7fcc5657976d] The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} I'm not sure if it could provide any hint as to the cause of the crash, but the donor's error log contained a lot of messages about long semaphore waits involving log_sys->checkpoint_lock during the time period in which it acted as donor: {noformat} InnoDB: ###### Diagnostic info printed to the standard error stream WSREP: avoiding InnoDB self crash due to long semaphore wait of > 600 seconds Server is processing SST donor operation, fatal_cnt now: 0InnoDB: Warning: a long semaphore wait: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 InnoDB: Warning: semaphore wait: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 InnoDB: Warning: Writer thread is waiting this semaphore: --Thread 140439973771008 has waited at log0log.cc line 2352 for 1265.0 seconds the semaphore: S-lock on RW-latch at 0x7fbd82c5d3b8 '&log_sys->checkpoint_lock' a writer (thread id 140439973771008) has reserved it in mode exclusive number of readers 0, waiters flag 1, lock_word: 0 Last time read locked in file log0log.cc line 2352 Last time write locked in file log0log.cc line 2109 Holder thread 0 file not yet reserved line 0 {noformat} In case it is relevant, the donor node was a replication master, but not a slave. After the first SST led to a crash, another SST was attempted, and that one was successful without a crash. |
Fix Version/s | 10.1 [ 16100 ] |
Summary | Galera joiner node crashes in my_hash_element after rsync SST completes | SIGHUP after rsync SST causes crashe in my_hash_element |
Summary | SIGHUP after rsync SST causes crashe in my_hash_element | SIGHUP after rsync SST causes crash in my_hash_element |
Assignee | Andrii Nikitin [ anikitin ] |
Priority | Major [ 3 ] | Minor [ 4 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Link |
This issue relates to |
Assignee | Andrei Elkin [ elkin ] |
Workflow | MariaDB v3 [ 83622 ] | MariaDB v4 [ 143985 ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.1 [ 16100 ] | |
Resolution | Won't Fix [ 2 ] | |
Status | Confirmed [ 10101 ] | Closed [ 6 ] |
Zendesk Related Tickets | 171572 |
Geoff, I am not sure if checkoint on master is related, but the crash happened during something like kill -HUP $(pidof mysqld) (which follows something like mysqladmin debug) on problem node.
So, the most we can do is to try find out if user was using some custom commands/scripts during/after sst and try to write test case which replicates that behavior.