Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.4.5
-
None
-
ec2, ubuntu
Description
Hi,
I have 3 node MariDB Galera 4 cluster for testing.
I was running sysbench trough ProxySQL.
All 3 nodes were in the same hostgroup to loadbalance the queries on all the node. When a deadlock appears 2 nodes were crashing out of tree and the cluster goes into a non-Primary sate.
The whole point of the test was to see what happens if we are writing all the nodes, looks like the cluster dies.
2019-06-18 6:01:01 0 [Note] WSREP: Writing down CC checksum: 09e5753c 59fa600d 2e63f8a0 053de98b at offset 176 |
2019-06-18 6:01:01 0 [Note] WSREP: Flow-control interval: [23, 23] |
2019-06-18 6:01:01 0 [Note] WSREP: Trying to continue unpaused monitor |
2019-06-18 6:01:01 12 [Note] WSREP: ####### processing CC 187746, local, ordered |
2019-06-18 6:01:01 12 [Note] WSREP: ####### drain monitors upto 187745 |
2019-06-18 6:01:01 12 [Note] WSREP: REPL Protocols: 10 (5, 3) |
2019-06-18 6:01:01 12 [Note] WSREP: ####### My UUID: 269e2eb0-918e-11e9-8882-cb66bb34dda1 |
2019-06-18 6:01:01 12 [Note] WSREP: ####### ST not required |
2019-06-18 6:01:01 12 [Note] WSREP: Skipping cert index reset |
2019-06-18 6:01:01 12 [Note] WSREP: ####### Adjusting cert position: 187745 -> 187746 |
2019-06-18 6:01:01 0 [Note] WSREP: Service thread queue flushed. |
2019-06-18 6:01:01 12 [Note] WSREP: ####### Setting monitor position to 187746 |
2019-06-18 6:01:01 12 [Note] WSREP: Lowest cert indnex boundary for CC from group: 187726 |
2019-06-18 6:01:01 12 [Note] WSREP: Min available from gcache for CC from group: 187593 |
2019-06-18 6:01:01 12 [Note] WSREP: ================================================ |
View:
|
id: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:187746 |
status: primary
|
protocol_version: 4 |
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no |
own_index: 0 |
members(2): |
0: 269e2eb0-918e-11e9-8882-cb66bb34dda1, Node2 |
1: 96783e50-9154-11e9-ab2b-560064419d5a, Node3 |
=================================================
|
2019-06-18 6:01:01 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
terminate called after throwing an instance of 'wsrep::runtime_error' |
what(): client_state: Unallowed state transition: 2 -> 0 |
190618 6:01:01 [ERROR] mysqld got signal 6 ; |
This could be because you hit a bug. It is also possible that this binary |
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
 |
To report this bug, see https://mariadb.com/kb/en/reporting-bugs |
 |
We will try our best to scrape up some info that will hopefully help |
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail. |
 |
Server version: 10.4.5-MariaDB-1:10.4.5+maria~bionic-log |
key_buffer_size=134217728 |
read_buffer_size=2097152 |
max_used_connections=7 |
max_threads=102 |
thread_count=20 |
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 760240 K bytes of memory |
Hope that's ok; if not, decrease some variables in the equation. |
 |
Thread pointer: 0x7f804c01d018 |
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went |
terribly wrong...
|
stack_bottom = 0x7f8378214dd8 thread_stack 0x49000 |
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x557c294232ce] |
/usr/sbin/mysqld(handle_fatal_signal+0x515)[0x557c28e9e275] |
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f838f283890] |
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f838e593e97] |
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f838e595801] |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f838ed70957] |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ab6)[0x7f838ed76ab6] |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92af1)[0x7f838ed76af1] |
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d24)[0x7f838ed76d24] |
/usr/sbin/mysqld(_ZN5wsrep12client_state5stateERNS_11unique_lockINS_5mutexEEENS0_5stateE+0x1f6)[0x557c29490046] |
/usr/sbin/mysqld(_ZN5wsrep12client_state7cleanupEv+0x60)[0x557c294904e0] |
/usr/sbin/mysqld(_ZN3THD7cleanupEv+0x204)[0x557c28c467d4] |
/usr/sbin/mysqld(_ZN3THD11change_userEv+0x17d)[0x557c28c472cd] |
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1260)[0x557c28c9b080] |
/usr/sbin/mysqld(_Z10do_commandP3THD+0x148)[0x557c28c9c8f8] |
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x2c2)[0x557c28d764c2] |
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x557c28d7658d] |
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f838f2786db] |
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f838e67688f] |
 |
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x0): |
Connection ID (thread ID): 69 |
Status: KILL_CONNECTION
|
 |
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on |
 |
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains |
information that should help you find out what is causing the crash.
|
Writing a core file...
|
Working directory at /var/lib/mysql
|
Resource Limits:
|
Limit Soft Limit Hard Limit Units
|
Max cpu time unlimited unlimited seconds
|
Max file size unlimited unlimited bytes
|
Max data size unlimited unlimited bytes
|
Max stack size 8388608 unlimited bytes |
Max core file size 0 unlimited bytes |
Max resident set unlimited unlimited bytes
|
Max processes 63501 63501 processes |
Max open files 16364 16364 files |
Max locked memory 16777216 16777216 bytes |
Max address space unlimited unlimited bytes
|
Max file locks unlimited unlimited locks
|
Max pending signals 63501 63501 signals |
Max msgqueue size 819200 819200 bytes |
Max nice priority 0 0 |
Max realtime priority 0 0 |
Max realtime timeout unlimited unlimited us
|
Core pattern: |/usr/share/apport/apport %p %s %c %d %P
|
 |
2019-06-18 6:01:10 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:187784 |
wsrep loader: [INFO] wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so' |
wsrep loader: [INFO] wsrep_load(): Galera 26.4.2(r4498) by Codership Oy <info@codership.com> loaded successfully. |
2019-06-18 6:01:10 0 [Note] WSREP: CRC-32C: using hardware acceleration. |
2019-06-18 6:01:10 0 [Note] WSREP: Found saved state: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:-1, safe_to_bootstrap: 0 |
2019-06-18 6:01:10 0 [Note] WSREP: GCache DEBUG: opened preamble: |
Version: 2 |
UUID: 7fdaf3de-9147-11e9-84de-22f8161d0a7c |
Seqno: -1 - -1 |
Offset: -1 |
Synced: 0 |
2019-06-18 6:01:10 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 7fdaf3de-9147-11e9-84de-22f8161d0a7c, offset: -1 |
2019-06-18 6:01:10 0 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/134217752 bytes) complete. |
2019-06-18 6:01:10 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete. |
2019-06-18 6:01:10 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 187593-187785 |
2019-06-18 6:01:10 0 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/408960 bytes) complete. |
2019-06-18 6:01:10 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (408960/408960 bytes) complete. |
2019-06-18 6:01:10 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/193 locked buffers |
2019-06-18 6:01:10 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 408960/134217728 |
2019-06-18 6:01:10 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.30.2.150; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum = |
2019-06-18 6:01:10 0 [Note] WSREP: Start replication |
2019-06-18 6:01:10 0 [Note] WSREP: Connecting with bootstrap option: 0 |
2019-06-18 6:01:10 0 [Note] WSREP: Setting GCS initial position to 7fdaf3de-9147-11e9-84de-22f8161d0a7c:187784 |
2019-06-18 6:01:10 0 [Note] WSREP: protonet asio version 0 |
2019-06-18 6:01:10 0 [Note] WSREP: Using CRC-32C for message checksums. |
2019-06-18 6:01:10 0 [Note] WSREP: backend: asio |
2019-06-18 6:01:10 0 [Note] WSREP: gcomm thread scheduling priority set to other:0 |
2019-06-18 6:01:10 0 [Note] WSREP: restore pc from disk successfully |
2019-06-18 6:01:10 0 [Note] WSREP: GMCast version 0 |
2019-06-18 6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567 |
2019-06-18 6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') multicast: , ttl: 1 |
2019-06-18 6:01:10 0 [Note] WSREP: EVS version 1 |
2019-06-18 6:01:10 0 [Note] WSREP: gcomm: connecting to group 'galera_cluster', peer '172.30.2.167:,172.30.2.150:,172.30.2.68:' |
2019-06-18 6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection established to 269e2eb0 tcp://172.30.2.150:4567 |
2019-06-18 6:01:10 0 [Warning] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') address 'tcp://172.30.2.150:4567' points to own listening address, blacklisting |
2019-06-18 6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection established to 646e290e tcp://172.30.2.167:4567 |
2019-06-18 6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: |
2019-06-18 6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection established to 96783e50 tcp://172.30.2.68:4567 |
2019-06-18 6:01:11 0 [Note] WSREP: EVS version upgrade 0 -> 1 |
2019-06-18 6:01:11 0 [Note] WSREP: declaring 646e290e at tcp://172.30.2.167:4567 stable |
2019-06-18 6:01:11 0 [Note] WSREP: declaring 96783e50 at tcp://172.30.2.68:4567 stable |
2019-06-18 6:01:11 0 [Note] WSREP: PC protocol upgrade 0 -> 1 |
2019-06-18 6:01:11 0 [Note] WSREP: view(view_id(NON_PRIM,269e2eb0,34) memb { |
269e2eb0,0 |
646e290e,0 |
96783e50,0 |
} joined {
|
} left {
|
} partitioned {
|
269e2eb0,0 |
})
|
2019-06-18 6:01:11 0 [Warning] WSREP: node uuid: 269e2eb0 last_prim(type: 3, uuid: 269e2eb0) is inconsistent to restored view(type: V_NON_PRIM, uuid: 269e2eb0 |
2019-06-18 6:01:11 0 [Note] WSREP: gcomm: connected |
2019-06-18 6:01:11 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636 |
2019-06-18 6:01:11 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0) |
2019-06-18 6:01:11 0 [Note] WSREP: Opened channel 'galera_cluster' |
2019-06-18 6:01:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 3 |
2019-06-18 6:01:11 0 [Note] WSREP: Writing down CC checksum: 90b061aa c20b88bc 88b0f096 8dd8ac7b at offset 256 |
2019-06-18 6:01:11 0 [Note] WSREP: Flow-control interval: [28, 28] |
2019-06-18 6:01:11 0 [Note] WSREP: Trying to continue unpaused monitor |
2019-06-18 6:01:11 0 [Note] WSREP: Received NON-PRIMARY. |
2019-06-18 6:01:11 1 [Note] WSREP: Starting rollbacker thread 1 |
2019-06-18 6:01:11 2 [Note] WSREP: Starting post rollbacker thread 2 |
2019-06-18 6:01:11 3 [Note] WSREP: Starting applier thread 3 |
2019-06-18 6:01:11 3 [Note] WSREP: ####### processing CC -1, local, unordered |
2019-06-18 6:01:11 3 [Note] WSREP: ####### drain monitors upto -1 |
2019-06-18 6:01:11 3 [Note] WSREP: ####### My UUID: 269e2eb0-918e-11e9-8883-cb66bb34dda1 |
2019-06-18 6:01:11 3 [Note] WSREP: Server Node2 connected to cluster at position 7fdaf3de-9147-11e9-84de-22f8161d0a7c:-1 with ID 269e2eb0-918e-11e9-8883-cb66bb34dda1 |
2019-06-18 6:01:11 3 [Note] WSREP: Server status change disconnected -> connected |
2019-06-18 6:01:11 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2019-06-18 6:01:11 3 [Note] WSREP: ####### ST not required |
2019-06-18 6:01:11 3 [Note] WSREP: ================================================ |
View:
|
id: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:-1 |
status: non-primary
|
protocol_version: -1 |
capabilities:
|
final: no |
own_index: 0 |
members(3): |
0: 269e2eb0-918e-11e9-8883-cb66bb34dda1, Node2 |
1: 646e290e-915a-11e9-a9fa-ca11152ebb52, unspecified |
2: 96783e50-9154-11e9-ab2b-560064419d5a, unspecified |
=================================================
|
2019-06-18 6:01:11 3 [Note] WSREP: Non-primary view |
2019-06-18 6:01:11 3 [Note] WSREP: Server status change connected -> connected |
2019-06-18 6:01:11 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2019-06-18 6:01:11 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification. |
2019-06-18 6:01:13 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection to peer 269e2eb0 with addr tcp://172.30.2.150:4567 timed out, no messages seen in PT3S |
2019-06-18 6:01:13 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') turning message relay requesting off |
Sysbench command:
sysbench \
|
/usr/share/sysbench/oltp_write_only.lua \
|
--db-driver=mysql \
|
--db-ps-mode=disable \
|
--skip_trx=off \
|
--mysql-user=sysbuser \
|
--mysql-password=xxxxx \
|
--mysql-db=sysbench \
|
--mysql-ignore-errors \
|
--mysql-port=6033 \ |
--mysql-host=127.0.0.1 \ |
--tables=1 \ |
--table_size=100 \ |
--report-interval=1 \ |
--threads=3 \ |
--time=60 \ |
--forced-shutdown=1 \ |
--events=0 \ |
--debug \
|
--db-debug \
|
run
|