Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19791

MariaDB Galera4 Crashes

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 10.4.5
    • Fix Version/s: 10.3.17, 10.4.7
    • Component/s: Galera
    • Labels:
      None
    • Environment:
      ec2, ubuntu

      Description

      Hi,

      I have 3 node MariDB Galera 4 cluster for testing.
      I was running sysbench trough ProxySQL.
      All 3 nodes were in the same hostgroup to loadbalance the queries on all the node. When a deadlock appears 2 nodes were crashing out of tree and the cluster goes into a non-Primary sate.

      The whole point of the test was to see what happens if we are writing all the nodes, looks like the cluster dies.

      2019-06-18  6:01:01 0 [Note] WSREP: Writing down CC checksum: 09e5753c 59fa600d 2e63f8a0 053de98b at offset 176
      2019-06-18  6:01:01 0 [Note] WSREP: Flow-control interval: [23, 23]
      2019-06-18  6:01:01 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-06-18  6:01:01 12 [Note] WSREP: ####### processing CC 187746, local, ordered
      2019-06-18  6:01:01 12 [Note] WSREP: ####### drain monitors upto 187745
      2019-06-18  6:01:01 12 [Note] WSREP: REPL Protocols: 10 (5, 3)
      2019-06-18  6:01:01 12 [Note] WSREP: ####### My UUID: 269e2eb0-918e-11e9-8882-cb66bb34dda1
      2019-06-18  6:01:01 12 [Note] WSREP: ####### ST not required
      2019-06-18  6:01:01 12 [Note] WSREP: Skipping cert index reset
      2019-06-18  6:01:01 12 [Note] WSREP: ####### Adjusting cert position: 187745 -> 187746
      2019-06-18  6:01:01 0 [Note] WSREP: Service thread queue flushed.
      2019-06-18  6:01:01 12 [Note] WSREP: ####### Setting monitor position to 187746
      2019-06-18  6:01:01 12 [Note] WSREP: Lowest cert indnex boundary for CC from group: 187726
      2019-06-18  6:01:01 12 [Note] WSREP: Min available from gcache for CC from group: 187593
      2019-06-18  6:01:01 12 [Note] WSREP: ================================================
      View:
        id: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:187746
        status: primary
        protocol_version: 4
        capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
        final: no
        own_index: 0
        members(2):
      	0: 269e2eb0-918e-11e9-8882-cb66bb34dda1, Node2
      	1: 96783e50-9154-11e9-ab2b-560064419d5a, Node3
      =================================================
      2019-06-18  6:01:01 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      terminate called after throwing an instance of 'wsrep::runtime_error'
        what():  client_state: Unallowed state transition: 2 -> 0
      190618  6:01:01 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.4.5-MariaDB-1:10.4.5+maria~bionic-log
      key_buffer_size=134217728
      read_buffer_size=2097152
      max_used_connections=7
      max_threads=102
      thread_count=20
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 760240 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7f804c01d018
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f8378214dd8 thread_stack 0x49000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x557c294232ce]
      /usr/sbin/mysqld(handle_fatal_signal+0x515)[0x557c28e9e275]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f838f283890]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f838e593e97]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f838e595801]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x8c957)[0x7f838ed70957]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92ab6)[0x7f838ed76ab6]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92af1)[0x7f838ed76af1]
      /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0x92d24)[0x7f838ed76d24]
      /usr/sbin/mysqld(_ZN5wsrep12client_state5stateERNS_11unique_lockINS_5mutexEEENS0_5stateE+0x1f6)[0x557c29490046]
      /usr/sbin/mysqld(_ZN5wsrep12client_state7cleanupEv+0x60)[0x557c294904e0]
      /usr/sbin/mysqld(_ZN3THD7cleanupEv+0x204)[0x557c28c467d4]
      /usr/sbin/mysqld(_ZN3THD11change_userEv+0x17d)[0x557c28c472cd]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1260)[0x557c28c9b080]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x148)[0x557c28c9c8f8]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x2c2)[0x557c28d764c2]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x557c28d7658d]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f838f2786db]
      /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f838e67688f]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0):
      Connection ID (thread ID): 69
      Status: KILL_CONNECTION
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        0                    unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             63501                63501                processes
      Max open files            16364                16364                files
      Max locked memory         16777216             16777216             bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       63501                63501                signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: |/usr/share/apport/apport %p %s %c %d %P
       
      2019-06-18  6:01:10 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:187784
      wsrep loader: [INFO] wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
      wsrep loader: [INFO] wsrep_load(): Galera 26.4.2(r4498) by Codership Oy <info@codership.com> loaded successfully.
      2019-06-18  6:01:10 0 [Note] WSREP: CRC-32C: using hardware acceleration.
      2019-06-18  6:01:10 0 [Note] WSREP: Found saved state: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:-1, safe_to_bootstrap: 0
      2019-06-18  6:01:10 0 [Note] WSREP: GCache DEBUG: opened preamble:
      Version: 2
      UUID: 7fdaf3de-9147-11e9-84de-22f8161d0a7c
      Seqno: -1 - -1
      Offset: -1
      Synced: 0
      2019-06-18  6:01:10 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 7fdaf3de-9147-11e9-84de-22f8161d0a7c, offset: -1
      2019-06-18  6:01:10 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
      2019-06-18  6:01:10 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
      2019-06-18  6:01:10 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 187593-187785
      2019-06-18  6:01:10 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (     0/408960 bytes) complete.
      2019-06-18  6:01:10 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (408960/408960 bytes) complete.
      2019-06-18  6:01:10 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 0/193 locked buffers
      2019-06-18  6:01:10 0 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 408960/134217728
      2019-06-18  6:01:10 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.30.2.150; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.version = 0; pc.announce_timeout = PT3S; pc.checksum =
      2019-06-18  6:01:10 0 [Note] WSREP: Start replication
      2019-06-18  6:01:10 0 [Note] WSREP: Connecting with bootstrap option: 0
      2019-06-18  6:01:10 0 [Note] WSREP: Setting GCS initial position to 7fdaf3de-9147-11e9-84de-22f8161d0a7c:187784
      2019-06-18  6:01:10 0 [Note] WSREP: protonet asio version 0
      2019-06-18  6:01:10 0 [Note] WSREP: Using CRC-32C for message checksums.
      2019-06-18  6:01:10 0 [Note] WSREP: backend: asio
      2019-06-18  6:01:10 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
      2019-06-18  6:01:10 0 [Note] WSREP: restore pc from disk successfully
      2019-06-18  6:01:10 0 [Note] WSREP: GMCast version 0
      2019-06-18  6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      2019-06-18  6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      2019-06-18  6:01:10 0 [Note] WSREP: EVS version 1
      2019-06-18  6:01:10 0 [Note] WSREP: gcomm: connecting to group 'galera_cluster', peer '172.30.2.167:,172.30.2.150:,172.30.2.68:'
      2019-06-18  6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection established to 269e2eb0 tcp://172.30.2.150:4567
      2019-06-18  6:01:10 0 [Warning] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') address 'tcp://172.30.2.150:4567' points to own listening address, blacklisting
      2019-06-18  6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection established to 646e290e tcp://172.30.2.167:4567
      2019-06-18  6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
      2019-06-18  6:01:10 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection established to 96783e50 tcp://172.30.2.68:4567
      2019-06-18  6:01:11 0 [Note] WSREP: EVS version upgrade 0 -> 1
      2019-06-18  6:01:11 0 [Note] WSREP: declaring 646e290e at tcp://172.30.2.167:4567 stable
      2019-06-18  6:01:11 0 [Note] WSREP: declaring 96783e50 at tcp://172.30.2.68:4567 stable
      2019-06-18  6:01:11 0 [Note] WSREP: PC protocol upgrade 0 -> 1
      2019-06-18  6:01:11 0 [Note] WSREP: view(view_id(NON_PRIM,269e2eb0,34) memb {
      	269e2eb0,0
      	646e290e,0
      	96783e50,0
      } joined {
      } left {
      } partitioned {
      	269e2eb0,0
      })
      2019-06-18  6:01:11 0 [Warning] WSREP: node uuid: 269e2eb0 last_prim(type: 3, uuid: 269e2eb0) is inconsistent to restored view(type: V_NON_PRIM, uuid: 269e2eb0
      2019-06-18  6:01:11 0 [Note] WSREP: gcomm: connected
      2019-06-18  6:01:11 0 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      2019-06-18  6:01:11 0 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      2019-06-18  6:01:11 0 [Note] WSREP: Opened channel 'galera_cluster'
      2019-06-18  6:01:11 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 3
      2019-06-18  6:01:11 0 [Note] WSREP: Writing down CC checksum: 90b061aa c20b88bc 88b0f096 8dd8ac7b at offset 256
      2019-06-18  6:01:11 0 [Note] WSREP: Flow-control interval: [28, 28]
      2019-06-18  6:01:11 0 [Note] WSREP: Trying to continue unpaused monitor
      2019-06-18  6:01:11 0 [Note] WSREP: Received NON-PRIMARY.
      2019-06-18  6:01:11 1 [Note] WSREP: Starting rollbacker thread 1
      2019-06-18  6:01:11 2 [Note] WSREP: Starting post rollbacker thread 2
      2019-06-18  6:01:11 3 [Note] WSREP: Starting applier thread 3
      2019-06-18  6:01:11 3 [Note] WSREP: ####### processing CC -1, local, unordered
      2019-06-18  6:01:11 3 [Note] WSREP: ####### drain monitors upto -1
      2019-06-18  6:01:11 3 [Note] WSREP: ####### My UUID: 269e2eb0-918e-11e9-8883-cb66bb34dda1
      2019-06-18  6:01:11 3 [Note] WSREP: Server Node2 connected to cluster at position 7fdaf3de-9147-11e9-84de-22f8161d0a7c:-1 with ID 269e2eb0-918e-11e9-8883-cb66bb34dda1
      2019-06-18  6:01:11 3 [Note] WSREP: Server status change disconnected -> connected
      2019-06-18  6:01:11 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-06-18  6:01:11 3 [Note] WSREP: ####### ST not required
      2019-06-18  6:01:11 3 [Note] WSREP: ================================================
      View:
        id: 7fdaf3de-9147-11e9-84de-22f8161d0a7c:-1
        status: non-primary
        protocol_version: -1
        capabilities:
        final: no
        own_index: 0
        members(3):
      	0: 269e2eb0-918e-11e9-8883-cb66bb34dda1, Node2
      	1: 646e290e-915a-11e9-a9fa-ca11152ebb52, unspecified
      	2: 96783e50-9154-11e9-ab2b-560064419d5a, unspecified
      =================================================
      2019-06-18  6:01:11 3 [Note] WSREP: Non-primary view
      2019-06-18  6:01:11 3 [Note] WSREP: Server status change connected -> connected
      2019-06-18  6:01:11 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-06-18  6:01:11 3 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2019-06-18  6:01:13 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') connection to peer 269e2eb0 with addr tcp://172.30.2.150:4567 timed out, no messages seen in PT3S
      2019-06-18  6:01:13 0 [Note] WSREP: (269e2eb0, 'tcp://0.0.0.0:4567') turning message relay requesting off
      

      Sysbench command:

      sysbench \
                      /usr/share/sysbench/oltp_write_only.lua \
                      --db-driver=mysql \
                      --db-ps-mode=disable \
                      --skip_trx=off \
                      --mysql-user=sysbuser \
                      --mysql-password=xxxxx \
                      --mysql-db=sysbench \
                      --mysql-ignore-errors \
                      --mysql-port=6033 \
                      --mysql-host=127.0.0.1 \
                      --tables=1 \
                      --table_size=100 \
                      --report-interval=1 \
                      --threads=3 \
                      --time=60 \
                      --forced-shutdown=1 \
                      --events=0 \
                      --debug \
                      --db-debug \
                      run
      

        Attachments

          Activity

            People

            Assignee:
            stepan.patryshev Stepan Patryshev (Inactive)
            Reporter:
            Tibi Tibor Korocz
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration