Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6175

MariaDB Galera 5.5.36 Signal 6, Entire Cluster Dies

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 5.5.36-galera
    • None
    • None

    Description

      Galera died in a ball of fire this morning with all 3 Cluster nodes showing something like the follow (from the error log)

      140425  6:52:27 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000000 of size 134217728 bytes
      140425  6:53:02 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000000
      140425  7:22:34 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000001 of size 191230321 bytes
      140425  7:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000001
      140425  7:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000002 of size 189538622 bytes
      140425  7:52:55 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000002
      140425  8:22:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000003 of size 191615069 bytes
      140425  8:22:54 [Note] WSREP: Deleted page /var/lib/mysql/data/gcache.page.000003
      140425  8:52:33 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000004 of size 182155485 bytes
      140425  9:02:49 [Note] WSREP: Created page /var/lib/mysql/data/gcache.page.000005 of size 134217728 bytes
      140425  9:03:48 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 5.5.36-MariaDB-wsrep-log
      key_buffer_size=67108864
      read_buffer_size=1048576
      max_used_connections=558
      max_threads=302
      thread_count=356
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 702370 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x7f17e9012000
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      140425  9:03:48 [Note] WSREP: declaring 62a2e8b4-c73f-11e3-bda3-daee80deac33 stable
      stack_bottom = 0x7f17d18d3940 thread_stack 0x48000
      140425  9:03:48 [Note] WSREP: (3ac797f0-c73f-11e3-a58e-37c0ddd67eee, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://172.29.33.169:4567 
      140425  9:03:48 [Note] WSREP: Node 3ac797f0-c73f-11e3-a58e-37c0ddd67eee state prim
      140425  9:03:49 [Note] WSREP: (3ac797f0-c73f-11e3-a58e-37c0ddd67eee, 'tcp://0.0.0.0:4567') reconnecting to 5efac803-cae4-11e3-bb7b-b67ea5991bad (tcp://172.29.33.169:4567), attempt 0
      140425  9:03:50 [Note] WSREP: (3ac797f0-c73f-11e3-a58e-37c0ddd67eee, 'tcp://0.0.0.0:4567') reconnecting to 62a2e8b4-c73f-11e3-bda3-daee80deac33 (tcp://172.29.33.167:4567), attempt 0
      140425  9:03:53 [Note] WSREP: evs::proto(3ac797f0-c73f-11e3-a58e-37c0ddd67eee, OPERATIONAL, view_id(REG,3ac797f0-c73f-11e3-a58e-37c0ddd67eee,12)) suspecting node: 62a2e8b4-c73f-11e3-bda3-daee80deac33
      140425  9:03:54 [Note] WSREP: evs::proto(3ac797f0-c73f-11e3-a58e-37c0ddd67eee, GATHER, view_id(REG,3ac797f0-c73f-11e3-a58e-37c0ddd67eee,12)) suspecting node: 62a2e8b4-c73f-11e3-bda3-daee80deac33
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xa9127b]
      /usr/sbin/mysqld(handle_fatal_signal+0x398)[0x6ea128]
      /lib64/libpthread.so.0[0x3d5c60f710]
      /lib64/libc.so.6(gsignal+0x35)[0x3d5c232925]
      /lib64/libc.so.6(abort+0x175)[0x3d5c234105]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_for_trx_v3EPNS_9TrxHandleE+0x265)[0x7f20b495c9a5]
      /usr/lib64/galera/libgalera_smm.so(_ZNK6galera13Certification15PurgeAndDiscardclERSt4pairIKlPNS_9TrxHandleEE+0xd2)[0x7f20b4965112]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13Certification16purge_trxs_upto_Elb+0x91)[0x7f20b4960631]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM18process_commit_cutEll+0x88)[0x7f20b4987e98]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource8dispatchEPvRK10gcs_actionRb+0x315)[0x7f20b4969055]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera15GcsActionSource7processEPvRb+0x63)[0x7f20b49694e3]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10async_recvEPv+0x93)[0x7f20b4985a23]
      /usr/lib64/galera/libgalera_smm.so(galera_recv+0x23)[0x7f20b499f993]
      /usr/sbin/mysqld[0x69e83f]
      /usr/sbin/mysqld(start_wsrep_THD+0x385)[0x5270f5]
      /lib64/libpthread.so.0[0x3d5c6079d1]
      /lib64/libc.so.6(clone+0x6d)[0x3d5c2e8b6d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): is an invalid pointer
      Connection ID (thread ID): 7
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=off
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      140425 09:03:56 mysqld_safe Number of processes running now: 0
      140425 09:03:56 mysqld_safe WSREP: not restarting wsrep node automatically
      140425 09:03:56 mysqld_safe mysqld from pid file /var/lib/mysql/data/redacted.pid ended
      140425 09:15:24 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
      140425 09:15:24 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/data/wsrep_recovery.wXdkzc' --pid-file='/var/lib/mysql/data/redacted.redacted.com-recover.pid'
      140425 09:15:34 mysqld_safe WSREP: Recovered position 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:15:34 [Note] WSREP: wsrep_start_position var submitted: '0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203'
      140425  9:15:34 [Note] WSREP: Read nil XID from storage engines, skipping position init
      140425  9:15:34 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      140425  9:15:34 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info@codership.com> loaded successfully.
      140425  9:15:34 [Note] WSREP: CRC-32C: using hardware acceleration.
      140425  9:15:34 [Note] WSREP: Found saved state: 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:-1
      140425  9:15:34 [Note] WSREP: Passing config to GCS: base_host = 172.29.33.165; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
      140425  9:15:34 [Note] WSREP: Assign initial position for certification: 2509203, protocol version: -1
      140425  9:15:34 [Note] WSREP: wsrep_sst_grab()
      140425  9:15:34 [Note] WSREP: Start replication
      140425  9:15:34 [Note] WSREP: Setting initial position to 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:15:34 [Note] WSREP: protonet asio version 0
      140425  9:15:34 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
      140425  9:15:34 [Note] WSREP: backend: asio
      140425  9:15:34 [Note] WSREP: GMCast version 0
      140425  9:15:34 [Note] WSREP: (11119c10-cc84-11e3-95fc-669db67d5915, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      140425  9:15:34 [Note] WSREP: (11119c10-cc84-11e3-95fc-669db67d5915, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      140425  9:15:34 [Note] WSREP: EVS version 0
      140425  9:15:34 [Note] WSREP: PC version 0
      140425  9:15:34 [Note] WSREP: gcomm: connecting to group 'redacted', peer '172.29.33.165:,172.29.33.167:,172.29.33.169:'
      140425  9:15:34 [Warning] WSREP: (11119c10-cc84-11e3-95fc-669db67d5915, 'tcp://0.0.0.0:4567') address 'tcp://172.29.33.165:4567' points to own listening address, blacklisting
      140425  9:15:37 [Warning] WSREP: no nodes coming from prim view, prim not possible
      140425  9:15:37 [Note] WSREP: view(view_id(NON_PRIM,11119c10-cc84-11e3-95fc-669db67d5915,1) memb {
      	11119c10-cc84-11e3-95fc-669db67d5915,0
      } joined {
      } left {
      } partitioned {
      })
      140425  9:15:38 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50214S), skipping check
      140425  9:16:07 [Note] WSREP: view((empty))
      140425  9:16:07 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
      	 at gcomm/src/pc.cpp:connect():141
      140425  9:16:07 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out)
      140425  9:16:07 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'redacted' at 'gcomm://172.29.33.165,172.29.33.167,172.29.33.169': -110 (Connection timed out)
      140425  9:16:07 [ERROR] WSREP: gcs connect failed: Connection timed out
      140425  9:16:07 [ERROR] WSREP: wsrep::connect() failed: 7
      140425  9:16:07 [ERROR] Aborting
       
      140425  9:16:07 [Note] WSREP: Service disconnected.
      140425  9:16:08 [Note] WSREP: Some threads may fail to exit.
      140425  9:16:08 [Note] /usr/sbin/mysqld: Shutdown complete
       
      140425 09:16:08 mysqld_safe mysqld from pid file /var/lib/mysql/data/redacted.redacted.com.pid ended
      140425 09:16:46 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql/data
      140425 09:16:46 mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/mysql/data/wsrep_recovery.ODDV5Q' --pid-file='/var/lib/mysql/data/redacted.redacted.com-recover.pid'
      140425 09:16:52 mysqld_safe WSREP: Recovered position 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:16:52 [Note] WSREP: wsrep_start_position var submitted: '0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203'
      140425  9:16:52 [Note] WSREP: Read nil XID from storage engines, skipping position init
      140425  9:16:52 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
      140425  9:16:52 [Note] WSREP: wsrep_load(): Galera 25.3.2(r170) by Codership Oy <info@codership.com> loaded successfully.
      140425  9:16:52 [Note] WSREP: CRC-32C: using hardware acceleration.
      140425  9:16:52 [Note] WSREP: Found saved state: 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:-1
      140425  9:16:52 [Note] WSREP: Passing config to GCS: base_host = 172.29.33.165; base_port = 4567; cert.log_conflicts = no; gcache.dir = /var/lib/mysql/data/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql/data//galera.cache; gcache.page_size = 128M; gcache.size = 128M; gcs.fc_debug = 0; gcs.fc_factor = 1; gcs.fc_limit = 16; gcs.fc_master_slave = NO; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = NO; repl.causal_read_timeout = PT30S; repl.commit_order = 3; repl.key_format = FLAT8; repl.proto_max = 5
      140425  9:16:52 [Note] WSREP: Assign initial position for certification: 2509203, protocol version: -1
      140425  9:16:52 [Note] WSREP: wsrep_sst_grab()
      140425  9:16:52 [Note] WSREP: Start replication
      140425  9:16:52 [Note] WSREP: Setting initial position to 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203
      140425  9:16:52 [Note] WSREP: protonet asio version 0
      140425  9:16:52 [Note] WSREP: Using CRC-32C (optimized) for message checksums.
      140425  9:16:52 [Note] WSREP: backend: asio
      140425  9:16:52 [Note] WSREP: GMCast version 0
      140425  9:16:52 [Note] WSREP: (3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
      140425  9:16:52 [Note] WSREP: (3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
      140425  9:16:52 [Note] WSREP: EVS version 0
      140425  9:16:52 [Note] WSREP: PC version 0
      140425  9:16:52 [Note] WSREP: gcomm: connecting to group 'redacted', peer ''
      140425  9:16:52 [Note] WSREP: Node 3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0 state prim
      140425  9:16:52 [Note] WSREP: view(view_id(PRIM,3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0,1) memb {
      	3f5f5d45-cc84-11e3-b5b7-f26a4b642ec0,0
      } joined {
      } left {
      } partitioned {
      })
      140425  9:16:52 [Note] WSREP: gcomm: connected
      140425  9:16:52 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
      140425  9:16:52 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
      140425  9:16:52 [Note] WSREP: Opened channel 'redacted'
      140425  9:16:52 [Note] WSREP: Waiting for SST to complete.
      140425  9:16:52 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
      140425  9:16:52 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 3f601dc1-cc84-11e3-ae8c-a266f08afc7d
      140425  9:16:52 [Note] WSREP: STATE EXCHANGE: sent state msg: 3f601dc1-cc84-11e3-ae8c-a266f08afc7d
      140425  9:16:52 [Note] WSREP: STATE EXCHANGE: got state msg: 3f601dc1-cc84-11e3-ae8c-a266f08afc7d from 0 (redacted)
      140425  9:16:52 [Note] WSREP: Quorum results:
      	version    = 3,
      	component  = PRIMARY,
      	conf_id    = 0,
      	members    = 1/1 (joined/total),
      	act_id     = 2509203,
      	last_appl. = -1,
      	protocols  = 0/5/2 (gcs/repl/appl),
      	group UUID = 0c4d2c70-c73f-11e3-8556-d312fc5c48bb
      140425  9:16:52 [Note] WSREP: Flow-control interval: [16, 16]
      140425  9:16:52 [Note] WSREP: Restored state OPEN -> JOINED (2509203)
      140425  9:16:52 [Note] WSREP: Member 0 (redacted) synced with group.
      140425  9:16:52 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2509203)
      140425  9:16:52 [Note] WSREP: New cluster view: global state: 0c4d2c70-c73f-11e3-8556-d312fc5c48bb:2509203, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 2
      140425  9:16:52 [Note] WSREP: SST complete, seqno: 2509203
      140425  9:16:52 InnoDB: The InnoDB memory heap is disabled
      140425  9:16:52 InnoDB: Mutexes and rw_locks use GCC atomic builtins
      140425  9:16:52 InnoDB: Compressed tables use zlib 1.2.3
      140425  9:16:52 InnoDB: Using Linux native AIO
      140425  9:16:52 InnoDB: Initializing buffer pool, size = 32.0G
      140425  9:16:53 InnoDB: Completed initialization of buffer pool
      140425  9:16:53 InnoDB: highest supported file format is Barracuda.
      140425  9:16:55  InnoDB: Waiting for the background threads to start
      140425  9:16:56 Percona XtraDB (http://www.percona.com) 5.5.36-MariaDB-33.0 started; log sequence number 7825046431393
      140425  9:16:56 [Note] Plugin 'FEEDBACK' is disabled.
      140425  9:16:56 [Note] Server socket created on IP: '0.0.0.0'.
      140425  9:16:56 [Note] Event Scheduler: Loaded 1 event
      140425  9:16:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      140425  9:16:56 [Note] WSREP: REPL Protocols: 5 (3, 1)
      140425  9:16:56 [Note] WSREP: Assign initial position for certification: 2509203, protocol version: 3
      140425  9:16:56 [Note] WSREP: Service thread queue flushed.
      140425  9:16:56 [Note] WSREP: Synchronized with group, ready for connections
      140425  9:16:56 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '5.5.36-MariaDB-wsrep-log'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server, wsrep_25.9.r3961
      140425  9:16:56 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

      The servers largely handle our ColdFusion ressions which can involve large TEXT fields:

      cvardb1 mysql> show create table CDATA\G
      *************************** 1. row ***************************
             Table: CDATA
      Create Table: CREATE TABLE `CDATA` (
        `cfid` varchar(64) NOT NULL DEFAULT '',
        `app` varchar(64) NOT NULL DEFAULT '',
        `data` longtext NOT NULL,
        PRIMARY KEY (`cfid`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 KEY_BLOCK_SIZE=16
      1 row in set (0.00 sec)
       
      cvardb1 mysql> show create table CGLOBAL\G
      *************************** 1. row ***************************
             Table: CGLOBAL
      Create Table: CREATE TABLE `CGLOBAL` (
        `cfid` varchar(64) NOT NULL DEFAULT '',
        `data` longtext NOT NULL,
        `lvisit` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
        PRIMARY KEY (`cfid`),
        KEY `lvisit_idx` (`lvisit`)
      ) ENGINE=InnoDB DEFAULT CHARSET=utf8 KEY_BLOCK_SIZE=16
      1 row in set (0.00 sec)

      It also serves as a copy of some caching tables we have setup on this, and our other, Galera cluster running the same version. I suspect that it has to do with the large TEXT fields, but that's just hazarding a guess.

      Attachments

        Activity

          People

            nirbhay_c Nirbhay Choubey (Inactive)
            txtim Tim Soderstrom
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.