Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9772

mysqld crash [Failed to JOIN the cluster after SST]

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 10.1.12, 10.1.13
    • Fix Version/s: 10.1.15
    • Component/s: Galera SST, Replication
    • Labels:
      None
    • Environment:
      Debian Jessie (with systemd)
      Linux mariadb-02 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) x86_64 GNU/Linux

      Description

      Sometimes with restart any node after replication daemon mysqld is crashed.

      2016-03-21 23:48:37 140077095184128 [Warning] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable
      2016-03-21 23:48:38 140077095184128 [Warning] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable
      2016-03-21 23:48:38 140077095184128 [Note] WSREP: 2.2 (mariadb-01): State transfer to 0.2 (mariadb-03) complete.
      2016-03-21 23:48:38 140077095184128 [Note] WSREP: Member 2.2 (mariadb-01) synced with group.
      2016-03-21 23:48:39 140077095184128 [Note] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*'. Selected 2.2 (mariadb-01)(SYNCED) as donor.
      2016-03-21 23:48:39 140077095184128 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 45386)
      2016-03-21 23:48:39 140082886372096 [Note] WSREP: Requesting state transfer: success after 80 tries, donor: 2
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: (18e4fe19, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.100.22.53:4567 
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: declaring b0cafdc9 at tcp://10.100.22.51:4567 stable
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: declaring d4ba184b at tcp://10.100.22.54:4567 stable
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: forgetting 01c16e15 (tcp://10.100.22.53:4567)
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: (18e4fe19, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: Node 18e4fe19 state prim
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: view(view_id(PRIM,18e4fe19,123) memb {
              18e4fe19,2
              b0cafdc9,2
              d4ba184b,0
      } joined {
      } left {
      } partitioned {
              01c16e15,2
      })
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: save pc into disk
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: forgetting 01c16e15 (tcp://10.100.22.53:4567)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 58c1a508-efa6-11e5-af34-e3ae0409c024
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: sent state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 0 (mariadb-02)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 2 (garb)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 1 (mariadb-01)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: Quorum results:
              version    = 3,
              component  = PRIMARY,
              conf_id    = 71,
              members    = 2/3 (joined/total),
              act_id     = 45815,
              last_appl. = 45683,
              protocols  = 0/7/3 (gcs/repl/appl),
              group UUID = 3f92b3c2-ef9c-11e5-ad47-93c5357ece23
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: Flow-control interval: [28, 28]
      2016-03-21 23:49:08 140077103576832 [Note] WSREP:  cleaning up 01c16e15 (tcp://10.100.22.53:4567)
      2016-03-21 23:49:56 140077095184128 [Note] WSREP: 1.2 (mariadb-01): State transfer to 0.2 (mariadb-02) complete.
      2016-03-21 23:49:56 140077095184128 [Note] WSREP: Member 1.2 (mariadb-01) synced with group.
      2016-03-21 23:50:15 140082886686656 [Note] WSREP: SST complete, seqno: 45838
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Memory barrier is not used
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Compressed tables use zlib 1.2.8
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using Linux native AIO
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using SSE crc32 instructions
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Initializing buffer pool, size = 512.0M
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Completed initialization of buffer pool
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Waiting for purge to start
      2016-03-21 23:50:17 140082886686656 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 74594461331
      2016-03-21 23:50:17 140076138858240 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-03-21 23:50:17 140082886686656 [Note] Plugin 'FEEDBACK' is disabled.
      2016-03-21 23:50:17 140082886686656 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
      2016-03-21 23:50:17 140082886686656 [Note] Starting crash recovery...
      2016-03-21 23:50:17 140082886686656 [Note] Crash recovery finished.
      2016-03-21 23:50:17 140082886686656 [Note] Server socket created on IP: '0.0.0.0'.
      2016-03-21 23:50:17 140082885253888 [Note] /usr/sbin/mysqld: Normal shutdown
       
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Stop replication
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing send monitor...
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closed send monitor.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: terminating thread
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: joining thread
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: closing backend
      2016-03-21 23:50:17 140082886686656 [Warning] 'user' entry 'root@mariadb-01' ignored in --skip-name-resolve mode.
      2016-03-21 23:50:17 140082886686656 [Warning] 'proxies_priv' entry '@% root@mariadb-01' ignored in --skip-name-resolve mode.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: view(view_id(NON_PRIM,18e4fe19,123) memb {
              18e4fe19,2
      } joined {
      } left {
      } partitioned {
              b0cafdc9,2
              d4ba184b,0
      })
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: view((empty))
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: closed
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Flow-control interval: [16, 16]
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received NON-PRIMARY.
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Shifting JOINER -> OPEN (TO: 45889)
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received self-leave message.
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Flow-control interval: [0, 0]
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 45889)
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: RECV thread exiting 0: Success
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: recv_thread() joined.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing replication queue.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing slave action queue.
      2016-03-21 23:50:17 140082886686656 [Note] WSREP: Signalling provider to continue.
      2016-03-21 23:50:17 140082833533696 [Note] Event Scheduler: scheduler thread started with id 4
      2016-03-21 23:50:17 140082886686656 [Note] WSREP: SST received: 3f92b3c2-ef9c-11e5-ad47-93c5357ece23:45838
      2016-03-21 23:50:17 140082886686656 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://10.100.22.54,10.100.22.53,10.100.22.51'
      mysqld: /home/buildbot/buildbot/build/mariadb-10.1.12/sql/wsrep_thd.cc:401: void wsrep_create_appliers(long int): Assertion `0' failed.
      160321 23:50:17 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.1.12-MariaDB-1~jessie
      key_buffer_size=67108864
      read_buffer_size=33554432
      max_used_connections=0
      max_threads=1026
      thread_count=3
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 67326468 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
      stack_bottom = 0x0 thread_stack 0x48400
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: Failed to JOIN the cluster after SST
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43139, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43266, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43395, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43521, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43650, -77 (File descriptor in bad state)
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5630f4313a3e]
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43778, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43904, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44036, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44160, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44290, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44418, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44547, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44674, -77 (File descriptor in bad state)
      /usr/sbin/mysqld(handle_fatal_signal+0x34d)[0x5630f3e55cbd]
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44801, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44931, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45058, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45185, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45312, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45440, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45522, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45650, -77 (File descriptor in bad state)
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f679671c8d0]
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45779, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: New cluster view: global state: 3f92b3c2-ef9c-11e5-ad47-93c5357ece23:45815, view# 72: Primary, number of nodes: 3, my index: 0, protocol version 3
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-03-21 23:50:17 140077157435136 [Note] WSREP: Service thread queue flushed.
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: Assign initial position for certification: 45815, protocol version: 3
      2016-03-21 23:50:17 140077157435136 [Note] WSREP: Service thread queue flushed.
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: Failed to JOIN the cluster after SST
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f679480d067]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f679480e448]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f6794806266]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f6794806312]
      /usr/sbin/mysqld(+0x552592)[0x5630f3e06592]
      /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1484)[0x5630f3c41044]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f67947f9b45]
      /usr/sbin/mysqld(+0x38007d)[0x5630f3c3407d]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              nirbhay_c Nirbhay Choubey (Inactive)
              Reporter:
              sys.admin Matyashov Andrey
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: