Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10021

Joiner node crashes badly after successful SST

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Won't Fix
    • 10.1.13
    • 10.1.15
    • wsrep
    • None
    • Debian 8 Jessie

    Description

      Hi everyone,

      I have an issue with the latest stable MariaDB version 10.1.13.

      On a fresh install on a newly created Debian 8 machine, the new node that joins an existing cluster of 2 nodes crashes badly after a successful SST.

      I'm using the following mirror: http://ftp.igh.cnrs.fr/pub/mariadb/repo/10.1/debian/dists/jessie/

      The main cause seems to be:

      {{ WSREP: Failed to report last committed 262549048, -77 (File descriptor in bad state)}}

      Here are the logs from the SST start to the crash:

      {{WSREP_SST: [INFO] Evaluating socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20160504 11:40:01.828)
      WSREP_SST: [INFO] Cleaning the existing datadir and innodb-data/log directories (20160504 11:40:01.829)
      WSREP_SST: [INFO] Waiting for SST streaming to complete! (20160504 11:40:01.840)
      2016-05-04 11:40:02 140497859376896 [Note] WSREP: (2a79cb83, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-05-04 11:42:13 140497846789888 [Note] WSREP: 2.0 (mariadb3.pci): State transfer to 1.0 (mariadb2.pci) complete.
      2016-05-04 11:42:13 140497846789888 [Note] WSREP: Member 2.0 (mariadb3.pci) synced with group.
      WSREP_SST: [INFO] Preparing the backup at /home/mysql//.sst (20160504 11:42:16.246)
      WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --apply-log $rebuildcmd ${DATA} &>${DATA}/innobackup.prepare.log (20160504 11:42:16.249)
      rm: impossible de supprimer « /home/mysql//innobackup.prepare.log »: Aucun fichier ou dossier de ce type
      rm: impossible de supprimer « /home/mysql//innobackup.move.log »: Aucun fichier ou dossier de ce type
      WSREP_SST: [INFO] Moving the backup to /home/mysql/ (20160504 11:42:43.527)
      WSREP_SST: [INFO] Evaluating innobackupex --no-version-check --move-back --force-non-empty-directories ${DATA} &>${DATA}/innobackup.move.log (20160504 11:42:43.530)
      WSREP_SST: [INFO] Move successful, removing /home/mysql//.sst (20160504 11:42:43.790)
      WSREP_SST: [INFO] Galera co-ords from recovery: 4dc8842b-fa29-11e4-a83d-e3e89ebeeaee:262549840 (20160504 11:42:43.800)
      WSREP_SST: [INFO] Total time on joiner: 0 seconds (20160504 11:42:43.805)
      WSREP_SST: [INFO] Removing the sst_in_progress file (20160504 11:42:43.808)
      2016-05-04 11:42:43 140499059210176 [Note] WSREP: SST complete, seqno: 262549840
      2016-05-04 11:42:43 140499059210176 [Warning] option 'innodb-buffer-pool-instances': signed value -1 adjusted to 0
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Memory barrier is not used
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Compressed tables use zlib 1.2.8
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Using Linux native AIO
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Using SSE crc32 instructions
      2016-05-04 11:42:43 140499059210176 [Note] InnoDB: Initializing buffer pool, size = 15.7G
      2016-05-04 11:42:45 140499059210176 [Note] InnoDB: Completed initialization of buffer pool
      2016-05-04 11:42:45 140499059210176 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-05-04 11:42:45 140499059210176 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-05-04 11:42:45 140499059210176 [Note] InnoDB: Waiting for purge to start
      2016-05-04 11:42:46 140499059210176 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 406579686001
      2016-05-04 11:42:46 140478871734016 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-05-04 11:42:46 140499059210176 [Note] Plugin 'FEEDBACK' is disabled.
      160504 11:42:46 server_audit: MariaDB Audit Plugin version 1.4.0 STARTED.
      2016-05-04 11:42:46 140499059210176 [Note] Server socket created on IP: '::'.
      2016-05-04 11:42:46 140499057888000 [Note] /usr/sbin/mysqld: Normal shutdown

      2016-05-04 11:42:46 140499057888000 [Note] WSREP: Stop replication
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: Closing send monitor...
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: Closed send monitor.
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: gcomm: terminating thread
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: gcomm: joining thread
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: gcomm: closing backend
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: view(view_id(NON_PRIM,136622bc,690) memb

      { 2a79cb83,0 }

      joined {
      } left {
      } partitioned

      { 136622bc,0 56f77fea,0 }

      )
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: view((empty))
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: gcomm: closed
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Flow-control interval: [16, 16]
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Received NON-PRIMARY.
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Shifting JOINER -> OPEN (TO: 262550092)
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Received self-leave message.
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Flow-control interval: [0, 0]
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 262550092)
      2016-05-04 11:42:46 140497846789888 [Note] WSREP: RECV thread exiting 0: Success
      2016-05-04 11:42:46 140499059210176 [Warning] 'user' entry 'root@mariadb-rbx' ignored in --skip-name-resolve mode.
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: recv_thread() joined.
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: Closing replication queue.
      2016-05-04 11:42:46 140499057888000 [Note] WSREP: Closing slave action queue.
      2016-05-04 11:42:46 140499059210176 [Warning] 'user' entry 'cmon@nas-movidone' ignored in --skip-name-resolve mode.
      2016-05-04 11:42:46 140499059210176 [Warning] 'user' entry 'maxscale@%' has both a password and an authentication plugin specified. The password will be ignored.
      2016-05-04 11:42:46 140499059210176 [Warning] 'proxies_priv' entry '@% root@mariadb-rbx' ignored in --skip-name-resolve mode.
      2016-05-04 11:42:46 140499059210176 [Note] WSREP: Signalling provider to continue.
      2016-05-04 11:42:46 140499059210176 [Note] WSREP: SST received: 4dc8842b-fa29-11e4-a83d-e3e89ebeeaee:262549840
      2016-05-04 11:42:46 140499059210176 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://mariadb3.pci,mariadb1.pci,mariadb2.pci'
      mysqld: /home/buildbot/buildbot/build/mariadb-10.1.13/sql/wsrep_thd.cc:401: void wsrep_create_appliers(long int): Assertion `0' failed.
      2016-05-04 11:42:46 140499058895616 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
      160504 11:42:46 2016-05-04 11:42:46 140499058895616 [ERROR] WSREP: Failed to JOIN the cluster after SST
      [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.1.13-MariaDB-1~jessie
      key_buffer_size=25165824
      read_buffer_size=131072
      max_used_connections=0
      max_threads=502
      thread_count=2
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1127190 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x48400
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549048, -77 (File descriptor in bad state)
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549176, -77 (File descriptor in bad state)
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549304, -77 (File descriptor in bad state)
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549432, -77 (File descriptor in bad state)
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549560, -77 (File descriptor in bad state)
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549688, -77 (File descriptor in bad state)
      2016-05-04 11:42:46 140497913902848 [Warning] WSREP: Failed to report last committed 262549816, -77 (File descriptor in bad state)
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7fc87c1bfe3e]
      /usr/sbin/mysqld(handle_fatal_signal+0x34d)[0x7fc87bd0170d]
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fc87b3278d0]
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7fc8793ed067]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7fc8793ee448]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7fc8793e6266]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7fc8793e6312]
      /usr/sbin/mysqld(+0x551d72)[0x7fc87bcb1d72]
      /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1484)[0x7fc87baed014]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fc8793d9b45]
      /usr/sbin/mysqld(+0x38004d)[0x7fc87bae004d]}}

      Attachments

        Issue Links

          Activity

            People

              nirbhay_c Nirbhay Choubey (Inactive)
              jom Maiavria
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.