Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9772

mysqld crash [Failed to JOIN the cluster after SST]

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Won't Fix
    • 10.1.12, 10.1.13
    • 10.1.15
    • Galera SST, Replication
    • None
    • Debian Jessie (with systemd)
      Linux mariadb-02 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) x86_64 GNU/Linux

    Description

      Sometimes with restart any node after replication daemon mysqld is crashed.

      2016-03-21 23:48:37 140077095184128 [Warning] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable
      2016-03-21 23:48:38 140077095184128 [Warning] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable
      2016-03-21 23:48:38 140077095184128 [Note] WSREP: 2.2 (mariadb-01): State transfer to 0.2 (mariadb-03) complete.
      2016-03-21 23:48:38 140077095184128 [Note] WSREP: Member 2.2 (mariadb-01) synced with group.
      2016-03-21 23:48:39 140077095184128 [Note] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*'. Selected 2.2 (mariadb-01)(SYNCED) as donor.
      2016-03-21 23:48:39 140077095184128 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 45386)
      2016-03-21 23:48:39 140082886372096 [Note] WSREP: Requesting state transfer: success after 80 tries, donor: 2
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: (18e4fe19, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.100.22.53:4567 
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: declaring b0cafdc9 at tcp://10.100.22.51:4567 stable
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: declaring d4ba184b at tcp://10.100.22.54:4567 stable
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: forgetting 01c16e15 (tcp://10.100.22.53:4567)
      2016-03-21 23:49:02 140077103576832 [Note] WSREP: (18e4fe19, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: Node 18e4fe19 state prim
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: view(view_id(PRIM,18e4fe19,123) memb {
              18e4fe19,2
              b0cafdc9,2
              d4ba184b,0
      } joined {
      } left {
      } partitioned {
              01c16e15,2
      })
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: save pc into disk
      2016-03-21 23:49:03 140077103576832 [Note] WSREP: forgetting 01c16e15 (tcp://10.100.22.53:4567)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 58c1a508-efa6-11e5-af34-e3ae0409c024
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: sent state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 0 (mariadb-02)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 2 (garb)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 1 (mariadb-01)
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: Quorum results:
              version    = 3,
              component  = PRIMARY,
              conf_id    = 71,
              members    = 2/3 (joined/total),
              act_id     = 45815,
              last_appl. = 45683,
              protocols  = 0/7/3 (gcs/repl/appl),
              group UUID = 3f92b3c2-ef9c-11e5-ad47-93c5357ece23
      2016-03-21 23:49:03 140077095184128 [Note] WSREP: Flow-control interval: [28, 28]
      2016-03-21 23:49:08 140077103576832 [Note] WSREP:  cleaning up 01c16e15 (tcp://10.100.22.53:4567)
      2016-03-21 23:49:56 140077095184128 [Note] WSREP: 1.2 (mariadb-01): State transfer to 0.2 (mariadb-02) complete.
      2016-03-21 23:49:56 140077095184128 [Note] WSREP: Member 1.2 (mariadb-01) synced with group.
      2016-03-21 23:50:15 140082886686656 [Note] WSREP: SST complete, seqno: 45838
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using mutexes to ref count buffer pool pages
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: The InnoDB memory heap is disabled
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Memory barrier is not used
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Compressed tables use zlib 1.2.8
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using Linux native AIO
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using SSE crc32 instructions
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Initializing buffer pool, size = 512.0M
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Completed initialization of buffer pool
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Highest supported file format is Barracuda.
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: 128 rollback segment(s) are active.
      2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Waiting for purge to start
      2016-03-21 23:50:17 140082886686656 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 74594461331
      2016-03-21 23:50:17 140076138858240 [Note] InnoDB: Dumping buffer pool(s) not yet started
      2016-03-21 23:50:17 140082886686656 [Note] Plugin 'FEEDBACK' is disabled.
      2016-03-21 23:50:17 140082886686656 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin
      2016-03-21 23:50:17 140082886686656 [Note] Starting crash recovery...
      2016-03-21 23:50:17 140082886686656 [Note] Crash recovery finished.
      2016-03-21 23:50:17 140082886686656 [Note] Server socket created on IP: '0.0.0.0'.
      2016-03-21 23:50:17 140082885253888 [Note] /usr/sbin/mysqld: Normal shutdown
       
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Stop replication
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing send monitor...
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closed send monitor.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: terminating thread
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: joining thread
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: closing backend
      2016-03-21 23:50:17 140082886686656 [Warning] 'user' entry 'root@mariadb-01' ignored in --skip-name-resolve mode.
      2016-03-21 23:50:17 140082886686656 [Warning] 'proxies_priv' entry '@% root@mariadb-01' ignored in --skip-name-resolve mode.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: view(view_id(NON_PRIM,18e4fe19,123) memb {
              18e4fe19,2
      } joined {
      } left {
      } partitioned {
              b0cafdc9,2
              d4ba184b,0
      })
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: view((empty))
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: closed
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Flow-control interval: [16, 16]
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received NON-PRIMARY.
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Shifting JOINER -> OPEN (TO: 45889)
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received self-leave message.
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Flow-control interval: [0, 0]
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 45889)
      2016-03-21 23:50:17 140077095184128 [Note] WSREP: RECV thread exiting 0: Success
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: recv_thread() joined.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing replication queue.
      2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing slave action queue.
      2016-03-21 23:50:17 140082886686656 [Note] WSREP: Signalling provider to continue.
      2016-03-21 23:50:17 140082833533696 [Note] Event Scheduler: scheduler thread started with id 4
      2016-03-21 23:50:17 140082886686656 [Note] WSREP: SST received: 3f92b3c2-ef9c-11e5-ad47-93c5357ece23:45838
      2016-03-21 23:50:17 140082886686656 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://10.100.22.54,10.100.22.53,10.100.22.51'
      mysqld: /home/buildbot/buildbot/build/mariadb-10.1.12/sql/wsrep_thd.cc:401: void wsrep_create_appliers(long int): Assertion `0' failed.
      160321 23:50:17 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see http://kb.askmonty.org/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.1.12-MariaDB-1~jessie
      key_buffer_size=67108864
      read_buffer_size=33554432
      max_used_connections=0
      max_threads=1026
      thread_count=3
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 67326468 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
      stack_bottom = 0x0 thread_stack 0x48400
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: Failed to JOIN the cluster after SST
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43139, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43266, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43395, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43521, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43650, -77 (File descriptor in bad state)
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5630f4313a3e]
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43778, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43904, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44036, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44160, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44290, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44418, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44547, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44674, -77 (File descriptor in bad state)
      /usr/sbin/mysqld(handle_fatal_signal+0x34d)[0x5630f3e55cbd]
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44801, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44931, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45058, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45185, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45312, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45440, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45522, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45650, -77 (File descriptor in bad state)
      /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f679671c8d0]
      2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45779, -77 (File descriptor in bad state)
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: New cluster view: global state: 3f92b3c2-ef9c-11e5-ad47-93c5357ece23:45815, view# 72: Primary, number of nodes: 3, my index: 0, protocol version 3
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2016-03-21 23:50:17 140077157435136 [Note] WSREP: Service thread queue flushed.
      2016-03-21 23:50:17 140082886372096 [Note] WSREP: Assign initial position for certification: 45815, protocol version: 3
      2016-03-21 23:50:17 140077157435136 [Note] WSREP: Service thread queue flushed.
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).
      2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: Failed to JOIN the cluster after SST
      /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f679480d067]
      /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f679480e448]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f6794806266]
      /lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f6794806312]
      /usr/sbin/mysqld(+0x552592)[0x5630f3e06592]
      /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1484)[0x5630f3c41044]
      /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f67947f9b45]
      /usr/sbin/mysqld(+0x38007d)[0x5630f3c3407d]
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      

      Attachments

        Issue Links

          Activity

            sys.admin Does it happen in a scenario where multiple joiner are contenting to join at the same time? Could you share the configurations?

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - sys.admin Does it happen in a scenario where multiple joiner are contenting to join at the same time? Could you share the configurations?

            My config:

            [client]
            max_allowed_packet      = 128M
            port                    = 3306
            socket                  = /var/run/mysqld/mysqld.sock
             
            [mysqld_safe]
            socket                  = /var/run/mysqld/mysqld.sock
            nice                    = 0
             
            [mysqld]
            user                    = mysql
            pid-file                = /var/run/mysqld/mysqld.pid
            socket                  = /var/run/mysqld/mysqld.sock
            port                    = 3306
            basedir                 = /usr
            datadir                 = /var/lib/mysql
            tmpdir                  = /tmp
            lc_messages_dir         = /usr/share/mysql
            lc_messages             = en_US
            skip-external-locking   = 1
            collation-server        = utf8_general_ci
            init-connect            = 'SET NAMES utf8'
            character-set-server    = utf8
            bind-address            = 0.0.0.0
            net_write_timeout       = 300
            skip-name-resolve       = yes
            max_connections         = 128
            connect_timeout         = 5
            wait_timeout            = 600
            max_allowed_packet      = 128M
            thread_cache_size       = 128
            sort_buffer_size        = 16M
            bulk_insert_buffer_size = 64M
            tmp_table_size          = 128M
            max_heap_table_size     = 128M
            myisam_recover_options  = BACKUP
            key_buffer_size         = 32M
            table_open_cache        = 400
            myisam_sort_buffer_size = 512M
            concurrent_insert       = 2
            read_buffer_size        = 32M
            read_rnd_buffer_size    = 16M
            query_cache_limit       = 32M
            query_cache_size        = 256M
            general_log_file        = /var/log/mysql/mysql.log
            general_log             = 0
            log-error               = /var/log/mysql/mysql_err.log
            log_warnings            = 2
            slow_query_log=0
            slow_query_log_file     = /var/log/mysql/mariadb-slow.log
            long_query_time         = 10
            log_slow_verbosity      = query_plan
            server-id               = 1
            report_host             = mariadb-01
            log_bin                 = /var/log/mysql/mariadb-bin
            log_bin_index           = /var/log/mysql/mariadb-bin.index
            event_scheduler         = on
            sync_binlog             = 0
            expire_logs_days        = 32
            max_binlog_size         = 1024M
            relay_log              = /var/log/mysql/relay-bin
            relay_log_index        = /var/log/mysql/relay-bin.index
            relay_log_info_file    = /var/log/mysql/relay-bin.info
            log_slave_updates       = 1
            slave_type_conversions  = ALL_NON_LOSSY
            default_storage_engine  = InnoDB
            innodb_log_file_size    = 512M
            innodb_buffer_pool_size = 512M
            innodb_log_buffer_size  = 64M
            innodb_file_per_table   = 1
            innodb_doublewrite      = 1
            innodb_open_files       = 2400
            innodb_io_capacity      = 2400
            innodb_flush_method     = O_DIRECT
            innodb_file_format      = barracuda
            innodb-data-file-path   = ibdata1:10M:autoextend
             
            [galera]
            wsrep_on                = ON
            wsrep_provider          = /usr/lib/libgalera_smm.so
            wsrep_node_address      = 10.100.22.51
            wsrep_cluster_address   = gcomm://10.100.22.54,10.100.22.53,10.100.22.52
            wsrep_sst_auth          = root:MY_SECRET
            wsrep_cluster_name      = mariadb-cluster
            wsrep_node_name         = mariadb-01
            binlog_format           = row
            default_storage_engine  = InnoDB
            innodb_autoinc_lock_mode= 2
            wsrep_sst_method        = xtrabackup-v2
            wsrep_retry_autocommit  = 3
            wsrep_restart_slave     = On
            wsrep_debug             = Off
            wsrep_auto_increment_control = Off
            wsrep_provider_options  = "gcache.size=5G;repl.commit_order=1;gmcast.segment=2;pc.recovery=TRUE;debug=no;gcs.fc_debug=0;pc.wait_prim=true"
            wsrep_replicate_myisam  = On
            max_allowed_packet      = 128M
            bind-address            = 0.0.0.0
            wsrep_slave_threads     = 32
            innodb_flush_log_at_trx_commit = 1
             
            [mysqldump]
            quick
            quote-names
            max_allowed_packet      = 128M
             
            [mysql]
             
            [isamchk]
            key_buffer              = 128M
            !includedir /etc/mysql/conf.d/
            

            sys.admin Matyashov Andrey added a comment - My config: [client] max_allowed_packet = 128M port = 3306 socket = /var/run/mysqld/mysqld.sock   [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0   [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc_messages_dir = /usr/share/mysql lc_messages = en_US skip-external-locking = 1 collation-server = utf8_general_ci init-connect = 'SET NAMES utf8' character-set-server = utf8 bind-address = 0.0 . 0.0 net_write_timeout = 300 skip-name-resolve = yes max_connections = 128 connect_timeout = 5 wait_timeout = 600 max_allowed_packet = 128M thread_cache_size = 128 sort_buffer_size = 16M bulk_insert_buffer_size = 64M tmp_table_size = 128M max_heap_table_size = 128M myisam_recover_options = BACKUP key_buffer_size = 32M table_open_cache = 400 myisam_sort_buffer_size = 512M concurrent_insert = 2 read_buffer_size = 32M read_rnd_buffer_size = 16M query_cache_limit = 32M query_cache_size = 256M general_log_file = /var/log/mysql/mysql.log general_log = 0 log-error = /var/log/mysql/mysql_err.log log_warnings = 2 slow_query_log= 0 slow_query_log_file = /var/log/mysql/mariadb-slow.log long_query_time = 10 log_slow_verbosity = query_plan server-id = 1 report_host = mariadb- 01 log_bin = /var/log/mysql/mariadb-bin log_bin_index = /var/log/mysql/mariadb-bin.index event_scheduler = on sync_binlog = 0 expire_logs_days = 32 max_binlog_size = 1024M relay_log = /var/log/mysql/relay-bin relay_log_index = /var/log/mysql/relay-bin.index relay_log_info_file = /var/log/mysql/relay-bin.info log_slave_updates = 1 slave_type_conversions = ALL_NON_LOSSY default_storage_engine = InnoDB innodb_log_file_size = 512M innodb_buffer_pool_size = 512M innodb_log_buffer_size = 64M innodb_file_per_table = 1 innodb_doublewrite = 1 innodb_open_files = 2400 innodb_io_capacity = 2400 innodb_flush_method = O_DIRECT innodb_file_format = barracuda innodb-data-file-path = ibdata1:10M:autoextend   [galera] wsrep_on = ON wsrep_provider = /usr/lib/libgalera_smm.so wsrep_node_address = 10.100 . 22.51 wsrep_cluster_address = gcomm: //10.100.22.54,10.100.22.53,10.100.22.52 wsrep_sst_auth = root:MY_SECRET wsrep_cluster_name = mariadb-cluster wsrep_node_name = mariadb- 01 binlog_format = row default_storage_engine = InnoDB innodb_autoinc_lock_mode= 2 wsrep_sst_method = xtrabackup-v2 wsrep_retry_autocommit = 3 wsrep_restart_slave = On wsrep_debug = Off wsrep_auto_increment_control = Off wsrep_provider_options = "gcache.size=5G;repl.commit_order=1;gmcast.segment=2;pc.recovery=TRUE;debug=no;gcs.fc_debug=0;pc.wait_prim=true" wsrep_replicate_myisam = On max_allowed_packet = 128M bind-address = 0.0 . 0.0 wsrep_slave_threads = 32 innodb_flush_log_at_trx_commit = 1   [mysqldump] quick quote-names max_allowed_packet = 128M   [mysql]   [isamchk] key_buffer = 128M !includedir /etc/mysql/conf.d/
            andrew.garner Andrew Garner added a comment -

            I just ran into this by accident last night in a test environment. This seemed to be caused by the service timing out during startup and systemd sends a SIGTERM. SST finished up but mysqld immediately starts the shutdown process afterwards. This seems to shutdown the wsrep connections (I suppose in the shutdown thread) and when the startup bits finish trickling through to wsrep_create_appliers we hit this assertion. This seems to line up with the logs in this issue where we see an immediate shutdown starting after the initial startup.

            Avoiding was fairly easy by setting a sufficiently high TimeoutStartSec for the mariadb service. I was able to reproduce this multiple times this morning by forcing SST and setting TimeoutStartSec rather low. If it is useful I can provide logs or other debugging info from my environment.

            andrew.garner Andrew Garner added a comment - I just ran into this by accident last night in a test environment. This seemed to be caused by the service timing out during startup and systemd sends a SIGTERM. SST finished up but mysqld immediately starts the shutdown process afterwards. This seems to shutdown the wsrep connections (I suppose in the shutdown thread) and when the startup bits finish trickling through to wsrep_create_appliers we hit this assertion. This seems to line up with the logs in this issue where we see an immediate shutdown starting after the initial startup. Avoiding was fairly easy by setting a sufficiently high TimeoutStartSec for the mariadb service. I was able to reproduce this multiple times this morning by forcing SST and setting TimeoutStartSec rather low. If it is useful I can provide logs or other debugging info from my environment.
            jom Maiavria added a comment -

            Thanks Andrew Garner, this fixed my problem too.

            I think this should be fixed in the next release, because when starting a node of a cluster it may always need to synchronize with other nodes and thus take from 1 to 20 or even 40 minutes depending of the amount of data.

            jom Maiavria added a comment - Thanks Andrew Garner, this fixed my problem too. I think this should be fixed in the next release, because when starting a node of a cluster it may always need to synchronize with other nodes and thus take from 1 to 20 or even 40 minutes depending of the amount of data.

            This could happen when systemd service timeout for mariadb is not sufficiently large.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - This could happen when systemd service timeout for mariadb is not sufficiently large.

            Just a heads up I am still seeing this behavior on 10.1.18-MariaDB-1~jessie (ticket is reporting 10.1.15 as fixed)

            alekc Alexander Chernov added a comment - Just a heads up I am still seeing this behavior on 10.1.18-MariaDB-1~jessie (ticket is reporting 10.1.15 as fixed)

            alekc This was closed as Won't Fix. You'll need to increase the TimeoutStartSec.

            nirbhay_c Nirbhay Choubey (Inactive) added a comment - alekc This was closed as Won't Fix. You'll need to increase the TimeoutStartSec.

            People

              nirbhay_c Nirbhay Choubey (Inactive)
              sys.admin Matyashov Andrey
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.