[MDEV-9772] mysqld crash [Failed to JOIN the cluster after SST] - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Won't Fix
Affects Version/s: 10.1.12, 10.1.13
Fix Version/s: 10.1.15
Component/s: Galera SST, Replication
Labels:
None
Environment:
Debian Jessie (with systemd)
Linux mariadb-02 4.3.0-0.bpo.1-amd64 #1 SMP Debian 4.3.5-1~bpo8+1 (2016-02-23) x86_64 GNU/Linux

Description

Sometimes with restart any node after replication daemon mysqld is crashed.

2016-03-21 23:48:37 140077095184128 [Warning] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable

2016-03-21 23:48:38 140077095184128 [Warning] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*', but it is impossible to select State Transfer donor: Resource temporarily unavailable

2016-03-21 23:48:38 140077095184128 [Note] WSREP: 2.2 (mariadb-01): State transfer to 0.2 (mariadb-03) complete.

2016-03-21 23:48:38 140077095184128 [Note] WSREP: Member 2.2 (mariadb-01) synced with group.

2016-03-21 23:48:39 140077095184128 [Note] WSREP: Member 1.2 (mariadb-02) requested state transfer from '*any*'. Selected 2.2 (mariadb-01)(SYNCED) as donor.

2016-03-21 23:48:39 140077095184128 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 45386)

2016-03-21 23:48:39 140082886372096 [Note] WSREP: Requesting state transfer: success after 80 tries, donor: 2

2016-03-21 23:49:02 140077103576832 [Note] WSREP: (18e4fe19, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.100.22.53:4567

2016-03-21 23:49:02 140077103576832 [Note] WSREP: declaring b0cafdc9 at tcp://10.100.22.51:4567 stable

2016-03-21 23:49:02 140077103576832 [Note] WSREP: declaring d4ba184b at tcp://10.100.22.54:4567 stable

2016-03-21 23:49:02 140077103576832 [Note] WSREP: forgetting 01c16e15 (tcp://10.100.22.53:4567)

2016-03-21 23:49:02 140077103576832 [Note] WSREP: (18e4fe19, 'tcp://0.0.0.0:4567') turning message relay requesting off

2016-03-21 23:49:03 140077103576832 [Note] WSREP: Node 18e4fe19 state prim

2016-03-21 23:49:03 140077103576832 [Note] WSREP: view(view_id(PRIM,18e4fe19,123) memb {

        18e4fe19,2

        b0cafdc9,2

        d4ba184b,0

} joined {

} left {

} partitioned {

        01c16e15,2

})

2016-03-21 23:49:03 140077103576832 [Note] WSREP: save pc into disk

2016-03-21 23:49:03 140077103576832 [Note] WSREP: forgetting 01c16e15 (tcp://10.100.22.53:4567)

2016-03-21 23:49:03 140077095184128 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 3

2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 58c1a508-efa6-11e5-af34-e3ae0409c024

2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: sent state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024

2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 0 (mariadb-02)

2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 2 (garb)

2016-03-21 23:49:03 140077095184128 [Note] WSREP: STATE EXCHANGE: got state msg: 58c1a508-efa6-11e5-af34-e3ae0409c024 from 1 (mariadb-01)

2016-03-21 23:49:03 140077095184128 [Note] WSREP: Quorum results:

        version    = 3,

        component  = PRIMARY,

        conf_id    = 71,

        members    = 2/3 (joined/total),

        act_id     = 45815,

        last_appl. = 45683,

        protocols  = 0/7/3 (gcs/repl/appl),

        group UUID = 3f92b3c2-ef9c-11e5-ad47-93c5357ece23

2016-03-21 23:49:03 140077095184128 [Note] WSREP: Flow-control interval: [28, 28]

2016-03-21 23:49:08 140077103576832 [Note] WSREP:  cleaning up 01c16e15 (tcp://10.100.22.53:4567)

2016-03-21 23:49:56 140077095184128 [Note] WSREP: 1.2 (mariadb-01): State transfer to 0.2 (mariadb-02) complete.

2016-03-21 23:49:56 140077095184128 [Note] WSREP: Member 1.2 (mariadb-01) synced with group.

2016-03-21 23:50:15 140082886686656 [Note] WSREP: SST complete, seqno: 45838

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using mutexes to ref count buffer pool pages

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: The InnoDB memory heap is disabled

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Memory barrier is not used

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Compressed tables use zlib 1.2.8

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using Linux native AIO

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Using SSE crc32 instructions

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Initializing buffer pool, size = 512.0M

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Completed initialization of buffer pool

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Highest supported file format is Barracuda.

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: 128 rollback segment(s) are active.

2016-03-21 23:50:16 140082886686656 [Note] InnoDB: Waiting for purge to start

2016-03-21 23:50:17 140082886686656 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.28-76.1 started; log sequence number 74594461331

2016-03-21 23:50:17 140076138858240 [Note] InnoDB: Dumping buffer pool(s) not yet started

2016-03-21 23:50:17 140082886686656 [Note] Plugin 'FEEDBACK' is disabled.

2016-03-21 23:50:17 140082886686656 [Note] Recovering after a crash using /var/log/mysql/mariadb-bin

2016-03-21 23:50:17 140082886686656 [Note] Starting crash recovery...

2016-03-21 23:50:17 140082886686656 [Note] Crash recovery finished.

2016-03-21 23:50:17 140082886686656 [Note] Server socket created on IP: '0.0.0.0'.

2016-03-21 23:50:17 140082885253888 [Note] /usr/sbin/mysqld: Normal shutdown

2016-03-21 23:50:17 140082885253888 [Note] WSREP: Stop replication

2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing send monitor...

2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closed send monitor.

2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: terminating thread

2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: joining thread

2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: closing backend

2016-03-21 23:50:17 140082886686656 [Warning] 'user' entry 'root@mariadb-01' ignored in --skip-name-resolve mode.

2016-03-21 23:50:17 140082886686656 [Warning] 'proxies_priv' entry '@% root@mariadb-01' ignored in --skip-name-resolve mode.

2016-03-21 23:50:17 140082885253888 [Note] WSREP: view(view_id(NON_PRIM,18e4fe19,123) memb {

        18e4fe19,2

} joined {

} left {

} partitioned {

        b0cafdc9,2

        d4ba184b,0

})

2016-03-21 23:50:17 140082885253888 [Note] WSREP: view((empty))

2016-03-21 23:50:17 140077095184128 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1

2016-03-21 23:50:17 140082885253888 [Note] WSREP: gcomm: closed

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Flow-control interval: [16, 16]

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received NON-PRIMARY.

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Shifting JOINER -> OPEN (TO: 45889)

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received self-leave message.

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Flow-control interval: [0, 0]

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Received SELF-LEAVE. Closing connection.

2016-03-21 23:50:17 140077095184128 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 45889)

2016-03-21 23:50:17 140077095184128 [Note] WSREP: RECV thread exiting 0: Success

2016-03-21 23:50:17 140082885253888 [Note] WSREP: recv_thread() joined.

2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing replication queue.

2016-03-21 23:50:17 140082885253888 [Note] WSREP: Closing slave action queue.

2016-03-21 23:50:17 140082886686656 [Note] WSREP: Signalling provider to continue.

2016-03-21 23:50:17 140082833533696 [Note] Event Scheduler: scheduler thread started with id 4

2016-03-21 23:50:17 140082886686656 [Note] WSREP: SST received: 3f92b3c2-ef9c-11e5-ad47-93c5357ece23:45838

2016-03-21 23:50:17 140082886686656 [ERROR] WSREP: Trying to launch slave threads before creating connection at 'gcomm://10.100.22.54,10.100.22.53,10.100.22.51'

mysqld: /home/buildbot/buildbot/build/mariadb-10.1.12/sql/wsrep_thd.cc:401: void wsrep_create_appliers(long int): Assertion `0' failed.

160321 23:50:17 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.12-MariaDB-1~jessie

key_buffer_size=67108864

read_buffer_size=33554432

max_used_connections=0

max_threads=1026

thread_count=3

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 67326468 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).

stack_bottom = 0x0 thread_stack 0x48400

2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: Failed to JOIN the cluster after SST

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43139, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43266, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43395, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43521, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43650, -77 (File descriptor in bad state)

/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5630f4313a3e]

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43778, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 43904, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44036, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44160, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44290, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44418, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44547, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44674, -77 (File descriptor in bad state)

/usr/sbin/mysqld(handle_fatal_signal+0x34d)[0x5630f3e55cbd]

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44801, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 44931, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45058, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45185, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45312, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45440, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45522, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45650, -77 (File descriptor in bad state)

/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f679671c8d0]

2016-03-21 23:50:17 140077157435136 [Warning] WSREP: Failed to report last committed 45779, -77 (File descriptor in bad state)

2016-03-21 23:50:17 140082886372096 [Note] WSREP: New cluster view: global state: 3f92b3c2-ef9c-11e5-ad47-93c5357ece23:45815, view# 72: Primary, number of nodes: 3, my index: 0, protocol version 3

2016-03-21 23:50:17 140082886372096 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.

2016-03-21 23:50:17 140082886372096 [Note] WSREP: REPL Protocols: 7 (3, 2)

2016-03-21 23:50:17 140077157435136 [Note] WSREP: Service thread queue flushed.

2016-03-21 23:50:17 140082886372096 [Note] WSREP: Assign initial position for certification: 45815, protocol version: 3

2016-03-21 23:50:17 140077157435136 [Note] WSREP: Service thread queue flushed.

2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: gcs/src/gcs.cpp:_join():810: Sending JOIN failed: -103 (Software caused connection abort).

2016-03-21 23:50:17 140082886372096 [ERROR] WSREP: Failed to JOIN the cluster after SST

/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f679480d067]

/lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f679480e448]

/lib/x86_64-linux-gnu/libc.so.6(+0x2e266)[0x7f6794806266]

/lib/x86_64-linux-gnu/libc.so.6(+0x2e312)[0x7f6794806312]

/usr/sbin/mysqld(+0x552592)[0x5630f3e06592]

/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1484)[0x5630f3c41044]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f67947f9b45]

/usr/sbin/mysqld(+0x38007d)[0x5630f3c3407d]

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

Attachments

Issue Links

is duplicated by

MDEV-10021 Joiner node crashes badly after successful SST

Closed

Activity

Ascending order - Click to sort in descending order

Nirbhay Choubey (Inactive) added a comment - 2016-04-04 20:12

sys.admin Does it happen in a scenario where multiple joiner are contenting to join at the same time? Could you share the configurations?

Nirbhay Choubey (Inactive) added a comment - 2016-04-04 20:12 sys.admin Does it happen in a scenario where multiple joiner are contenting to join at the same time? Could you share the configurations?

Matyashov Andrey added a comment - 2016-04-04 20:27

My config:

[client]

max_allowed_packet      = 128M

port                    = 3306

socket                  = /var/run/mysqld/mysqld.sock

[mysqld_safe]

socket                  = /var/run/mysqld/mysqld.sock

nice                    = 0

[mysqld]

user                    = mysql

pid-file                = /var/run/mysqld/mysqld.pid

socket                  = /var/run/mysqld/mysqld.sock

port                    = 3306

basedir                 = /usr

datadir                 = /var/lib/mysql

tmpdir                  = /tmp

lc_messages_dir         = /usr/share/mysql

lc_messages             = en_US

skip-external-locking   = 1

collation-server        = utf8_general_ci

init-connect            = 'SET NAMES utf8'

character-set-server    = utf8

bind-address            = 0.0.0.0

net_write_timeout       = 300

skip-name-resolve       = yes

max_connections         = 128

connect_timeout         = 5

wait_timeout            = 600

max_allowed_packet      = 128M

thread_cache_size       = 128

sort_buffer_size        = 16M

bulk_insert_buffer_size = 64M

tmp_table_size          = 128M

max_heap_table_size     = 128M

myisam_recover_options  = BACKUP

key_buffer_size         = 32M

table_open_cache        = 400

myisam_sort_buffer_size = 512M

concurrent_insert       = 2

read_buffer_size        = 32M

read_rnd_buffer_size    = 16M

query_cache_limit       = 32M

query_cache_size        = 256M

general_log_file        = /var/log/mysql/mysql.log

general_log             = 0

log-error               = /var/log/mysql/mysql_err.log

log_warnings            = 2

slow_query_log=0

slow_query_log_file     = /var/log/mysql/mariadb-slow.log

long_query_time         = 10

log_slow_verbosity      = query_plan

server-id               = 1

report_host             = mariadb-01

log_bin                 = /var/log/mysql/mariadb-bin

log_bin_index           = /var/log/mysql/mariadb-bin.index

event_scheduler         = on

sync_binlog             = 0

expire_logs_days        = 32

max_binlog_size         = 1024M

relay_log              = /var/log/mysql/relay-bin

relay_log_index        = /var/log/mysql/relay-bin.index

relay_log_info_file    = /var/log/mysql/relay-bin.info

log_slave_updates       = 1

slave_type_conversions  = ALL_NON_LOSSY

default_storage_engine  = InnoDB

innodb_log_file_size    = 512M

innodb_buffer_pool_size = 512M

innodb_log_buffer_size  = 64M

innodb_file_per_table   = 1

innodb_doublewrite      = 1

innodb_open_files       = 2400

innodb_io_capacity      = 2400

innodb_flush_method     = O_DIRECT

innodb_file_format      = barracuda

innodb-data-file-path   = ibdata1:10M:autoextend

[galera]

wsrep_on                = ON

wsrep_provider          = /usr/lib/libgalera_smm.so

wsrep_node_address      = 10.100.22.51

wsrep_cluster_address   = gcomm://10.100.22.54,10.100.22.53,10.100.22.52

wsrep_sst_auth          = root:MY_SECRET

wsrep_cluster_name      = mariadb-cluster

wsrep_node_name         = mariadb-01

binlog_format           = row

default_storage_engine  = InnoDB

innodb_autoinc_lock_mode= 2

wsrep_sst_method        = xtrabackup-v2

wsrep_retry_autocommit  = 3

wsrep_restart_slave     = On

wsrep_debug             = Off

wsrep_auto_increment_control = Off

wsrep_provider_options  = "gcache.size=5G;repl.commit_order=1;gmcast.segment=2;pc.recovery=TRUE;debug=no;gcs.fc_debug=0;pc.wait_prim=true"

wsrep_replicate_myisam  = On

max_allowed_packet      = 128M

bind-address            = 0.0.0.0

wsrep_slave_threads     = 32

innodb_flush_log_at_trx_commit = 1

[mysqldump]

quick

quote-names

max_allowed_packet      = 128M

[mysql]

[isamchk]

key_buffer              = 128M

!includedir /etc/mysql/conf.d/

Matyashov Andrey added a comment - 2016-04-04 20:27 My config: [client] max_allowed_packet = 128M port = 3306 socket = /var/run/mysqld/mysqld.sock [mysqld_safe] socket = /var/run/mysqld/mysqld.sock nice = 0 [mysqld] user = mysql pid-file = /var/run/mysqld/mysqld.pid socket = /var/run/mysqld/mysqld.sock port = 3306 basedir = /usr datadir = /var/lib/mysql tmpdir = /tmp lc_messages_dir = /usr/share/mysql lc_messages = en_US skip-external-locking = 1 collation-server = utf8_general_ci init-connect = 'SET NAMES utf8' character-set-server = utf8 bind-address = 0.0 . 0.0 net_write_timeout = 300 skip-name-resolve = yes max_connections = 128 connect_timeout = 5 wait_timeout = 600 max_allowed_packet = 128M thread_cache_size = 128 sort_buffer_size = 16M bulk_insert_buffer_size = 64M tmp_table_size = 128M max_heap_table_size = 128M myisam_recover_options = BACKUP key_buffer_size = 32M table_open_cache = 400 myisam_sort_buffer_size = 512M concurrent_insert = 2 read_buffer_size = 32M read_rnd_buffer_size = 16M query_cache_limit = 32M query_cache_size = 256M general_log_file = /var/log/mysql/mysql.log general_log = 0 log-error = /var/log/mysql/mysql_err.log log_warnings = 2 slow_query_log= 0 slow_query_log_file = /var/log/mysql/mariadb-slow.log long_query_time = 10 log_slow_verbosity = query_plan server-id = 1 report_host = mariadb- 01 log_bin = /var/log/mysql/mariadb-bin log_bin_index = /var/log/mysql/mariadb-bin.index event_scheduler = on sync_binlog = 0 expire_logs_days = 32 max_binlog_size = 1024M relay_log = /var/log/mysql/relay-bin relay_log_index = /var/log/mysql/relay-bin.index relay_log_info_file = /var/log/mysql/relay-bin.info log_slave_updates = 1 slave_type_conversions = ALL_NON_LOSSY default_storage_engine = InnoDB innodb_log_file_size = 512M innodb_buffer_pool_size = 512M innodb_log_buffer_size = 64M innodb_file_per_table = 1 innodb_doublewrite = 1 innodb_open_files = 2400 innodb_io_capacity = 2400 innodb_flush_method = O_DIRECT innodb_file_format = barracuda innodb-data-file-path = ibdata1:10M:autoextend [galera] wsrep_on = ON wsrep_provider = /usr/lib/libgalera_smm.so wsrep_node_address = 10.100 . 22.51 wsrep_cluster_address = gcomm: //10.100.22.54,10.100.22.53,10.100.22.52 wsrep_sst_auth = root:MY_SECRET wsrep_cluster_name = mariadb-cluster wsrep_node_name = mariadb- 01 binlog_format = row default_storage_engine = InnoDB innodb_autoinc_lock_mode= 2 wsrep_sst_method = xtrabackup-v2 wsrep_retry_autocommit = 3 wsrep_restart_slave = On wsrep_debug = Off wsrep_auto_increment_control = Off wsrep_provider_options = "gcache.size=5G;repl.commit_order=1;gmcast.segment=2;pc.recovery=TRUE;debug=no;gcs.fc_debug=0;pc.wait_prim=true" wsrep_replicate_myisam = On max_allowed_packet = 128M bind-address = 0.0 . 0.0 wsrep_slave_threads = 32 innodb_flush_log_at_trx_commit = 1 [mysqldump] quick quote-names max_allowed_packet = 128M [mysql] [isamchk] key_buffer = 128M !includedir /etc/mysql/conf.d/

Andrew Garner added a comment - 2016-04-22 16:31

I just ran into this by accident last night in a test environment. This seemed to be caused by the service timing out during startup and systemd sends a SIGTERM. SST finished up but mysqld immediately starts the shutdown process afterwards. This seems to shutdown the wsrep connections (I suppose in the shutdown thread) and when the startup bits finish trickling through to wsrep_create_appliers we hit this assertion. This seems to line up with the logs in this issue where we see an immediate shutdown starting after the initial startup.

Avoiding was fairly easy by setting a sufficiently high TimeoutStartSec for the mariadb service. I was able to reproduce this multiple times this morning by forcing SST and setting TimeoutStartSec rather low. If it is useful I can provide logs or other debugging info from my environment.

Andrew Garner added a comment - 2016-04-22 16:31 I just ran into this by accident last night in a test environment. This seemed to be caused by the service timing out during startup and systemd sends a SIGTERM. SST finished up but mysqld immediately starts the shutdown process afterwards. This seems to shutdown the wsrep connections (I suppose in the shutdown thread) and when the startup bits finish trickling through to wsrep_create_appliers we hit this assertion. This seems to line up with the logs in this issue where we see an immediate shutdown starting after the initial startup. Avoiding was fairly easy by setting a sufficiently high TimeoutStartSec for the mariadb service. I was able to reproduce this multiple times this morning by forcing SST and setting TimeoutStartSec rather low. If it is useful I can provide logs or other debugging info from my environment.

Maiavria added a comment - 2016-05-04 10:36

Thanks Andrew Garner, this fixed my problem too.

I think this should be fixed in the next release, because when starting a node of a cluster it may always need to synchronize with other nodes and thus take from 1 to 20 or even 40 minutes depending of the amount of data.

Maiavria added a comment - 2016-05-04 10:36 Thanks Andrew Garner, this fixed my problem too. I think this should be fixed in the next release, because when starting a node of a cluster it may always need to synchronize with other nodes and thus take from 1 to 20 or even 40 minutes depending of the amount of data.

Nirbhay Choubey (Inactive) added a comment - 2016-06-03 23:13

This could happen when systemd service timeout for mariadb is not sufficiently large.

Nirbhay Choubey (Inactive) added a comment - 2016-06-03 23:13 This could happen when systemd service timeout for mariadb is not sufficiently large.

Alexander Chernov added a comment - 2016-11-10 12:25

Just a heads up I am still seeing this behavior on 10.1.18-MariaDB-1~jessie (ticket is reporting 10.1.15 as fixed)

Alexander Chernov added a comment - 2016-11-10 12:25 Just a heads up I am still seeing this behavior on 10.1.18-MariaDB-1~jessie (ticket is reporting 10.1.15 as fixed)

Nirbhay Choubey (Inactive) added a comment - 2016-11-10 14:58

alekc This was closed as Won't Fix. You'll need to increase the TimeoutStartSec.

Nirbhay Choubey (Inactive) added a comment - 2016-11-10 14:58 alekc This was closed as Won't Fix. You'll need to increase the TimeoutStartSec.

People

Assignee:: Nirbhay Choubey (Inactive)

Reporter:: Matyashov Andrey

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2016-03-21 21:04

Updated:: 2016-11-10 20:00

Resolved:: 2016-06-03 23:13

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server