Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.1.26
-
Debian Jessie
Package Version 10.1.26+maria-1~jessie
Description
Hi there,
we are running a MariaDB Galera Cluster on Debian Jessie and we are sometimes experiencing the issue analyzed here: https://serverfault.com/a/715940/115361
I suspect it happens when systemd-journald is being restarted while everything else is running.
The immediately visible hint to that is when systemctl status mariadb.service on a node does not show any recent logging output anymore (although there should be some).
mysqld is still running normally, accepting queries and replicating to the other nodes, but an SST that tries to use such a defective node as donor will always fail with Will never receive state. Need to abort.
Restarting mariadb.service on the defective donor node fixes the issue until the logger process dies the next time.
mariadb-service-convert created the dropin configuration file to make mysqld log directly to syslog:
StandardOutput=syslog
StandardError=syslog
SyslogFacility=daemon
SyslogLevel=err
SyslogIdentifier=mysqld
I'd be happy to assist with more information to solve this issue soon.
Attachments
Activity
strace shows:
14229 write(2, "2017-08-15 10:47:26 140592914888448 [Note] WSREP: (5c7243d7, 'tcp://0.0.0.0:4567') connection established to a7a45725 tcp://127.0.0.1:4569\n", 139) = -1 EPIPE (Broken pipe)
|
Related external bugs :
https://bugs.freedesktop.org/show_bug.cgi?id=84923
https://github.com/systemd/systemd/issues/6620
https://github.com/codership/galera/issues/478
Workaround is to avoid syslog for MariaDB Server logging :
Make sure that Error log is enabled in configuration files and syslog is disabled, e.g.
[mysqld_safe]
|
skip_syslog
|
[mysqld]
|
log_error=/your/path/error.log
|
With configuration above original error is not shown and journal restart doesn't affect sst
Since the error is caused by external components (systemd and codership/galera), I am going to close this ticket , unless any objections.
Please contribute / comment to provided links above, unless any justification present that the problem may be fixed on Server part.
I am not sure at the moment why the problem reproduces only in Debian (including buster with the most recent systemd 234), while e.g. on Fedora 25/26 with systemd 233/231 and Centos 7 with older systemd 219-30 the worst behavior is just mariadb service becomes silent.
Anyway - it looks that nothing can be done on MariaDB Server side besides suggested workaround.
Hey @Andrii Nikitin,
sorry that I can only answer now!
Thanks for investigating the issue and even properly reporting it to upstream.
So as a followup to your last comment, we should add that systemctl restart systemd-jounald.service is different from systemctl stop systemd-jounald.service; systemctl start systemd-jounald.service (at least for Systemd versions >= 219), right?
Then I would suspect that the other possible workaround for Debian Jessie systems would be to install
systemd from jessie-backports (the fix that Lennart mentions in the freedesktop.org bug, where journald "stores" its file descriptors in PID 1 on restart is available since version 219, I think):
Installed: 215-17+deb8u7
|
Candidate: 215-17+deb8u7
|
Version table:
|
230-7~bpo8+2 0
|
600 http://ftp.uni-mainz.de/debian/ jessie-backports/main amd64 Packages
|
600 http://ftp.de.debian.org/debian/ jessie-backports/main amd64 Packages
|
*** 215-17+deb8u7 0
|
700 http://ftp.uni-mainz.de/debian/ jessie/main amd64 Packages
|
700 http://ftp.de.debian.org/debian/ jessie/main amd64 Packages
|
100 /var/lib/dpkg/status
|
I understand that there is not all too much that you can fix there from MariaDB Server PoV, but since the configuration is currently the default on a new installation of MariaDB on Debian, I think it would be nice if you could at least add a prominent warning to README or the config file (/etc/mysql/conf.d/mysqld_safe_syslog.cnf maybe) that this can happen.
Regards,
Moritz
Yes in my understanding 'restart' should be safer than 'stop&start' , at least in recent versions. (because https://github.com/systemd/systemd/issues/6620 should not trigger after 'restart' ).
But on practice I see that mariadb service becomes silent after restart even on systemd (230-7~bpo8+2) on jessie .
But good news is that I can confirm that the problem doesn't occur with my steps when using both 'stop' or 'restart' with your suggested workaround (i.e. SST is performed properly, just service becomes silent anyway):
sudo apt-get -t jessie-backports install systemd |
$ apt-cache policy systemd
|
systemd:
|
Installed: 215-17
|
Candidate: 215-17+deb8u7
|
Version table:
|
230-7~bpo8+2 0
|
100 http://ftp.debian.org/debian/ jessie-backports/main amd64 Packages
|
215-17+deb8u7 0
|
500 http://debian.osuosl.org/debian/ jessie/main amd64 Packages
|
*** 215-17 0
|
100 /var/lib/dpkg/status
|
$ sudo apt-get -t jessie-backports install systemd
|
...
|
$ git clone http://github.com/AndriiNikitin/mariadb-environs
|
...
|
$ ./replant.sh m0-system
|
$ sudo m0-system/install.sh 10.1
|
...
|
$ sudo vim /etc/mysql/my.cnf
|
$ ls /usr/lib/galera/
|
libgalera_smm.so
|
$ sudo systemctl stop mariadb
|
$ sudo galera_new_cluster
|
$ mysql -uroot -e 'show variables like "wsrep_on"'
|
+---------------+-------+
|
| Variable_name | Value |
|
+---------------+-------+
|
| wsrep_on | ON |
|
+---------------+-------+
|
$ ./replant.sh m2-system2
|
$ m2*/gen_cnf.sh
|
$ m2*/install_db.sh
|
$ cat << EOF >> m2*/mysqldextra.cnf
|
> binlog_format=ROW
|
> default-storage-engine=innodb
|
> innodb_autoinc_lock_mode=2
|
> bind-address=0.0.0.0
|
> wsrep_on=ON
|
> wsrep_provider=/usr/lib/galera/libgalera_smm.so
|
> wsrep_cluster_name=t1
|
> wsrep_cluster_address=gcomm://127.0.0.1:4567
|
> wsrep_sst_method=rsync
|
> wsrep_node_address=127.0.0.1:4569
|
> wsrep_node_name=127.0.0.1_4569
|
> EOF
|
$ mysqld_safe --defaults-file=m2-system2/my.cnf --user=$(whoami) &
|
[1] 27407
|
$ mysql --defaults-file=m2-system2/my.cnf -e 'show status like "wsrep_cluster_size"'
|
+--------------------+-------+
|
| Variable_name | Value |
|
+--------------------+-------+
|
| wsrep_cluster_size | 2 |
|
+--------------------+-------+
|
$ m2*/shutdown.sh
|
[1]+ Done mysqld_safe --defaults-file=m2-system2/my.cnf --user=$(whoami)
|
$ rm -rf m2*/dt/*
|
$ m2*/install_db.sh
|
$ sudo systemctl restart systemd-journald
|
$ sudo systemctl status mariadb
|
● mariadb.service - MariaDB database server
|
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
|
Drop-In: /etc/systemd/system/mariadb.service.d
|
└─migrated-from-my.cnf-settings.conf
|
Active: active (running) since Fri 2017-08-18 10:26:46 EDT; 6min ago
|
Process: 28088 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Process: 28086 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
|
Process: 27909 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-env
|
Process: 27905 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Process: 27902 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
|
Main PID: 28053 (mysqld)
|
Status: "Taking your SQL requests now..."
|
CGroup: /system.slice/mariadb.service
|
└─28053 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=afe4dfc2-8420-11e7-b8c2-aa002665b1b9:0
|
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: group UUID = afe4dfc2-8420-11e7-b8c2-aa002665b1b9
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203415037696 [Note] WSREP: Flow-control interval: [16, 16]
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: New cluster view: global state: afe4dfc2-8420-11e7-b8c
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: Assign initial position for certification: 0, protocol
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203473073920 [Note] WSREP: Service thread queue flushed.
|
Aug 18 10:31:27 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:27 140203423430400 [Note] WSREP: (436428c9, 'tcp://0.0.0.0:4567') connection establishe
|
Aug 18 10:31:27 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:27 140203423430400 [Warning] WSREP: discarding established (time wait) 4e987697 (tcp://
|
Aug 18 10:31:28 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:28 140203423430400 [Note] WSREP: cleaning up 4e987697 (tcp://127.0.0.1:4569)
|
$ mysqld_safe --defaults-file=m2-system2/my.cnf --user=$(whoami) &
|
[1] 28557
|
$ mysql --defaults-file=m2-system2/my.cnf -e 'show status like "wsrep_cluster_size"'
|
+--------------------+-------+
|
| Variable_name | Value |
|
+--------------------+-------+
|
| wsrep_cluster_size | 2 |
|
+--------------------+-------+
|
$ sudo systemctl status mariadb
|
● mariadb.service - MariaDB database server
|
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
|
Drop-In: /etc/systemd/system/mariadb.service.d
|
└─migrated-from-my.cnf-settings.conf
|
Active: active (running) since Fri 2017-08-18 10:26:46 EDT; 8min ago
|
Process: 28088 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Process: 28086 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
|
Process: 27909 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-env
|
Process: 27905 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Process: 27902 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
|
Main PID: 28053 (mysqld)
|
Status: "Taking your SQL requests now..."
|
CGroup: /system.slice/mariadb.service
|
└─28053 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=afe4dfc2-8420-11e7-b8c2-aa002665b1b9:0
|
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: group UUID = afe4dfc2-8420-11e7-b8c2-aa002665b1b9
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203415037696 [Note] WSREP: Flow-control interval: [16, 16]
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: New cluster view: global state: afe4dfc2-8420-11e7-b8c
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: Assign initial position for certification: 0, protocol
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203473073920 [Note] WSREP: Service thread queue flushed.
|
Aug 18 10:31:27 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:27 140203423430400 [Note] WSREP: (436428c9, 'tcp://0.0.0.0:4567') connection establishe
|
Aug 18 10:31:27 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:27 140203423430400 [Warning] WSREP: discarding established (time wait) 4e987697 (tcp://
|
Aug 18 10:31:28 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:28 140203423430400 [Note] WSREP: cleaning up 4e987697 (tcp://127.0.0.1:4569)
|
$ m2*/shutdown.sh
|
[1]+ Done mysqld_safe --defaults-file=m2-system2/my.cnf --user=$(whoami)
|
$ rm -rf m2*/dt/*
|
$ m2*/install_db.sh
|
$ sudo systemctl stop systemd-journald
|
$ mysqld_safe --defaults-file=m2-system2/my.cnf --user=$(whoami) &
|
[1] 29020
|
$ mysql --defaults-file=m2-system2/my.cnf -e 'show status like "wsrep_cluster_size"'
|
+--------------------+-------+
|
| Variable_name | Value |
|
+--------------------+-------+
|
| wsrep_cluster_size | 2 |
|
+--------------------+-------+
|
$ sudo systemctl status mariadb
|
● mariadb.service - MariaDB database server
|
Loaded: loaded (/lib/systemd/system/mariadb.service; enabled; vendor preset: enabled)
|
Drop-In: /etc/systemd/system/mariadb.service.d
|
└─migrated-from-my.cnf-settings.conf
|
Active: active (running) since Fri 2017-08-18 10:26:46 EDT; 11min ago
|
Process: 28088 ExecStartPost=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Process: 28086 ExecStartPost=/etc/mysql/debian-start (code=exited, status=0/SUCCESS)
|
Process: 27909 ExecStartPre=/bin/sh -c [ ! -e /usr/bin/galera_recovery ] && VAR= || VAR=`/usr/bin/galera_recovery`; [ $? -eq 0 ] && systemctl set-envi
|
Process: 27905 ExecStartPre=/bin/sh -c systemctl unset-environment _WSREP_START_POSITION (code=exited, status=0/SUCCESS)
|
Process: 27902 ExecStartPre=/usr/bin/install -m 755 -o mysql -g root -d /var/run/mysqld (code=exited, status=0/SUCCESS)
|
Main PID: 28053 (mysqld)
|
Status: "Taking your SQL requests now..."
|
CGroup: /system.slice/mariadb.service
|
└─28053 /usr/sbin/mysqld --wsrep-new-cluster --wsrep_start_position=afe4dfc2-8420-11e7-b8c2-aa002665b1b9:0
|
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: group UUID = afe4dfc2-8420-11e7-b8c2-aa002665b1b9
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203415037696 [Note] WSREP: Flow-control interval: [16, 16]
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: New cluster view: global state: afe4dfc2-8420-11e7-b8c2
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203716564736 [Note] WSREP: Assign initial position for certification: 0, protocol
|
Aug 18 10:31:23 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:23 140203473073920 [Note] WSREP: Service thread queue flushed.
|
Aug 18 10:31:27 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:27 140203423430400 [Note] WSREP: (436428c9, 'tcp://0.0.0.0:4567') connection established
|
Aug 18 10:31:27 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:27 140203423430400 [Warning] WSREP: discarding established (time wait) 4e987697 (tcp://1
|
Aug 18 10:31:28 debian-jessie-amd64 mysqld[28053]: 2017-08-18 10:31:28 140203423430400 [Note] WSREP: cleaning up 4e987697 (tcp://127.0.0.1:4569)
|
The problem has been verified in 10.1 on Jessie and Stretch using steps below:
1. Setup Environs framework for easy local cluster management
2. (optional) Install 10.1, unless already installed
# danger !!! will wipe out datadir
# run these commands with sudo to wipe out current installation:
3. Configure new cluster
Make sure that following galera configuration is in my.cnf
[galera]
wsrep_on=ON
bind-address=0.0.0.0
wsrep_cluster_name=t1
binlog_format=row
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
wsrep_node_address=127.0.0.1
wsrep_node_name=n0
4. Start new cluster and make sure it is started properly:
# this must show ON
5. configure another local instance on 3308
binlog_format=ROW
default-storage-engine=innodb
innodb_autoinc_lock_mode=2
bind-address=0.0.0.0
wsrep_on=ON
wsrep_cluster_name=t1
wsrep_node_address=127.0.0.1:4569
wsrep_node_name=127.0.0.1_4569
EOF
6. Join new instance to the cluster, observe that it joined properly
# must show ON
# must show 2
7. Shutdown joiner, cleanup-reset it
8. (optional) repeat step 6 and 7 many times to confirm it joins every time without problem
9. restart journal service
10. try step 6 again, observe join failure and following error in m2-system2/dt/error.log