[MDEV-23981] Persistent connections prevents Galera node to stop/restart Created: 2020-10-19  Updated: 2021-12-23

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.15
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: acsfer Assignee: Ramesh Sivaraman
Resolution: Unresolved Votes: 0
Labels: None


 Description   

Maxscale 2.4.13 (readwritesplit)
Nodes: MariaDB 10.4.15

Enable persistent connections to backend server:

persistpoolmax=16
persistmaxtime=30s

# mysqladmin shutdown

 
[Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
[Note] WSREP: Shutdown replication
[Note] WSREP: Server status change synced -> disconnecting
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: Closing send monitor...
[Note] WSREP: Closed send monitor.
[Note] WSREP: gcomm: terminating thread
[Note] WSREP: gcomm: joining thread
[Note] WSREP: gcomm: closing backend
[Note] WSREP: view(view_id(NON_PRIM,7176b6f5-97a8,171) memb {
        bd53962a-b45c,0
} joined {
} left {
} partitioned {
        7176b6f5-97a8,0
        a6560dab-8685,0
})
[Note] WSREP: PC protocol downgrade 1 -> 0
[Note] WSREP: view((empty))
[Note] WSREP: gcomm: closed
[Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
[Note] WSREP: Flow-control interval: [253, 256]
[Note] WSREP: Received NON-PRIMARY.
[Note] WSREP: Shifting SYNCED -> OPEN (TO: 344783)
[Note] WSREP: New SELF-LEAVE.
[Note] WSREP: Flow-control interval: [253, 256]
[Note] WSREP: Received SELF-LEAVE. Closing connection.
[Note] WSREP: Shifting OPEN -> CLOSED (TO: 344783)
[Note] WSREP: RECV thread exiting 0: Success
[Note] WSREP: ================================================
View:
  id: 692f8bd3-faca-11ea-ae47-93bd97be4ac8:344783
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(1):
        0: bd53962a-fc37-11ea-b45c-cad97f2906c2, sql3
=================================================
[Note] WSREP: Non-primary view
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: recv_thread() joined.
[Note] WSREP: Closing replication queue.
[Note] WSREP: Closing slave action queue.
[Note] WSREP: ================================================
View:
  id: 692f8bd3-faca-11ea-ae47-93bd97be4ac8:344783
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: yes
  own_index: -1
  members(0):
=================================================
[Note] WSREP: Non-primary view
[Note] WSREP: Server status change disconnecting -> disconnected
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
[Note] WSREP: Applier thread exiting ret: 0 thd: 2
[Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
[Note] WSREP: killing local connection: 239119
[Note] WSREP: killing local connection: 142392

15 minutes after, systemctl status mariadb outputs this

systemd: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.

and mysqladmin shutdown command got stuck, no more output at the logs, so issued a CTRL+C, it outputs this:

Warning;  Aborted waiting on pid file: '/var/run/mysqld/mysqld.pid' after 1617 seconds

Persmissions check:

ls -l /var/run/mysqld
total 4
-rw-rw---- 1 mysql mysql 5 Sep 21 20:24 mysqld.pid
srwxrwxrwx 1 mysql mysql 0 Sep 21 20:24 mysqld.sock

and finaly:

systemd: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
systemd: mariadb.service: Failed with result 'timeout'.

Any attempt to start it will fail( before issue a kill -9 PID):

systemd: mariadb.service: Found left-over process 7463 (mysqld) in control group while starting unit. Ignoring.
systemd: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
systemd: mariadb.service: Will not start SendSIGKILL=no service of type KillMode=control-group or mixed while processes exist
systemd: mariadb.service: Failed to run 'start-pre' task: Device or resource busy
systemd: mariadb.service: Failed with result 'resources'.
systemd: Failed to start MariaDB 10.4.15 database server.

Same behavior on the 5 nodes of our cluster, no matter if they are assigned as master or slave by maxscale.


Generated at Thu Feb 08 09:26:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.