[MXS-3203] Persistent connections prevents Galera node to stop/restart Created: 2020-09-21  Updated: 2020-10-07  Resolved: 2020-10-07

Status: Closed
Project: MariaDB MaxScale
Component/s: galeramon, mariadbbackend
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: acsfer Assignee: markus makela
Resolution: Not a Bug Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-23605 galera_3nodes.galera_vote_rejoin_mysq... Closed
Sprint: MXS-SPRINT-116

 Description   
  • Maxscale 2.4.14
  • Backend: MariaDB 10.4.14

Enable persistent connections to backend server:

persistpoolmax=32
persistmaxtime=3600s

Issue a systemctl restart mariadb on the server. It hangs at "Stopping MariaDB server" (possibly until 'persistmaxtime' expires)?

Disable persistence (comment out parameters), server restart takes just a few seconds.



 Comments   
Comment by markus makela [ 2020-09-22 ]

Does the MariaDB server wait for open connections to close? If it does, this seems like something you have to configure on the MariaDB side of things or, as you said, reduce persistmaxtime to a more acceptable level.

Does the SHUTDOWN SQL command cause the same behavior to occur?

Comment by acsfer [ 2020-09-23 ]

# mysqladmin shutdown

 
2020-09-23 15:24:05 0 [Note] /usr/sbin/mysqld (initiated by: root[root] @ localhost []): Normal shutdown
2020-09-23 15:24:05 0 [Note] WSREP: Shutdown replication
2020-09-23 15:24:05 0 [Note] WSREP: Server status change synced -> disconnecting
2020-09-23 15:24:05 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-23 15:24:05 0 [Note] WSREP: Closing send monitor...
2020-09-23 15:24:05 0 [Note] WSREP: Closed send monitor.
2020-09-23 15:24:05 0 [Note] WSREP: gcomm: terminating thread
2020-09-23 15:24:05 0 [Note] WSREP: gcomm: joining thread
2020-09-23 15:24:05 0 [Note] WSREP: gcomm: closing backend
2020-09-23 15:24:05 0 [Note] WSREP: view(view_id(NON_PRIM,7176b6f5-97a8,171) memb {
        bd53962a-b45c,0
} joined {
} left {
} partitioned {
        7176b6f5-97a8,0
        a6560dab-8685,0
})
2020-09-23 15:24:05 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2020-09-23 15:24:05 0 [Note] WSREP: view((empty))
2020-09-23 15:24:05 0 [Note] WSREP: gcomm: closed
2020-09-23 15:24:05 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-09-23 15:24:05 0 [Note] WSREP: Flow-control interval: [253, 256]
2020-09-23 15:24:05 0 [Note] WSREP: Received NON-PRIMARY.
2020-09-23 15:24:05 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 344783)
2020-09-23 15:24:05 0 [Note] WSREP: New SELF-LEAVE.
2020-09-23 15:24:05 0 [Note] WSREP: Flow-control interval: [253, 256]
2020-09-23 15:24:05 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2020-09-23 15:24:05 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 344783)
2020-09-23 15:24:05 0 [Note] WSREP: RECV thread exiting 0: Success
2020-09-23 15:24:05 2 [Note] WSREP: ================================================
View:
  id: 692f8bd3-faca-11ea-ae47-93bd97be4ac8:344783
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(1):
        0: bd53962a-fc37-11ea-b45c-cad97f2906c2, sql3
=================================================
2020-09-23 15:24:05 2 [Note] WSREP: Non-primary view
2020-09-23 15:24:05 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-23 15:24:05 0 [Note] WSREP: recv_thread() joined.
2020-09-23 15:24:05 0 [Note] WSREP: Closing replication queue.
2020-09-23 15:24:05 0 [Note] WSREP: Closing slave action queue.
2020-09-23 15:24:05 2 [Note] WSREP: ================================================
View:
  id: 692f8bd3-faca-11ea-ae47-93bd97be4ac8:344783
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: yes
  own_index: -1
  members(0):
=================================================
2020-09-23 15:24:05 2 [Note] WSREP: Non-primary view
2020-09-23 15:24:05 2 [Note] WSREP: Server status change disconnecting -> disconnected
2020-09-23 15:24:05 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-23 15:24:05 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-23 15:24:05 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
2020-09-23 15:24:05 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
2020-09-23 15:24:05 0 [Note] WSREP: killing local connection: 239119
2020-09-23 15:24:05 0 [Note] WSREP: killing local connection: 142392

15 minutes after, systemctl status mariadb outputs this

Sep 23 15:39:05 mysql3 systemd: mariadb.service: State 'stop-sigterm' timed out. Skipping SIGKILL.

and mysqladmin shutdown command got stuck, no more output at the logs, so issued a CTRL+C, it outputs this:

Warning;  Aborted waiting on pid file: '/var/run/mysqld/mysqld.pid' after 1617 seconds

Persmissions check:

ls -l /var/run/mysqld
total 4
-rw-rw---- 1 mysql mysql 5 Sep 21 20:24 mysqld.pid
srwxrwxrwx 1 mysql mysql 0 Sep 21 20:24 mysqld.sock

and finaly:

Sep 23 15:54:05 mysql3 systemd: mariadb.service: State 'stop-final-sigterm' timed out. Skipping SIGKILL. Entering failed mode.
Sep 23 15:54:05 mysql3 systemd: mariadb.service: Failed with result 'timeout'.

Any attempt to start it will fail( before issue a kill -9 PID):

Sep 23 15:58:01 mysql3 systemd: mariadb.service: Found left-over process 7463 (mysqld) in control group while starting unit. Ignoring.
Sep 23 15:58:01 mysql3 systemd: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 23 15:58:01 mysql3 systemd: mariadb.service: Will not start SendSIGKILL=no service of type KillMode=control-group or mixed while processes exist
Sep 23 15:58:01 mysql3 systemd: mariadb.service: Failed to run 'start-pre' task: Device or resource busy
Sep 23 15:58:01 mysql3 systemd: mariadb.service: Failed with result 'resources'.
Sep 23 15:58:01 mysql3 systemd: Failed to start MariaDB 10.4.14 database server.

Comment by markus makela [ 2020-09-28 ]

Does this only happen with a Galera cluster? Can you reproduce it with a single server?

Comment by markus makela [ 2020-09-29 ]

Tested with Galera from 10.4 and it worked:

2020-09-30  7:59:50 0 [Note] mysqld (initiated by: maxuser[maxuser] @  [172.19.0.1]): Normal shutdown
2020-09-30  7:59:50 0 [Note] WSREP: Shutdown replication
2020-09-30  7:59:50 0 [Note] WSREP: Server status change synced -> disconnecting
2020-09-30  7:59:50 0 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-30  7:59:50 0 [Note] WSREP: Closing send monitor...
2020-09-30  7:59:50 0 [Note] WSREP: Closed send monitor.
2020-09-30  7:59:50 0 [Note] WSREP: gcomm: terminating thread
2020-09-30  7:59:50 0 [Note] WSREP: gcomm: joining thread
2020-09-30  7:59:50 0 [Note] WSREP: gcomm: closing backend
2020-09-30  7:59:50 0 [Note] WSREP: view(view_id(NON_PRIM,272c5f27-b1cd,8) memb {
	3bfe6ae2-9651,0
} joined {
} left {
} partitioned {
	272c5f27-b1cd,0
	4ce07ee2-b57f,0
	5b5db3eb-acc2,0
})
2020-09-30  7:59:50 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2020-09-30  7:59:50 0 [Note] WSREP: view((empty))
2020-09-30  7:59:50 0 [Note] WSREP: gcomm: closed
2020-09-30  7:59:50 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2020-09-30  7:59:50 0 [Note] WSREP: Flow-control interval: [16, 16]
2020-09-30  7:59:50 0 [Note] WSREP: Received NON-PRIMARY.
2020-09-30  7:59:50 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 33)
2020-09-30  7:59:50 0 [Note] WSREP: New SELF-LEAVE.
2020-09-30  7:59:50 2 [Note] WSREP: ================================================
View:
  id: 255e8e5e-02f1-11eb-b437-0ba5243c6607:33
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: no
  own_index: 0
  members(1):
	0: 3bfe6ae2-02f1-11eb-9651-ae29b226513a, galera1
=================================================
2020-09-30  7:59:50 0 [Note] WSREP: Flow-control interval: [0, 0]
2020-09-30  7:59:50 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2020-09-30  7:59:50 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 33)
2020-09-30  7:59:50 2 [Note] WSREP: Non-primary view
2020-09-30  7:59:50 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-30  7:59:50 0 [Note] WSREP: RECV thread exiting 0: Success
2020-09-30  7:59:50 0 [Note] WSREP: recv_thread() joined.
2020-09-30  7:59:50 0 [Note] WSREP: Closing replication queue.
2020-09-30  7:59:50 0 [Note] WSREP: Closing slave action queue.
2020-09-30  7:59:50 2 [Note] WSREP: ================================================
View:
  id: 255e8e5e-02f1-11eb-b437-0ba5243c6607:33
  status: non-primary
  protocol_version: 4
  capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
  final: yes
  own_index: -1
  members(0):
=================================================
2020-09-30  7:59:50 2 [Note] WSREP: Non-primary view
2020-09-30  7:59:50 2 [Note] WSREP: Server status change disconnecting -> disconnected
2020-09-30  7:59:50 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-30  7:59:50 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2020-09-30  7:59:50 2 [Note] WSREP: Applier thread exiting ret: 0 thd: 2
2020-09-30  7:59:50 2 [Warning] Aborted connection 2 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 429
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 427
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 424
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 421
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 415
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 414
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 401
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 400
2020-09-30  7:59:50 0 [Note] WSREP: killing local connection: 333
2020-09-30  7:59:51 1 [Note] WSREP: rollbacker thread exiting 1
2020-09-30  7:59:51 1 [Warning] Aborted connection 1 to db: 'unconnected' user: 'unauthenticated' host: '' (This connection closed normally without authentication)
2020-09-30  7:59:51 0 [Note] Event Scheduler: Purging the queue. 0 events
2020-09-30  7:59:51 0 [Note] InnoDB: FTS optimize thread exiting.
2020-09-30  7:59:51 0 [Note] WSREP: dtor state: CLOSED
2020-09-30  7:59:51 0 [Note] WSREP: MemPool(TrxHandleSlave): hit ratio: 0, misses: 18, in use: 18, in pool: 0
2020-09-30  7:59:51 0 [Note] WSREP: mon: entered 14 oooe fraction 0 oool fraction 0
2020-09-30  7:59:52 0 [Note] WSREP: mon: entered 14 oooe fraction 0 oool fraction 0
2020-09-30  7:59:52 0 [Note] WSREP: mon: entered 27 oooe fraction 0 oool fraction 0.037037
2020-09-30  7:59:52 0 [Note] WSREP: cert index usage at exit 2
2020-09-30  7:59:52 0 [Note] WSREP: cert trx map usage at exit 18
2020-09-30  7:59:52 0 [Note] WSREP: deps set usage at exit 0
2020-09-30  7:59:52 0 [Note] WSREP: avg deps dist 1
2020-09-30  7:59:52 0 [Note] WSREP: avg cert interval 4.61111
2020-09-30  7:59:52 0 [Note] WSREP: cert index size 2
2020-09-30  7:59:52 0 [Note] WSREP: Service thread queue flushed.
2020-09-30  7:59:52 0 [Note] WSREP: wsdb trx map usage 0 conn query map usage 0
2020-09-30  7:59:52 0 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.5, misses: 1, in use: 0, in pool: 1
2020-09-30  7:59:52 0 [Note] WSREP: Shifting CLOSED -> DESTROYED (TO: 33)
2020-09-30  7:59:52 0 [Note] WSREP: Flushing memory map to disk...
2020-09-30  7:59:52 0 [Note] InnoDB: Starting shutdown...
2020-09-30  7:59:52 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
2020-09-30  7:59:52 0 [Note] InnoDB: Buffer pool(s) dump completed at 200930  7:59:52
2020-09-30  7:59:53 0 [Note] InnoDB: Shutdown completed; log sequence number 127561; transaction id 132
2020-09-30  7:59:53 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2020-09-30  7:59:53 0 [Note] mysqld: Shutdown complete

Comment by markus makela [ 2020-09-30 ]

This seems like a problem in the MariaDB shutdown. I'll leave this open until MDEV-23605 is resolved as it might be related.

acs-ferreira Meanwhile, please try to reproduce this with a non-Galera setup and see if the behavior changes. If you cannot reproduce this without Galera, the problem is likely inside Galera and not MaxScale.

Comment by acsfer [ 2020-09-30 ]

Will do @markus since i have some time to spin up a single instance.

Comment by acsfer [ 2020-10-07 ]

I can only reproduce it in a Galera cluster. mysqladmin shutdown works well on a single instance.

Comment by markus makela [ 2020-10-07 ]

OK, I'll close this issue then. I'd recommend searching for a Galera issue and reporting this there as well. If you can't find a MDEV issue for it, file a new one.

Generated at Thu Feb 08 04:19:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.