Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Incomplete
-
22.08.8
-
None
-
None
-
- 2 Maxscale 22.08.8 servers, `dbproxy1` and `dbproxy2` sharing a VIP using `keepalived` with binlog router enabled for external replication
- 3 MariaDB 10.6.12-7 servers, `db1`, `db2`, and `db3` with replication managed by Maxscale
-
MXS-SPRINT-219
Description
After placing a third server in a cluster into maintenance mode, the Maxscale process is terminated with signal 6 (Abort). This is not guaranteed to happen every time, and generally happens on systems with more connections.
This is a critical bug that the customer's management is very aware of, and has been happening frequently during patching for some time. They would like a custom build of Maxscale with more debugging for abort signals.
Configuration and log files from the time of the incidents for Maxscale, MariaDB, and keepalived for all systems are attached in a hidden comment. The Maxscale crashes happen around 2024-09-24 22:48:20
Timeline of issue on 2024-09-24 (UTC):
1. db2 is put into maintenance mode, wait for connections to drain - 21:42:36
2. db2 has its OS patched and is rebooted - 22:02:03
3. db2 taken out of maintenance mode - 22:19:54
4. dbproxy1 has its OS patched and is rebooted - 22:23:39
5. db3 is put into maintenance mode in preparation for patching - 22:30:20
6. db1 and db2 start reporting errors reading communication packets - 22:36:40
7. dbproxy1 aborts with these log entries, keepalived does not switchover - 22:48:20
2024-09-24 22:47:50 warning: Thread 'Worker-08' has not reported back in 30 seconds.
|
2024-09-24 22:48:20 warning: Thread 'Worker-10' has not reported back in 30 seconds.
|
2024-09-24 22:48:20 warning: Thread 'Worker-08' has not reported back in 30 seconds.
|
2024-09-24 22:48:20 warning: Thread 'Worker-26' has not reported back in 30 seconds.
|
2024-09-24 22:48:20 warning: Thread 'Worker-27' has not reported back in 30 seconds.
|
2024-09-24 22:48:20 warning: Thread 'Worker-28' has not reported back in 30 seconds.
|
8165099:alert : MaxScale 22.08.8 received fatal signal 6. Commit ID: 2f16a515391ac530a7280334dff5334f489d884e System name: Linux Release string: Ubuntu 20.04.6 LTS
|
|
|
26 ../sysdeps/unix/sysv/linux/read.c: No such file or directory.
|
|
|
Running systemctl restart maxscale brings Maxscale back up without issue.
Attachments
Issue Links
- is blocked by
-
MXS-5363 GDB stacktraces may hang
- Closed