[MXS-1705] Maxscale 2.2.2 crashes on startup with CentOS 7 Created: 2018-03-06  Updated: 2018-03-08  Resolved: 2018-03-08

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.2.2
Fix Version/s: 2.2.4

Type: Bug Priority: Major
Reporter: jerry wiersma Assignee: Johan Wikman
Resolution: Fixed Votes: 0
Labels: None


 Description   

The Maxscale service crashes on start-up on a fresh CentOS 7 host.

The latest version of Maxscale 2.2.2 has been Downloaded as RPM and installed through Yum on a freshly provisioned CentOS 7 host with the following error(s):
2018-03-06 14:35:55 error : Could not create pipe for worker: Invalid argument
2018-03-06 14:35:55 error : Could not create message queue for worker.
2018-03-06 14:35:55 error : Failed to initialize workers.
2018-03-06 14:35:55 info : Starting log flushing to disk.
2018-03-06 14:35:56 alert : Fatal: MaxScale 2.2.2 received fatal signal 11. Attempting backtrace.
2018-03-06 14:35:56 alert : Commit ID: d1465e03c3eef6c8758a80ac30d84f75307a6cbd System name: Linux Release string: CentOS Linux release 7.3.1611 (Core)
2018-03-06 14:35:56 alert : /usr/bin/maxscale() [0x4079d1]: ??:0
2018-03-06 14:35:56 alert : /lib64/libpthread.so.0(+0xf370) [0x7f2c3fd99370]: sigaction.c:?
2018-03-06 14:35:56 alert : /usr/lib64/maxscale/libmaxscale-common.so.1.0.0(_ZN8maxscale6Worker12shutdown_allEv+0x22) [0x7f2c402aa822]: /home/ec2-user/MaxScale/server/core/worker.cc:939 (discriminator 2)
2018-03-06 14:35:56 alert : /usr/bin/maxscale(maxscale_shutdown+0x22) [0x407e42]: /home/ec2-user/MaxScale/server/core/gateway.cc:2354
2018-03-06 14:35:56 alert : /usr/bin/maxscale() [0x407eef]: ??:0
2018-03-06 14:35:56 alert : /lib64/libpthread.so.0(+0xf370) [0x7f2c3fd99370]: sigaction.c:?
2018-03-06 14:35:56 alert : /lib64/libpthread.so.0(write+0x2d) [0x7f2c3fd9843d]: :?
2018-03-06 14:35:56 alert : /usr/bin/maxscale(_Z21write_child_exit_codeii+0x1a) [0x408ada]: /home/ec2-user/MaxScale/server/core/gateway.cc:2970
2018-03-06 14:35:56 alert : /usr/bin/maxscale(main+0xadc) [0x404e7c]: /home/ec2-user/MaxScale/server/core/gateway.cc:2328
2018-03-06 14:35:56 alert : /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f2c3dc9fb35]: ??:?
2018-03-06 14:35:56 alert : /usr/bin/maxscale() [0x4069f5]: ??:0
2018-03-06 14:35:56 info : Starting log flushing to disk.

This happens both with the default config template and with our custom settings.



 Comments   
Comment by Johan Wikman [ 2018-03-06 ]

Please ensure that warnings are on

log_warnings=1

rerun and provide the complete log.

Comment by jerry wiersma [ 2018-03-06 ]

To add to this issue:
The problems occur when maxscale is run within a CentOS7 container. We just tested this inside a virtualbox, the service does run when started within a virtual machine as opposed to a container.

Our config already includes the log_warning=1 value. See:
[maxscale]
log_debug=1
log_info=1
threads=auto

[MySQL1]
type=server
address=x.x.x.177
port=3306
protocol=MySQLBackend

[MySQL2]
type=server
address=x.x.x.178
port=3306
protocol=MySQLBackend

[Sharded-Service]
type=service
router=schemarouter
servers=MySQL1,MySQL2
user=proxy-router
passwd=password

[Sharded-Service-Listener]
type=listener
service=Sharded-Service
protocol=MySQLClient
port=4006
address=0.0.0.0

[MySQL-Monitor]
type=monitor
module=mmmon
servers=MySQL1,MySQL2
user=proxy-monitor
passwd=password
monitor_interval=2000

[MaxAdmin-Service]
type=service
router=cli

[MaxAdmin-Listener]
type=listener
service=MaxAdmin-Service
protocol=maxscaled
socket=default

Comment by Johan Wikman [ 2018-03-06 ]

Ok, a CentOS7 container is quite different from a CentOS7 machine.

But could you provide the full log. The one you initially provided seems be just the end of the log.

Comment by markus makela [ 2018-03-07 ]

What sort of a container is it?

Comment by Johan Wikman [ 2018-03-07 ]

jerry-wiersma
The reason for this error

2018-03-06 14:35:55 error : Could not create pipe for worker: Invalid argument

is known and can easily be dealt with. But please provide the full log nonetheless.

Comment by Johan Wikman [ 2018-03-07 ]

There were two problems. We would like to use O_DIRECT with pipes that are used internally, but that flag is supported in conjunction with pipes only form kernel version 3.4 onward. We dealt with this by figuring out the kernel version and then using O_DIRECT if it is supported and not if it isn't. Apparently, in that CentOS7 container it appears to be supported, while it in reality is not. That caused the worker creation to fail.

In addition there was a bug, which led to a crash if a signal was received after the worker creation has failed. Namely, the code made a note about the number of workers before they actually had been created, which then led to illegal access when performing shutdown processing due to the signal.

The first problem has been fixed by not attempting to figure out beforehand whether O_DIRECT is supported, but instead simply using it and if the pipe creation fails with EINVAL make a second creation attempt without it. The second problem has been solved by making a note about the number of worker only after they really have been created.

Generated at Thu Feb 08 04:08:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.