[MXS-2864] Maxctrl Not Responding CentOS 7 Created: 2020-02-03  Updated: 2020-02-10  Resolved: 2020-02-10

Status: Closed
Project: MariaDB MaxScale
Component/s: maxctrl
Affects Version/s: 2.4.5
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Aaron Chamberlain Assignee: markus makela
Resolution: Not a Bug Votes: 0
Labels: network
Environment:

CentOS 7.7 x86_64 in VMWare
MariaDB 10.3.21


Attachments: File mariadb_server.cnf     File maxscale_server.cnf    

 Description   

I'm working on setting up a MariaDB 3 Node Cluster and using Maxscale as the proxy. I had set up a practice config on some local KVM machines, worked without a hitch. So I went to spin up the production servers and I'm getting an error I can't make sense of. If I run any command in maxctrl at all it throws the same error:

ERROR
The requested URL could not be retrieved
The following error was encountered while trying to retrieve the URL: http://localhost:8989/v1/maxscale/modules/mariadbmon/
Connection to ::1 failed.
The system returned: (99) Cannot assign requested address
The remote host or network may be down. Please try the request again.

Ok so it sounds like something was using port 8989 before Maxscale, let's check with lsof -i -P -n | grep 89:

maxscale 1117 maxscale   23u  IPv4  19765      0t0  TCP 127.0.0.1:8989 (LISTEN)

SELinux is set to Permissive for testing, Firewalld is off for testing.

Someone suggested it might be an IPv6 issue since it says connection to ::1 but I can't see what the difference would be between my test and pro machines as they both have the same default loopback adapter settings in `lo` and both have the same aliases in `/etc/hosts`

Suggestions for debugging?

Some other steps I performed:
1) Maxscale logs, here's everything up until the listener claims to be started:

MariaDB MaxScale  /var/log/maxscale/maxscale.log  Sun Feb  2 21:31:23 2020
----------------------------------------------------------------------------
2020-02-02 21:31:23   notice : syslog logging is enabled.
2020-02-02 21:31:23   notice : maxlog logging is enabled.
2020-02-02 21:31:23   notice : Using up to 3.51GiB of memory for query classifier cache
2020-02-02 21:31:23   notice : Working directory: /var/log/maxscale
2020-02-02 21:31:23   notice : The collection of SQLite memory allocation statistics turned off.
2020-02-02 21:31:23   notice : Threading mode of SQLite set to Multi-thread.
2020-02-02 21:31:23   notice : MariaDB MaxScale 2.4.5 started (Commit: 61b8bbf7f63c38ca9c408674e66f3627a0b2192e)
2020-02-02 21:31:23   notice : MaxScale is running in process 8036
2020-02-02 21:31:23   notice : Configuration file: /etc/maxscale.cnf
2020-02-02 21:31:23   notice : Log directory: /var/log/maxscale
2020-02-02 21:31:23   notice : Data directory: /var/lib/maxscale
2020-02-02 21:31:23   notice : Module directory: /usr/lib64/maxscale
2020-02-02 21:31:23   notice : Service cache: /var/cache/maxscale
2020-02-02 21:31:23   notice : Worker message queue size: 1.00MiB
2020-02-02 21:31:23   notice : No query classifier specified, using default 'qc_sqlite'.
2020-02-02 21:31:23   notice : Loaded module qc_sqlite: V1.0.0 from /usr/lib64/maxscale/libqc_sqlite.so
2020-02-02 21:31:23   notice : Query classification results are cached and reused. Memory used per thread: 449.02MiB
2020-02-02 21:31:23   notice : The systemd watchdog is Enabled. Internal timeout = 30s
2020-02-02 21:31:23   notice : Loading /etc/maxscale.cnf.
2020-02-02 21:31:23   notice : /etc/maxscale.cnf.d does not exist, not reading.
2020-02-02 21:31:23   notice : Loaded module MariaDBClient: V1.1.0 from /usr/lib64/maxscale/libmariadbclient.so
2020-02-02 21:31:23   notice : [readwritesplit] Initializing statement-based read/write split router module.
2020-02-02 21:31:23   notice : Loaded module readwritesplit: V1.1.0 from /usr/lib64/maxscale/libreadwritesplit.so
2020-02-02 21:31:23   notice : [readconnroute] Initialise readconnroute router module.
2020-02-02 21:31:23   notice : Loaded module readconnroute: V2.0.0 from /usr/lib64/maxscale/libreadconnroute.so
2020-02-02 21:31:23   notice : [mariadbmon] Initialise the MariaDB Monitor module.
2020-02-02 21:31:23   notice : Loaded module mariadbmon: V1.5.0 from /usr/lib64/maxscale/libmariadbmon.so
2020-02-02 21:31:23   notice : Loaded module MariaDBBackend: V2.0.0 from /usr/lib64/maxscale/libmariadbbackend.so
2020-02-02 21:31:23   notice : Loaded module mariadbbackendauth: V1.0.0 from /usr/lib64/maxscale/libmariadbbackendauth.so
2020-02-02 21:31:23   notice : Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.
2020-02-02 21:31:23   notice : Loaded module mariadbauth: V1.1.0 from /usr/lib64/maxscale/libmariadbauth.so
2020-02-02 21:31:23   notice : Started REST API on [127.0.0.1]:8989
2020-02-02 21:31:23   notice : MaxScale started with 8 worker threads, each with a stack size of 8388608 bytes.
2020-02-02 21:31:23   notice : Starting a total of 2 services...
2020-02-02 21:31:23   notice : Server 'server1' version: 10.3.21-MariaDB-log
2020-02-02 21:31:23   notice : Server 'server2' version: 10.3.21-MariaDB-log

2) curl localhost:8989/v1/maxscale returns the 99 error code as above. If I do curl 127.0.0.1:8989/v1/maxscale it returns a different 111 error.

<blockquote id="error">
<p><b>Connection to 127.0.0.1 failed.</b></p>
</blockquote>
 
<p id="sysmsg">The system returned: <i>(111) Connection refused</i></p>

3) tcpdump shows that absolutely nothing is coming across the wire, which is really weird. I tried `tcpdump -v -i ens192 'port 8989'` and `tcpdump -v -i lo 'port 8989'` and then both curl methods as above, and get the same result:

tcpdump: listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
0 packets captured
0 packets received by filter
0 packets dropped by kernel

Finally I have attached the MariaDB and Maxscale config files (with obfuscation) to aid in finding out what the issue is.



 Comments   
Comment by markus makela [ 2020-02-03 ]

Can you try to adding admin_host=127.0.0.1 in the [maxscale] section to see if that resolves it? Maybe for some reason it fails to bind to the IPv6 loopbak address and setting it explicitly to the IPv4 would solve it.

Comment by Aaron Chamberlain [ 2020-02-03 ]

Doesn't appear to have worked. Steps taken:

1) In /etc/maxscale.cnf set admin_host=127.0.0.1
2) systemctl restart maxscale
3) as root user maxctrl list servers
4) Recieve the same error.

Since the error mentions it trying to query http://localhost:8989 I tried the steps above with admin_host=localhost. Recieved the same error but did get an interesting result from the lsof command above, the listener only bound to IPv6.

maxscale  22500 maxscale   51u  IPv6 9515525      0t0  TCP [::1]:8989 (LISTEN)

With admin_host=127.0.0.1 lsof returns

maxscale  23456 maxscale   51u  IPv4 9520707      0t0  TCP 127.0.0.1:8989 (LISTEN)

So is that the issue? Maxscale is trying to run on the IPv4 but the query defaults to "localhost" which on my system is actually the IPv6 address? It's weird the curl also returns these same errors.

Comment by Aaron Chamberlain [ 2020-02-03 ]

Following that logic, I checked to see if there was anything in /etc/hosts to arouse suspicion:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
 
127.0.1.1       <host_names>

So I commented out that second line and it didn't change anything either.

Comment by markus makela [ 2020-02-03 ]

Does curl 127.0.0.1:8989/v1/maxscale with admin_host=127.0.0.1 work? That should explicitly use the IPv4 stack and if that doesn't work, something else is going on.

Comment by Aaron Chamberlain [ 2020-02-03 ]

With admin_host=127.0.0.1 I tried curl http://127.0.0.1:8989/v1/maxscale and this was the error (same as in the original post):

</head><body id=ERR_CONNECT_FAIL>
<div id="titles">
<h1>ERROR</h1>
<h2>The requested URL could not be retrieved</h2>
</div>
<hr>
 
<div id="content">
<p>The following error was encountered while trying to retrieve the URL: <a href="http://127.0.0.1:8989/v1/maxscale">http://127.0.0.1:8989/v1/maxscale</a></p>
 
<blockquote id="error">
<p><b>Connection to 127.0.0.1 failed.</b></p>
</blockquote>
 
<p id="sysmsg">The system returned: <i>(111) Connection refused</i></p>
 
<p>The remote host or network may be down. Please try the request again.</p>
 
<p>Your cache administrator is <a href="mailto:webmaster?subject=CacheErrorInfo%20-%20ERR_CONNECT_FAIL&amp;body=CacheHost%3A%204226df9a2fae%0D%0AErrPage%3A%20ERR_CONNECT_FAIL%0D%0AErr%3A%20(111)%20Connection%20refused%0D%0ATimeStamp%3A%20Mon,%2003%20Feb%202020%2012%3A45%3A17%20GMT%0D%0A%0D%0AClientIP%3A%2010.19.142.75%0D%0AServerIP%3A%20127.0.0.1%0D%0A%0D%0AHTTP%20Request%3A%0D%0AGET%20%2Fv1%2Fmaxscale%20HTTP%2F1.1%0AUser-Agent%3A%20curl%2F7.29.0%0D%0AAccept%3A%20*%2F*%0D%0AProxy-Connection%3A%20Keep-Alive%0D%0AHost%3A%20127.0.0.1%3A8989%0D%0A%0D%0A%0D%0A">webmaster</a>.</p>
 
<br>
</div>
 
<hr>
<div id="footer">
<p>Generated Mon, 03 Feb 2020 12:45:17 GMT by 4226df9a2fae (squid/3.5.27)</p>
<!-- ERR_CONNECT_FAIL -->
</div>
</body></html>

Comment by markus makela [ 2020-02-03 ]

That's definitely Maxscale that created the error. Are you using some sort of a HTTP proxy on port 8989?

Comment by Aaron Chamberlain [ 2020-02-03 ]

Bingo. I had the following in /etc/environment:

http_proxy=http://<ip>:3128
https_proxy=http://<ip>:3128

I removed that, shut down my ssh session and jumped back in (so the env variables could be loaded again) and now maxctrl list servers shows the expected output.

The question though is: how should I set up my proxy then? I do need those to be able to reach the internet, perform OS updates, etc.

Comment by markus makela [ 2020-02-03 ]

Sadly, that is not my area of expertise.

Generated at Thu Feb 08 04:17:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.