[MXS-3317] Nginx errors while streaming to Maxscale Created: 2020-12-03  Updated: 2021-08-25  Resolved: 2021-08-25

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: 2.4.14
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: acsfer Assignee: markus makela
Resolution: Cannot Reproduce Votes: 0
Labels: None


 Description   

I have a simple stream block to stream MySQL TCP traffic to Maxscale instances. 2nd instance acts as a failover only, with a configuration as simple as:

stream {    
    upstream maxscale {
        zone upstream_maxscale 64k;
        server 10.1.0.11:3307;
        server 10.1.0.12:3307 backup;
    }
 
    server {
        listen 3307;
        proxy_pass maxscale;
    }
}

When connections are low (<30), everything goes fine. But when connection are higher than 30, nginx error log keeps complaining about something that i don't know how to debug, as only nginx complains about this but not Maxscale.

recv() failed (104: Connection reset by peer) while proxying and reading from upstream,
bytes from/to client:15738/64316, bytes from/to upstream:64316/15738

I've tried play with options like `reuseport`, `worker_connections` or `so_keepalive` on nginx configuration but no chances.

Here the Maxscale 2.4 listener:

# Listener
    
[listener-rw]
type=listener
service=readwritesplit
protocol=MariaDBClient
address=10.1.0.11
port=3307
ssl=required
ssl_ca_cert=/var/lib/maxscale/ssl/ca-cert.pem
ssl_cert=/var/lib/maxscale/ssl/server.pem
ssl_key=/var/lib/maxscale/ssl/server.key
ssl_version=MAX
 
# Service
 
[readwritesplit]
type=service
router=readwritesplit
servers=sql1,sql2,sql3
user=maxscale
password=324F74A347291B3BE79956A
enable_root_user=1
max_sescmd_history=150
max_slave_connections=100%
lazy_connect=true
slave_selection_criteria=LEAST_CURRENT_OPERATIONS
optimistic_trx=true
connection_keepalive=300
master_failure_mode=fail_on_write

Maxscale log remains empty (other than start notice messages) so only nginx is complaining about this.



 Comments   
Comment by markus makela [ 2020-12-03 ]

Have you seen any errors in your application when this happens?

Comment by acsfer [ 2020-12-03 ]

No, but I'm not writing anything, only reading (GET requests).
The same nginx instance is proxying other things (redis and memcached) and only the maxscale proxy is producing these errors.

Comment by markus makela [ 2020-12-03 ]

OK, this definitely suggests that this is caused by something in MaxScale.

If you can reproduce this problem in a non-production setup, you could turn on log_info to see if there's anything logged there.

Comment by acsfer [ 2020-12-04 ]

Log can be found here: https://hiberfile.com/d/uNLzJPRk?p=Bt2OGG@Khx=CmdyD
(i was unable to attach it here, sorry)

Comment by markus makela [ 2020-12-08 ]

The first look of the file didn't reveal anything. A more thorough investigation needs to be done.

Comment by acsfer [ 2020-12-11 ]

We've switched it to a RW test application.

Direct connection to maxscale: no errors for 72h.

Proxying trough nginx:

SQLSTATE[HY000] [2002] Connection refused
PDOStatement::execute(): SSL: Connection reset by peer
SQLSTATE[HY000]: General error: 2006 MySQL server has gone away

random errors at random times were produced in 24h.

Comment by markus makela [ 2021-08-02 ]

Have you tried reproducing this using Nginx with only one MaxScale server? This would rule out any potential load balancing problems caused by Nginx itself.

Comment by markus makela [ 2021-08-25 ]

I tested this locally with sysbench and the same configuration for both MaxScale and Nginx. I wasn't able to reproduce it with 100 threads or by any other means.

Comment by markus makela [ 2021-08-25 ]

I'll close this as Cannot Reproduce since there's been no updates and our attempts to reproduce it haven't revealed anything.

Generated at Thu Feb 08 04:20:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.