[MDEV-31089] Connections counter increases with replication Created: 2023-04-19 Updated: 2023-06-27 Resolved: 2023-06-27 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.5.18, 10.5.19 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Nicolas | Assignee: | Unassigned |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | 10.5, connection, replication, thread | ||
| Environment: |
debian 11 |
||
| Attachments: |
|
| Description |
|
Hello everyone, I and my team spent a lot of time before posting this bug, and maybe it's not really a bug, but we can't figure out why our server has this behaviour. We had 5 servers with debian 10 + mariaDB 10.3.38, everything was fine. We recently upgraded to debian 11 + mariaDB 10.5.18 and we started to have a high connect rate. we upgraded on 23th march (week 12). We can see clearly on the chart below that connect_rate increased by about 4 times. connect_rate is a probe from CheckMK monitoring, which reads Connections variable from SHOW GLOBAL STATUS. When showing this variable every 10 seconds, counter increases a lot:
483 new connections in about 10 seconds. But when listing processlist, there aren't so many running sessions. And our trafic hasn't increased neither. All our servers are in ring replication, so all are master-master, and slave from previous one. sql1 <=> sql2 <=> sql3 <=> sql4 <=> sql5 => So we tried with another slave connected to sql1, with only replication as client (no read from any application or other client): same problem. sql1 <=> sql2 <=> sql3 <=> sql4 <=> sql5 => sql8 But when we stop replication (STOP SLAVE), new connections don't increase no more! I checked in the source code, "Connections" global status variable seems to be increased every new connection, and should have created a new thread. We tried to enable audit plugin, but we can't see all intermediate connection_id. We also tried to enable the general_log for a few minutes, we can see replication queries, with connection_id = the same in the processlist, but we can't find other one. Connection rate increases when there are many queries in the replication. So our main problems are:
We searched a lot on Internet, but we can't find any post or any bug in this tracker speaking about this behaviour. What it seems to be weird, is that we can't find in the general_log nor in the audit log queries or connections with new IDs. All queries made by replication have the same connection_id. But when there are many queries, connection rate increases, which seems to be a link between the two. I assume that queries replication shouldn't create new connections or increase Connections counter? Binlog format from master sql1 is in MIXED format.
Is this behaviour normal? Thanks a lot! |
| Comments |
| Comment by Nicolas [ 2023-04-19 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
here is an example: 21:19:55
21:19:58
from audit log:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Sergei Golubchik [ 2023-05-26 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Replication connections are still connections and they are counted as such, this is normal. But, of course, replication slave should not reconnect 50 times a second, so it cannot explain why you got 483 new connections in 10 seconds. Queries from the audit log looks like monitoring queries. Could you have some kind of a monitoring tool (e.g. your CheckMK) that starts doing a lot more work when replication is enabled? For example — just a guess — may be it tries to walk the replication topology from the master to slaves and collect some stats, but because you have a ring, it just runs forever in circles? Try to break the ring and see if Connections stops increasing. Or disable your monitoring and see if Connections stops increasing. Either way, it's likely not the replication itself. |