[MDEV-32960] Semi-sync ACKed Transaction can Timeout and Switch Off Semi-sync with Multiple Replicas Created: 2023-12-06 Updated: 2024-01-23 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.6 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Brandon Nesterenko | Assignee: | Michael Widenius |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | MDEV-32551-test | ||
| Issue Links: |
|
||||||||
| Description |
|
If a semi-sync primary has multiple semi-sync slaves, the primary seems to only listen for ACKs on one replica at a time. If the focused slave fails to reply, but another one does reply, the transaction will still time-out and semi-sync will switch off. See the following MTR snippets: When server_2 delays its ACK (using debug with "+d,simulate_delay_semisync_slave_reply") but server_3 sends its ACK, the transaction times out and semi-sync turns off
And result:
But vice-versa, if server_2 ACKS but server_3 times-out, the primary sees the ACK and no timeout occurs
With result:
|
| Comments |
| Comment by Brandon Nesterenko [ 2023-12-07 ] |
|
I think I see the issue. If a semi-sync connection already has been established on a primary, any new semi-sync slaves that join won't be listened to until a transaction has been ACKed after the new connection is added to the Ack_Receiver. The problem with this lies in the next transaction after adding a new semi-sync slave. If the existing connection (which is being listened for ACKs on) fails to send an ACK for the transaction, the Ack receiver thread won't be able to read the ACKs from the newly added replicas. This results in a time-out of the connection and semi-sync falling back to async mode altogether. |