[MDEV-25392] IO thread reporting yes despite failing to fetch GTID Created: 2021-04-12 Updated: 2024-01-30 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major |
| Reporter: | VAROQUI Stephane | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Description |
|
Randomly once in a month on a cheap cloud server with limited disk IO GTID replication freeze for ever. It's trigger by a short network glitch. The IO thread start re connection to leader that failed in infinite loop . During investigation the leader error log messages are very confusing and does not help to found the cause of the issue 2021-04-12 12:26:55 1951557 [Note] Start binlog_dump to slave_server(2), pos(, 4), using_gtid(1), gtid('0-12599180-3481944') The issue is because of slow disque and that sending the 0-12599180-3481944 position takes more than slave_net_timeout , the IO thread so cancel the event reception and retry set global slave_net_timeout=1200; In such infinite loop scenario cause by lack of binlog indexing on GTID the io_thread has always been reporting yes making monitoring proxies to send traffic to some super delayed slaves. Introducing a connecting state should be more appropriate. One can also point the lack of an existing GTID function that with GTID parameter return binlogs file and position |
| Comments |
| Comment by Andrei Elkin [ 2021-04-17 ] |
|
stephane@skysql.com, thanks for the report! Firstly, I agree master should be faster to respond, and there seems to be nothing but the indexing to help out. To the new in-CONNECTing state also makes sense. |
| Comment by Kristian Nielsen [ 2024-01-30 ] |
|
Nice analysis Stephane! GTID indexes are now finally done, in 11.4 (MDEV-4991). This should fix the problem with slow connect to master with slow disks. (The other points mentioned in the description remain valid, of course). |
| Comment by VAROQUI Stephane [ 2024-01-30 ] |
|
Let's celebrate MDEV-4991 in FOSDEM |