[MDEV-29981] Replica stops with "Found invalid event in binary log" Created: 2022-11-08 Updated: 2023-03-16 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.4.25, 10.6.9 |
| Fix Version/s: | 10.4 |
| Type: | Bug | Priority: | Major |
| Reporter: | Anton | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
I got a strange behaviour with replication
The errors always appear at position 4 , at the start of binlog. The replication begins to work after restart replications `stop slave; start slave` I've attached config file |
| Comments |
| Comment by Angelique Sklavounos (Inactive) [ 2022-11-09 ] | |||||||
|
Hi R, Could you please upload to the private FTP server:
Thank you | |||||||
| Comment by Luigi [ 2022-11-09 ] | |||||||
|
adding a case with the same issue on 10.4.24-15-MariaDB-enterprise-log | |||||||
| Comment by Andrei Elkin [ 2022-11-10 ] | |||||||
|
R, it'd be very helpful to see SHOW-SLAVE-STATUS log if you still have one for sharing. Thank you. | |||||||
| Comment by Luigi [ 2022-11-10 ] | |||||||
|
uploaded 'MDEV-29981_logs.tgz' to the ftp server with the error logs of both primary and replica, the show global variables of both and the 2 binlogs involved in the issue | |||||||
| Comment by Anton [ 2022-11-10 ] | |||||||
|
Hello Angelique, Andrei I've upload files your private ftp ftp://ftp.mariadb.org/private/
| |||||||
| Comment by Anton [ 2022-11-10 ] | |||||||
|
I've also uploaded by a one archive file | |||||||
| Comment by Andrei Elkin [ 2022-11-11 ] | |||||||
|
R , thanks for uploads! Show-slave-status is not from the time of the failure. We still need the one that displays the error from the error log. | |||||||
| Comment by Andrei Elkin [ 2022-11-11 ] | |||||||
|
After having found no ways to get to the slave error we need MXS analysis of these two events:
R, your MXS log might have a similar failover message just before the slave error one. I am looking for confirmation/denial. I can't dismiss yet that something is inserted into slave. | |||||||
| Comment by Anton [ 2022-11-11 ] | |||||||
|
Hi Elkin At 11.11 evening I've got the same error again But I notice that the problem always appear with heavy transaction (I mean long text transaction) in the end of the binlog . The size of the binlog with heavy transaction in the end a little bit bigger then other (mysqld-bin.204621)
| |||||||
| Comment by Anton [ 2022-11-11 ] | |||||||
|
And I checked at 10.4.26 version - the same problem | |||||||
| Comment by Andrei Elkin [ 2022-11-14 ] | |||||||
|
R, thank you for the reply! I am analyzing it deeper now. In this regard, could you please also look into the master and slave servers' syslogs to possibly identify anything unusual at the failure time? I also notice your log_warnings is of the default value 2, and since you're "lucky" to summon the failure, may I ask you to raise it to at least 3, on both master and slave? It'd be useful to try reproducing the failure in a pure no-MXS master-slave. Sure makes sense if you can afford that and the failure occurs rather reliably. If it's reproducible reliably but MXS can't be touched, I'd love to see network dump. | |||||||
| Comment by Andrei Elkin [ 2022-11-14 ] | |||||||
|
R, and one request for you (I am copying from markus makela who asked it elsewhere),
| |||||||
| Comment by Andrei Elkin [ 2022-11-14 ] | |||||||
|
R, could you please also consider to set master_verify_checksum = 1 which activates checksum verification on master, which of course would slow it a bit. This may help to clear out doubts about what the master actually sends. | |||||||
| Comment by Anton [ 2022-11-14 ] | |||||||
|
Hello Elkin At the weekend I got the same error with stop replication , with the same events in the logs I've upload MDEV-29981-mxs-report.txt to ftp I've changed log warning to 3 , with 4 are generated a log of Aborted connection warnings (This connection closed normally) | |||||||
| Comment by Andrei Elkin [ 2022-11-23 ] | |||||||
|
R, hello. Did you observe by any chance this one?
| |||||||
| Comment by Anton [ 2022-11-24 ] | |||||||
|
Hi Elkin , I've never seen this error in the logs `bogus data in log event` But I often see the next error in the maxscale logs
I'm not sure , but I can assume that this can be related with I'm waiting approve for update master to 4.26 version | |||||||
| Comment by Andrei Elkin [ 2022-11-25 ] | |||||||
|
Thanks for the reply, R. I am all ears how it would proceed should you have upgraded. | |||||||
| Comment by Anton [ 2022-12-19 ] | |||||||
|
Hello Elkin I did next changes few weeks ago This actions are reduced count of issues with stop replica . I got this error only one time at last week . But the error still exists Yes, I had swap on master , and I'm going to turn off it . On replicas I've already switched off | |||||||
| Comment by Andrei Elkin [ 2022-12-19 ] | |||||||
|
Thank you R! To my swap question, it was about weather you might observe correlation of a very active swap usage and the error appearance. We've been investigating potentially related All the best to you on the upcoming holidays | |||||||
| Comment by Anton [ 2023-02-20 ] | |||||||
|
Hi Elkin Sorry for long silence I've just switched master at the previous week (waited approve for a long time) I'll provide information if the error with replication appear again | |||||||
| Comment by Anton [ 2023-03-15 ] | |||||||
|
Hello Elkin The problem didn't reproduce for more than 3 weeks I think the current ticket can be closed | |||||||
| Comment by Andrei Elkin [ 2023-03-16 ] | |||||||
|
Thank you R! With my teammate bnestere we have had a plan to print replication event dumps into the server error log near binlog rotation, with log_warnings > 2. I think we'd close the ticket when it's done. |