[MDEV-21087] ER_SLAVE_INCIDENT arrives at slave without failure specifics Created: 2019-11-19  Updated: 2022-08-16  Resolved: 2022-07-26

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.3.36, 10.4.26, 10.5.17, 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Critical
Reporter: Andrei Elkin Assignee: Brandon Nesterenko
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-21443 Donot report INCIDENT_EVENT in binary... Closed

 Description   

When the mariadb slave (error) stops at receiving the event there's no description
of what led to it. Neither in the event nor in the master's error log.

Partly that issue was covered by an upstream patch:

@commit 68cbdee45628349c82cbcf3f530b8859615a3f38
Author: Bill Qu <bill.qu@Oracle.com>
Date: Wed Sep 18 10:27:41 2013 +0800

Bug #17258782 MASTER DIDN'T WRITE ERROR MESSAGE IN THE LOG FILE FOR INCIDENT

Master is not generating error message to log file
when writing an incident event to binlog. So user
is not alerted that slave servers will be stopped
by the incident event later.
...



 Comments   
Comment by Sujatha Sivakumar (Inactive) [ 2019-12-17 ]

Hello Andrei,

Please review the fix for MDEV-21087.

Build Bot: http://buildbot.askmonty.org/buildbot/grid?category=main&branch=bb-10.2-sujatha
Patch: https://github.com/MariaDB/server/commit/8576ea11f195dc2f76cd36a064712c40470a7afa

Thank you.

Comment by Sujatha Sivakumar (Inactive) [ 2020-01-08 ]

Current issue is specifically meant to include the upstream fix for
Bug #17258782 MASTER DIDN'T WRITE ERROR MESSAGE IN THE LOG FILE FOR INCIDENT.

Even though Arista networks experienced a similar problem with lost events and
no error message is being logged in server log, the main issue is INCIDENT
EVENT should not be written to binary log when safe rollback is possible.
i.e DMLs are being performed in InnoDB engine and transaction cache becomes
full. Here only error should have been reported to client rather than writing
INCIDENT EVENT.

For transactional changes that do not fit into the cache the following action
needs to be taken:
a) the statement is not logged
b) its respective statement gives an error

For non-transactional changes that do not fit into the cache and safe
rollback is not possible because multi statement transaction changed both
trans/non-trans tables then INCIDENT EVNENT should be written.

Hence a new https://jira.mariadb.org/browse/MDEV-21443 is reported to track
this issue.

Comment by Andrei Elkin [ 2022-07-07 ]

Taking on myself the old commit to investigate/elaborate.

Comment by Brandon Nesterenko [ 2022-07-15 ]

Hi Andrei!

I've extended Sujatha's original work and it is
ready for review: PR-2190

Comment by Andrei Elkin [ 2022-07-25 ]

[ Approved ] on GH.

Comment by Brandon Nesterenko [ 2022-07-26 ]

Pushed into 10.3 as 555c12a.

There will be merge conflicts in 10.4 (git), 10.5 (git), and 10.7 (compilation only).

A branch exists for each conflict:
bb-10.4-21087-merge
bb-10.5-21087-merge
bb-10.7-21087-merge

Generated at Thu Feb 08 09:04:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.