[MDEV-14014] Multi-Slave Replication Fail: bogus data in log event Created: 2017-10-06  Updated: 2020-12-08  Resolved: 2018-07-01

Status: Closed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.20
Fix Version/s: 10.1.35, 10.2.17, 10.3.8

Type: Bug Priority: Critical
Reporter: Nilnandan Joshi Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: upstream

Issue Links:
Relates
relates to MDEV-19689 Replication I/O thread stops daily: "... Closed
Sprint: 10.2.14, 10.1.32

 Description   

After upgrading to 10.1.20, replication now intermittently stops (randomly and not all slaves of a master at the same time) with the following errors:

2017-10-01 20:38:20 140079966804736 [ERROR] Master 'trades': Error reading packet from server: bogus data in log event; the first event 'mysql-bin.038184' at 156161357, the last event read from 'mysql-bin.038185' at 60018003, the last byte read from 'mysql-bin.038185' at 60018022. (server_errno=1236)
2017-10-01 20:38:20 140079966804736 [ERROR] Master 'trades': Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'bogus data in log event; the first event 'mysql-bin.038184' at 156161357, the last event read from 'mysql-bin.038185' at 60018003, the last byte read from 'mysql-bin.038185' at 60018022.', Internal MariaDB error code: 1236

We can fix this by issuing a START SLAVE and everything works without issue but the regular replication failures occurs.

It looks like upstream bug : https://bugs.mysql.com/bug.php?id=84752

With setting slave_compressed_protocol=1 and change the sync_binlog value to off, it's working fine. This is not easily reproducible. As per the comment in upstream bug, able to reproduce only with high load on master server and multiple slaves (like 5 or 6)



 Comments   
Comment by Andrei Elkin [ 2018-03-19 ]

@serg: Hello. I have a patch to MDEV-14014 which adds a feature to IO_CACHE which fixes the bug. I would be good someone outside repl to look at. I am to publish it yet.

serg4:07 PM
sure, I can do it

Comment by Andrei Elkin [ 2018-04-11 ]

Sergei,

I need to speak with you again about MDEV-14014 as I have made my mind about the case.

You might have perceived the issue as mere partial read by Dump thread of a event group that is being append to the binlog by a concurrent writer (user) thread. Actually I did not confront that with clear pointing at possible dirty read. 'Dirty' here stands for data that may not exist at all. E.g:

2017-10-02 10:48:35 [ERROR] Error in Log_event::read_log_event(): 'Event truncated', data_len: 201326604, event_type: 1

The dirty read I believe can be explained by the docs and Posix. I dug out some relevant references. I also have something to say about your tail--follow counter-example.

My understanding is as follows.

Suppose we have concurrent Reader (R) and Writer threads (W). R may access data being written by W.
Let the following be a sequence of bytes being appended to the file:

   b1_written b2 ... b_last_written | b1_dirty... EOF

Here bi_written subrange reflects the partially written segment of data. The rest of

   b1_dirty till EOF

are bytes that the reader may (my claim) access. And if this possibility is real it answers
your remark

I don't understand why reading till EOF is wrong...

from our slack chat.
And the standard http://www.opengroup.org/onlinepubs/009695399/functions/pwrite.html seems to confirm it:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

The EOF set affront of bytes recording does explain the above customer error: the Dump thread caught EOF advanced but data had not been yet written, those dirty data included the length of the event.

The following link could give you some thoughts:
https://stackoverflow.com/questions/1154446/is-file-append-atomic-in-unix

However concurrent reads to atomic appends may see torn writes
depending on OS, filing system, and what flags you opened the file
with - the increment of the maximum file extent is atomic, but the
visibility of the writes with respect to reads may or may not be
atomic.

and rather confirms my (very early) guess 'visibility of the writes
with respect to reads may or may not be atomic'. The question naturally
goes: atomic to what. And I read that as atomic to the file offset
setting.

To your puzzling "negative" to my theory 'tail -f' use case notice that according to
https://en.wikipedia.org/wiki/Tail_(Unix) it actually relies on inotify before attempting
to read. It would be reasonable to expect inotify respects atomicity (of EOF setting and last bytes written).

Could you please consider whether these arguments are reasonable.

Cheers,

Andrei

Comment by Andrei Elkin [ 2018-04-11 ]

Sergei, please read my clarification to the patch. I think we can suspect dirty read and
explain its mechanism via docs which I refer. Thank you. Andrei

Comment by Sergei Golubchik [ 2018-04-11 ]

1. I think you misinterpret the standard. What it says is that "the file offset shall be set to the end of the file prior to each write" that is, before every write it does (in a sense) lseek(fd, 0, SET_END). It does not say that prior to each write the file length should be set to what it will be when the write will be completed.

2. "torn writes" mean, exactly, that one can read partial data. Like, a writer writing "123456789" and a reader reading only "1234<EOF>", so, yes, writes are not (necessarily) atomic.

3. inotify is much newer than tail. Even your linked wikipedia article says that tail used to probe the file, repeatedly reading till eof.

Comment by Andrei Elkin [ 2018-04-11 ]

>1. I think you misinterpret the standard.

Sorry, I must admit that. We can not use it as the standard's confirmation for dirty read possibility.
I think it's fair to find an answer to why the reader can't see the data state like

1234[non-written-byte]EOF

I could not find any guarantee that 'non-written-byte' can not be found by the reader.
But the bug case all but exposes it as incorrect value of 'data_len'.

Yes, I was content with the current implementation of tail-f not to neglect what was before. In particular it could use some other methods/api:s to achieve atomicity of EOF and data-written.

Comment by Andrei Elkin [ 2018-06-12 ]

Sergei, could you please check out [Commits] c7a2d42c30b: MDEV-14014
which implements your simplification.

Cheers,

Andrei

Generated at Thu Feb 08 08:10:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.