[MDEV-27803] MariaDB binlog corruption when "No space left on device" and stuck session killed by client Created: 2022-02-11 Updated: 2024-01-11 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Replication |
| Affects Version/s: | 10.2, 10.6 |
| Fix Version/s: | 10.6 |
| Type: | Bug | Priority: | Major |
| Reporter: | Hugo Wen | Assignee: | Andrei Elkin |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||
| Description |
|
MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure. The issue is reproducible in 10.6.5 and was also seen in 10.2.40. Issue description:When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running.
As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage. However in this condition, if the stuck session is killed by the client, the binlog writing will break and couldn't recover. After the binlog corrupted:
The only way to recovery in this scenario is to restart MariaDB server. However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it:
How to reproduce:The issue can be reproduced with following steps, using source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/, and building/installing on AWS EC2 instance.
|
| Comments |
| Comment by Daniel Black [ 2022-02-11 ] |
|
Thank you for the detailed bug report. |
| Comment by Richard DEMONGEOT [ 2022-02-11 ] |
|
Hello, I've open a bug that seems very related a month ago ( MDEV-27436 ) so i've linked both ticket as related. Regards, |
| Comment by Otto Kekäläinen [ 2022-02-18 ] |
|
marko What are your thoughts on expected behavior of InnoDB when the data directory runs out of disk space (or if in general the filesystem suddenly goes into read-only mode for whatever reason)? Should mariadbd shut down automatically in such a case? Or should database stay on but yield errors, and continue once disk is writeable again? Should SELECT queries and the database connections still work and only write operations yield errors while filesystem does not accept writes? |
| Comment by Daniel Black [ 2022-02-18 ] |
|
As a general InnoDB IO error handling the Do you have a user preference? |
| Comment by Marko Mäkelä [ 2022-02-18 ] |
|
otto, InnoDB normally attempts to allocate space upfront. The InnoDB redo log should never run out of space, because it is circular. If an InnoDB data file needs to be extended, then I believe that a failure to extend a file currently results in the server being killed. A more robust error handling would be to refuse the write operation that resulted in the need to extend the data file. In any case, no data should be lost in the InnoDB layer due to running out of space. To my understanding, both log_bin and the Aria storage engine recovery log are appended on write. When it comes to the Aria storage engine, I believe that it cannot be changed to use a circular log file without restricting the maximum size of a transaction. Since InnoDB writes undo log into data pages that are covered by the circular redo log, open transactions do not prevent any redo log checkpoints. I do not know anything about the binlog, but I would not be surprised if the maximum transaction size is the minimum binlog file size. |
| Comment by Otto Kekäläinen [ 2022-02-18 ] |
|
Thanks for the InnoDB description Marko! Actually we should perhaps ask Elkin to chime in on what his thoughts are about the expected behavior of binlogs when disk is full (or filesystem goes into read-only mode for some other reason)? |
| Comment by Andrei Elkin [ 2022-02-18 ] |
|
otto, in case binlog file system gets full and the server gets crashed (to my testing the server can only be killed) there will (or should) be the following at restart: To WL#5493 trimming, actually I could not confirm that with my testing (on 10.6), it needs |
| Comment by Otto Kekäläinen [ 2022-02-19 ] |
|
Thanks Elkin for the comments. One problem here is that the binlogs are written by the primary DB and used for replication to be applied by replicas. Thus in theory the primary DB could continue working even when disk is full, but only replication would fail as binlogs are no longer written. Is the replicas exists for fail-over and high availability purposes, it would be a bit counterproductive to shut down the primary DB and make the whole application fail. Or is the assumption that if replication is on, the primary DB can be shut down and the app should fail-over to one of the replicas? And hopefully the replicas don't have their disk in read-only mode. Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down and recovery after disk is again writeable would happen by creating new replicas from the primary DB? Or is there some middle ground, the primary DB might close all connections and refuse new writes but still keep flushing the binlogs to the replicas, allowing one of the replicas to be promoted the primary DB as soon as they have caught up with the primary DB? And only after that fully shut down the primary DB? What if MariaDB had some code that would trigger a safe shutdown and flush before it runs out of disk space? |
| Comment by Andrei Elkin [ 2022-02-21 ] |
|
otto, Yw.
must be viable. From the server side though we need to ensure smooth shutdown (I did have a hang at my testing, to explore and fix if that's the case). It then would be the application burden to find the most updated slave to fail over the master role onto. (An automatic fail-over announced as a part of MDEV-19140 is not yet in the plans). I did not get your idea, sorry, in
So could you please explain? Cheers. Andrei |
| Comment by Otto Kekäläinen [ 2022-02-22 ] |
Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down
I meant that if the filesystem where binlogs are goes into read-only mode (disk full, filesystem corrupted to kernel remounts it as read-only, network filesystem with hickup, or whatever reason) and new binlog entries cannot be written, then the primary database could in still continue to serve both writes and reads if the InnoDB data tables are on different filesystem that is still writeable. However since no new binlogs are written, replication would be broken and the primary DB should tell the replicas that they are no longer up to date and thus inconsistent. Alternatively is neither binlog nor data tables can be written, the primary database could still continue to run but only serve SELECT queries and issue warnings both to client connections and in server logs that it does not accept write operations. My purpose was just to list different scenarios so you can consider what the designed behavior should be in them. |
| Comment by Andrei Elkin [ 2022-02-22 ] |
|
> the primary DB should tell the replicas that they are no longer up to date and thus inconsistent. This is a good idea and can be implemented separately or along with cherry-picking We can also help out the out-of-disk binlog primary to continue replication, again with sending out replicated events directly (not touching the binlog). |
| Comment by Michael Widenius [ 2023-10-24 ] |
|
Thing to do on the MariaDB server side to make things easier if something like this happens again:
Fix the following messages to make it things more clear of what is going on: 2023-10-16 11:54:18 14 [Note] Slave I/O thread exiting, read up to log 'bin_log.003418', position 10664265; GTID position 1-2-8425025170, master xxxx 2023-10-16 12:28:02 142367 [Note] Slave SQL thread exiting, replication stopped in log 'bin_log.003418' at position 10664156; GTID position '1-2-8425025170', master: xxxx 2023-10-16 13:02:07 491 [Note] Slave I/O thread: connected to master 'xxxx',replication starts at GTID position xxx |