[MDEV-33877] Disk full with transactional Aria table can lead to a hang - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: None
Fix Version/s: None
Component/s: Storage Engine - Aria
Labels:
None

Description

If one gets disk full with a transactional Aria table, the server may hang during Aria checkpoint. This can happen when checkpoint tries to flush the page cache and a page write will start waiting for someone to free up space. This can block other threads from
doing anything with the table that checkpoint tries to flush. One cannot even drop the
problematic table.
The only way to solve this is to kill the query and free some space
so that the write will succeed. If this does not work one has to kill the server and
delete/move the problematic table.

This does not happen with non transactional Aria tables or temporary tables as these
do not wait for user to free space but gives a write error instead.

How to repeat:
sudo mount -t tmpfs -o size=500M tmpfs /mariadb/temp
create table t1 (a int primary key auto_increment, v varchar(1000), key (v));
insert into t1 select seq, repeat('a', 1000) from seq_100001_to_1000000;

Suggested fix to solve the hang between checkpoint and and other threads:

If there is a 'disk full' error when flushing a page as part of checkpoint, ignore the error
and try to do the checkpoint later. Note that pages are flushed both as part of _ma_bitmap_flush_all() and flush_pagecache_blocks_with_filter(). _ma_bitmap_flush_all() will be more problematic to fix as it tries to write a new page to the pagecache, which can cause pagecache to write another page to disk, which may fail.
If there is a 'disk full' error when trying to flush an old page to the disk, mark the current
table as corrupted and keep the old page in the page cache. This will ensure that other tables are not corrupted if someone frees up space on the disk.

Attachments

Issue Links

relates to

MDEV-33813 ERROR 1021 (HY000): Disk full (./org/test1.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")

Closed

MDEV-34642 Shutdown take indefinitely when /tmp is full.

Stalled

Activity

People

Assignee:: Unassigned

Reporter:: Michael Widenius

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2024-04-10 14:26

Updated:: 2024-07-24 07:28

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.