[MDEV-29255] SQL waits forever when tmpdir full. Expect getting disk space full error. Created: 2022-08-05  Updated: 2022-11-03

Status: Open
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.5, 10.6
Fix Version/s: None

Type: Bug Priority: Major
Reporter: William Wong Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Environment:

redhat on VM



 Description   

Hi all,

Our production DB version is MariaDB 10.6.7 with a dedicate 10GB mount point for tmpdir. We got a few incidents that - when a SQL uses up tmpdir disk space for sorting, that SQL will keep waiting forever. All new SQL needs tmpdir will hang. Needs manual stop DB and remove files in tmpdir. This affects the availability a lot.

Checked the DB log saying waiting someone to free disk space. However, it is tmpdir and no one is touching it manually until service impact.

We increased disk space as workaround. However, we never know when a even bigger tmpdir usage SQL will come. We have 200 more mariadb and this is no way to estimate the optimal tmpdir for each DB.

Move tmpdir to same mount point as datadir cannot solve the problem. Bad SQL can still use up the disk space and introuduces availability issue.

The same problem also found in MariaDB 10.5.12.

While using MySQL, the SQL got disk full error immediately and, then, tmpdir disk space released to free. Other SQL statements are running fine and NO avaiability issue.

Kindly advice how can we have the behavior same as MySQL in this situation ?

Below is the step to repeat the issue

# at OS level, setup mariadb 10.6.7 and run it

# at VM level, create new disk with a few GB

# back to OS level

# rescan disk
for i in $(ls /sys/class/scsi_host/)
do
echo "- - -" > /sys/class/scsi_host/${i}/scan
done

# check rescan result
for i in $(ls /sys/class/scsi_device/)
do
echo 1 > /sys/class/scsi_device/${i}/device/rescan
done

# find the dev name (below example use /dev/sdb)
fdisk -l

# create partition
fdisk /dev/sdb
# p -> n -> p -> w

# mount the tmpdir with 2MB only
partprobe
pvcreate /dev/sdb1
vgcreate vgtmpdir /dev/sdb1
lvcreate -n lvtmpdir -L 2M vgtmpdir
mkfs.ext4 /dev/vgtmpdir/lvtmpdir
mkdir -p /var/tmp
mount /dev/vgtmpdir/lvtmpdir /var/tmp
chmod 4777 /var/tmp

# set tmpdir DB parameter
rm -f /etc/my.cnf.d/tmpdir.cnf
echo "[mysqld]" >> /etc/my.cnf.d/tmpdir.cnf
echo "tmpdir=/var/tmp" >> /etc/my.cnf.d/tmpdir.cnf

# restart DB
systemctl stop mariadb
systemctl start mariadb

# run large sorting SQL

show variables like 'tmpdir' ;

select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS
order by TABLE_NAME ;

#### SQL keeps waiting at this point but expect it returns error

MariaDB [(none)]> show processlist ;
--------------------------------------------------------------------------------------------------------------------------------------------------------------

Id User Host db Command Time State Info Progress

--------------------------------------------------------------------------------------------------------------------------------------------------------------

3 root localhost NULL Query 109 Removing tmp table select * from information_schema.COLUMNS union all
select * from information_schema.COLUMNS union a
0.000
5 root localhost NULL Query 0 starting show processlist 0.000

--------------------------------------------------------------------------------------------------------------------------------------------------------------
2 rows in set (0.000 sec)

#### tmpdir keeps 100%

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vgtmpdir-lvtmpdir 2.9M 2.6M 3.0K 100% /var/tmp

#### DB log
[Warning] mariadbd: Disk is full writing '/var/tmp/#sql-temptable-d1d7-3-9.MAD' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)



 Comments   
Comment by Aldo Gibbs [ 2022-11-03 ]

In quick summary; if a query takes up all available disk space, mariadb should immediately free up all resources used by the query.

As for next steps there are two options:

  • end the query with an error message
  • free up all resources used by query and then try again with exponential backoff up to a certain limit

The approach used now guarantees zero database availability under high attempted utilization.

Comment by Aldo Gibbs [ 2022-11-03 ]

to clarify the "zero database availability":

if 20 queries happen at nearly the same time and each requires 6% of available disk space, they will all be stuck waiting for just a little bit more disk space. Meanwhile NO other queries can run.

With exponential backoff this would not be a problem, even if a query required 200% disk space. It would be a problem for that one query, sure, but ONLY that one query would be hindered.

Generated at Thu Feb 08 10:07:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.