Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.2(EOL), 10.6
-
None
Description
MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure.
The issue is reproducible in 10.6.5 and was also seen in 10.2.40.
Issue description:
When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running.
2022-02-10 23:51:28 6 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) |
2022-02-10 23:51:28 6 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs |
As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage.
However in this condition, if the stuck session is killed by the client, the binlog writing will break and couldn't recover.
After the binlog corrupted:
- new insert query will fail with errors of errno: 11 "Resource temporarily unavailable")
MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
2022-02-10 23:55:46 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
- Using mysqlbinlog to parse the problematic binlog file will fail with following error:
# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001
ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32
The only way to recovery in this scenario is to restart MariaDB server. However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it:
Last_IO_Error: Relay log write failure: could not queue event from master
|
How to reproduce:
The issue can be reproduced with following steps, using source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/, and building/installing on AWS EC2 instance.
- Create an EC2 instance of Amazon Linux 2, add a EBS volume storage of 1 GB. Mount the volume to `/data` on instance:
sudo su -
fdisk -l
mkdir /data
mkfs.ext4 /dev/nvme1n1
mount /dev/nvme1n1 /data
- Checkout MariaDB 10.6.5, build and install. Here's the parameters and commands I used:
yum-builddep -y mariadb-server
yum install -y git gcc gcc-c++ bison libxml2-devel libevent-devel rpm-build
git clone https://github.com/MariaDB/server.git --branch mariadb-10.6.5 --depth 1
cd server && cmake . && make -j `nproc` &&make install
- Prepare
pkill mysqld && sleep 1
sudo rm -rf /data/*
sudo mkdir -p /data/log/binlog /data/log/error /data/log/innodb/ /data/db/innodb/ /data/tmp/
sudo chown `whoami`:`whoami` /data -R
- Init the DB and start with `--log-bin` in background
sudo /usr/local/mysql/scripts/mysql_install_db \
--no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 \
--auth-root-authentication-method=normal --force --skip-name-resolve --skip-test-db --cross-bootstrap --innodb-data-home-dir /data/db/innodb
Â
/usr/local/mysql/bin/mysqld --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 --innodb-data-home-dir /data/db/innodb \
--binlog_cache_size 32768 --binlog_format MIXED --max_binlog_size 134217728 --sync-binlog 1 --log-bin='/data/log/binlog/mysql-bin-changelog' --skip-grant-tables --server_id=2 &
- Connect to the database and create test db/table:
/usr/local/mysql/bin/mysql -e "\
create database t; \
CREATE TABLE t.t1 (a INT, b MEDIUMTEXT) ENGINE=Innodb; \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"
- Fill up the storage
let size1=`df /dev/nvme1n1 | sed -n '2p' | awk '{print $4}'`*1024
fallocate -l $size1 /data/1
cat /data/1 > /data/2
- Keep inserting a few times until query stuck
/usr/local/mysql/bin/mysql -e " \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"
- At this time, check the mysql-error log should contain following lines
2022-02-09 21:22:30 31 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000011' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
2022-02-09 21:22:30 31 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
- Show processlist shows query state is "Commit":
MariaDB [(none)]> show full processlist;
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
| 11 | root | localhost | NULL | Query | 48 | Commit | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') | 0.000 |
| 12 | root | localhost | NULL | Query | 0 | starting | show full processlist | 0.000 |
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
2 rows in set (0.000 sec)
- Kill the stuck client using ctrl+c
MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [(none)]> Ctrl-C -- exit!
Aborted
- At this point `show processlist`shows the query "Command=Killed" and "State=Commit"
#/usr/local/mysql/bin/mysql -e "show processlist"
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
| 5 | root | localhost | NULL | Killed | 22 | Commit | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 0.000 |
| 8 | root | localhost | NULL | Query | 0 | starting | show processlist | 0.000 |
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
- Wait for 1 min (Time in processlist becomes 59)
Then should see this error:[ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 28 "No space left on device")
- release storage by rm /data/1
[21:24:54][root][~]$ rm /data/1
rm: remove regular file '/data/1'? y
- Ideally at this time since there's enough storage, the binlog should be recovered.
- However, the binlog is already corrupted at this point:
use mysqlbinlog to parse the binlog will see errors like:[root@ip-172-31-41-130 tmp]# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001
ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32
- Retry the inserting it will show errors about the binlog writing.
/usr/local/mysql/bin/mysql -e "INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')"
ERROR 1026 (HY000) at line 1: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
Error log:
2022-02-10 23:18:17 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
Attachments
Issue Links
- relates to
-
MDEV-14462 Confusing error message: ib_logfiles are too small for innodb_thread_concurrency=0
- Closed
-
MDEV-27436 binlog corruption (/tmp no space left on device at the same moment)
- Closed