[MDEV-27803] MariaDB binlog corruption when "No space left on device" and stuck session killed by client - Jira

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.2(EOL), 10.6
Fix Version/s: 10.6
Component/s: Replication
Labels:
None

Description

MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure.

The issue is reproducible in 10.6.5 and was also seen in 10.2.40.

Issue description:

When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running.

2022-02-10 23:51:28 6 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)

2022-02-10 23:51:28 6 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage.

However in this condition, if the stuck session is killed by the client, the binlog writing will break and couldn't recover.

After the binlog corrupted:

new insert query will fail with errors of errno: 11 "Resource temporarily unavailable")

MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');

2022-02-10 23:55:46 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")

Using mysqlbinlog to parse the problematic binlog file will fail with following error:

# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001

ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32

The only way to recovery in this scenario is to restart MariaDB server. However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it:

Last_IO_Error: Relay log write failure: could not queue event from master

How to reproduce:

The issue can be reproduced with following steps, using source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/, and building/installing on AWS EC2 instance.

Create an EC2 instance of Amazon Linux 2, add a EBS volume storage of 1 GB. Mount the volume to `/data` on instance:

sudo su -

fdisk -l

mkdir /data

mkfs.ext4 /dev/nvme1n1

mount /dev/nvme1n1 /data

Checkout MariaDB 10.6.5, build and install. Here's the parameters and commands I used:

yum-builddep -y mariadb-server

yum install -y git gcc gcc-c++ bison libxml2-devel libevent-devel rpm-build

git clone https://github.com/MariaDB/server.git --branch mariadb-10.6.5 --depth 1

cd server && cmake . && make -j `nproc` &&make install

Prepare

pkill mysqld && sleep 1

sudo rm -rf /data/*

sudo mkdir -p /data/log/binlog /data/log/error /data/log/innodb/ /data/db/innodb/ /data/tmp/

sudo chown `whoami`:`whoami` /data -R

Init the DB and start with `--log-bin` in background

sudo /usr/local/mysql/scripts/mysql_install_db \

 --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 \

 --auth-root-authentication-method=normal --force --skip-name-resolve --skip-test-db --cross-bootstrap --innodb-data-home-dir /data/db/innodb

/usr/local/mysql/bin/mysqld --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 --innodb-data-home-dir /data/db/innodb  \

--binlog_cache_size 32768  --binlog_format MIXED   --max_binlog_size 134217728   --sync-binlog 1 --log-bin='/data/log/binlog/mysql-bin-changelog'  --skip-grant-tables --server_id=2 &

Connect to the database and create test db/table:

/usr/local/mysql/bin/mysql -e "\

create database t; \

CREATE TABLE t.t1 (a INT, b MEDIUMTEXT) ENGINE=Innodb; \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"

Fill up the storage

let size1=`df /dev/nvme1n1 | sed -n '2p' | awk '{print $4}'`*1024

fallocate -l $size1 /data/1

cat /data/1 > /data/2

Keep inserting a few times until query stuck

/usr/local/mysql/bin/mysql -e " \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \

INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"

At this time, check the mysql-error log should contain following lines

2022-02-09 21:22:30 31 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000011' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)

2022-02-09 21:22:30 31 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs

Show processlist shows query state is "Commit":

MariaDB [(none)]> show full processlist;

+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+

| Id | User | Host      | db   | Command | Time | State    | Info                                                                                                                       | Progress |

+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+

| 11 | root | localhost | NULL | Query   |   48 | Commit   | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') |    0.000 |

| 12 | root | localhost | NULL | Query   |    0 | starting | show full processlist                                                                                                      |    0.000 |

+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+

2 rows in set (0.000 sec)

Kill the stuck client using ctrl+c

MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");

^CCtrl-C -- query killed. Continuing normally.

^CCtrl-C -- query killed. Continuing normally.

ERROR 2013 (HY000): Lost connection to MySQL server during query

MariaDB [(none)]> Ctrl-C -- exit!

Aborted

At this point `show processlist`shows the query "Command=Killed" and "State=Commit"

#/usr/local/mysql/bin/mysql -e "show processlist"

+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+

| Id | User | Host      | db   | Command | Time | State    | Info                                                                                                 | Progress |

+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+

|  5 | root | localhost | NULL | Killed  |   22 | Commit   | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa |    0.000 |

|  8 | root | localhost | NULL | Query   |    0 | starting | show processlist                                                                                     |    0.000 |

+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+

Wait for 1 min (Time in processlist becomes 59)
Then should see this error:

[ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 28 "No space left on device")

release storage by rm /data/1

[21:24:54][root][~]$ rm /data/1

rm: remove regular file '/data/1'? y

Ideally at this time since there's enough storage, the binlog should be recovered.

However, the binlog is already corrupted at this point:
use mysqlbinlog to parse the binlog will see errors like:

[root@ip-172-31-41-130 tmp]# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001

ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32

Retry the inserting it will show errors about the binlog writing.

/usr/local/mysql/bin/mysql -e "INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')"

ERROR 1026 (HY000) at line 1: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")

Error log:

2022-02-10 23:18:17 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")

Attachments

Issue Links

relates to

MDEV-14462 Confusing error message: ib_logfiles are too small for innodb_thread_concurrency=0

Closed

MDEV-27436 binlog corruption (/tmp no space left on device at the same moment)

Closed

Activity

Ascending order - Click to sort in descending order

Hugo Wen created issue - 2022-02-11 01:24

Hugo Wen made changes - 2022-02-11 01:27

Field	Original Value	New Value
Description	MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure. The issue is reproducible in 10.6.5 and was also seen in 10.2.40. h3. Issue description: When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running. {code:java} 2022-02-10 23:51:28 6 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 2022-02-10 23:51:28 6 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs {code} As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage. However in this condition, if the stuck session is killed by the client, the binlog writing will break and couldn't recover. After the binlog corrupted: # new insert query will fail with errors of {{errno: 11 "Resource temporarily unavailable")}} {code:java} MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); 2022-02-10 23:55:46 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable") {code} # Using mysqlbinlog to parse the problematic binlog file will fail with following error: {code} # /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001 ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32 {code} The only way to recovery in this scenario is to restart MariaDB server. However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it: {code} Last_IO_Error: Relay log write failure: could not queue event from master {code} h3. How to reproduce: I'm able to reproduce this issue by building and installing from source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/ on using AWS EC2 instance. # Create an EC2 instance of Amazon Linux 2, add a EBS volume storage of 1 GB. Mount the volume to `/data` on instance: {code} sudo su - fdisk -l mkdir /data mkfs.ext4 /dev/nvme1n1 mount /dev/nvme1n1 /data {code} # Checkout MariaDB 10.6.5, build and install. Here's the parameters and commands I used: {code} yum-builddep -y mariadb-server yum install -y git gcc gcc-c++ bison libxml2-devel libevent-devel rpm-build git clone https://github.com/MariaDB/server.git --branch mariadb-10.6.5 --depth 1 cd server && cmake . && make -j `nproc` &&make install {code} # Prepare {code} pkill mysqld && sleep 1 sudo rm -rf /data/* sudo mkdir -p /data/log/binlog /data/log/error /data/log/innodb/ /data/db/innodb/ /data/tmp/ sudo chown `whoami`:`whoami` /data -R {code} # Init the DB and start with `--log-bin` in background {code} sudo /usr/local/mysql/scripts/mysql_install_db \ --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 \ --auth-root-authentication-method=normal --force --skip-name-resolve --skip-test-db --cross-bootstrap --innodb-data-home-dir /data/db/innodb /usr/local/mysql/bin/mysqld --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 --innodb-data-home-dir /data/db/innodb \ --binlog_cache_size 32768 --binlog_format MIXED --max_binlog_size 134217728 --sync-binlog 1 --log-bin='/data/log/binlog/mysql-bin-changelog' --skip-grant-tables --server_id=2 & {code} # Connect to the database and create test db/table: {code} /usr/local/mysql/bin/mysql -e "\ create database t; \ CREATE TABLE t.t1 (a INT, b MEDIUMTEXT) ENGINE=Innodb; \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');" {code} # Fill up the storage {code} let size1=`df /dev/nvme1n1 \| sed -n '2p' \| awk '{print $4}'`1024 fallocate -l $size1 /data/1 cat /data/1 > /data/2 {code} # Keep inserting a few times until query stuck {code} /usr/local/mysql/bin/mysql -e " \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');" {code} # At this time, check the mysql-error log should contain following lines {code} 2022-02-09 21:22:30 31 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000011' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 2022-02-09 21:22:30 31 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs {code} # Show processlist shows query state is "Commit": {code} MariaDB [(none)]> show full processlist; +----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+ \| Id \| User \| Host \| db \| Command \| Time \| State \| Info \| Progress \| +----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+ \| 11 \| root \| localhost \| NULL \| Query \| 48 \| Commit \| INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') \| 0.000 \| \| 12 \| root \| localhost \| NULL \| Query \| 0 \| starting \| show full processlist \| 0.000 \| +----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+ 2 rows in set (0.000 sec) {code} # Kill the stuck client using ctrl+c {code} MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query MariaDB [(none)]> Ctrl-C -- exit! Aborted {code} # At this point `show processlist`shows the query "Command=Killed" and "State=Commit" {code} #/usr/local/mysql/bin/mysql -e "show processlist" +----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+ \| Id \| User \| Host \| db \| Command \| Time \| State \| Info \| Progress \| +----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+ \| 5 \| root \| localhost \| NULL \| Killed \| 22 \| Commit \| INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa \| 0.000 \| \| 8 \| root \| localhost \| NULL \| Query \| 0 \| starting \| show processlist \| 0.000 \| +----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+ {code} # Wait for 1 min (Time in processlist becomes 59) Then should see this error: {code} [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 28 "No space left on device") {code} # release storage by rm /data/1 {code} [21:24:54][root][~]$ rm /data/1 rm: remove regular file '/data/1'? y {code} # Ideally at this time since there's enough storage, the binlog should be recovered.* # However, the binlog will be corrupted at this point: use mysqlbinlog to parse the binlog will see errors like: {code} [root@ip-172-31-41-130 tmp]# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001 ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32 {code} # Retry the inserting it will show errors about the binlog writing. {code} /usr/local/mysql/bin/mysql -e "INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')" ERROR 1026 (HY000) at line 1: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable") {code} Error log: {code} 2022-02-10 23:18:17 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable") {code}	MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure. The issue is reproducible in 10.6.5 and was also seen in 10.2.40. h3. Issue description: When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running. {code:java} 2022-02-10 23:51:28 6 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 2022-02-10 23:51:28 6 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs {code} As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage. However in this condition, if the stuck session is killed by the client, the binlog writing will break and couldn't recover. After the binlog corrupted: # new insert query will fail with errors of {{errno: 11 "Resource temporarily unavailable")}} {code:java} MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); 2022-02-10 23:55:46 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable") {code} # Using mysqlbinlog to parse the problematic binlog file will fail with following error: {code} # /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001 ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32 {code} The only way to recovery in this scenario is to restart MariaDB server. However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it: {code} Last_IO_Error: Relay log write failure: could not queue event from master {code} h3. How to reproduce: The issue can be reproduced with following steps, using source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/, and building/installing on AWS EC2 instance. # Create an EC2 instance of Amazon Linux 2, add a EBS volume storage of 1 GB. Mount the volume to `/data` on instance: {code} sudo su - fdisk -l mkdir /data mkfs.ext4 /dev/nvme1n1 mount /dev/nvme1n1 /data {code} # Checkout MariaDB 10.6.5, build and install. Here's the parameters and commands I used: {code} yum-builddep -y mariadb-server yum install -y git gcc gcc-c++ bison libxml2-devel libevent-devel rpm-build git clone https://github.com/MariaDB/server.git --branch mariadb-10.6.5 --depth 1 cd server && cmake . && make -j `nproc` &&make install {code} # Prepare {code} pkill mysqld && sleep 1 sudo rm -rf /data/* sudo mkdir -p /data/log/binlog /data/log/error /data/log/innodb/ /data/db/innodb/ /data/tmp/ sudo chown `whoami`:`whoami` /data -R {code} # Init the DB and start with `--log-bin` in background {code} sudo /usr/local/mysql/scripts/mysql_install_db \ --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 \ --auth-root-authentication-method=normal --force --skip-name-resolve --skip-test-db --cross-bootstrap --innodb-data-home-dir /data/db/innodb /usr/local/mysql/bin/mysqld --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 --innodb-data-home-dir /data/db/innodb \ --binlog_cache_size 32768 --binlog_format MIXED --max_binlog_size 134217728 --sync-binlog 1 --log-bin='/data/log/binlog/mysql-bin-changelog' --skip-grant-tables --server_id=2 & {code} # Connect to the database and create test db/table: {code} /usr/local/mysql/bin/mysql -e "\ create database t; \ CREATE TABLE t.t1 (a INT, b MEDIUMTEXT) ENGINE=Innodb; \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');" {code} # Fill up the storage {code} let size1=`df /dev/nvme1n1 \| sed -n '2p' \| awk '{print $4}'`1024 fallocate -l $size1 /data/1 cat /data/1 > /data/2 {code} # Keep inserting a few times until query stuck {code} /usr/local/mysql/bin/mysql -e " \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \ INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');" {code} # At this time, check the mysql-error log should contain following lines {code} 2022-02-09 21:22:30 31 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000011' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space) 2022-02-09 21:22:30 31 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs {code} # Show processlist shows query state is "Commit": {code} MariaDB [(none)]> show full processlist; +----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+ \| Id \| User \| Host \| db \| Command \| Time \| State \| Info \| Progress \| +----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+ \| 11 \| root \| localhost \| NULL \| Query \| 48 \| Commit \| INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') \| 0.000 \| \| 12 \| root \| localhost \| NULL \| Query \| 0 \| starting \| show full processlist \| 0.000 \| +----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+ 2 rows in set (0.000 sec) {code} # Kill the stuck client using ctrl+c {code} MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"); ^CCtrl-C -- query killed. Continuing normally. ^CCtrl-C -- query killed. Continuing normally. ERROR 2013 (HY000): Lost connection to MySQL server during query MariaDB [(none)]> Ctrl-C -- exit! Aborted {code} # At this point `show processlist`shows the query "Command=Killed" and "State=Commit" {code} #/usr/local/mysql/bin/mysql -e "show processlist" +----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+ \| Id \| User \| Host \| db \| Command \| Time \| State \| Info \| Progress \| +----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+ \| 5 \| root \| localhost \| NULL \| Killed \| 22 \| Commit \| INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa \| 0.000 \| \| 8 \| root \| localhost \| NULL \| Query \| 0 \| starting \| show processlist \| 0.000 \| +----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+ {code} # Wait for 1 min (Time in processlist becomes 59) Then should see this error: {code} [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 28 "No space left on device") {code} # release storage by rm /data/1 {code} [21:24:54][root][~]$ rm /data/1 rm: remove regular file '/data/1'? y {code} # Ideally at this time since there's enough storage, the binlog should be recovered.* # However, the binlog will be corrupted at this point: use mysqlbinlog to parse the binlog will see errors like: {code} [root@ip-172-31-41-130 tmp]# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001 ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32 {code} # Retry the inserting it will show errors about the binlog writing. {code} /usr/local/mysql/bin/mysql -e "INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')" ERROR 1026 (HY000) at line 1: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable") {code} Error log: {code} 2022-02-10 23:18:17 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable") {code}

Hugo Wen made changes - 2022-02-11 01:30

Description

*MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure.*

The issue is reproducible in 10.6.5 and was also seen in 10.2.40.

h3. *Issue description*:

When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running.

{code:java}
2022-02-10 23:51:28 6 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
2022-02-10 23:51:28 6 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
{code}

As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage.

However in this condition, *if the stuck session is killed by the client, the binlog writing will break and couldn't recover.*

After the binlog corrupted:
# new insert query will fail with errors of {{errno: 11 "Resource temporarily unavailable")}}
{code:java}
MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
2022-02-10 23:55:46 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
{code}
# Using mysqlbinlog to parse the problematic binlog file will fail with following error:
{code}
# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001
ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32
{code}

The only way to recovery in this scenario is to restart MariaDB server. *However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it:*
{code}
Last_IO_Error: Relay log write failure: could not queue event from master
{code}

h3. *How to reproduce:*
The issue can be reproduced with following steps, using source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/, and building/installing on AWS EC2 instance.

# Create an EC2 instance of Amazon Linux 2, add a EBS volume storage of 1 GB. Mount the volume to `/data` on instance:
{code}
sudo su -
fdisk -l
mkdir /data
mkfs.ext4 /dev/nvme1n1
mount /dev/nvme1n1 /data
{code}
# Checkout MariaDB 10.6.5, build and install. Here's the parameters and commands I used:
{code}
yum-builddep -y mariadb-server
yum install -y git gcc gcc-c++ bison libxml2-devel libevent-devel rpm-build
git clone https://github.com/MariaDB/server.git --branch mariadb-10.6.5 --depth 1
cd server && cmake . && make -j `nproc` &&make install
{code}
# Prepare
{code}
pkill mysqld && sleep 1
sudo rm -rf /data/*
sudo mkdir -p /data/log/binlog /data/log/error /data/log/innodb/ /data/db/innodb/ /data/tmp/
sudo chown `whoami`:`whoami` /data -R
{code}
# Init the DB and start with `--log-bin` in background
{code}
sudo /usr/local/mysql/scripts/mysql_install_db \
--no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 \
--auth-root-authentication-method=normal --force --skip-name-resolve --skip-test-db --cross-bootstrap --innodb-data-home-dir /data/db/innodb

/usr/local/mysql/bin/mysqld --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 --innodb-data-home-dir /data/db/innodb \
--binlog_cache_size 32768 --binlog_format MIXED --max_binlog_size 134217728 --sync-binlog 1 --log-bin='/data/log/binlog/mysql-bin-changelog' --skip-grant-tables --server_id=2 &
{code}
# Connect to the database and create test db/table:
{code}
/usr/local/mysql/bin/mysql -e "\
create database t; \
CREATE TABLE t.t1 (a INT, b MEDIUMTEXT) ENGINE=Innodb; \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"
{code}
# Fill up the storage
{code}
let size1=`df /dev/nvme1n1 | sed -n '2p' | awk '{print $4}'`*1024
fallocate -l $size1 /data/1
cat /data/1 > /data/2
{code}
# Keep inserting a few times until query stuck
{code}
/usr/local/mysql/bin/mysql -e " \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"
{code}
# At this time, check the mysql-error log should contain following lines
{code}
2022-02-09 21:22:30 31 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000011' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
2022-02-09 21:22:30 31 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
{code}
# Show processlist shows query state is "Commit":
{code}
MariaDB [(none)]> show full processlist;
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
| 11 | root | localhost | NULL | Query | 48 | Commit | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') | 0.000 |
| 12 | root | localhost | NULL | Query | 0 | starting | show full processlist | 0.000 |
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
2 rows in set (0.000 sec)
{code}
# Kill the stuck client using ctrl+c
{code}
MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [(none)]> Ctrl-C -- exit!
Aborted
{code}
# At this point `show processlist`shows the query "Command=Killed" and "State=Commit"
{code}
#/usr/local/mysql/bin/mysql -e "show processlist"
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
| 5 | root | localhost | NULL | Killed | 22 | Commit | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 0.000 |
| 8 | root | localhost | NULL | Query | 0 | starting | show processlist | 0.000 |
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
{code}
# Wait for 1 min (Time in processlist becomes 59)
Then should see this error:
{code}
[ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 28 "No space left on device")
{code}
# release storage by rm /data/1
{code}
[21:24:54][root][~]$ rm /data/1
rm: remove regular file '/data/1'? y
{code}
# *Ideally at this time since there's enough storage, the binlog should be recovered.*
# However, the binlog will be corrupted at this point:
use mysqlbinlog to parse the binlog will see errors like:
{code}
[root@ip-172-31-41-130 tmp]# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001
ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32
{code}
# Retry the inserting it will show errors about the binlog writing.
{code}
/usr/local/mysql/bin/mysql -e "INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')"
ERROR 1026 (HY000) at line 1: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
{code}
Error log:
{code}
2022-02-10 23:18:17 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
{code}

*MariaDB server should be recoverable during storage-full condition, but in following condition the binlog corrupted during storage-full (0 space on disk). Could cause binlog replay failure, and replication failure.*

The issue is reproducible in 10.6.5 and was also seen in 10.2.40.

h3. *Issue description*:

When MariaDB server runs out of storage, it fails to write binlog file because of "No space left on device". At this time, the server is still running.

{code:java}
2022-02-10 23:51:28 6 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000001' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
2022-02-10 23:51:28 6 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
{code}

As it will keep retrying the binlog writing, it suppose to recover after releasing some storage or adding more storage.

However in this condition, *if the stuck session is killed by the client, the binlog writing will break and couldn't recover.*

After the binlog corrupted:
# new insert query will fail with errors of {{errno: 11 "Resource temporarily unavailable")}}
{code:java}
MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
2022-02-10 23:55:46 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
{code}
# Using mysqlbinlog to parse the problematic binlog file will fail with following error:
{code}
# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001
ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32
{code}

The only way to recovery in this scenario is to restart MariaDB server. *However it left the problematic binlog and it can't be replayed. If there're replicas, replication will also fail because of it:*
{code}
Last_IO_Error: Relay log write failure: could not queue event from master
{code}

h3. *How to reproduce:*
The issue can be reproduced with following steps, using source code https://github.com/MariaDB/server/tree/mariadb-10.6.5/, and building/installing on AWS EC2 instance.

# Create an EC2 instance of Amazon Linux 2, add a EBS volume storage of 1 GB. Mount the volume to `/data` on instance:
{code}
sudo su -
fdisk -l
mkdir /data
mkfs.ext4 /dev/nvme1n1
mount /dev/nvme1n1 /data
{code}
# Checkout MariaDB 10.6.5, build and install. Here's the parameters and commands I used:
{code}
yum-builddep -y mariadb-server
yum install -y git gcc gcc-c++ bison libxml2-devel libevent-devel rpm-build
git clone https://github.com/MariaDB/server.git --branch mariadb-10.6.5 --depth 1
cd server && cmake . && make -j `nproc` &&make install
{code}
# Prepare
{code}
pkill mysqld && sleep 1
sudo rm -rf /data/*
sudo mkdir -p /data/log/binlog /data/log/error /data/log/innodb/ /data/db/innodb/ /data/tmp/
sudo chown `whoami`:`whoami` /data -R
{code}
# Init the DB and start with `--log-bin` in background
{code}
sudo /usr/local/mysql/scripts/mysql_install_db \
--no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 \
--auth-root-authentication-method=normal --force --skip-name-resolve --skip-test-db --cross-bootstrap --innodb-data-home-dir /data/db/innodb

/usr/local/mysql/bin/mysqld --no-defaults --user=`whoami` --datadir=/data/db --basedir=/usr/local/mysql/ --innodb-log-group-home-dir /data/log/innodb --innodb-log-file-size 134217728 --innodb-data-home-dir /data/db/innodb \
--binlog_cache_size 32768 --binlog_format MIXED --max_binlog_size 134217728 --sync-binlog 1 --log-bin='/data/log/binlog/mysql-bin-changelog' --skip-grant-tables --server_id=2 &
{code}
# Connect to the database and create test db/table:
{code}
/usr/local/mysql/bin/mysql -e "\
create database t; \
CREATE TABLE t.t1 (a INT, b MEDIUMTEXT) ENGINE=Innodb; \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"
{code}
# Fill up the storage
{code}
let size1=`df /dev/nvme1n1 | sed -n '2p' | awk '{print $4}'`*1024
fallocate -l $size1 /data/1
cat /data/1 > /data/2
{code}
# Keep inserting a few times until query stuck
{code}
/usr/local/mysql/bin/mysql -e " \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa'); \
INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');"
{code}
# At this time, check the mysql-error log should contain following lines
{code}
2022-02-09 21:22:30 31 [Warning] mysqld: Disk is full writing '/data/log/binlog/mysql-bin-changelog.000011' (Errcode: 28 "No space left on device"). Waiting for someone to free space... (Expect up to 60 secs delay for server to continue after freeing disk space)
2022-02-09 21:22:30 31 [Warning] mysqld: Retry in 60 secs. Message reprinted in 600 secs
{code}
# Show processlist shows query state is "Commit":
{code}
MariaDB [(none)]> show full processlist;
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
| 11 | root | localhost | NULL | Query | 48 | Commit | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa') | 0.000 |
| 12 | root | localhost | NULL | Query | 0 | starting | show full processlist | 0.000 |
+----+------+-----------+------+---------+------+----------+----------------------------------------------------------------------------------------------------------------------------+----------+
2 rows in set (0.000 sec)
{code}
# Kill the stuck client using ctrl+c
{code}
MariaDB [(none)]> INSERT INTO t.t1 VALUES (1, "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
^CCtrl-C -- query killed. Continuing normally.
^CCtrl-C -- query killed. Continuing normally.
ERROR 2013 (HY000): Lost connection to MySQL server during query
MariaDB [(none)]> Ctrl-C -- exit!
Aborted
{code}
# At this point `show processlist`shows the query "Command=Killed" and "State=Commit"
{code}
#/usr/local/mysql/bin/mysql -e "show processlist"
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
| 5 | root | localhost | NULL | Killed | 22 | Commit | INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa | 0.000 |
| 8 | root | localhost | NULL | Query | 0 | starting | show processlist | 0.000 |
+----+------+-----------+------+---------+------+----------+------------------------------------------------------------------------------------------------------+----------+
{code}
# Wait for 1 min (Time in processlist becomes 59)
Then should see this error:
{code}
[ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 28 "No space left on device")
{code}
# release storage by rm /data/1
{code}
[21:24:54][root][~]$ rm /data/1
rm: remove regular file '/data/1'? y
{code}
# *Ideally at this time since there's enough storage, the binlog should be recovered.*
# However, the binlog is already corrupted at this point:
use mysqlbinlog to parse the binlog will see errors like:
{code}
[root@ip-172-31-41-130 tmp]# /usr/local/mysql/bin/mysqlbinlog /data/log/binlog/mysql-bin-changelog.000001 > /tmp/0001
ERROR: Error in Log_event::read_log_event(): 'Event truncated', data_len: 673207109, event_type: 32
{code}
# Retry the inserting it will show errors about the binlog writing.
{code}
/usr/local/mysql/bin/mysql -e "INSERT INTO t.t1 VALUES (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa')"
ERROR 1026 (HY000) at line 1: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
{code}
Error log:
{code}
2022-02-10 23:18:17 11 [ERROR] mysqld: Error writing file '/data/log/binlog/mysql-bin-changelog' (errno: 11 "Resource temporarily unavailable")
{code}

Hugo Wen made changes - 2022-02-11 01:30

Summary

MariaDB binlog corruption when "No space left on device" and stuck session killed from client

MariaDB binlog corruption when "No space left on device" and stuck session killed by client

Daniel Black added a comment - 2022-02-11 01:50

Thank you for the detailed bug report.

Daniel Black added a comment - 2022-02-11 01:50 Thank you for the detailed bug report.

Richard DEMONGEOT made changes - 2022-02-11 08:08

Link

This issue relates to ~~MDEV-27436~~ [ ~~MDEV-27436~~ ]

Richard DEMONGEOT added a comment - 2022-02-11 08:31

Hello,

I've open a bug that seems very related a month ago ( MDEV-27436 ) so i've linked both ticket as related.

Regards,

Richard DEMONGEOT added a comment - 2022-02-11 08:31 Hello, I've open a bug that seems very related a month ago ( MDEV-27436 ) so i've linked both ticket as related. Regards,

Otto Kekäläinen added a comment - 2022-02-18 06:33

marko What are your thoughts on expected behavior of InnoDB when the data directory runs out of disk space (or if in general the filesystem suddenly goes into read-only mode for whatever reason)? Should mariadbd shut down automatically in such a case? Or should database stay on but yield errors, and continue once disk is writeable again? Should SELECT queries and the database connections still work and only write operations yield errors while filesystem does not accept writes?

Otto Kekäläinen added a comment - 2022-02-18 06:33 marko What are your thoughts on expected behavior of InnoDB when the data directory runs out of disk space (or if in general the filesystem suddenly goes into read-only mode for whatever reason)? Should mariadbd shut down automatically in such a case? Or should database stay on but yield errors, and continue once disk is writeable again? Should SELECT queries and the database connections still work and only write operations yield errors while filesystem does not accept writes?

Daniel Black added a comment - 2022-02-18 07:27

As a general InnoDB IO error handling the ~~MDEV-27593~~ is currently looking at how to handle these. It would be good if binlog where handled the same way.

Do you have a user preference?

Daniel Black added a comment - 2022-02-18 07:27 As a general InnoDB IO error handling the MDEV-27593 is currently looking at how to handle these. It would be good if binlog where handled the same way. Do you have a user preference?

Marko Mäkelä added a comment - 2022-02-18 12:46

otto, InnoDB normally attempts to allocate space upfront. The InnoDB redo log should never run out of space, because it is circular. If an InnoDB data file needs to be extended, then I believe that a failure to extend a file currently results in the server being killed. A more robust error handling would be to refuse the write operation that resulted in the need to extend the data file. In any case, no data should be lost in the InnoDB layer due to running out of space.

To my understanding, both log_bin and the Aria storage engine recovery log are appended on write.

When it comes to the Aria storage engine, I believe that it cannot be changed to use a circular log file without restricting the maximum size of a transaction. Since InnoDB writes undo log into data pages that are covered by the circular redo log, open transactions do not prevent any redo log checkpoints. I do not know anything about the binlog, but I would not be surprised if the maximum transaction size is the minimum binlog file size.

Marko Mäkelä added a comment - 2022-02-18 12:46 otto , InnoDB normally attempts to allocate space upfront. The InnoDB redo log should never run out of space, because it is circular. If an InnoDB data file needs to be extended, then I believe that a failure to extend a file currently results in the server being killed. A more robust error handling would be to refuse the write operation that resulted in the need to extend the data file. In any case, no data should be lost in the InnoDB layer due to running out of space. To my understanding, both log_bin and the Aria storage engine recovery log are appended on write. When it comes to the Aria storage engine, I believe that it cannot be changed to use a circular log file without restricting the maximum size of a transaction. Since InnoDB writes undo log into data pages that are covered by the circular redo log, open transactions do not prevent any redo log checkpoints. I do not know anything about the binlog, but I would not be surprised if the maximum transaction size is the minimum binlog file size.

Otto Kekäläinen added a comment - 2022-02-18 16:27

Thanks for the InnoDB description Marko!

Actually we should perhaps ask Elkin to chime in on what his thoughts are about the expected behavior of binlogs when disk is full (or filesystem goes into read-only mode for some other reason)?

Otto Kekäläinen added a comment - 2022-02-18 16:27 Thanks for the InnoDB description Marko! Actually we should perhaps ask Elkin to chime in on what his thoughts are about the expected behavior of binlogs when disk is full (or filesystem goes into read-only mode for some other reason)?

Andrei Elkin added a comment - 2022-02-18 18:31 - edited

otto, in case binlog file system gets full and the server gets crashed (to my testing the server can only be killed) there will (or should) be the following at restart:
1. The last filed transaction may be incomplete
2. It *should* be trimmed from binlog according to WL#5493: Binlog crash-safe when master crashed
3. The trimmed transaction won't be committed at recovery either.

To WL#5493 trimming, actually I could not confirm that with my testing (on 10.6), it needs
investigating. I am not sure though whether the report's mysqlbinglog failures to read
were done after the server is restarted. Probably not, and if soit makes sense to restart the server and after that check the old binlog file with mysqlbinlog.

Andrei Elkin added a comment - 2022-02-18 18:31 - edited otto , in case binlog file system gets full and the server gets crashed (to my testing the server can only be killed) there will (or should) be the following at restart: 1. The last filed transaction may be incomplete 2. It * should * be trimmed from binlog according to WL#5493: Binlog crash-safe when master crashed 3. The trimmed transaction won't be committed at recovery either. To WL#5493 trimming, actually I could not confirm that with my testing (on 10.6), it needs investigating. I am not sure though whether the report's mysqlbinglog failures to read were done after the server is restarted. Probably not, and if soit makes sense to restart the server and after that check the old binlog file with mysqlbinlog .

Otto Kekäläinen added a comment - 2022-02-19 03:32

Thanks Elkin for the comments. One problem here is that the binlogs are written by the primary DB and used for replication to be applied by replicas. Thus in theory the primary DB could continue working even when disk is full, but only replication would fail as binlogs are no longer written. Is the replicas exists for fail-over and high availability purposes, it would be a bit counterproductive to shut down the primary DB and make the whole application fail. Or is the assumption that if replication is on, the primary DB can be shut down and the app should fail-over to one of the replicas? And hopefully the replicas don't have their disk in read-only mode. Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down and recovery after disk is again writeable would happen by creating new replicas from the primary DB? Or is there some middle ground, the primary DB might close all connections and refuse new writes but still keep flushing the binlogs to the replicas, allowing one of the replicas to be promoted the primary DB as soon as they have caught up with the primary DB? And only after that fully shut down the primary DB?

What if MariaDB had some code that would trigger a safe shutdown and flush before it runs out of disk space?

Otto Kekäläinen added a comment - 2022-02-19 03:32 Thanks Elkin for the comments. One problem here is that the binlogs are written by the primary DB and used for replication to be applied by replicas. Thus in theory the primary DB could continue working even when disk is full, but only replication would fail as binlogs are no longer written. Is the replicas exists for fail-over and high availability purposes, it would be a bit counterproductive to shut down the primary DB and make the whole application fail. Or is the assumption that if replication is on, the primary DB can be shut down and the app should fail-over to one of the replicas? And hopefully the replicas don't have their disk in read-only mode. Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down and recovery after disk is again writeable would happen by creating new replicas from the primary DB? Or is there some middle ground, the primary DB might close all connections and refuse new writes but still keep flushing the binlogs to the replicas, allowing one of the replicas to be promoted the primary DB as soon as they have caught up with the primary DB? And only after that fully shut down the primary DB? What if MariaDB had some code that would trigger a safe shutdown and flush before it runs out of disk space?

Andrei Elkin added a comment - 2022-02-21 20:25

otto, Yw.
From my capsule review of your questions/proposals (did not have much time today for deeper look), the following method

is the assumption that if replication is on, the primary DB can be shut down and the app should fail-over to one of the replicas?

must be viable. From the server side though we need to ensure smooth shutdown (I did have a hang at my testing, to explore and fix if that's the case). It then would be the application burden to find the most updated slave to fail over the master role onto. (An automatic fail-over announced as a part of MDEV-19140 is not yet in the plans).

I did not get your idea, sorry, in

Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down

So could you please explain?

Cheers.

Andrei

Andrei Elkin added a comment - 2022-02-21 20:25 otto , Yw. From my capsule review of your questions/proposals (did not have much time today for deeper look), the following method is the assumption that if replication is on, the primary DB can be shut down and the app should fail-over to one of the replicas? must be viable. From the server side though we need to ensure smooth shutdown (I did have a hang at my testing, to explore and fix if that's the case). It then would be the application burden to find the most updated slave to fail over the master role onto. (An automatic fail-over announced as a part of MDEV-19140 is not yet in the plans). I did not get your idea, sorry, in Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down So could you please explain? Cheers. Andrei

Otto Kekäläinen added a comment - 2022-02-22 06:35

Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down

So could you please explain?

I meant that if the filesystem where binlogs are goes into read-only mode (disk full, filesystem corrupted to kernel remounts it as read-only, network filesystem with hickup, or whatever reason) and new binlog entries cannot be written, then the primary database could in still continue to serve both writes and reads if the InnoDB data tables are on different filesystem that is still writeable. However since no new binlogs are written, replication would be broken and the primary DB should tell the replicas that they are no longer up to date and thus inconsistent.

Alternatively is neither binlog nor data tables can be written, the primary database could still continue to run but only serve SELECT queries and issue warnings both to client connections and in server logs that it does not accept write operations.

My purpose was just to list different scenarios so you can consider what the designed behavior should be in them.

Otto Kekäläinen added a comment - 2022-02-22 06:35 Or is the assumption that if filesystem goes into read-only mode, the primary DB would continue running but emit alerts, and then the replicas would shut down So could you please explain? I meant that if the filesystem where binlogs are goes into read-only mode (disk full, filesystem corrupted to kernel remounts it as read-only, network filesystem with hickup, or whatever reason) and new binlog entries cannot be written, then the primary database could in still continue to serve both writes and reads if the InnoDB data tables are on different filesystem that is still writeable. However since no new binlogs are written, replication would be broken and the primary DB should tell the replicas that they are no longer up to date and thus inconsistent. Alternatively is neither binlog nor data tables can be written, the primary database could still continue to run but only serve SELECT queries and issue warnings both to client connections and in server logs that it does not accept write operations. My purpose was just to list different scenarios so you can consider what the designed behavior should be in them.

Andrei Elkin added a comment - 2022-02-22 11:06 - edited

> the primary DB should tell the replicas that they are no longer up to date and thus inconsistent.

This is a good idea and can be implemented separately or along with cherry-picking
--binlog-error-action to extend the upstream's set of policies.
We already have INCIDENT event to notify replicas on certain primary's abnormalities, just in your proposal INCIDENT would be sent directly to replicas, bypassing the binlog.

We can also help out the out-of-disk binlog primary to continue replication, again with sending out replicated events directly (not touching the binlog).
In this scenario, the primary would have to be demoted into slave at its first restart (see ~~MDEV-21117~~ semisync slave recovery that has already probated this approach) , to receive own events but not execute them but rather merely binlog them.

Andrei Elkin added a comment - 2022-02-22 11:06 - edited > the primary DB should tell the replicas that they are no longer up to date and thus inconsistent. This is a good idea and can be implemented separately or along with cherry-picking --binlog-error-action to extend the upstream's set of policies. We already have INCIDENT event to notify replicas on certain primary's abnormalities, just in your proposal INCIDENT would be sent directly to replicas, bypassing the binlog. We can also help out the out-of-disk binlog primary to continue replication, again with sending out replicated events directly (not touching the binlog). In this scenario, the primary would have to be demoted into slave at its first restart (see MDEV-21117 semisync slave recovery that has already probated this approach) , to receive own events but not execute them but rather merely binlog them.

Andrei Elkin made changes - 2022-02-22 14:16

Assignee

Andrei Elkin [ elkin ]

Marko Mäkelä made changes - 2022-03-16 08:00

Link

This issue relates to ~~MDEV-14462~~ [ ~~MDEV-14462~~ ]

Elena Stepanova made changes - 2022-04-17 16:46

Fix Version/s		10.6 [ 24028 ]
Affects Version/s		10.2 [ 14601 ]
Affects Version/s		10.6 [ 24028 ]

Michael Widenius added a comment - 2023-10-24 18:04 - edited

Thing to do on the MariaDB server side to make things easier if something like this happens again:

On the master, if the thread local binlog cached event causes /tmp to be full during
binlog-commit, we should skip the event and instead write an incident event to the binary
log to mark the binlog as corrupted on the slave (as the transaction is already committed in
the engine but the binary log will not contain it).
Better error message when we get a "half event" from the master.
When we get a 'half event', wait until the SQL slave threads has executed all found events so
far before stopping replication (this may be the case already, but it has to be checked).
From the above it looks like we are starting applying from 1-2-8425025170 over and over again
when we should be applying starting from 1-2-8425025171.
If this is the case, this is bug as in case of 'half events' we will also miss any events
that we have read before the 'half event' as we are aborting the replication before we
have applied the previous events.
If the user in this case will try to use "SQL_SLAVE_SKIP_COUNTER" to skip some events,
also the not applied events will be ignored.
SQL_SLAVE_SKIP_COUNTER should also be able to skip a 'half event' if this is the last
event in a log.

Fix the following messages to make it things more clear of what is going on:

2023-10-16 11:54:18 14 [Note] Slave I/O thread exiting, read up to log 'bin_log.003418', position 10664265; GTID position 1-2-8425025170, master xxxx
->
2023-10-16 11:54:18 14 [Note] Slave I/O thread exiting, read up to log 'bin_log.003418', position 10664265; Last applied GTID 1-2-8425025170, master xxxx

2023-10-16 12:28:02 142367 [Note] Slave SQL thread exiting, replication stopped in log 'bin_log.003418' at position 10664156; GTID position '1-2-8425025170', master: xxxx
->
2023-10-16 12:28:02 142367 [Note] Slave SQL thread exiting, replication stopped in log 'bin_log.003418' at position 10664156; Last applied GTID 1-2-8425025170', master: xxxx

2023-10-16 13:02:07 491 [Note] Slave I/O thread: connected to master 'xxxx',replication starts at GTID position xxx
->
2023-10-16 13:02:07 491 [Note] Slave I/O thread: connected to master 'xxxx',replication starting on next event after GTID xxx

Michael Widenius added a comment - 2023-10-24 18:04 - edited Thing to do on the MariaDB server side to make things easier if something like this happens again: On the master, if the thread local binlog cached event causes /tmp to be full during binlog-commit, we should skip the event and instead write an incident event to the binary log to mark the binlog as corrupted on the slave (as the transaction is already committed in the engine but the binary log will not contain it). Better error message when we get a "half event" from the master. When we get a 'half event', wait until the SQL slave threads has executed all found events so far before stopping replication (this may be the case already, but it has to be checked). From the above it looks like we are starting applying from 1-2-8425025170 over and over again when we should be applying starting from 1-2-8425025171. If this is the case, this is bug as in case of 'half events' we will also miss any events that we have read before the 'half event' as we are aborting the replication before we have applied the previous events. If the user in this case will try to use "SQL_SLAVE_SKIP_COUNTER" to skip some events, also the not applied events will be ignored. SQL_SLAVE_SKIP_COUNTER should also be able to skip a 'half event' if this is the last event in a log. Fix the following messages to make it things more clear of what is going on: 2023-10-16 11:54:18 14 [Note] Slave I/O thread exiting, read up to log 'bin_log.003418', position 10664265; GTID position 1-2-8425025170, master xxxx -> 2023-10-16 11:54:18 14 [Note] Slave I/O thread exiting, read up to log 'bin_log.003418', position 10664265; Last applied GTID 1-2-8425025170, master xxxx 2023-10-16 12:28:02 142367 [Note] Slave SQL thread exiting, replication stopped in log 'bin_log.003418' at position 10664156; GTID position '1-2-8425025170', master: xxxx -> 2023-10-16 12:28:02 142367 [Note] Slave SQL thread exiting, replication stopped in log 'bin_log.003418' at position 10664156; Last applied GTID 1-2-8425025170', master: xxxx 2023-10-16 13:02:07 491 [Note] Slave I/O thread: connected to master 'xxxx',replication starts at GTID position xxx -> 2023-10-16 13:02:07 491 [Note] Slave I/O thread: connected to master 'xxxx',replication starting on next event after GTID xxx

Michael Widenius made changes - 2023-10-24 19:30

Labels

CS0665999

Julien Fritsch made changes - 2023-10-31 19:01

Labels

CS0665999

Julien Fritsch made changes - 2023-10-31 19:01

Priority

Major [ 3 ]

Critical [ 2 ]

Julien Fritsch made changes - 2023-10-31 19:03

Labels

triage

Roel Van de Paar made changes - 2024-01-05 23:54

Link

This issue relates to ~~MDEV-9101~~ [ ~~MDEV-9101~~ ]

Roel Van de Paar made changes - 2024-01-05 23:55

Link

This issue relates to ~~MDEV-9101~~ [ ~~MDEV-9101~~ ]

Julien Fritsch made changes - 2024-01-11 16:36

Priority

Critical [ 2 ]

Major [ 3 ]

Julien Fritsch made changes - 2024-01-11 16:39

Labels

triage

People

Assignee:: Andrei Elkin

Reporter:: Hugo Wen

Votes:: 2 Vote for this issue

Watchers:: 15 Start watching this issue

Dates

Created:: 2022-02-11 01:24

Updated:: 2024-01-11 16:39

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Issue description:

How to reproduce:

Attachments

Issue Links

Activity

People

Dates

Git Integration