[MDEV-10852] Server hangs after running xtrabackup with MariaDB 10.0 and innodb_flush_method = O_DIRECT Created: 2016-09-21  Updated: 2022-11-10  Resolved: 2022-11-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - XtraDB
Affects Version/s: 10.0.26, 10.0.27, 10.0
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Tomas Mozes Assignee: Jan Lindström (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Environment:

Server virtualized in Xen domU, 230GB RAM, Linux kernel 4.4.21, hardware raid 10 on hdd, Xtrabackup 2.4.4.


Attachments: File my.cnf     File sysrq-l.log     File sysrq-m.log     File sysrq-w.log    

 Description   

After backing up something around 15-25GB data with xtrabackup:
$ xtrabackup --backup --stream=xbstream --parallel=4 --compress --compress-threads=12

the server starts acting strange. The xtrabackup process only prints "log scanned up to" messages with same lsn:

...
160921 09:09:54 >> log scanned up to (7313323134056)
160921 09:09:55 >> log scanned up to (7313323134056)
160921 09:09:56 >> log scanned up to (7313323134056)
160921 09:09:57 >> log scanned up to (7313323134056)
160921 09:09:58 >> log scanned up to (7313323134056)
160921 09:09:59 >> log scanned up to (7313323134056)
...

MariaDB cannot be stopped nor xtrabackup can be killed. Some files cannot be read:
$ cat /var/lib/mysql/backup-my.cnf -> hangs.

During this time, the sever iowait is high (also load) as some files cannot be even read. Iotop shows no activity for xtrabackup (normally it reads around 200-300MB/s). There is nothing in the system logs, nor in MariaDB. This machine is running virtualized in Xen and there is nothing in dom0 logs also.

A normal shutdown fails, only a forced machine shutdown works. After reboot, the files are readable normally.

Under normal circumstances, all MariaDB data files can be read:
$ find /var/lib/mysql -type f -exec cat {} \; > /dev/null

After unsetting innodb_flush_method in MariaDB my.cnf the backup completes normally and system continues to work normally.

$ xfs_info /var/lib/mysql/
meta-data=/dev/xvda7 isize=512 agcount=23, agsize=7864256 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1 spinodes=0
data = bsize=4096 blocks=178257920, imaxpct=25
= sunit=64 swidth=64 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=61440, version=2
= sectsz=512 sunit=64 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0

$ find /var/lib/mysql -type f | wc -l
624

$ du -hs /var/lib/mysql/
454G /var/lib/mysql/



 Comments   
Comment by Tomas Mozes [ 2016-10-28 ]

I've been doing some tests to discover which part is having problems. Seems like it's xfs on kernel > 4.1:

kernel 4.1 + xfs = pass

kernel 4.4 + ext4 = pass
kernel 4.4 + xfs - O_DIRECT = pass
kernel 4.4 + xfs + O_DIRECT = FAIL

Comment by Tomas Mozes [ 2017-03-21 ]

It also happens with ext4, but it took more time to show up. So kernel 4.4, either xfs/ext4 with O_DIRECT has this problem. On kernel 4.1 it cannot be reproduced.

Comment by Tomas Mozes [ 2017-10-27 ]

Cannot reproduce any more on MariaDB 10.1 and Linux Kernel 4.12+.

Generated at Thu Feb 08 07:45:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.