Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10852

Server hangs after running xtrabackup with MariaDB 10.0 and innodb_flush_method = O_DIRECT

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.0.26, 10.0.27, 10.0(EOL)
    • N/A
    • None
    • Server virtualized in Xen domU, 230GB RAM, Linux kernel 4.4.21, hardware raid 10 on hdd, Xtrabackup 2.4.4.

    Description

      After backing up something around 15-25GB data with xtrabackup:
      $ xtrabackup --backup --stream=xbstream --parallel=4 --compress --compress-threads=12

      the server starts acting strange. The xtrabackup process only prints "log scanned up to" messages with same lsn:

      ...
      160921 09:09:54 >> log scanned up to (7313323134056)
      160921 09:09:55 >> log scanned up to (7313323134056)
      160921 09:09:56 >> log scanned up to (7313323134056)
      160921 09:09:57 >> log scanned up to (7313323134056)
      160921 09:09:58 >> log scanned up to (7313323134056)
      160921 09:09:59 >> log scanned up to (7313323134056)
      ...

      MariaDB cannot be stopped nor xtrabackup can be killed. Some files cannot be read:
      $ cat /var/lib/mysql/backup-my.cnf -> hangs.

      During this time, the sever iowait is high (also load) as some files cannot be even read. Iotop shows no activity for xtrabackup (normally it reads around 200-300MB/s). There is nothing in the system logs, nor in MariaDB. This machine is running virtualized in Xen and there is nothing in dom0 logs also.

      A normal shutdown fails, only a forced machine shutdown works. After reboot, the files are readable normally.

      Under normal circumstances, all MariaDB data files can be read:
      $ find /var/lib/mysql -type f -exec cat {} \; > /dev/null

      After unsetting innodb_flush_method in MariaDB my.cnf the backup completes normally and system continues to work normally.

      $ xfs_info /var/lib/mysql/
      meta-data=/dev/xvda7 isize=512 agcount=23, agsize=7864256 blks
      = sectsz=512 attr=2, projid32bit=1
      = crc=1 finobt=1 spinodes=0
      data = bsize=4096 blocks=178257920, imaxpct=25
      = sunit=64 swidth=64 blks
      naming =version 2 bsize=4096 ascii-ci=0 ftype=1
      log =internal bsize=4096 blocks=61440, version=2
      = sectsz=512 sunit=64 blks, lazy-count=1
      realtime =none extsz=4096 blocks=0, rtextents=0

      $ find /var/lib/mysql -type f | wc -l
      624

      $ du -hs /var/lib/mysql/
      454G /var/lib/mysql/

      Attachments

        1. my.cnf
          2 kB
        2. sysrq-l.log
          451 kB
        3. sysrq-m.log
          4 kB
        4. sysrq-w.log
          2 kB

        Activity

          hydrapolic Tomáš Mózes created issue -
          hydrapolic Tomáš Mózes made changes -
          Field Original Value New Value
          Description After backing up something around 15-25GB data with xtrabackup:
          # xtrabackup --backup --stream=xbstream --parallel=4 --compress --compress-threads=12

          the server starts acting strange. The xtrabackup process only prints "log scanned up to" messages with same lsn:

          ...
          160921 09:09:54 >> log scanned up to (7313323134056)
          160921 09:09:55 >> log scanned up to (7313323134056)
          160921 09:09:56 >> log scanned up to (7313323134056)
          160921 09:09:57 >> log scanned up to (7313323134056)
          160921 09:09:58 >> log scanned up to (7313323134056)
          160921 09:09:59 >> log scanned up to (7313323134056)
          ...

          MariaDB cannot be stopped nor xtrabackup can be killed. Some files cannot be read:
          # cat /var/lib/mysql/backup-my.cnf -> hangs.

          During this time, the sever iowait is high (also load) as some files cannot be even read. Iotop shows no activity for xtrabackup (normally it reads around 200-300MB/s). There is nothing in the system logs, nor in MariaDB. This machine is running virtualized in Xen and there is nothing in dom0 logs also.

          A normal shutdown fails, only a forced machine shutdown works. After reboot, the files are readable normally.

          Under normal circumstances, all MariaDB data files can be read:
          # find /var/lib/mysql -type f -exec cat {} \; > /dev/null

          After unsetting innodb_flush_method in MariaDB my.cnf the backup completes normally and system continues to work normally.

          # xfs_info /var/lib/mysql/
          meta-data=/dev/xvda7 isize=512 agcount=23, agsize=7864256 blks
                   = sectsz=512 attr=2, projid32bit=1
                   = crc=1 finobt=1 spinodes=0
          data = bsize=4096 blocks=178257920, imaxpct=25
                   = sunit=64 swidth=64 blks
          naming =version 2 bsize=4096 ascii-ci=0 ftype=1
          log =internal bsize=4096 blocks=61440, version=2
                   = sectsz=512 sunit=64 blks, lazy-count=1
          realtime =none extsz=4096 blocks=0, rtextents=0

          # find /var/lib/mysql -type f | wc -l
          624

          # du -hs /var/lib/mysql/
          454G /var/lib/mysql/
          After backing up something around 15-25GB data with xtrabackup:
          $ xtrabackup --backup --stream=xbstream --parallel=4 --compress --compress-threads=12

          the server starts acting strange. The xtrabackup process only prints "log scanned up to" messages with same lsn:

          ...
          160921 09:09:54 >> log scanned up to (7313323134056)
          160921 09:09:55 >> log scanned up to (7313323134056)
          160921 09:09:56 >> log scanned up to (7313323134056)
          160921 09:09:57 >> log scanned up to (7313323134056)
          160921 09:09:58 >> log scanned up to (7313323134056)
          160921 09:09:59 >> log scanned up to (7313323134056)
          ...

          MariaDB cannot be stopped nor xtrabackup can be killed. Some files cannot be read:
          $ cat /var/lib/mysql/backup-my.cnf -> hangs.

          During this time, the sever iowait is high (also load) as some files cannot be even read. Iotop shows no activity for xtrabackup (normally it reads around 200-300MB/s). There is nothing in the system logs, nor in MariaDB. This machine is running virtualized in Xen and there is nothing in dom0 logs also.

          A normal shutdown fails, only a forced machine shutdown works. After reboot, the files are readable normally.

          Under normal circumstances, all MariaDB data files can be read:
          $ find /var/lib/mysql -type f -exec cat {} \; > /dev/null

          After unsetting innodb_flush_method in MariaDB my.cnf the backup completes normally and system continues to work normally.

          $ xfs_info /var/lib/mysql/
          meta-data=/dev/xvda7 isize=512 agcount=23, agsize=7864256 blks
                   = sectsz=512 attr=2, projid32bit=1
                   = crc=1 finobt=1 spinodes=0
          data = bsize=4096 blocks=178257920, imaxpct=25
                   = sunit=64 swidth=64 blks
          naming =version 2 bsize=4096 ascii-ci=0 ftype=1
          log =internal bsize=4096 blocks=61440, version=2
                   = sectsz=512 sunit=64 blks, lazy-count=1
          realtime =none extsz=4096 blocks=0, rtextents=0

          $ find /var/lib/mysql -type f | wc -l
          624

          $ du -hs /var/lib/mysql/
          454G /var/lib/mysql/
          hydrapolic Tomáš Mózes made changes -
          Environment Server virtualized in Xen domU, 230GB RAM, Linux kernel 4.4.21, hardware raid 10 on hdd. Server virtualized in Xen domU, 230GB RAM, Linux kernel 4.4.21, hardware raid 10 on hdd, Xtrabackup 2.4.4.
          hydrapolic Tomáš Mózes made changes -
          Attachment sysrq-l.log [ 42800 ]
          Attachment sysrq-m.log [ 42801 ]
          Attachment sysrq-w.log [ 42802 ]
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0 [ 16000 ]
          Assignee Jan Lindström [ jplindst ]

          I've been doing some tests to discover which part is having problems. Seems like it's xfs on kernel > 4.1:

          kernel 4.1 + xfs = pass

          kernel 4.4 + ext4 = pass
          kernel 4.4 + xfs - O_DIRECT = pass
          kernel 4.4 + xfs + O_DIRECT = FAIL

          hydrapolic Tomáš Mózes added a comment - I've been doing some tests to discover which part is having problems. Seems like it's xfs on kernel > 4.1: kernel 4.1 + xfs = pass kernel 4.4 + ext4 = pass kernel 4.4 + xfs - O_DIRECT = pass kernel 4.4 + xfs + O_DIRECT = FAIL

          It also happens with ext4, but it took more time to show up. So kernel 4.4, either xfs/ext4 with O_DIRECT has this problem. On kernel 4.1 it cannot be reproduced.

          hydrapolic Tomáš Mózes added a comment - It also happens with ext4, but it took more time to show up. So kernel 4.4, either xfs/ext4 with O_DIRECT has this problem. On kernel 4.1 it cannot be reproduced.

          Cannot reproduce any more on MariaDB 10.1 and Linux Kernel 4.12+.

          hydrapolic Tomáš Mózes added a comment - Cannot reproduce any more on MariaDB 10.1 and Linux Kernel 4.12+.
          jplindst Jan Lindström (Inactive) made changes -
          Sprint 10.1.30 [ 215 ]
          jplindst Jan Lindström (Inactive) made changes -
          Rank Ranked lower
          jplindst Jan Lindström (Inactive) made changes -
          Sprint 10.1.30 [ 215 ]
          jplindst Jan Lindström (Inactive) made changes -
          Rank Ranked higher
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 77158 ] MariaDB v4 [ 140085 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.0 [ 16000 ]
          elenst Elena Stepanova made changes -
          Fix Version/s N/A [ 14700 ]
          Resolution Cannot Reproduce [ 5 ]
          Status Open [ 1 ] Closed [ 6 ]

          People

            jplindst Jan Lindström (Inactive)
            hydrapolic Tomáš Mózes
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.