Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-35926

mariadb-backup getting stuck preventing Galera SST from finishing

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.10
    • 10.11
    • Galera, Galera SST, mariabackup
    • None
    • Gentoo linux
      mariadb-10.11.10
      galera-26.4.21

    Description

      Hello,

      we've encountered a possible bug in mariadb-backup utility, while attempting SST of one of our Galera Cluster nodes.

      SST is able to transfer the data to the new node. However when mariadb-backup --prepare is called. It gets stuck and (seemingly) never finishes. During our testing, this state persisted for hours. Only being interrupted by innodb log file being filled causing the SST and Mariadb to shutdown.

      However, when we run the same command on that same data again manually it finishes without a problem. We are then able to manually start the node and get it to connect to the cluster.

      Following is the precise mariadb-backup command issued:

      WSREP_SST: [INFO] Evaluating /usr//bin/mariadb-backup --prepare --no-version-check --use-memory=128G --target-dir='/data/mysql/.sst' --datadir='/data/mysql/.sst' > '/data/mysql/mariabackup.prepare.log' 2>&1 (20250102 04:54:11.307)
      

      During this stuck phase we can see a lot of errors like this in the mariabackup.prepare.log:

      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1536 bytes, for file app_user_664783/generic_collection_item_v001#P#5652409f5ee81cfc8f270185cabaab1b.ibd(644), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 2560 bytes, for file app_user_664783/generic_collection_item_v001#P#5652409f5ee81cfc8f270185cabaab1b.ibd(644), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 2560 bytes, for file app_user_664783/generic_collection_item_v001#P#5652409f5ee81cfc8f270185cabaab1b.ibd(644), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 3072 bytes, for file app_user_664783/generic_collection_item_v001#P#5652409f5ee81cfc8f270185cabaab1b.ibd(644), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1536 bytes, for file app_user_664783/generic_collection_item_v001#P#5652409f5ee81cfc8f270185cabaab1b.ibd(644), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_664783/generic_collection_item_v001#P#5652409f5ee81cfc8f270185cabaab1b.ibd(644), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1536 bytes, for file app_user_499733/generic_collection_item_v001#P#d60680e60bd5c8c08f7c7e5f6a5e3fb0.ibd(643), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_499733/generic_collection_item_v001#P#d60680e60bd5c8c08f7c7e5f6a5e3fb0.ibd(643), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_499733/generic_collection_item_v001#P#ecd3e991611e4490e6c0d24be5acd153.ibd(642), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_499733/generic_collection_item_v001#P#ecd3e991611e4490e6c0d24be5acd153.ibd(642), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_383118/generic_collection_item_v001#P#46afa26fdeeb6032382866849abdc2a9.ibd(641), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_383118/generic_collection_item_v001#P#46afa26fdeeb6032382866849abdc2a9.ibd(641), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 2560 bytes, for file app_user_383118/generic_collection_item_v001#P#d5cf2b1863d4bf45f03a38e706ee7058.ibd(640), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_383118/generic_collection_item_v001#P#d5cf2b1863d4bf45f03a38e706ee7058.ibd(640), returned 0
      2025-01-14  8:11:37 0 [Note] InnoDB: IO Error: 22 during write of 1024 bytes, for file app_user_383118/generic_collection_item_v001#P#d5cf2b1863d4bf45f03a38e706ee7058.ibd(640), returned 0
      

      Do you have an idea what could cause this?

      Attachments

        Activity

          The error messages should be output by write_io_callback(), which wraps either the pread/pwrite based "simulated asynchronous I/O" interface or one of the Linux specific interfaces libaio or io_uring. The Linux errno 22 would be EINVAL. Based on the error messages, I assume that generic_collection_item_v001 is a partitioned table that makes use of page_compressed, that is, sparse and heavily fragmented files. Could it be that the physical block size of the underlying storage is 4096 bytes, and writes of less than that (multiples of 512 bytes) would fail in case of O_DIRECT? Would the errors go away if you set innodb_flush_method=fsync to disable O_DIRECT? What is the Linux kernel version, the file system type and the type of storage?

          marko Marko Mäkelä added a comment - The error messages should be output by write_io_callback() , which wraps either the pread / pwrite based "simulated asynchronous I/O" interface or one of the Linux specific interfaces libaio or io_uring . The Linux errno 22 would be EINVAL . Based on the error messages, I assume that generic_collection_item_v001 is a partitioned table that makes use of page_compressed , that is, sparse and heavily fragmented files. Could it be that the physical block size of the underlying storage is 4096 bytes, and writes of less than that (multiples of 512 bytes) would fail in case of O_DIRECT ? Would the errors go away if you set innodb_flush_method=fsync to disable O_DIRECT ? What is the Linux kernel version, the file system type and the type of storage?
          Gajdos Jakub Gajdos added a comment - - edited

          Hello thanks for reply.

          The storage used for this database is a 7T mdraid 1 on 2 nvme ssd disks.
          Blocksize is indeed 4096 bytes.

          Kernel version is 6.12.10 but we had this problem even pre 6.12. AFAIK

          Currently we have no specified innodb_flush_method but according to the documentation fsync is the default so we already use this.

          Gajdos Jakub Gajdos added a comment - - edited Hello thanks for reply. The storage used for this database is a 7T mdraid 1 on 2 nvme ssd disks. Blocksize is indeed 4096 bytes. Kernel version is 6.12.10 but we had this problem even pre 6.12. AFAIK Currently we have no specified innodb_flush_method but according to the documentation fsync is the default so we already use this.
          Gajdos Jakub Gajdos added a comment - - edited

          Weird thing is that if I stop the SST when mariadb-backup gets stuck and then run the exact same mariadb-backup --prepare command manually again it finishes without any errors

          Gajdos Jakub Gajdos added a comment - - edited Weird thing is that if I stop the SST when mariadb-backup gets stuck and then run the exact same mariadb-backup --prepare command manually again it finishes without any errors
          Gajdos Jakub Gajdos added a comment -

          Correction when checking actual value in mariadb innodb_flush_method appears to be O_DIRECT for some reason. I'll check if fsync helps.

          Gajdos Jakub Gajdos added a comment - Correction when checking actual value in mariadb innodb_flush_method appears to be O_DIRECT for some reason. I'll check if fsync helps.

          People

            marko Marko Mäkelä
            Gajdos Jakub Gajdos
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.