Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30179

mariabackup --backup fails with FATAL ERROR: ... failed to copy datafile.

Details

    Description

      origin/bb-10.11-new-innodb-defaults be6a3d9a01c4357fbb52b4eefcb91bf84a8d0d4c 2022-12-09T13:51:55+02:00
      Marko assumes that the trouble is linked to MDEV-29986.
       
      Scenario:
      1. The server is under concurrent DDL/DML load.
      2. During 1. is ongoing runs a mariabackup --backup to some other directory.
       
      Snip of the error log:
      ```# 2022-12-09T03:58:05 [234171] | [rr 332149 3776][00] 2022-12-09 03:58:04 >> log scanned up to (315140211)
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3801][00] 2022-12-09 03:58:05 >> log scanned up to (315141264)
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3808]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3814]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3820]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3826]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3832]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3838]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3846]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3856]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3864][00] 2022-12-09 03:58:05 >> log scanned up to (315165453)
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3871]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3877]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3881]2022-12-09  3:58:05 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3890][01] 2022-12-09 03:58:05 mariabackup: xtrabackup_copy_datafile() failed.
      # 2022-12-09T03:58:05 [234171] | [rr 332149 3898][00] FATAL ERROR: 2022-12-09 03:58:05 failed to copy datafile.```
      

      Hitting this bug is very rare.

      RQG:
      ===

      1. git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
        #
      2. GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 1abbb033ff1d798d9d0a9bacdf072fe8fd4f5deb 2022-12-09T12:56:59+01:00
      3. rqg.pl : Version 4.2.1 (2022-12)
        #
      4. $RQG_HOME/rqg.pl \
      5. --grammar=conf/engines/many_indexes.yy \
      6. --gendata=conf/engines/many_indexes.zz \
      7. --reporters=Mariabackup_linux \
      8. --redefine=conf/mariadb/redefine_file_buffering_write_through.yy \
      9. -mysqld=-loose-innodb_lock_schedule_algorithm=fcfs \
      10. -mysqld=-loose-idle_write_transaction_timeout=0 \
      11. -mysqld=-loose-idle_transaction_timeout=0 \
      12. -mysqld=-loose-idle_readonly_transaction_timeout=0 \
      13. -mysqld=-connect_timeout=60 \
      14. -mysqld=-interactive_timeout=28800 \
      15. -mysqld=-slave_net_timeout=60 \
      16. -mysqld=-net_read_timeout=30 \
      17. -mysqld=-net_write_timeout=60 \
      18. -mysqld=-loose-table_lock_wait_timeout=50 \
      19. -mysqld=-wait_timeout=28800 \
      20. -mysqld=-lock-wait-timeout=86400 \
      21. -mysqld=-innodb-lock-wait-timeout=50 \
      22. --no-mask \
      23. --queries=10000000 \
      24. --seed=random \
      25. --reporters=Backtrace \
      26. --reporters=ErrorLog \
      27. --reporters=Deadlock \
      28. --validators=None \
      29. -mysqld=-log_output=none \
      30. -mysqld=-log_bin_trust_function_creators=1 \
      31. -mysqld=-loose-debug_assert_on_not_freed_memory=0 \
      32. --engine=InnoDB \
      33. --restart_timeout=240 \
      34. -mysqld=-plugin-load-add=file_key_management.so \
      35. -mysqld=-loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
      36. -mysqld=-plugin-load-add=provider_lzo.so \
      37. -mysqld=-plugin-load-add=provider_bzip2.so \
      38. -mysqld=-plugin-load-add=provider_lzma.so \
      39. -mysqld=-plugin-load-add=provider_snappy.so \
      40. -mysqld=-plugin-load-add=provider_lz4.so \
      41. --duration=300 \
      42. -mysqld=-loose-innodb_fatal_semaphore_wait_threshold=300 \
      43. -mysqld=-innodb_file_per_table=0 \
      44. -mysqld=-loose-innodb_read_only_compressed=OFF \
      45. -mysqld=-innodb_stats_persistent=on \
      46. -mysqld=-innodb_adaptive_hash_index=off \
      47. -mysqld=-log-bin \
      48. -mysqld=-sync-binlog=1 \
      49. -mysqld=-loose-innodb_evict_tables_on_commit_debug=on \
      50. -mysqld=-loose-max-statement-time=30 \
      51. --threads=33 \
      52. -mysqld=-innodb_use_native_aio=1 \
      53. -mysqld=-loose_innodb_change_buffering=changes \
      54. -mysqld=-innodb_rollback_on_timeout=OFF \
      55. -mysqld=-innodb_page_size=16K \
      56. -mysqld=-innodb-buffer-pool-size=8M \
      57. --vardir_type=slow \
      58. <local settings>

      sdp:/data/results/1670586562/Backup$ _RR_TRACE_DIR=./1_clone/rr rr replay --mark-stdio

      Attachments

        Issue Links

          Activity

            I see two problems here. First, the error message fails to mention the file name (undo002) even though it would be readily available a couple stack frames earlier.

            Second, it may be that mariadb-backup assumes that files will not shrink while they are being copied, that is, it is not compatible with innodb_undo_log_truncate=ON.

            marko Marko Mäkelä added a comment - I see two problems here. First, the error message fails to mention the file name ( undo002 ) even though it would be readily available a couple stack frames earlier. Second, it may be that mariadb-backup assumes that files will not shrink while they are being copied, that is, it is not compatible with innodb_undo_log_truncate=ON .

            Patch is in bb-10.6-MDEV-30179

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.6- MDEV-30179

            Thank you, the high level algorithm looks OK to me. I posted some suggestions.

            I think that we should target already 10.5 with this fix, because that is when we last changed the format of the redo log records that are being written for undo tablespace truncation.

            marko Marko Mäkelä added a comment - Thank you, the high level algorithm looks OK to me. I posted some suggestions. I think that we should target already 10.5 with this fix, because that is when we last changed the format of the redo log records that are being written for undo tablespace truncation.

            Patch is in bb-10.5-MDEV-30179

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.5- MDEV-30179
            marko Marko Mäkelä added a comment - I posted some review comments .

            People

              thiru Thirunarayanan Balathandayuthapani
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.