[MDEV-30179] mariabackup --backup fails with FATAL ERROR: ... failed to copy datafile. Created: 2022-12-09  Updated: 2023-03-03  Resolved: 2023-01-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.11.2
Fix Version/s: 10.11.2, 11.0.1, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Relates
relates to MDEV-29986 Set innodb_undo_tablespaces=3 by default Closed

 Description   

origin/bb-10.11-new-innodb-defaults be6a3d9a01c4357fbb52b4eefcb91bf84a8d0d4c 2022-12-09T13:51:55+02:00
Marko assumes that the trouble is linked to MDEV-29986.
 
Scenario:
1. The server is under concurrent DDL/DML load.
2. During 1. is ongoing runs a mariabackup --backup to some other directory.
 
Snip of the error log:
```# 2022-12-09T03:58:05 [234171] | [rr 332149 3776][00] 2022-12-09 03:58:04 >> log scanned up to (315140211)
# 2022-12-09T03:58:05 [234171] | [rr 332149 3801][00] 2022-12-09 03:58:05 >> log scanned up to (315141264)
# 2022-12-09T03:58:05 [234171] | [rr 332149 3808]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3814]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3820]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3826]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3832]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3838]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3846]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3856]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3864][00] 2022-12-09 03:58:05 >> log scanned up to (315165453)
# 2022-12-09T03:58:05 [234171] | [rr 332149 3871]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3877]2022-12-09  3:58:05 0 [Warning] InnoDB: 6291456 bytes should have been read at 10485760 from (unknown file), but got only 0. Retrying.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3881]2022-12-09  3:58:05 0 [Warning] InnoDB: Retry attempts for reading partial data failed.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3890][01] 2022-12-09 03:58:05 mariabackup: xtrabackup_copy_datafile() failed.
# 2022-12-09T03:58:05 [234171] | [rr 332149 3898][00] FATAL ERROR: 2022-12-09 03:58:05 failed to copy datafile.```

Hitting this bug is very rare.

RQG:
===

  1. git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
    #
  2. GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 1abbb033ff1d798d9d0a9bacdf072fe8fd4f5deb 2022-12-09T12:56:59+01:00
  3. rqg.pl : Version 4.2.1 (2022-12)
    #
  4. $RQG_HOME/rqg.pl \
  5. --grammar=conf/engines/many_indexes.yy \
  6. --gendata=conf/engines/many_indexes.zz \
  7. --reporters=Mariabackup_linux \
  8. --redefine=conf/mariadb/redefine_file_buffering_write_through.yy \
  9. -mysqld=-loose-innodb_lock_schedule_algorithm=fcfs \
  10. -mysqld=-loose-idle_write_transaction_timeout=0 \
  11. -mysqld=-loose-idle_transaction_timeout=0 \
  12. -mysqld=-loose-idle_readonly_transaction_timeout=0 \
  13. -mysqld=-connect_timeout=60 \
  14. -mysqld=-interactive_timeout=28800 \
  15. -mysqld=-slave_net_timeout=60 \
  16. -mysqld=-net_read_timeout=30 \
  17. -mysqld=-net_write_timeout=60 \
  18. -mysqld=-loose-table_lock_wait_timeout=50 \
  19. -mysqld=-wait_timeout=28800 \
  20. -mysqld=-lock-wait-timeout=86400 \
  21. -mysqld=-innodb-lock-wait-timeout=50 \
  22. --no-mask \
  23. --queries=10000000 \
  24. --seed=random \
  25. --reporters=Backtrace \
  26. --reporters=ErrorLog \
  27. --reporters=Deadlock \
  28. --validators=None \
  29. -mysqld=-log_output=none \
  30. -mysqld=-log_bin_trust_function_creators=1 \
  31. -mysqld=-loose-debug_assert_on_not_freed_memory=0 \
  32. --engine=InnoDB \
  33. --restart_timeout=240 \
  34. -mysqld=-plugin-load-add=file_key_management.so \
  35. -mysqld=-loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
  36. -mysqld=-plugin-load-add=provider_lzo.so \
  37. -mysqld=-plugin-load-add=provider_bzip2.so \
  38. -mysqld=-plugin-load-add=provider_lzma.so \
  39. -mysqld=-plugin-load-add=provider_snappy.so \
  40. -mysqld=-plugin-load-add=provider_lz4.so \
  41. --duration=300 \
  42. -mysqld=-loose-innodb_fatal_semaphore_wait_threshold=300 \
  43. -mysqld=-innodb_file_per_table=0 \
  44. -mysqld=-loose-innodb_read_only_compressed=OFF \
  45. -mysqld=-innodb_stats_persistent=on \
  46. -mysqld=-innodb_adaptive_hash_index=off \
  47. -mysqld=-log-bin \
  48. -mysqld=-sync-binlog=1 \
  49. -mysqld=-loose-innodb_evict_tables_on_commit_debug=on \
  50. -mysqld=-loose-max-statement-time=30 \
  51. --threads=33 \
  52. -mysqld=-innodb_use_native_aio=1 \
  53. -mysqld=-loose_innodb_change_buffering=changes \
  54. -mysqld=-innodb_rollback_on_timeout=OFF \
  55. -mysqld=-innodb_page_size=16K \
  56. -mysqld=-innodb-buffer-pool-size=8M \
  57. --vardir_type=slow \
  58. <local settings>

sdp:/data/results/1670586562/Backup$ _RR_TRACE_DIR=./1_clone/rr rr replay --mark-stdio



 Comments   
Comment by Marko Mäkelä [ 2022-12-09 ]

I see two problems here. First, the error message fails to mention the file name (undo002) even though it would be readily available a couple stack frames earlier.

Second, it may be that mariadb-backup assumes that files will not shrink while they are being copied, that is, it is not compatible with innodb_undo_log_truncate=ON.

Comment by Thirunarayanan Balathandayuthapani [ 2022-12-20 ]

Patch is in bb-10.6-MDEV-30179

Comment by Marko Mäkelä [ 2022-12-20 ]

Thank you, the high level algorithm looks OK to me. I posted some suggestions.

I think that we should target already 10.5 with this fix, because that is when we last changed the format of the redo log records that are being written for undo tablespace truncation.

Comment by Thirunarayanan Balathandayuthapani [ 2023-01-02 ]

Patch is in bb-10.5-MDEV-30179

Comment by Marko Mäkelä [ 2023-01-09 ]

I posted some review comments.

Generated at Thu Feb 08 10:14:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.