[MDEV-24044] mariabackup --backup: [ERROR] InnoDB: File <file>: 'opendir' returned OS error 71 Created: 2020-10-28  Updated: 2023-09-26  Resolved: 2023-09-26

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Storage Engine - InnoDB
Affects Version/s: 10.5.7
Fix Version/s: 10.6.1, 10.4.20, 10.5.11

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Blocks
blocks MDEV-18200 MariaBackup full backup failed with I... Closed
Relates
relates to MDEV-25710 Dead code os_file_opendir() in the se... Closed

 Description   

Workflow:
1. Start the DB Server + generate some initial data
2. 9 sessions run concurrent a DDL/DML mix
     One of these sessions runs all 60s
         DROP SCHEMA cooldown; CREATE SCHEMA cooldown;
         RENAME test.<table> TO cooldown.<table>; CREATE test.<table>
3. During 2. is ongoing
    mariabackup --backup ....
 
And 3. fails on the n'th ttempt with
# 2020-10-27T15:01:38 [4083556] | # 2020-10-27T14:59:45 [4083556] Executing backup: _RR_TRACE_DIR=/dev/shm/vardir/1603810119/39/1/rr rr record "--chaos" /home/mleich/Server_bin/10.5B_asan_Og//bin/mariabackup --host=127.0.0.1 --user=root --password=''  --innodb-use-native-aio=0  --port=25360 --backup --datadir=/dev/shm/vardir/1603810119/39/1/data --target-dir=/dev/shm/vardir/1603810119/39/1_clone/data
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:47 Connecting to MySQL server host: 127.0.0.1, user: root, password: set, port: 25360, socket: not set
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 Using server version 10.5.7-MariaDB-debug-log
# 2020-10-27T15:01:38 [4083556] | /home/mleich/Server_bin/10.5B_asan_Og//bin/mariabackup based on MariaDB server 10.5.7-MariaDB Linux (x86_64)
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 uses posix_fadvise().
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 cd to /dev/shm/vardir/1603810119/39/1/data/
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 open files limit requested 0, set to 1048576
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 InnoDB: The universal page size of the database is set to 8192.
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 mariabackup: using the following InnoDB configuration:
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 innodb_data_home_dir = 
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 innodb_data_file_path = ibdata1:12M:autoextend
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 innodb_log_group_home_dir = ./
# 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [Note] InnoDB: Number of pools: 1
# 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 mariabackup: Generating a list of tablespaces
# 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
# 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
# 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: If you are installing InnoDB, remember that you must create directories yourself, InnoDB does not create them.
# 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: File ./cool_down: 'opendir' returned OS error 71. Cannot continue operation
# 2020-10-27T15:01:38 [4083556] | 201027 14:59:48 [ERROR] mysqld got signal 6 ;
 
The corresponding activity from the running server is
     rr:/data/mleich/RQG_mleich/storage/1603810119/BACKUP/dev/shm/vardir/1603810119/39/1/rr
     _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0
Thread 2 hit Breakpoint 1, rmdir () at ../sysdeps/unix/syscall-template.S:78
78	../sysdeps/unix/syscall-template.S: No such file or directory.
(rr) bt
#0  rmdir () at ../sysdeps/unix/syscall-template.S:78
#1  0x00005608eaff5e73 in rm_dir_w_symlink (org_path=org_path@entry=0x79ca315d5510 "./cool_down/", send_error=send_error@entry=1 '\001') at /home/mleich/Server/10.5B/sql/sql_db.cc:1255
#2  0x00005608eaffc985 in mysql_rm_db_internal (thd=0x62b0000d9218, db=0x62b0000de060, if_exists=<optimized out>, silent=silent@entry=false) at /home/mleich/Server/10.5B/sql/sql_db.cc:986
#3  0x00005608eaffdcb0 in mysql_rm_db (thd=thd@entry=0x62b0000d9218, db=db@entry=0x62b0000de060, if_exists=<optimized out>) at /home/mleich/Server/10.5B/sql/sql_db.cc:1103
#4  0x00005608eb14be92 in mysql_execute_command (thd=thd@entry=0x62b0000d9218) at /home/mleich/Server/10.5B/sql/structs.h:559
#5  0x00005608eb0fec7c in mysql_parse (thd=thd@entry=0x62b0000d9218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x79ca315d6e50, is_com_multi=is_com_multi@entry=false, 
    is_next_command=is_next_command@entry=false) at /home/mleich/Server/10.5B/sql/sql_parse.cc:8010
#6  0x00005608eb1302c6 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000d9218, packet=packet@entry=0x629000b09219 " DROP SCHEMA cool_down  /* E_R Thread1 QNO 1482 CON_ID 15 */ ", 
    packet_length=packet_length@entry=61, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/Server/10.5B/sql/sql_class.h:1252
#7  0x00005608eb13b76d in do_command (thd=0x62b0000d9218) at /home/mleich/Server/10.5B/sql/sql_parse.cc:1352
#8  0x00005608eb5d0a0c in do_handle_one_connection (connect=<optimized out>, connect@entry=0x6080000088b8, put_in_cache=put_in_cache@entry=true) at /home/mleich/Server/10.5B/sql/sql_connect.cc:1410
#9  0x00005608eb5d1c42 in handle_one_connection (arg=arg@entry=0x6080000088b8) at /home/mleich/Server/10.5B/sql/sql_connect.cc:1312
#10 0x00005608ec3f2011 in pfs_spawn_thread (arg=0x61500000c398) at /home/mleich/Server/10.5B/storage/perfschema/pfs.cc:2201
#11 0x00002d967df7d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x000035456250a103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 484003
 
origin/10.5 00ddea4f2f7d20e09a28655b2f30a4c014e9344a 2020-10-27T09:52:42+02:00
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--reporters=Mariabackup_linux \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--duration=400 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=400 \
--mysqld=--loose-innodb-sync-debug \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--loose-max-statement-time=30 \
--threads=9 \
--mysqld=--innodb_page_size=8K \
--mysqld=--innodb-buffer-pool-size=256M \
--duration=400 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_ \
--rr=Extended \
--rr_options=--chaos
 
Maybe this problem is a sibling of
https://jira.mariadb.org/browse/MDEV-18200
          



 Comments   
Comment by Matthias Leich [ 2020-10-28 ]

rr:/data/mleich/RQG_mleich/storage/1603810119/BACKUP/dev/shm/vardir/1603810119/39/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-6

(rr) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x000016be3c224859 in __GI_abort () at abort.c:79
#2 0x0000555ef1d9ade1 in os_file_handle_error_cond_exit (name=name@entry=0x60c000001018 "./cool_down", operation=operation@entry=0x555ef2e99160 "opendir", should_abort=should_abort@entry=true, on_error_silent=on_error_silent@entry=false)
at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:3453
#3 0x0000555ef1d9af21 in os_file_handle_error (name=name@entry=0x60c000001018 "./cool_down", operation=operation@entry=0x555ef2e99160 "opendir") at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:210
#4 0x0000555ef1d9b2e2 in os_file_opendir (dirname=0x60c000001018 "./cool_down", error_is_fatal=<optimized out>) at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:1196
#5 0x0000555eeff76b88 in enumerate_ibd_files (callback=<optimized out>) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:3193
#6 0x0000555eeff77192 in xb_load_tablespaces () at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:3392
#7 0x0000555eeff7be57 in xtrabackup_backup_func () at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:4134
#8 0x0000555eeff8a690 in main_low (argv=<optimized out>) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:6307
#9 0x0000555eeff8aafc in main (argc=9, argv=0x7fffeb9e31f8) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:6117
(rr)

Archive:
rr:/data/mleich/RQG_mleich/storage/1603810119/000080.tgz

Comment by Marko Mäkelä [ 2021-09-08 ]

I am assigning this to the current assignee of MDEV-18200.

This part of mariabackup --backup has really nothing to do with InnoDB itself. The only code that mariabackup --backup needs to share with InnoDB is the redo log and data page validation. It is also unnecessarily using some InnoDB code for opening files.

The root cause of this bug should be some kind of race condition, which should be very easy to figure out with rr replay of the recorded traces.

Comment by Daniel Black [ 2022-08-29 ]

This look like a failure of BACKUP STAGES to stall on one of the DLL changes during backup.

Like MDEV-18200 do we want mariabackup doing assertions on filesystem issues? If what are the merits/antimerits of:

diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc
index 4d6a82d0fe8..a3dfef5aa63 100644
--- a/storage/innobase/os/os0file.cc
+++ b/storage/innobase/os/os0file.cc
@@ -5185,7 +5185,11 @@ os_file_handle_error_cond_exit(
                }
 
                if (should_abort) {
+#ifdef MYSQL_SERVER
                        abort();
+#else
+                       exit(1);
+#endif
                }
        }
 

Comment by Daniel Black [ 2023-09-26 ]

The assertions where the original problem occured where resolved in MDEV-25710 when the code was moved out to the server into mariabackup directly.

There are now non-assertion based statements related to opendir failures.

Comment by Daniel Black [ 2023-09-26 ]

closing with same fixed versions as MDEV-25710

Generated at Thu Feb 08 09:27:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.