Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24044

mariabackup --backup: [ERROR] InnoDB: File <file>: 'opendir' returned OS error 71

Details

    Description

      Workflow:
      1. Start the DB Server + generate some initial data
      2. 9 sessions run concurrent a DDL/DML mix
           One of these sessions runs all 60s
               DROP SCHEMA cooldown; CREATE SCHEMA cooldown;
               RENAME test.<table> TO cooldown.<table>; CREATE test.<table>
      3. During 2. is ongoing
          mariabackup --backup ....
       
      And 3. fails on the n'th ttempt with
      # 2020-10-27T15:01:38 [4083556] | # 2020-10-27T14:59:45 [4083556] Executing backup: _RR_TRACE_DIR=/dev/shm/vardir/1603810119/39/1/rr rr record "--chaos" /home/mleich/Server_bin/10.5B_asan_Og//bin/mariabackup --host=127.0.0.1 --user=root --password=''  --innodb-use-native-aio=0  --port=25360 --backup --datadir=/dev/shm/vardir/1603810119/39/1/data --target-dir=/dev/shm/vardir/1603810119/39/1_clone/data
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:47 Connecting to MySQL server host: 127.0.0.1, user: root, password: set, port: 25360, socket: not set
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 Using server version 10.5.7-MariaDB-debug-log
      # 2020-10-27T15:01:38 [4083556] | /home/mleich/Server_bin/10.5B_asan_Og//bin/mariabackup based on MariaDB server 10.5.7-MariaDB Linux (x86_64)
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 uses posix_fadvise().
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 cd to /dev/shm/vardir/1603810119/39/1/data/
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 open files limit requested 0, set to 1048576
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 InnoDB: The universal page size of the database is set to 8192.
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 mariabackup: using the following InnoDB configuration:
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 innodb_data_home_dir = 
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 innodb_data_file_path = ibdata1:12M:autoextend
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 innodb_log_group_home_dir = ./
      # 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [Note] InnoDB: Number of pools: 1
      # 2020-10-27T15:01:38 [4083556] | [00] 2020-10-27 14:59:48 mariabackup: Generating a list of tablespaces
      # 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: Operating system error number 2 in a file operation.
      # 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: The error means the system cannot find the path specified.
      # 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: If you are installing InnoDB, remember that you must create directories yourself, InnoDB does not create them.
      # 2020-10-27T15:01:38 [4083556] | 2020-10-27 14:59:48 0 [ERROR] InnoDB: File ./cool_down: 'opendir' returned OS error 71. Cannot continue operation
      # 2020-10-27T15:01:38 [4083556] | 201027 14:59:48 [ERROR] mysqld got signal 6 ;
       
      The corresponding activity from the running server is
           rr:/data/mleich/RQG_mleich/storage/1603810119/BACKUP/dev/shm/vardir/1603810119/39/1/rr
           _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0
      Thread 2 hit Breakpoint 1, rmdir () at ../sysdeps/unix/syscall-template.S:78
      78	../sysdeps/unix/syscall-template.S: No such file or directory.
      (rr) bt
      #0  rmdir () at ../sysdeps/unix/syscall-template.S:78
      #1  0x00005608eaff5e73 in rm_dir_w_symlink (org_path=org_path@entry=0x79ca315d5510 "./cool_down/", send_error=send_error@entry=1 '\001') at /home/mleich/Server/10.5B/sql/sql_db.cc:1255
      #2  0x00005608eaffc985 in mysql_rm_db_internal (thd=0x62b0000d9218, db=0x62b0000de060, if_exists=<optimized out>, silent=silent@entry=false) at /home/mleich/Server/10.5B/sql/sql_db.cc:986
      #3  0x00005608eaffdcb0 in mysql_rm_db (thd=thd@entry=0x62b0000d9218, db=db@entry=0x62b0000de060, if_exists=<optimized out>) at /home/mleich/Server/10.5B/sql/sql_db.cc:1103
      #4  0x00005608eb14be92 in mysql_execute_command (thd=thd@entry=0x62b0000d9218) at /home/mleich/Server/10.5B/sql/structs.h:559
      #5  0x00005608eb0fec7c in mysql_parse (thd=thd@entry=0x62b0000d9218, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x79ca315d6e50, is_com_multi=is_com_multi@entry=false, 
          is_next_command=is_next_command@entry=false) at /home/mleich/Server/10.5B/sql/sql_parse.cc:8010
      #6  0x00005608eb1302c6 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000d9218, packet=packet@entry=0x629000b09219 " DROP SCHEMA cool_down  /* E_R Thread1 QNO 1482 CON_ID 15 */ ", 
          packet_length=packet_length@entry=61, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /home/mleich/Server/10.5B/sql/sql_class.h:1252
      #7  0x00005608eb13b76d in do_command (thd=0x62b0000d9218) at /home/mleich/Server/10.5B/sql/sql_parse.cc:1352
      #8  0x00005608eb5d0a0c in do_handle_one_connection (connect=<optimized out>, connect@entry=0x6080000088b8, put_in_cache=put_in_cache@entry=true) at /home/mleich/Server/10.5B/sql/sql_connect.cc:1410
      #9  0x00005608eb5d1c42 in handle_one_connection (arg=arg@entry=0x6080000088b8) at /home/mleich/Server/10.5B/sql/sql_connect.cc:1312
      #10 0x00005608ec3f2011 in pfs_spawn_thread (arg=0x61500000c398) at /home/mleich/Server/10.5B/storage/perfschema/pfs.cc:2201
      #11 0x00002d967df7d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
      #12 0x000035456250a103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr) when
      Current event: 484003
       
      origin/10.5 00ddea4f2f7d20e09a28655b2f30a4c014e9344a 2020-10-27T09:52:42+02:00
       
      RQG
      ====
      git clone https://github.com/mleich1/rqg --branch experimental RQG
       
      perl rqg.pl \
      --grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
      --gendata=conf/mariadb/table_stress.zz \
      --gendata_sql=conf/mariadb/table_stress.sql \
      --reporters=Mariabackup_linux \
      --mysqld=--innodb_use_native_aio=1 \
      --mysqld=--innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --no-mask \
      --queries=10000000 \
      --seed=random \
      --reporters=Backtrace \
      --reporters=ErrorLog \
      --reporters=Deadlock1 \
      --validators=None \
      --mysqld=--log_output=none \
      --mysqld=--log-bin \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --engine=InnoDB \
      --restart_timeout=120 \
      --duration=400 \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=400 \
      --mysqld=--loose-innodb-sync-debug \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--loose-max-statement-time=30 \
      --threads=9 \
      --mysqld=--innodb_page_size=8K \
      --mysqld=--innodb-buffer-pool-size=256M \
      --duration=400 \
      --no_mask \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<local settings> \
      --script_debug=_nix_ \
      --rr=Extended \
      --rr_options=--chaos
       
      Maybe this problem is a sibling of
      https://jira.mariadb.org/browse/MDEV-18200
                
      

      Attachments

        Issue Links

          Activity

            mleich Matthias Leich added a comment - - edited

            rr:/data/mleich/RQG_mleich/storage/1603810119/BACKUP/dev/shm/vardir/1603810119/39/1/rr
            _RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-6

            (rr) bt
            #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
            #1 0x000016be3c224859 in __GI_abort () at abort.c:79
            #2 0x0000555ef1d9ade1 in os_file_handle_error_cond_exit (name=name@entry=0x60c000001018 "./cool_down", operation=operation@entry=0x555ef2e99160 "opendir", should_abort=should_abort@entry=true, on_error_silent=on_error_silent@entry=false)
            at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:3453
            #3 0x0000555ef1d9af21 in os_file_handle_error (name=name@entry=0x60c000001018 "./cool_down", operation=operation@entry=0x555ef2e99160 "opendir") at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:210
            #4 0x0000555ef1d9b2e2 in os_file_opendir (dirname=0x60c000001018 "./cool_down", error_is_fatal=<optimized out>) at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:1196
            #5 0x0000555eeff76b88 in enumerate_ibd_files (callback=<optimized out>) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:3193
            #6 0x0000555eeff77192 in xb_load_tablespaces () at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:3392
            #7 0x0000555eeff7be57 in xtrabackup_backup_func () at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:4134
            #8 0x0000555eeff8a690 in main_low (argv=<optimized out>) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:6307
            #9 0x0000555eeff8aafc in main (argc=9, argv=0x7fffeb9e31f8) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:6117
            (rr)

            Archive:
            rr:/data/mleich/RQG_mleich/storage/1603810119/000080.tgz

            mleich Matthias Leich added a comment - - edited rr:/data/mleich/RQG_mleich/storage/1603810119/BACKUP/dev/shm/vardir/1603810119/39/1/rr _RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-6 (rr) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x000016be3c224859 in __GI_abort () at abort.c:79 #2 0x0000555ef1d9ade1 in os_file_handle_error_cond_exit (name=name@entry=0x60c000001018 "./cool_down", operation=operation@entry=0x555ef2e99160 "opendir", should_abort=should_abort@entry=true, on_error_silent=on_error_silent@entry=false) at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:3453 #3 0x0000555ef1d9af21 in os_file_handle_error (name=name@entry=0x60c000001018 "./cool_down", operation=operation@entry=0x555ef2e99160 "opendir") at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:210 #4 0x0000555ef1d9b2e2 in os_file_opendir (dirname=0x60c000001018 "./cool_down", error_is_fatal=<optimized out>) at /home/mleich/Server/10.5B/storage/innobase/os/os0file.cc:1196 #5 0x0000555eeff76b88 in enumerate_ibd_files (callback=<optimized out>) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:3193 #6 0x0000555eeff77192 in xb_load_tablespaces () at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:3392 #7 0x0000555eeff7be57 in xtrabackup_backup_func () at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:4134 #8 0x0000555eeff8a690 in main_low (argv=<optimized out>) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:6307 #9 0x0000555eeff8aafc in main (argc=9, argv=0x7fffeb9e31f8) at /home/mleich/Server/10.5B/extra/mariabackup/xtrabackup.cc:6117 (rr) Archive: rr:/data/mleich/RQG_mleich/storage/1603810119/000080.tgz

            I am assigning this to the current assignee of MDEV-18200.

            This part of mariabackup --backup has really nothing to do with InnoDB itself. The only code that mariabackup --backup needs to share with InnoDB is the redo log and data page validation. It is also unnecessarily using some InnoDB code for opening files.

            The root cause of this bug should be some kind of race condition, which should be very easy to figure out with rr replay of the recorded traces.

            marko Marko Mäkelä added a comment - I am assigning this to the current assignee of MDEV-18200 . This part of mariabackup --backup has really nothing to do with InnoDB itself. The only code that mariabackup --backup needs to share with InnoDB is the redo log and data page validation. It is also unnecessarily using some InnoDB code for opening files. The root cause of this bug should be some kind of race condition, which should be very easy to figure out with rr replay of the recorded traces.
            danblack Daniel Black added a comment -

            This look like a failure of BACKUP STAGES to stall on one of the DLL changes during backup.

            Like MDEV-18200 do we want mariabackup doing assertions on filesystem issues? If what are the merits/antimerits of:

            diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc
            index 4d6a82d0fe8..a3dfef5aa63 100644
            --- a/storage/innobase/os/os0file.cc
            +++ b/storage/innobase/os/os0file.cc
            @@ -5185,7 +5185,11 @@ os_file_handle_error_cond_exit(
                            }
             
                            if (should_abort) {
            +#ifdef MYSQL_SERVER
                                    abort();
            +#else
            +                       exit(1);
            +#endif
                            }
                    }
             
            

            danblack Daniel Black added a comment - This look like a failure of BACKUP STAGES to stall on one of the DLL changes during backup. Like MDEV-18200 do we want mariabackup doing assertions on filesystem issues? If what are the merits/antimerits of: diff --git a/storage/innobase/os/os0file.cc b/storage/innobase/os/os0file.cc index 4d6a82d0fe8..a3dfef5aa63 100644 --- a/storage/innobase/os/os0file.cc +++ b/storage/innobase/os/os0file.cc @@ -5185,7 +5185,11 @@ os_file_handle_error_cond_exit( } if (should_abort) { +#ifdef MYSQL_SERVER abort(); +#else + exit(1); +#endif } }
            danblack Daniel Black added a comment -

            The assertions where the original problem occured where resolved in MDEV-25710 when the code was moved out to the server into mariabackup directly.

            There are now non-assertion based statements related to opendir failures.

            danblack Daniel Black added a comment - The assertions where the original problem occured where resolved in MDEV-25710 when the code was moved out to the server into mariabackup directly. There are now non-assertion based statements related to opendir failures.
            danblack Daniel Black added a comment -

            closing with same fixed versions as MDEV-25710

            danblack Daniel Black added a comment - closing with same fixed versions as MDEV-25710

            People

              serg Sergei Golubchik
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.