[MDEV-33011] mariabackup --backup: FATAL ERROR: ... Can't open datafile cool_down/t3 Created: 2023-12-13  Updated: 2024-01-05

Status: Confirmed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.6.17
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Debarun Banerjee
Resolution: Unresolved Votes: 0
Labels: backup, rr-profile-analyzed

Attachments: File MB-1270.cfg     File MB-1270.yy    
Issue Links:
Relates
relates to MDEV-32120 [ERROR] InnoDB: Expected tablespace i... Open

 Description   

The problem was found on
   origin/10.6-MDEV-32939 028a35b50640fe94e659d58fe8c7241af2696e89
but Marko means its not caused by the modifications for 10.6-MDEV-32939.
 
Scenario:
1. Start the source DB server, generate some unimportant initial data
2. One connection runs a random mix of
     - CREATE TABLE <table> IF NOT EXISTS ...
     - RENAME TABLE <table>  TO cool_down . <table>
     Please note that the schema "cool_down" was never created.
3. During 2. is ongoing some "mariabackup --backup" connects to the source DB server
     and wants to backup data to some other location.
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19322][00] 2023-12-13 13:53:51 Finished backing up non-InnoDB tables and files
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19406][01] 2023-12-13 13:53:51 Copying /data/results/1702475558/8/1/data//aria_log.00000001 to /data/results/1702475558/8/1_clone/data/aria_log.00000001
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19426][01] 2023-12-13 13:53:51         ...done
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19508][01] 2023-12-13 13:53:51 Copying /data/results/1702475558/8/1/data//aria_log_control to /data/results/1702475558/8/1_clone/data/aria_log_control
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19528][01] 2023-12-13 13:53:51         ...done
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19855][00] 2023-12-13 13:53:51 Waiting for log copy thread to read lsn 80003
# 2023-12-13T13:53:51 [1677124] | [rr 1712197 19954][00] FATAL ERROR: 2023-12-13 13:53:51 Can't open datafile cool_down/t3
 
It wonders me that some
     RENAME <existing table> TO <not existing schema> . <tablename> which should fail during parsing/check prerequisites
     gets memorized by the system at all.
     And if at all than why as something which was executed with success/needs attention when making a backup?
 
pluto:/data/results/1702475558/MB-1270$ _RR_TRACE_DIR=./1_clone/rr rr replay --mark-stdio # mariabackup --backup --> Can't open datafile cool_down/t3
pluto:/data/results/1702475558/MB-1270$ _RR_TRACE_DIR=./1/rr rr replay --mark-stdio # source DB server
 
RQG
------
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 1ed8eb38d6f08d76cb64aebdec71e668389df20f 2023-12-04T18:55:11+01:00
# rqg.pl  : Version 4.4.0 (2023-08)
#
# $RQG_HOME/rqg.pl \
# --duration=300 \
# --queries=10000000 \
# --no_mask \
# --seed=random \
# --engine=InnoDB \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --gendata=conf/mariadb/table_stress.zz \
# --max_gd_duration=1200 \
# --rpl_mode=none \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--innodb-buffer-pool-size=5M \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--log-output=none \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--innodb_page_size=4K \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--loose-max-statement-time=30 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--sql_mode=STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION \
# --mysqld=--loose-innodb-log-file-size=200M \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --reporters=Deadlock,Mariabackup_linux,None \
# --validators=None \
# --grammar=MB-1270.yy \
# --threads=1 \
# <local settings>
 
 



 Comments   
Comment by Marko Mäkelä [ 2023-12-13 ]

As far as I can tell, this only requires the following to be executed on the server:

CREATE DATABASE test;
CREATE TABLE test.t3 …;
-- fails because of non-existing schema cool_down
RENAME TABLE test.t3 TO cool_down.t3;

Exactly at LSN 80003, the table was renamed back to test.t3.

Comment by Marko Mäkelä [ 2023-12-22 ]

There indeed was no mkdir call for the schema cool_down in the server execution trace. I only saw one for CREATE SCHEMA IF NOT EXISTS rqg. Bootstrap invoked mkdir in mysql_create_db() for the schemata test, mysql, and performance_schema, in that order. I assume that an external process had created the directory. MariaDB or InnoDB does not keep much track of schema directories. The only thing I know of is the db.opt file that will be accessed by the SQL layer.

LSN 80003 corresponds to the following (empty after rollback) transaction commit:

#8  0x000055ef8602e6a0 in trx_t::rollback_finish (this=0x22fb58c86940) at /data/Server/10.6-MDEV-32939_3/storage/innobase/trx/trx0roll.cc:66
#9  trx_t::rollback_low (this=this@entry=0x22fb58c86940, savept=savept@entry=0x0) at /data/Server/10.6-MDEV-32939_3/storage/innobase/trx/trx0roll.cc:133
#10 0x000055ef8602512f in trx_t::rollback (this=this@entry=0x22fb58c86940, savept=savept@entry=0x0) at /data/Server/10.6-MDEV-32939_3/storage/innobase/trx/trx0roll.cc:176
#11 0x000055ef85eba90f in row_rename_table_for_mysql (old_name=old_name@entry=0x54162fbd0140 "test/t3", new_name=new_name@entry=0x54162fbcff00 "cool_down/t3", trx=trx@entry=0x22fb58c86940, 
    use_fk=use_fk@entry=true) at /data/Server/10.6-MDEV-32939_3/storage/innobase/row/row0mysql.cc:2846
#12 0x000055ef85aa4c40 in innobase_rename_table (trx=trx@entry=0x22fb58c86940, from=from@entry=0x54162fbd0f80 "./test/t3", to=to@entry=0x54162fbd11c0 "./cool_down/t3", use_fk=use_fk@entry=true)
    at /data/Server/10.6-MDEV-32939_3/storage/innobase/handler/ha_innodb.cc:13824
#13 0x000055ef85aee415 in ha_innobase::rename_table (this=<optimized out>, from=<optimized out>, to=<optimized out>) at /data/Server/10.6-MDEV-32939_3/storage/innobase/handler/ha_innodb.cc:14230
#14 0x000055ef850e3fc5 in handler::ha_rename_table (this=this@entry=0x62b0001275d8, from=from@entry=0x54162fbd0f80 "./test/t3", to=to@entry=0x54162fbd11c0 "./cool_down/t3")
    at /data/Server/10.6-MDEV-32939_3/sql/handler.cc:5393
#15 0x000055ef84abf636 in mysql_rename_table (base=base@entry=0x615000002118, old_db=old_db@entry=0x62b000126408, old_name=old_name@entry=0x54162fbd1f20, new_db=new_db@entry=0x62b000126b38, 
    new_name=new_name@entry=0x54162fbd1f30, id=id@entry=0x54162fbd1f40, flags=<optimized out>) at /data/Server/10.6-MDEV-32939_3/sql/sql_table.cc:5055

So, we indeed failed to rename the file from test/t3.ibd to cool_down/t3.ibd: rename returned -1 and set errno=ENOENT (No such file or directory):

#0  rename (old=0x602000019310 "./test/t3.ibd", new=0x6030001425d0 "./cool_down/t3.ibd") at ../sysdeps/unix/sysv/linux/rename.c:27
#1  0x000055ef85d42081 in os_file_rename_func (oldpath=oldpath@entry=0x602000019310 "./test/t3.ibd", newpath=newpath@entry=0x6030001425d0 "./cool_down/t3.ibd")
    at /data/Server/10.6-MDEV-32939_3/storage/innobase/os/os0file.cc:1432
#2  0x000055ef85d18459 in mtr_t::commit_file (this=this@entry=0x54162fbcd530, space=..., name=name@entry=0x6030001425d0 "./cool_down/t3.ibd")
    at /data/Server/10.6-MDEV-32939_3/storage/innobase/mtr/mtr0mtr.cc:386
#3  0x000055ef862f32f3 in fil_space_t::rename (this=0x6120000e0a40, path=path@entry=0x6030001425d0 "./cool_down/t3.ibd", log=log@entry=true, replace=replace@entry=false)
    at /data/Server/10.6-MDEV-32939_3/storage/innobase/fil/fil0fil.cc:1894
#4  0x000055ef8623d46b in dict_table_t::rename_tablespace (this=this@entry=0x618000070908, new_name=..., replace=<optimized out>) at /data/Server/10.6-MDEV-32939_3/storage/innobase/dict/dict0dict.cc:1458
#5  0x000055ef8625b95b in dict_table_rename_in_cache (table=table@entry=0x618000070908, new_name=..., replace_new_file=replace_new_file@entry=false) at /data/Server/10.6-MDEV-32939_3/include/span.h:121
#6  0x000055ef85ebac34 in row_rename_table_for_mysql (old_name=old_name@entry=0x54162fbd0140 "test/t3", new_name=new_name@entry=0x54162fbcff00 "cool_down/t3", trx=trx@entry=0x22fb58c86940, 
    use_fk=use_fk@entry=true) at /data/Server/10.6-MDEV-32939_3/include/span.h:66
#7  0x000055ef85aa4c40 in innobase_rename_table (trx=trx@entry=0x22fb58c86940, from=from@entry=0x54162fbd0f80 "./test/t3", to=to@entry=0x54162fbd11c0 "./cool_down/t3", use_fk=use_fk@entry=true)
    at /data/Server/10.6-MDEV-32939_3/storage/innobase/handler/ha_innodb.cc:13824
#8  0x000055ef85aee415 in ha_innobase::rename_table (this=<optimized out>, from=<optimized out>, to=<optimized out>) at /data/Server/10.6-MDEV-32939_3/storage/innobase/handler/ha_innodb.cc:14230
#9  0x000055ef850e3fc5 in handler::ha_rename_table (this=this@entry=0x62b0001275d8, from=from@entry=0x54162fbd0f80 "./test/t3", to=to@entry=0x54162fbd11c0 "./cool_down/t3")
    at /data/Server/10.6-MDEV-32939_3/sql/handler.cc:5393
#10 0x000055ef84abf636 in mysql_rename_table (base=base@entry=0x615000002118, old_db=old_db@entry=0x62b000126408, old_name=old_name@entry=0x54162fbd1f20, new_db=new_db@entry=0x62b000126b38, 
    new_name=new_name@entry=0x54162fbd1f30, id=id@entry=0x54162fbd1f40, flags=<optimized out>) at /data/Server/10.6-MDEV-32939_3/sql/sql_table.cc:5055

The actual bug seems to be that we write a FILE_RENAME record for attempting to rename the file (which we must do according to the rules of write-ahead logging), but nothing for "renaming it back" when we roll back the operation.

Generated at Thu Feb 08 10:35:43 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.