[MDEV-22250] InnoDB: Failing assertion: opt_no_lock during mariabackup --backup ... (new) Created: 2020-04-15  Updated: 2023-12-08

Status: Open
Project: MariaDB Server
Component/s: Backup, Full-text Search, Locking, mariabackup
Affects Version/s: 10.4.11, 10.4.13, 10.5.3, 10.6
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Michael Widenius
Resolution: Unresolved Votes: 1
Labels: rr-profile-analyzed

Attachments: File MDEV-22250.yy    
Issue Links:
Relates
relates to MDEV-13825 mariabackup --lock-ddl-per-table does... Closed
relates to MDEV-28399 Assertion failure in dict_table_check... Closed
relates to MDEV-31904 mariabackup --backup: [ERROR] InnoDB:... Open

 Description   

Problem found during RQG testing invoking Mariabackup on
origin/10.3, 10.3 84db10f27bdb4c8d9edf7f554afdcd2a24e3285a 2020-04-15T09:56:03+03:00 build with debug
 
RQG runs
- a DDL mix in several sessions (here 10)
- from time to time
  1. Make a backup
  2. Create a DB Server based on that backup at some other location than the original one
  3. Start that server and make certain consistency checks
  4. Stop that cloned server and throw all his data away
 
One of the mariabackup ... --backup ... fails with
mariabackup: Stopping log copying thread.[00] 2020-04-15 18:25:53 DDL tracking :  create 1054 "./test/FTS_000000000000042a_00000000000004de_INDEX_3.ibd": 210039
| 2020-04-15 18:25:53 0x7fa057b37700  InnoDB: Assertion failure in file extra/mariabackup/xtrabackup.cc line 661
InnoDB: Failing assertion: opt_no_lock
...
# 2020-04-15T18:25:57 [26096] | Thread pointer: 0x0
# 2020-04-15T18:25:57 [26096] | Attempting backtrace. You can use the following information to find out
# 2020-04-15T18:25:57 [26096] | where mysqld died. If you see no messages after this, something went
# 2020-04-15T18:25:57 [26096] | terribly wrong...
# 2020-04-15T18:25:57 [26096] | stack_bottom = 0x0 thread_stack 0x49000
# 2020-04-15T18:25:57 [26096] | /home/mleich/Server_bin/10.4_debug/bin/mariabackup(my_print_stacktrace+0x40)[0x5633adb21355]
# 2020-04-15T18:25:57 [26096] | ./home/mleich/Server_bin/10.4_debug/bin/mariabackup(handle_fatal_signal+0x3ae)[0x5633ad26fe63]
# 2020-04-15T18:25:57 [26096] | ..../lib/x86_64-linux-gnu/libpthread.so.0(+0x13150)[0x7fa05a776150]
# 2020-04-15T18:25:57 [26096] | /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fa0585820bb]
# 2020-04-15T18:25:57 [26096] | ./lib/x86_64-linux-gnu/libc.so.6(abort+0x16d)[0x7fa058583f5d]
# 2020-04-15T18:25:57 [26096] | bin/mariabackup(+0x11bcc95)[0x5633ad7a8c95]
# 2020-04-15T18:25:57 [26096] |bin/mariabackup(+0x7004e4)[0x5633accec4e4]
# 2020-04-15T18:25:57 [26096] | bin/mariabackup(+0x100a6c9)[0x5633ad5f66c9]
# 2020-04-15T18:25:57 [26096] | bin/mariabackup(+0x100d1e8)[0x5633ad5f91e8]
# 2020-04-15T18:25:57 [26096] | ut/ut0mem.cc:39(ut_str3cat(char const*, char const*, char const*))[0x5633ad5fddd4]
# 2020-04-15T18:25:57 [26096] | mariabackup/xtrabackup.cc:662(backup_file_op_fail(unsigned long, unsigned char const*, unsigned char const*, unsigned long, unsigned char const*, unsigned long))[0x5633ad5feeb1]
# 2020-04-15T18:25:57 [26096] | .log/log0recv.cc:623(fil_name_parse(unsigned char*, unsigned char const*, page_id_t, mlog_id_t, bool))[0x5633accefbe0]
# 2020-04-15T18:25:57 [26096] | log/log0recv.cc:2525(recv_parse_log_rec(mlog_id_t*, unsigned char*, unsigned char*, unsigned long*, unsigned long*, bool, unsigned char**))[0x5633acceff03]
# 2020-04-15T18:25:57 [26096] | mariabackup/xtrabackup.cc:2695(xtrabackup_copy_log(unsigned long, unsigned long, bool))[0x5633accf0107]
# 2020-04-15T18:25:57 [26096] | ./lib/x86_64-linux-gnu/libpthread.so.0(+0x77fc)[0x7fa05a76a7fc]
# 2020-04-15T18:25:57 [26096] | /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fa05865fb5f]
 
rqg.pl \                                                 
--duration=300 \
--queries=10000000 \
--threads=10 \
--no_mask \
--seed=random \
--rpl_mode=none \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--engine=InnoDB \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--net_read_timeout=30 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--innodb-lock-wait-timeout=50 \
--mysqld=--connect_timeout=60 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--log-bin \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--wait_timeout=28800 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--mysqld=--log-output=none \ 
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--log_bin_trust_function_creators=1 \
--reporters=Backtrace,Deadlock1,ErrorLog,Mariabackup \
--validators=None \
--grammar=MDEV-22250.yy \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings>
 
You might need several attempts for replaying.
 
Also a replay of that problem on 10.5.3
origin/10.5 2f7d91bb6ce7bb34dd644e30590189bce37fb8f1 2020-04-14T18:43:03+03:00
 
No replay of that problem on 10.3.23
10.3 84db10f27bdb4c8d9edf7f554afdcd2a24e3285a 2020-04-15T09:56:03+03:00



 Comments   
Comment by Marko Mäkelä [ 2022-06-23 ]

According to mleich, the last occurrence of this was

bb-10.6-MDEV-26217 9a60243c742b891699c9e34797b49017b0493fac 2022-04-25T20:56:17+03:00

(We rarely test earlier major versions than 10.6.)

That branch included 10.6 as of the first commit of MDEV-15250 as well as fixes of MDEV-26217 and MDEV-26753.

Since that time, the most likely change that could have fixed this is MDEV-28399, on April 28. That could have fixed this bug (in MariaDB 10.6.8, 10.7.4, 10.8.3, 10.9.1). I will not close this bug yet.

Note: Before 10.6, DDL operations were not crash-safe. That could affect the ability of mariadb-backup to produce correct backups while DDL is being executed. This could be very difficult to fix in older major versions.

Comment by Matthias Leich [ 2023-08-02 ]

pluto:/data/results/1690368524/MDEV-22250$ _RR_TRACE_DIR=./1_clone/rr/ rr replay --mark-stdio

Comment by Marko Mäkelä [ 2023-08-02 ]

Here is the most recent trace:

ssh pluto
rr replay /data/results/1690368524/MDEV-22250/1_clone/rr/mariadb-backup-0

10.6 b102872ad50cce5959ad95369740766d14e9e48c

[00] 2023-07-26 15:06:25 Finished backing up non-InnoDB tables and files
[01] 2023-07-26 15:06:25 Copying /data/results/1690368524/268/1/data//aria_log_control to /data/results/1690368524/268/1_clone/data/aria_log_control
[01] 2023-07-26 15:06:25         ...done
[01] 2023-07-26 15:06:25 Copying /data/results/1690368524/268/1/data//aria_log.00000001 to /data/results/1690368524/268/1_clone/data/aria_log.00000001
[01] 2023-07-26 15:06:25         ...done
[00] 2023-07-26 15:06:25 Waiting for log copy thread to read lsn 15537355
[New Thread 3364349.3370113]
 
Thread 1 hit Hardware watchpoint 2: -location log_file_op
 
Old value = 
    (void (*)(ulint, int, const byte *, ulint, const byte *, ulint)) 0x564cd6b88ad0 <backup_file_op(ulint, int, byte const*, ulint, byte const*, ulint)>
New value = 
    (void (*)(ulint, int, const byte *, ulint, const byte *, ulint)) 0x564cd6b88220 <backup_file_op_fail(ulint, int, byte const*, ulint, byte const*, ulint)>
CorruptedPages::backup_fix_ddl (this=this@entry=0x7ffcee594510, 
    ds_data=ds_data@entry=0x564cd9efb428, ds_meta=ds_meta@entry=0x564cd9efb428)
    at /data/Server/bb-10.6-primary-corruptionA/extra/mariabackup/xtrabackup.cc:5019
5019		pthread_mutex_unlock(&backup_mutex);
(rr) continue
Continuing.
[00] 2023-07-26 15:06:25 Copying ./test/table0_innodb_int.ibd to /data/results/1690368524/268/1_clone/data/test/table0_innodb_int.new
[00] 2023-07-26 15:06:25         ...done
[00] 2023-07-26 15:06:25 Copying ./test/t1.ibd to /data/results/1690368524/268/1_clone/data/test/t1.new
[00] 2023-07-26 15:06:26         ...done
[00] 2023-07-26 15:06:26 Copying ./test/FTS_0000000000000188_00000000000001f4_INDEX_1.ibd to /data/results/1690368524/268/1_clone/data/test/FTS_0000000000000188_00000000000001f4_INDEX_1.new
[00] 2023-07-26 15:06:26         ...done
[00] 2023-07-26 15:06:26 Copying ./test/FTS_0000000000000188_00000000000001f4_INDEX_2.ibd to /data/results/1690368524/268/1_clone/data/test/FTS_0000000000000188_00000000000001f4_INDEX_2.new
[00] 2023-07-26 15:06:26         ...done
[00] 2023-07-26 15:06:26 Copying ./test/FTS_0000000000000188_00000000000001f4_INDEX_3.ibd to /data/results/1690368524/268/1_clone/data/test/FTS_0000000000000188_00000000000001f4_INDEX_3.new
[00] 2023-07-26 15:06:26         ...done
[00] 2023-07-26 15:06:26 Copying ./test/FTS_0000000000000188_00000000000001f4_INDEX_4.ibd to /data/results/1690368524/268/1_clone/data/test/FTS_0000000000000188_00000000000001f4_INDEX_4.new
[00] 2023-07-26 15:06:26         ...done
[00] 2023-07-26 15:06:26 Copying ./test/FTS_0000000000000188_00000000000001f4_INDEX_5.ibd to /data/results/1690368524/268/1_clone/data/test/FTS_0000000000000188_00000000000001f4_INDEX_5.new
[00] 2023-07-26 15:06:26         ...done
[00] 2023-07-26 15:06:26 Copying ./test/FTS_0000000000000188_00000000000001f4_INDEX_6.ibd to /data/results/1690368524/268/1_clone/data/test/FTS_0000000000000188_00000000000001f4_INDEX_6.new
[Switching to Thread 3364349.3370113]
 
Thread 5 hit Breakpoint 1, backup_file_op_fail (space_id=380, type=144, 
    name=0x2c6e66877005 "./test/FTS_", '0' <repeats 13 times>, "188_", '0' <repeats 13 times>, "1f4_INDEX_1.ibd", len=56, new_name=0x0, new_len=0)
    at /data/Server/bb-10.6-primary-corruptionA/extra/mariabackup/xtrabackup.cc:968
968	{
(rr) up
#1  0x0000564cd71c682c in recv_sys_t::parse (this=<optimized out>, 
    checkpoint_lsn=<optimized out>, store=<optimized out>, 
    apply=<optimized out>)
    at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/log/log0recv.cc:2879
2879	          log_file_op(space_id, b & 0xf0,
(rr) backtrace
#0  backup_file_op_fail (space_id=380, type=144, name=0x2c6e66877005 "./test/FTS_", '0' <repeats 13 times>, "188_", '0' <repeats 13 times>, "1f4_INDEX_1.ibd", len=56, new_name=0x0, new_len=0)
    at /data/Server/bb-10.6-primary-corruptionA/extra/mariabackup/xtrabackup.cc:968
#1  0x0000564cd71c682c in recv_sys_t::parse (this=<optimized out>, checkpoint_lsn=<optimized out>, store=<optimized out>, apply=<optimized out>)
    at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/log/log0recv.cc:2879
#2  0x0000564cd6b81fe0 in xtrabackup_copy_log (last=false, end_lsn=15922688, start_lsn=15921664) at /data/Server/bb-10.6-primary-corruptionA/extra/mariabackup/xtrabackup.cc:3097
#3  xtrabackup_copy_logfile (last=false) at /data/Server/bb-10.6-primary-corruptionA/extra/mariabackup/xtrabackup.cc:3169
#4  0x0000564cd6b82353 in log_copying_thread () at /data/Server/bb-10.6-primary-corruptionA/extra/mariabackup/xtrabackup.cc:3218
#5  0x0000419e1df74de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x000065850a5d4609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00006ea553b4f293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The record is FILE_DELETE (0x90), deleting a file that used to belong to the FULLTEXT INDEX of a table.

There is also a trace of the server that was being backed up:

ssh pluto
rr replay /data/results/1690368524/MDEV-22250/1_clone/rr/mariadb-backup-0

break mtr_t::log_file_op
cond 1 type==FILE_DELETE && $_regex(path,".*0001f4_INDEX_1.ibd")
continue
backtrace

The record was written during the rollback of ALTER TABLE t1 ADD FULLTEXT KEY ftidx ( col4 ) /* E_R Thread11 QNO 34 CON_ID 25 */:

10.6 b102872ad50cce5959ad95369740766d14e9e48c

Thread 31 hit Breakpoint 1, mtr_t::log_file_op (this=0x7fb8efdb6470, type=FILE_DELETE, space_id=380, 
    path=0x29016401ba10 "./test/FTS_", '0' <repeats 13 times>, "188_", '0' <repeats 13 times>, "1f4_INDEX_1.ibd", new_path=0x0) at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/fil/fil0fil.cc:1517
1517	{
(rr) bt
#0  mtr_t::log_file_op (this=0x7fb8efdb6470, type=FILE_DELETE, space_id=380, path=0x29016401ba10 "./test/FTS_", '0' <repeats 13 times>, "188_", '0' <repeats 13 times>, "1f4_INDEX_1.ibd", new_path=0x0)
    at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/fil/fil0fil.cc:1517
#1  0x000055e257f70931 in fil_delete_tablespace (id=id@entry=380) at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/fil/fil0fil.cc:1699
#2  0x000055e257f6c36b in trx_t::commit (this=this@entry=0x140b4400a680, deleted=std::vector of length 0, capacity 0) at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/dict/drop.cc:278
#3  0x000055e257e39a82 in commit_unlock_and_unlink (trx=0x140b4400a680) at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/handler/handler0alter.cc:4339
#4  0x000055e257e48210 in rollback_inplace_alter_table (ha_alter_info=ha_alter_info@entry=0x7fb8efdb6c60, table=<optimized out>, prebuilt=<optimized out>)
    at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/handler/handler0alter.cc:9038
#5  0x000055e257e435f2 in ha_innobase::commit_inplace_alter_table (this=0x29016401a730, altered_table=0x7fb8efdb6d20, ha_alter_info=0x7fb8efdb6c60, commit=<optimized out>)
    at /data/Server/bb-10.6-primary-corruptionA/storage/innobase/handler/handler0alter.cc:11015
#6  0x000055e257a27557 in mysql_inplace_alter_table (target_mdl_request=0x7fb8efdb7620, alter_ctx=0x7fb8efdb87a0, trigger_param=0x7fb8efdb7110, ddl_log_state=0x7fb8efdb6c00, ha_alter_info=0x7fb8efdb6c60, 
    altered_table=0x7fb8efdb6d20, table=0x7a624017878, table_list=0x2901640108a0, thd=0x290164000c58) at /data/Server/bb-10.6-primary-corruptionA/sql/sql_table.cc:7583
#7  mysql_alter_table (thd=thd@entry=0x290164000c58, new_db=new_db@entry=0x290164005500, new_name=new_name@entry=0x290164005958, create_info=create_info@entry=0x7fb8efdb95b0, table_list=<optimized out>, 
    table_list@entry=0x2901640108a0, recreate_info=recreate_info@entry=0x7fb8efdb94a0, alter_info=0x7fb8efdb94c0, order_num=0, order=0x0, ignore=false, if_exists=false)
    at /data/Server/bb-10.6-primary-corruptionA/sql/sql_table.cc:10354
#8  0x000055e257a876d2 in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x290164000c58) at /data/Server/bb-10.6-primary-corruptionA/sql/structs.h:568
#9  0x000055e257982aa6 in mysql_execute_command (thd=0x290164000c58, is_called_from_prepared_stmt=<optimized out>) at /data/Server/bb-10.6-primary-corruptionA/sql/sql_parse.cc:6014
#10 0x000055e257972df6 in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x290164000c58) at /data/Server/bb-10.6-primary-corruptionA/sql/sql_parse.cc:8041

I believe that the problem is that backup locks may fail to adequately block DDL operations. Possibly, some or all FILE_DELETE log records may need to be ignored in backup_file_op_fail().

Comment by Marko Mäkelä [ 2023-12-08 ]

Something similar had once been fixed in MDEV-13825.

Generated at Thu Feb 08 09:13:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.