[MDEV-31488] Restart on backupped data fails with InnoDB: Failing assertion: !i || prev_id + 1 == space_id at srv0start.cc line 803 Created: 2023-06-16  Updated: 2023-06-29  Resolved: 2023-06-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 11.2.0
Fix Version/s: 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2, 11.2.1

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Duplicate Votes: 2
Labels: None

Issue Links:
Duplicate
duplicates MDEV-31487 Recovery or backup failure after inno... Closed
is duplicated by MDEV-31504 Failing assertion: !i || prev_id + 1 ... Closed
Relates
relates to MDEV-19229 Allow innodb_undo_tablespaces to be c... Closed

 Description   

origin/preview-11.2-preview 79f9e94195dc8a007debd27ec4e8d2c15f094a97 2023-06-13T17:23:05+10:00
Scenario:
1. Start source server, generate some initial data
2. Several sessions run concurrent SQL
3. During 2. is ongoing
    mariabackup --backup data from source server to some other location
    mariabackup --prepare on that other location
    Try to start a DB server on that other location.
    This fails with
    #  2023-06-13 17:32:44 0 [Note] InnoDB: Initializing buffer pool, total size = 8.000MiB, chunk size = 1.000MiB
   #  2023-06-13 17:32:44 0 [Note] InnoDB: Completed initialization of buffer pool
   #  2023-06-13 17:32:44 0 [ERROR] InnoDB: Inconsistent tablespace ID in file .//undo001
   #  2023-06-13 17:32:44 0 [ERROR] InnoDB: Unable to read first page of file .//undo001
   #  2023-06-13 17:32:44 0 [ERROR] InnoDB: Inconsistent tablespace ID in file .//undo002
   #  2023-06-13 17:32:44 0 [ERROR] InnoDB: Unable to read first page of file .//undo002
   #  2023-06-13 17:32:44 0x7f90afc1f7c0  InnoDB: Assertion failure in file /data/Server/preview-11.2-preview/storage/innobase/srv/srv0start.cc line 803
   #  2023-06-13T17:34:05 [3112262] | InnoDB: Failing assertion: !i || prev_id + 1 == space_id
 
# 2023-06-13T17:34:05 [3112262] Thread 1 (Thread 0x7f90afc1f7c0 (LWP 3205380)):
# 2023-06-13T17:34:05 [3112262] #0  __pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
# 2023-06-13T17:34:05 [3112262] #1  0x0000564dc73f8c9f in my_write_core (sig=sig@entry=6) at /data/Server/preview-11.2-preview/mysys/stacktrace.c:424
# 2023-06-13T17:34:05 [3112262] #2  0x0000564dc6fa5220 in handle_fatal_signal (sig=6) at /data/Server/preview-11.2-preview/sql/signal_handler.cc:360
# 2023-06-13T17:34:05 [3112262] #3  <signal handler called>
# 2023-06-13T17:34:05 [3112262] #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
# 2023-06-13T17:34:05 [3112262] #5  0x00007f90afc49859 in __GI_abort () at abort.c:79
# 2023-06-13T17:34:05 [3112262] #6  0x0000564dc6c04db3 in ut_dbg_assertion_failed (expr=expr@entry=0x564dc772be92 "!i || prev_id + 1 == space_id", file=file@entry=0x564dc772b158 "/data/Server/preview-11.2-preview/storage/innobase/srv/srv0start.cc", line=line@entry=803) at /data/Server/preview-11.2-preview/storage/innobase/ut/ut0dbg.cc:60
# 2023-06-13T17:34:05 [3112262] #7  0x0000564dc6bfe915 in srv_all_undo_tablespaces_open (n_undo=<optimized out>, create_new_undo=<optimized out>) at /data/Server/preview-11.2-preview/storage/innobase/srv/srv0start.cc:803
# 2023-06-13T17:34:05 [3112262] #8  srv_undo_tablespaces_init (create_new_undo=<optimized out>, mtr=0x7ffc2ee88650) at /data/Server/preview-11.2-preview/storage/innobase/srv/srv0start.cc:877
# 2023-06-13T17:34:05 [3112262] #9  0x0000564dc6bffaa8 in srv_start (create_new_db=false) at /data/Server/preview-11.2-preview/storage/innobase/srv/srv0start.cc:1422
# 2023-06-13T17:34:05 [3112262] #10 0x0000564dc720434a in innodb_init (p=<optimized out>) at /data/Server/preview-11.2-preview/storage/innobase/handler/ha_innodb.cc:4168
# 2023-06-13T17:34:05 [3112262] #11 0x0000564dc6fa80fa in ha_initialize_handlerton (plugin=0x564dca0c2bf0) at /data/Server/preview-11.2-preview/sql/handler.cc:687
# 2023-06-13T17:34:05 [3112262] #12 0x0000564dc6d7a40e in plugin_initialize (tmp_root=tmp_root@entry=0x7ffc2ee89ab0, plugin=plugin@entry=0x564dca0c2bf0, argc=argc@entry=0x564dc7e759e0 <remaining_argc>, argv=argv@entry=0x564dca00f1c8, options_only=<optimized out>) at /data/Server/preview-11.2-preview/sql/sql_plugin.cc:1465
# 2023-06-13T17:34:05 [3112262] #13 0x0000564dc6d7b313 in plugin_init (argc=argc@entry=0x564dc7e759e0 <remaining_argc>, argv=<optimized out>, flags=0) at /data/Server/preview-11.2-preview/sql/sql_plugin.cc:1758
# 2023-06-13T17:34:05 [3112262] #14 0x0000564dc6c6641d in init_server_components () at /data/Server/preview-11.2-preview/sql/mysqld.cc:5246
# 2023-06-13T17:34:05 [3112262] #15 0x0000564dc6c6b1b7 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /data/Server/preview-11.2-preview/sql/mysqld.cc:5875
# 2023-06-13T17:34:05 [3112262] #16 0x00007f90afc4b0b3 in __libc_start_main (main=0x564dc6c239c0 <main(int, char**)>, argc=37, argv=0x7ffc2ee8c9e8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc2ee8c9d8) at ../csu/libc-start.c:308
# 2023-06-13T17:34:05 [3112262] #17 0x0000564dc6c6208e in _start () at /data/Server/preview-11.2-preview/include/mysql/plugin.h:215
 
pluto:/data/results/1686674959/TBR-1927$ gdb -c ./1_clone/data/core /data/Server_bin/preview-11.2-preview_RelWithDebInfo/bin/mysqld
 
RQG
====
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD a8c8cd1021480e64ef979d326a2543c577d1068a 2023-06-09T18:22:34+02:00
# rqg.pl  : Version 4.2.1 (2022-12)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/table_stress_innodb_dml.yy \
# --gendata=conf/mariadb/table_stress.zz \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --reporters=Mariabackup_linux \
# --mysqld=--loose-innodb-log-file-size=200M \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --no_mask \
# --queries=10000000 \
# --seed=random \
# --reporters=None \
# --reporters=ErrorLog \
# --reporters=Deadlock \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --mysqld=--loose-innodb_compression_level=1 \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--lock-wait-timeout=15 \
# --mysqld=--innodb-lock-wait-timeout=10 \
# --mysqld=--loose-innodb_fast_shutdown=0 \
# --mysqld=--innodb_file_per_table=0 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--innodb_random_read_ahead=OFF \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
# --mysqld=--loose-max-statement-time=30 \
# --threads=33 \
# --mysqld=--innodb_use_native_aio=1 \
# --mysqld=--innodb_undo_log_truncate=ON \
# --mysqld=--innodb_undo_tablespaces=3 \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--innodb-buffer-pool-size=8M \
# <local settings>



 Comments   
Comment by Alex [ 2023-06-20 ]

10.11.4 same problem

Comment by Marko Mäkelä [ 2023-06-26 ]

MariaDB Server 10.11 was the first version to include MDEV-19229.

Comment by Marko Mäkelä [ 2023-06-26 ]

thiru, can you debug this?

ssh pluto
gdb -c /data/results/1686674959/TBR-1927/1_clone/data/core /data/Server_bin/preview-11.2-preview_RelWithDebInfo/bin/mysqld

(gdb) frame 7
#7  0x0000564dc6bfe915 in srv_all_undo_tablespaces_open (
    n_undo=<optimized out>, create_new_undo=<optimized out>)
    at /data/Server/preview-11.2-preview/storage/innobase/srv/srv0start.cc:803
803	    ut_a(!i || prev_id + 1 == space_id);
(gdb) info locals
space_id = 37
i = 1
prev_id = 37

Comment by Thirunarayanan Balathandayuthapani [ 2023-06-27 ]

PAGE_INIT redo for page0 wasn't applied for undo tablespace. This is because we remove the redo log records during trim().

This problem should be fixable by MDEV-31487 patch

Comment by Matthias Leich [ 2023-06-29 ]

Failing assertion: !i || prev_id + 1 == space_id  was only seen once (> 10000 RQG runs with Mariabackup on the tree).
And that was on origin/preview-11.2-preview 7c7de020b576a42bd381a9060c3076f85da0d9d4 2023-06-18T23:49:17+02:00
That problem was not observed during the excessive testing
- 11.2 2867894ac6ca23fc2f82cfad2dd510351f597325 2023-06-28T09:39:28+03:00
- origin/bb-11.2-MDEV-14795 f4ae37836c3a6632a4284be853675bcbe3f565d5 2023-06-28T15:48:11+05:30
But with the low likelihood to replay it given its impossible to tell its reason is now somehow fixed.

Comment by Marko Mäkelä [ 2023-06-29 ]

thiru was able to successfully recover the backup after applying the fix of MDEV-31487. Therefore, I think that we can conclude that this harder-to-reproduce symptom is actually a duplicate of the easier-to-reproduce bug MDEV-31487.

Generated at Thu Feb 08 10:24:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.