[MDEV-22097] Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption Created: 2020-03-31  Updated: 2020-04-08  Resolved: 2020-04-08

Status: Closed
Project: MariaDB Server
Component/s: mariabackup, Storage Engine - InnoDB
Affects Version/s: 10.5.2
Fix Version/s: 10.5.3

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: recovery

Issue Links:
Duplicate
duplicates MDEV-22139 fseg_free_page_low() fails to write F... Closed
Problem/Incident
is caused by MDEV-15528 Avoid writing freed InnoDB pages Closed
Relates
relates to MDEV-12353 Efficient InnoDB redo log record format Closed
relates to MDEV-22110 InnoDB unnecessarily writes unmodifie... Closed
relates to MDEV-22096 Mariabackup copied too old page or to... Closed

 Description   

In our test environment, we got a case where mariabackup --prepare would fail to apply log due to corruption on a ROW_FORMAT=REDUNDANT page.

With a debugger, I collected the following records (start_lsn, lsn, log record data) for the page from log_phys_t::apply():

start_lsn lsn data
1871057 1873747 3509817f0b5a
1883172 1885878 3609817f0e016b
1883172 1885878 3609817f300000
1883172 1885878 3809817f2c03da068eb20702
1883172 1885878 b23603
1883172 1885878 b489770074
1883172 1885878 3609817f801200*
1914429 1917880 3509817f0b3a
1947460 1947479 3609817f869b20
1947460 1947479 ba0d06420f00000379051a
1947597 1947624 3909817f89c2ffffffff
1947597 1947624 b30a0000
1947624 1947631 2509817f0800

The problem was detected at the last record, DELETE_ROW_FORMAT_REDUNDANT of the first user record on the page. Earlier in the log, the link to the page supremum record in the last record of the page (there were only two records) was replaced with a null pointer, by the last record listed for start_lsn=1883172.

Apparently, some log record was not written.

We had a rr record of the server process, but unfortunately it became unusable due to some change in the environment (probably related to updating shared libraries when upgrading the operating system). We must repeat this bug again so that it can be analyzed with rr replay.



 Comments   
Comment by Matthias Leich [ 2020-03-31 ]

Workflow of the test
1. Create DB server and start it
2. One session starts to run some SQL stream (DML and DDL) against this server
3. Concurrent to the ongoing 2. 
     extra/mariabackup/mariabackup --host=127.0.0.1 --user=root --password=''  --innodb-use-native-aio=0  ... --backup --datadir=<vardir of RQG runner>/1/data --target-dir=<vardir of RQG runner>/1_clone/data
      gets executed.
4. After that and also concurrent to 2. but not touching the server started in 1. some
     extra/mariabackup/mariabackup --host=127.0.0.1 --user=root --password=''  --innodb-use-native-aio=0  --port=26704 --prepare --target-dir=<vardir of RQG runner>/1_clone/data
     gets executed. And that fails.
 
Just one example of some historic replay (messages written by mariabackup ... prepare)
2020-03-30 13:30:41 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4711661
2020-03-30 13:30:41 0 [Note] InnoDB: Starting final batch to recover 168 pages from redo log.
2020-03-30 13:30:41 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption on [page id: space=9, page number=70]
2020-03-30 13:30:41 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
2020-03-30 13:30:41 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1574] with error Data structure corruption
[00] FATAL ERROR: 2020-03-30 13:30:42 mariabackup: innodb_init() returned 39 (Data structure corruption).
 
 
 
 
[ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption on [page id: space=9, page number=70]
perl rqg.pl \ 
--grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--reporters=Mariabackup \
# 2020-03-30T13:29:19 [91427] # --mysqld=--innodb_use_native_aio=1 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--innodb_stats_persistent=off \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--interactive_timeout=28800 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--slave_net_timeout=60 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--net_read_timeout=30 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--net_write_timeout=60 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--loose-table_lock_wait_timeout=50 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--wait_timeout=28800 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--lock-wait-timeout=86400 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--innodb-lock-wait-timeout=50 \
# 2020-03-30T13:29:19 [91427] # --no-mask \
# 2020-03-30T13:29:19 [91427] # --queries=10000000 \
# 2020-03-30T13:29:19 [91427] # --duration=300 \
# 2020-03-30T13:29:19 [91427] # --seed=random \
# 2020-03-30T13:29:19 [91427] # --reporters=Backtrace \
# 2020-03-30T13:29:19 [91427] # --reporters=ErrorLog \
# 2020-03-30T13:29:19 [91427] # --reporters=Deadlock1 \
# 2020-03-30T13:29:19 [91427] # --validators=None \
# 2020-03-30T13:29:19 [91427] # --mysqld=--log_output=none \
# 2020-03-30T13:29:19 [91427] # --mysqld=--log-bin \
# 2020-03-30T13:29:19 [91427] # --mysqld=--log_bin_trust_function_creators=1 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--loose-max-statement-time=30 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# 2020-03-30T13:29:19 [91427] # --engine=InnoDB \
# 2020-03-30T13:29:19 [91427] # --mysqld=--innodb-buffer-pool-size=32M \
# 2020-03-30T13:29:19 [91427] # --threads=2 \
# 2020-03-30T13:29:19 [91427] # --mysqld=--innodb_page_size=16K \
# 2020-03-30T13:29:19 [91427] # --no_mask \
# 2020-03-30T13:29:19 [91427] # --workdir=/home/mleich/RQG_mleich/storage/1585573778/106 \
# 2020-03-30T13:29:19 [91427] # --vardir=/dev/shm/vardir/1585573778/106 \
# 2020-03-30T13:29:19 [91427] # --mtr-build-thread=835 \
# 2020-03-30T13:29:19 [91427] # --basedir1=/home/mleich/10.5-enterprise-rebase/bld_asan \
# 2020-03-30T13:29:19 [91427] # --basedir2=/home/mleich/10.5-enterprise-rebase/bld_asan \
# 2020-03-30T13:29:19 [91427] # --script_debug=_nix_
 
Pattern for error tagging in RQG
[ 'MDEV-22097' , 'Starting InnoDB instance for recovery.{1,200}mariabackup:.{1,2000}\[ERROR\] InnoDB: Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption on' ],

Comment by Marko Mäkelä [ 2020-04-01 ]

New day, new trace. This time, it involved a server kill and restart. The page 9:296 fails to recover. It looks like a clustered index leaf page in a ROW_FORMAT=REDUNDANT table, with innodb_page_size=4k. Here are the records that were applied to the page during recovery, until the error:

start_lsn end_lsn log records
2411639 2411658 360980a8869b20
2411639 2411658 ba0d088f0f0000037a06e0
2414056 2414099 390980a889c2ffffffff
2414056 2414099 b30a0000
2414099 2414106 250980a8082e
2422365 2423226 2082bc0980a805003800004331801d001d001900150011000a00048000002500000000089d990000039c03688000002580000025ffffffff33<repeats 768 times>00<repeats 20 times>
2423226 2424612 370980a889d00501
2423226 2424612 360980a889c109b302016a
2423226 2424612 b20326
2431671 2431687 470980a886a50600b20080
2431671 2431687 c3000600
2537685 2537704 360980a889e220
2537685 2537704 ba0d091d6e0000036b06a2
2542557 2542584 390980a88d09ffffffff
2542557 2542584 b30a0000
2542584 2542608 260980a808863c

There is at least one preceding INSERT operation that is inserting a record with initially null BLOB pointer.

It is the second page_apply_delete_redundant() that fails. The reason is that the logged prev_rec is the very last record in the page, pointing to the page supremum. We must be missing a log record write.

I have been debugging this all day. The rr replay is good, and in it, the trace appears to start the server right after bootstrap. We do have all the necessary information there. What remains to be done is to find out at which point the page images are diverging between recovery and the initial server run. If the data file turns out to be already out of sync at the point where the recovery starts, we are out of luck and will need a better trace. I did not yet check this.

Debugging was greatly slowed down by MDEV-22110 that I initially suspected could occasionally break crash recovery, but it is only causing unnecessary page writes (performance degradation).

Comment by Marko Mäkelä [ 2020-04-01 ]

At the start of recovery, we have FIL_PAGE_LSN=1809537 on the page. In rr replay, I extracted the page from the buffer pool at this logical time. The image matches the one on recovery, except for the last 8 bytes (the least significant 4 bytes of LSN and the CRC-32C were not yet initialized):

10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d

Thread 23 hit Hardware watchpoint 3: -location $2.frame[16]@8
 
Old value = "\000\000\000\000\000\033!~"
New value = "\000\000\000\000\000\033\234~"
mach_write_to_4 (b=0x7f0a612c0014 "", n=1809537)
    at /home/mleich/10.5/storage/innobase/include/mach0data.ic:166
166		b[3] = (byte) n;
1: log_sys.lsn._M_i = 1809537
2: /x $2.frame[16]@8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x1b, 0x9c, 0x7e}
(rr) when
Current event: 393198

I will continue debugging from this point tomorrow. The goal is to dump the page image after each recovery apply step, and to find the corresponding page image in rr replay, and to compare them. Once we find a mismatch, we will know the logical time frame where a redo log record was wrongly omitted. I already found out that the discrepancy is present at LSN=2537685 (the start of the mini-transaction that delete-marks a clustered index leaf page record and updates its DB_TRX_ID,DB_ROLL_PTR). So, I will have to check at most 10 page images.

Comment by Marko Mäkelä [ 2020-04-02 ]

The first write that is not found on recovery is the very first one after the LSN=1809537:

10.5 718f18599a9bcf1e7c2d3f18416fca4f7124d00d

Thread 34 hit Hardware watchpoint 2: -location $2.frame[8]@8
 
Old value = "\000\000\001'\000\000\001,"
New value = "\000\000\001\234\000\000\001,"
mtr_t::write<4u, (mtr_t::write_type)1, unsigned long> (this=0x7f0a3c0abf90, 
    block=..., ptr=0x7f0a612c0008, val=412)
    at /home/mleich/10.5/storage/innobase/include/mtr0log.h:211
211	             (ut_align_offset(p, srv_page_size)), p, end - p);
1: /x $2.frame[16]@8 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x1b, 0x9c, 0x81}
2: log_sys.lsn._M_i = 1851445
3: /x $2.frame[8]@8 = {0x0, 0x0, 0x1, 0x9c, 0x0, 0x0, 0x1, 0x2c}
(rr) bt
#0  mtr_t::write<4u, (mtr_t::write_type)1, unsigned long> (
    this=0x7f0a3c0abf90, block=..., ptr=0x7f0a612c0008, val=412)
    at /home/mleich/10.5/storage/innobase/include/mtr0log.h:211
#1  0x000055f408dd4813 in btr_page_set_prev (block=0x7f0a600702f8, prev=412, 
    mtr=0x7f0a3c0abf90)
    at /home/mleich/10.5/storage/innobase/include/btr0btr.ic:80
#2  0x000055f408dca765 in btr_attach_half_pages (flags=0, 
    index=0x7f09cc383c28, block=0x7f0a600700c0, 
    split_rec=0x7f0a612bf3da "\200", new_block=0x7f0a6008d4e8, direction=111, 
    mtr=0x7f0a3c0abf90)
    at /home/mleich/10.5/storage/innobase/btr/btr0btr.cc:2571
#3  0x000055f408dcbeab in btr_page_split_and_insert (flags=0, 
    cursor=0x7f0a3c0abc30, offsets=0x7f0a3c0abbc8, heap=0x7f0a3c0abbc0, 
    tuple=0x7f09d0057b18, n_ext=1, mtr=0x7f0a3c0abf90)
    at /home/mleich/10.5/storage/innobase/btr/btr0btr.cc:2937
#4  0x000055f408df86da in btr_cur_pessimistic_insert (flags=0, 
    cursor=0x7f0a3c0abc30, offsets=0x7f0a3c0abbc8, heap=0x7f0a3c0abbc0, 
    entry=0x7f09d0057b18, rec=0x7f0a3c0abbd0, big_rec=0x7f0a3c0abbb8, n_ext=1, 
    thr=0x7f09d005d4d8, mtr=0x7f0a3c0abf90)
    at /home/mleich/10.5/storage/innobase/btr/btr0cur.cc:3753

Redo logging was correctly enabled for the operation, and the reason why we did not apply log was that the latest checkpoint was newer than this mini-transaction:

2020-04-02 13:52:35 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=2336926

Earlier in the rr replay log, I see the following:

# 2020-03-31T19:35:32 [41907] DEBUG: Reporter Mariabackup : Pass

This makes me believe that the broken recovery could somehow involve backup, and we are witnessing MDEV-22096 instead of a ‘genuine’ crash recovery bug.

mleich, if this run really involved backup, then it should be analyzed in the scope of MDEV-22096, to see what exactly the mariadb-backup was reading and when, compared to the page writes of the mariadbd process. The earlier rr record for that bug only included mariadbd.

Comment by Matthias Leich [ 2020-04-06 ]

The second RQG run with rr trace invokes mariabackup. There is no KILL of any server process.
/home/mleich/RQG_RR/storage/MDEV-22097/000472.log

Comment by Marko Mäkelä [ 2020-04-08 ]

While the rr replay trace of this failure was ruined by unattended-upgrades and could not be analyzed further, the fact that this failure is no longer reproducible with a more recent 10.5 strongly suggests that the cause of this failure was that the server failed to write a FREE_PAGE record before omitting a page write. That bug was introduced by MDEV-15528 and fixed in MDEV-22139.

Generated at Thu Feb 08 09:12:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.