[MDEV-24368] Draft: Strange system behaviour around Mariabackup Created: 2020-12-08  Updated: 2022-02-01  Resolved: 2022-02-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Matthias Leich
Resolution: Cannot Reproduce Votes: 0
Labels: rr-profile


 Description   

Problems found during RQG testing on
origin/bb-10.5-MDEV-24350 83591a23d6c2a09d9c24cccbc7d6e20a6feffe86 2020-12-04
 
Workflow:
Some "source" DB server is running and under heavy DDL/DML load.
From time to time starts some sequence of actions
      mariabackup --backup  <of the source DB to some location A>
          In case of error abort the test.
      mariabackup --prepare <on the location A>
          In case of error abort the test. 
          If the size of ib_logfile0  != 0  abort the test.
      mariabackup --prepare <on the location A>
          In case of error abort the test. 
          If the size of ib_logfile0  != 0  abort the test.
      remove ib_logfile0
      Attempt to start the server on the data in location A.
           In case of error abort the test. 
      Run SHOW VARIABLES and after that certain checks on the data.
 
What happened:
mariabackup --backup with success
mariabackup --prepare with success and shortened output
           [00] 2020-12-04 20:03:21 This target seems to be not prepared yet.
           ...
           2020-12-04 20:03:22 0 [Note] InnoDB: Starting final batch to recover 776 pages from redo log.
            [00] 2020-12-04 20:03:34 Last binlog file , position 0
            [00] 2020-12-04 20:03:35 completed OK!
mariabackup --prepare with success and shortened output
            [00] 2020-12-04 20:03:36 This target seems to be already prepared.
            ...    (no messages about recovering or log processing)
            [00] 2020-12-04 20:03:37 Last binlog file , position 0
            [00] 2020-12-04 20:03:38 completed OK! 
start of a DB server on the backupped data with success
Run SHOW VARIABLES and some more SQL's with success
CHECK TABLE `test`.`t2` EXTENDED  leads to server crash and shortened content of server error log
            [Note] InnoDB: Setting log file ./ib_logfile101 size to 100663296 bytes
            [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
            [Note] InnoDB: New log file created, LSN=19059883
            [Note] InnoDB: 2 transaction(s) which must be rolled back or cleaned up in total 3 row operations to undo
PROBLEM1:  Why that? Shouldn't the mariabackup --prepare already have done such stuff.
             ...
             [Note] InnoDB: Rolled back recovered transaction 13194
             ... 
             [Note] InnoDB: Buffer pool(s) load completed at 201204 20:03:39
             [Note] Server socket created on IP: '::'.
             mysqld: /Server/bb-10.5-MDEV-24350/storage/innobase/row/row0purge.cc:132: bool row_purge_remove_clust_if_poss_low(purge_node_t*, ulint): Assertion `rec_get_deleted_flag(rec, rec_offs_comp(offsets))' failed.
 
PROBLEM2:
There were nowhere the usual entries like
              [Note] <some path>/mysqld: ready for connections.
               Version: '10.5.9-MariaDB-debug-log'  socket: ...
but some connect had happened and was running SHOW VARIABLES.
This is IMHO quite bad for support and QA because we need entries which serve as
borders like   this set of actions is impossible before that border but are possible
after that border.
PROBLEM3:
Why causes CHECK TABLE `test`.`t2` EXTENDED the assert?
Maybe its caused by a bug behind PROBLEM1.
 
I am not able to exclude 100% that the RQG machinery maybe handles
something around mariabackup wrong. But this is quite unlikely
because some big fraction of such RQG tests invoking mariabackup just pass.



 Comments   
Comment by Matthias Leich [ 2020-12-08 ]

pluto:/RQG2/storage/1607101477/TBR-282
 
cd  /RQG2/storage/1607101477/TBR-282/dev/shm/vardir/1607101477/117/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-12                  # Call of mariabackup --backup
_RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-13                  # First call of mariabackup --prepare
_RR_TRACE_DIR="." rr replay --mark-stdio mariabackup-14                  # Second call of mariabackup --prepare
 
cd /RQG2/storage/1607101477/TBR-282/dev/shm/vardir/1607101477/117/1_clone/rr
_RR_TRACE_DIR="." rr replay --mark-stdio      # Start of the DB server on backupped data
 
/RQG2/storage/1607101477/TBR-282/dev/shm/vardir/1607101477/117/reporter_tmp.prt
           Output of the 3 mariabackup calls.
 

Comment by Matthias Leich [ 2020-12-08 ]

PROBLEM2 might be caused by the fact that mariabackup runs under "rr --chaos".
In case its impossible to fix that problem inside of the server than some solution
inside of RQG  is thinkable.
Something like wait for the error log entry "[Note] <some path>/mysqld: ready for connections."
before trying to connect first time.

Comment by Matthias Leich [ 2022-02-01 ]

PROBLEM3 was fixed by https://jira.mariadb.org/browse/MDEV-27059.
PROBLEM2 seems to be caused by the fact that certain actions typical for server start or especially recovery are performed asynchronous via several threads.
And every of these threads write its messages when it has reason to do so.
PROBLEM1 seems to be caused by the fact that some actions are postponed to the next server start.

Generated at Thu Feb 08 09:29:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.