[MDEV-14333] Mariabackup --apply-log-only crashes if incomplete transactions with update_undo logs are present Created: 2017-11-09  Updated: 2017-11-09  Resolved: 2017-11-09

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.1.28
Fix Version/s: 10.1.29

Type: Bug Priority: Major
Reporter: Gokhan Demir Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: innodb, mariabackup
Environment:

CentOS 6, 7


Issue Links:
Problem/Incident
is caused by MDEV-13807 mariabackup --apply-log-only does gen... Closed
Relates
relates to MDEV-11985 Make innodb_read_only shutdown more r... Closed
relates to MDEV-13890 mariabackup.xb_compressed_encrypted f... Closed

 Description   

When I am trying to apply-log-only my Incremental backup, I started to see this error frequently, like 1 or 2 times a week. Could not reach information on the Internet. So I am reporting it as a bug.

This error started to appear after I upgraded MariaDB server and backup from 10.1.26 to 10.1.28. Previously, I did not encounter this error.

CurrentIy I cannot provide a full and an incremental backup because database is nearly 400 GB. However, next time the error happens, I can provide desktop sharing if desired.

Here is the stack trace from the logs:

2017-11-08 1:13:10 139887762020384 [Note] InnoDB: Restoring possible half-written data pages from the doublewrite buffer...
2017-11-08 1:13:10 139887762020384 [Note] InnoDB: Read redo log up to LSN=63569216412160
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 9 row operations to undo
InnoDB: Trx id counter is 45387091968
2017-11-08 1:13:21 139887762020384 [Note] InnoDB: Starting final batch to recover 5944 pages from redo log
2017-11-08 1:13:25 139878396319488 [Note] InnoDB: To recover: 5269 pages from log
2017-11-08 1:13:40 139878406809344 [Note] InnoDB: To recover: 3861 pages from log
2017-11-08 1:13:55 139887657346816 [Note] InnoDB: To recover: 738 pages from log
InnoDB: Last MySQL binlog file position 0 23440540, file name ./saturn.000033
InnoDB: Last MySQL binlog file position 0 23440540, file name ./saturn.000033

xtrabackup: Recovered WSREP position: 00000000-0000-0000-0000-000000000000:-1
xtrabackup: starting shutdown with innodb_fast_shutdown = 1
2017-11-08 1:14:00 139887762020384 [Note] InnoDB: Starting shutdown...
2017-11-08 01:14:02 7f3a285d1820 InnoDB: Assertion failure in thread 139887762020384 in file trx0undo.cc line 2022
InnoDB: Failing assertion: srv_read_only_mode || srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
171108 1:14:02 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.28-MariaDB
key_buffer_size=0
read_buffer_size=131072
max_used_connections=0
max_threads=1
thread_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5311 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x48400
mariabackup(my_print_stacktrace+0x2b)[0x7f3a27ec2f2b]
mariabackup(handle_fatal_signal+0x4d5)[0x7f3a27a847a5]
/lib64/libpthread.so.0(+0xf7e0)[0x7f3a26fea7e0]
/lib64/libc.so.6(gsignal+0x35)[0x7f3a25410495]
/lib64/libc.so.6(abort+0x175)[0x7f3a25411c75]
mariabackup(+0x9d3b58)[0x7f3a27deeb58]
mariabackup(+0x9ca72d)[0x7f3a27de572d]
mariabackup(+0x9c6d85)[0x7f3a27de1d85]
mariabackup(+0x9ad1e5)[0x7f3a27dc81e5]
mariabackup(+0x3d43b5)[0x7f3a277ef3b5]
mariabackup(main+0x122d)[0x7f3a277d71cd]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x7f3a253fcd1d]
mariabackup(+0x3d3bc9)[0x7f3a277eebc9]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.



 Comments   
Comment by Marko Mäkelä [ 2017-11-09 ]

The assertion, which was added in MDEV-11985 in MariaDB 10.1.22, is too strict when it comes to Mariabackup.

In mariabackup --apply-log-only, nothing will be done to any recovered transactions; only the redo log will be applied to the data pages.
So, if any transactions were recovered in the active state, they will remain so until shutdown.

I am not sure, but it is possible that the backup would be usable even after this assertion failed. Because Mariabackup 10.1 is invoking XtraDB startup and shutdown several times, it is possible that a critical startup/shutdown step was omitted.

This bug does not affect Mariabackup 10.2, because there the assertion is also checking for srv_was_started, which would not be set in Mariabackup outside the --export function. In Mariabackup 10.1 it can be set, so we will have to relax the assertion in some other way.

Comment by Marko Mäkelä [ 2017-11-09 ]

The InnoDB transaction logs are divided into update_undo and insert_undo.

For insert_undo, I relaxed this assertion in MDEV-13807 which made --apply-log-only work in the first place (before that, it broke its contract by generating redo log on its own). But I somehow failed to realize that we can recover transactions that have both insert_undo and update_undo logs.

Comment by Marko Mäkelä [ 2017-11-09 ]

gokhan, it looks like this error indeed ruins the backup, because the xtrabackup_checkpoints file would not be updated to say "log-applied". The fix is this simple:

diff --git a/storage/xtradb/trx/trx0undo.cc b/storage/xtradb/trx/trx0undo.cc
index 0cc3048e624..3259bcb70b1 100644
--- a/storage/xtradb/trx/trx0undo.cc
+++ b/storage/xtradb/trx/trx0undo.cc
@@ -2025,6 +2025,7 @@ trx_undo_free_prepared(
 			/* lock_trx_release_locks() assigns
 			trx->is_recovered=false */
 			ut_a(srv_read_only_mode
+			     || srv_apply_log_only
 			     || srv_force_recovery >= SRV_FORCE_NO_TRX_UNDO);
 			break;
 		default:

The error indeed was only present in Mariabackup 10.1. I merged the test case to 10.2, and it worked just fine, without any code changes.

In Mariabackup 10.2, there is no --apply-log-only option because it is the default. Only during --export, Mariabackup 10.2 can generate redo log records on its own.

Generated at Thu Feb 08 08:12:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.