[MDEV-16367] mariabackup: error: failed to copy enough redo log Created: 2018-06-01 Updated: 2020-08-25 Resolved: 2018-06-07 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | mariabackup |
| Affects Version/s: | 10.2.15 |
| Fix Version/s: | 10.2.16, 10.3.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Pavel Znamensky | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Environment: |
CentOS Linux release 7.5.1804 |
||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
We've switched from xtrabackup to mariabackup in order to get ability enable compression for some tables. But mariabackup failing with:
We're running backup with this command:
The same problem from someone at stackoverflow: |
| Comments |
| Comment by Marko Mäkelä [ 2018-06-04 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looking at the output:
We copied 63,057,684 bytes of redo log. That does not sound much. And the required amount 754,030,093 does not feel unreasonably large. kompastver, do you have a rough idea how much the InnoDB LSN could have grown during the mariabackup execution? Can you execute
before and after the backup? All these numbers could be interesting:
The following code looks a bit suspicious. I think that we should re-invoke log_group_header_read() if the first checkpoint page of the two blocks contained the latest checkpoint:
But I would find it somewhat hard to believe that there were several hundred megabytes of log records between the two checkpoints. So, we probably need to fix something more. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pavel Znamensky [ 2018-06-05 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It's strange. But the problem occurs on our backup slave though on a production one it works fine.
mariabackup output:
Immediately after the backup:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
kompastver, thank you for the additional details. As expected, the backup starts copying log from the latest checkpoint LSN as it was at that time. For some reason, copying the log terminates prematurely. The backup should have copied about 633,078,858 bytes of redo log, but it stopped after 11,964,065 bytes. The missing log_group_header_read() call that I mentioned earlier could explain the slight discrepancy of the end checkpoint LSN (943,828 bytes of redo log between the 9568584074795 and 9568585018623) in these messages:
I wonder if this problem goes away if you set innodb_log_files_in_group=1. The default and originally the minimum value of this parameter is 2. You may want to multiply innodb_log_file_size by the previous value of innodb_log_files_in_group, so that the redo log size remains unchanged. How big is your innodb_log_file_size, and how big are the non-InnoDB tables? What was the timestamp of copying the last .ibd file before this one?
I identified two potential reasons for the early termination. Can you please apply the following patch and check if either message will occur in the output?
Termination due to the first one indicates that the log copying is having trouble to keep up with the writes by the server. Perhaps the circular redo log should be bigger, so that the server will not overwrite the portion of the log that mariabackup has not yet read. Given that you say that xtrabackup does not suffer from this, it is possible that xtrabackup is copying the log faster, possibly because it does not parse the individual log records in that phase, like mariabackup does since version 10.2.14 ( | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pavel Znamensky [ 2018-06-06 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Bad news.
= 1 =
Before the backup:
mariabackup output:
After the backup:
= 2 =
Before the backup:
mariabackup output:
After the backup:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-06-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I got a test case that repeats this:
When the test spits out the exec line, invoke the mysql command manually from another shell. (I did not know how to start a concurrent background process in mysql-test-run.) The dump that I used was 118,938,281 bytes of SQL. While debugging this, I fixed multiple small bugs. The 'final nail in the coffin' was that the log_copying_thread() would only keep invoking xtrabackup_copy_logfile() if it returned a (fatal) error every time. So, it would effectively only copy an integer multiple of RECV_SCAN_SIZE bytes (4·innodb_page_size) of log, until the end of the log is momentarily reached. Instead, it should be copying the log until it runs out. It turns out that we did not test Mariabackup with large enough datasets. With the bugs fixed, but using a too small innodb_log_file_size, the above test would still fail like this:
Over 240 megabytes of log was read; we stopped some 30 megabytes short. With a larger innodb_log_file_size the backup would finish successfully. I will test my fix a little more before pushing it to the 10.2 branch. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Pavel Znamensky [ 2018-06-07 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
marko, thank you! |