Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.1.9, 10.1(EOL)
-
None
-
Linux
-
10.1.11, 10.1.12, 10.1.13
Description
When using encrypted XtraDB tables there is a "good" chance to run into checksum errors in recovery when killing mysqld hard after large inserts on a table (e.g. restoring a mysqldump or doing a large INSERT ... SELECT).
How to reproduce:
- enable encryption by default
- insert a lot of data
- "killall -9 mysqld" right afterwards while changed tablespace data is still written back to disk
- in about 1 of 3 cases I got a page checksum error when restarting the server
e.g.:
2016-01-14 16:04:47 139797272356736 [Note] InnoDB: Starting crash recovery.
|
2016-01-14 16:04:47 139797272356736 [Note] InnoDB: Reading tablespace information from the .ibd files...
|
2016-01-14 16:04:47 139797272356736 [Note] InnoDB: Restoring possible half-written data pages
|
2016-01-14 16:04:47 139797272356736 [Note] InnoDB: from the doublewrite buffer...
|
InnoDB: Doing recovery: scanned up to log sequence number 34316812800
|
InnoDB: Doing recovery: scanned up to log sequence number 34322055680
|
InnoDB: Doing recovery: scanned up to log sequence number 34327298560
|
InnoDB: Doing recovery: scanned up to log sequence number 34332541440
|
InnoDB: Doing recovery: scanned up to log sequence number 34337784320
|
InnoDB: Doing recovery: scanned up to log sequence number 34343027200
|
InnoDB: Doing recovery: scanned up to log sequence number 34348270080
|
InnoDB: Doing recovery: scanned up to log sequence number 34353512960
|
BROKEN: block: 67102352 checkpoint: 1119 4d783007 94202545
|
BROKEN: block: 67102352 checkpoint: 1119 4d783007 94202545
|
InnoDB: Log block no 67102352 at lsn 34356403712 has
|
InnoDB: ok header, but checksum field contains 2485134661, should be 1299722247
|
2016-01-14 16:04:48 139797272356736 [ERROR] InnoDB: Redo log crypto: failed to decrypt log block. Reason could be that requested key version is not found, required encryption key management plugin is not found or configured encryption algorithm and/or method does not match.
|
2016-01-14 16:04:48 139797272356736 [ERROR] Plugin 'InnoDB' init function returned error.
|
2016-01-14 16:04:48 139797272356736 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
|
2016-01-14 16:04:48 139797272356736 [Note] Plugin 'FEEDBACK' is disabled.
|
2016-01-14 16:04:48 139797272356736 [ERROR] Unknown/unsupported storage engine: InnoDB
|
2016-01-14 16:04:48 139797272356736 [ERROR] Aborting
|
Full dump of crashed instance base- and datadir is in http://php-groupies.de/encryption-crash.tar.gz (~8GB)
Attachments
Issue Links
- causes
-
MDEV-12041 Implement key rotation for innodb_encrypt_log
-
- Closed
-
- is part of
-
MDEV-9774 Checksum errors on restart when killing busy instance that uses encrypted XtraDB tables with <10.1.13
-
- Closed
-
- relates to
-
MDEV-13044 Do not create a redo log checkpoint at startup
-
- Closed
-
Activity
It seems to happen only if InnoDB log encryption is enabled. I tried multiple times, but could not get the error without log encryption.
Table encryption does not matter.
Reproducible easily with the test below on a non-debug build – from sources or the binary tarball, all the same. I could not reproduce it on a debug build, maybe it's just too slow, or maybe there is some other difference in InnoDB behavior.
To reproduce manually
Start server with all defaults plus
--innodb_encrypt_log --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$BASEDIR/mysql-test/std_data/keys.txt
Feed the following sql file through MySQL client (be careful – at the last step it kills all mysqld instances, if you run other ones, you might want to change it):
DROP TABLE IF EXISTS t1, t2;
|
CREATE TABLE t1 (
|
pk bigint auto_increment,
|
col_int int,
|
col_int_key int,
|
col_char char(12),
|
col_char_key char(12),
|
primary key (pk),
|
key (`col_int_key` ),
|
key (`col_char_key` )
|
) ENGINE=InnoDB;
|
CREATE TABLE t2 LIKE t1;
|
|
INSERT INTO t1 VALUES (NULL,1,1,'foo','foo'),(NULL,2,2,'bar','bar'),(NULL,3,3,'baz','baz'),(NULL,4,4,'qux','qux');
|
INSERT INTO t2
|
SELECT NULL, a1.col_int, a1.col_int_key, a1.col_char, a1.col_char_key
|
FROM t1 a1, t1 a2, t1 a3, t1 a4, t1 a5, t1 a6, t1 a7, t1 a8, t1 a9, t1 a10;
|
system killall -s 9 mysqld;
|
Try to restart the server with the same options.
BROKEN: block: 121242 checkpoint: 9 0a024082 96376a3e
|
BROKEN: block: 121242 checkpoint: 9 0a024082 96376a3e
|
InnoDB: Log block no 121242 at lsn 62075392 has
|
InnoDB: ok header, but checksum field contains 2520214078, should be 167919746
|
InnoDB: Set innodb_force_recovery to ignore this error.
|
2016-01-16 18:36:02 139887663318880 [ERROR] Plugin 'InnoDB' init function returned error.
|
2016-01-16 18:36:02 139887663318880 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
|
2016-01-16 18:36:02 139887663318880 [Note] Plugin 'FEEDBACK' is disabled.
|
2016-01-16 18:36:02 139887663318880 [ERROR] Unknown/unsupported storage engine: InnoDB
|
2016-01-16 18:36:02 139887663318880 [ERROR] Aborting
|
To reproduce with MTR
Important note: you need to run the MTR test with the following options:
--mysqld=--innodb --mysqld=--innodb_encrypt_log --mysqld=--plugin-load-add=file_key_management.so --mysqld=--file_key_management_filename=`pwd`/std_data/keys.txt --mysqld=--innodb-buffer-pool-size=128M --mysqld=--innodb-log-file-size=50331648 --mysqld=--performance-schema=0 --mysqld=--default-storage-engine=innodb
|
If you don't do that, MTR enables performance schema in its config, and with performance schema it does not seem to be reproducible (again, maybe it's just slow, or timing is otherwise different).
MTR also sets very low values for innodb_buffer_pool_size and innodb_log_file_size, they don't allow the problem to happen.
Finally, setting InnoDB as default storage engine is just to have the test fail more obviously (so that the server cannot start after the error), otherwise it will just complain about failing to start InnoDB, but will survive.
All these settings are defaults in the server started directly, wtihout MTR, that's why you don't need them in the "manual" variant.
MTR test |
DROP TABLE IF EXISTS t1, t2;
|
CREATE TABLE t1 (
|
pk bigint auto_increment,
|
col_int int,
|
col_int_key int,
|
col_char char(12),
|
col_char_key char(12),
|
primary key (pk),
|
key (`col_int_key` ),
|
key (`col_char_key` )
|
) ENGINE=InnoDB;
|
CREATE TABLE t2 LIKE t1;
|
|
INSERT INTO t1 VALUES (NULL,1,1,'foo','foo'),(NULL,2,2,'bar','bar'),(NULL,3,3,'baz','baz'),(NULL,4,4,'qux','qux');
|
INSERT INTO t2
|
SELECT NULL, a1.col_int, a1.col_int_key, a1.col_char, a1.col_char_key
|
FROM t1 a1, t1 a2, t1 a3, t1 a4, t1 a5, t1 a6, t1 a7, t1 a8, t1 a9, t1 a10;
|
|
--exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
|
--shutdown_server 0
|
--source include/wait_until_disconnected.inc
|
|
--exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
|
--enable_reconnect
|
--source include/wait_until_connected_again.inc
|
|
DROP TABLE t1, t2;
|
Result:
bug.t3 [ fail ]
|
Test ended at 2016-01-16 21:37:35
|
|
CURRENT_TEST: bug.t3
|
|
|
Server [mysqld.1 - pid: 17331, winpid: 17331, exit: 256] failed during test run
|
Server log from this test:
|
----------SERVER LOG START-----------
|
2016-01-16 21:37:15 139968639919968 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld (mysqld 10.1.10-MariaDB-log) starting as process 17300 ...
|
2016-01-16 21:37:15 139968639919968 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: The InnoDB memory heap is disabled
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Memory barrier is not used
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Compressed tables use zlib 1.2.3
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using Linux native AIO
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using generic crc32 instructions
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Initializing buffer pool, size = 128.0M
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Completed initialization of buffer pool
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: The first specified data file ./ibdata1 did not exist: a new database to be created!
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Setting file ./ibdata1 size to 12 MB
|
2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Database physically writes the file full: wait...
|
2016-01-16 21:37:16 139968639919968 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB
|
2016-01-16 21:37:17 139968639919968 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
|
2016-01-16 21:37:18 139968639919968 [Warning] InnoDB: New log files created, LSN=45883
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Doublewrite buffer not found: creating new
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Doublewrite buffer created
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: 128 rollback segment(s) are active.
|
2016-01-16 21:37:18 139968639919968 [Warning] InnoDB: Creating foreign key constraint system tables.
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Foreign key constraint system tables created
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Creating tablespace and datafile system tables.
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Tablespace and datafile system tables created.
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Waiting for purge to start
|
2016-01-16 21:37:18 139968639919968 [Note] InnoDB: 5.6.27 started; log sequence number 0
|
2016-01-16 21:37:18 139968273970944 [Note] InnoDB: Dumping buffer pool(s) not yet started
|
2016-01-16 21:37:18 139968639919968 [Note] Plugin 'FEEDBACK' is disabled.
|
2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-startup-interval=20'
|
2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-first-interval=60'
|
2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-interval=60'
|
2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300'
|
2016-01-16 21:37:18 139968639919968 [Note] Server socket created on IP: '127.0.0.1'.
|
2016-01-16 21:37:18 139968639919968 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: ready for connections.
|
Version: '10.1.10-MariaDB-log' socket: '/data/releases/mariadb-10.1.10-linux-x86_64/mysql-test/var/tmp/mysqld.1.sock' port: 16000 MariaDB Server
|
2016-01-16 21:37:35 140548601800544 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld (mysqld 10.1.10-MariaDB-log) starting as process 17332 ...
|
2016-01-16 21:37:35 140548601800544 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using mutexes to ref count buffer pool pages
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: The InnoDB memory heap is disabled
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Memory barrier is not used
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Compressed tables use zlib 1.2.3
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using Linux native AIO
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using generic crc32 instructions
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Initializing buffer pool, size = 128.0M
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Completed initialization of buffer pool
|
2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Highest supported file format is Barracuda.
|
BROKEN: block: 118788 checkpoint: 3 78ed177a cbbf551b
|
BROKEN: block: 118788 checkpoint: 3 78ed177a cbbf551b
|
InnoDB: Log block no 118788 at lsn 60818944 has
|
InnoDB: ok header, but checksum field contains 3418314011, should be 2028803962
|
InnoDB: Set innodb_force_recovery to ignore this error.
|
2016-01-16 21:37:35 140548601800544 [ERROR] Plugin 'InnoDB' init function returned error.
|
2016-01-16 21:37:35 140548601800544 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
|
2016-01-16 21:37:35 140548601800544 [Note] Plugin 'FEEDBACK' is disabled.
|
2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-startup-interval=20'
|
2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-first-interval=60'
|
2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-interval=60'
|
2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300'
|
2016-01-16 21:37:35 140548601800544 [ERROR] Unknown/unsupported storage engine: innodb
|
2016-01-16 21:37:35 140548601800544 [ERROR] Aborting
|
|
----------SERVER LOG END-------------
|
mysqltest failed but provided no output
|
Also reproducible with InnoDB plugin.
Yes, I tried with encryption disabled, too, but could not reproduce it there. (The customer who originally reported this could also only reproduce this with encryption enabled).
I also only got it on 10.1.9 so far (about once in every three to five attempts), with 10.1.10 I haven't seen it yet in about 50 attempts, but once I got this instead:
InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
|
2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Block in space_id 59 in file test/t2 encrypted.
|
2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match.
|
2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
|
2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Block in space_id 59 in file test/t2 encrypted.
|
2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match.
|
2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
|
160116 17:46:11 [ERROR] mysqld got signal 11 ;
|
test/t2 was the table I imported data into.
The configuration for my tests was:
[mysqld]
|
user=mysql
|
plugin-load-add=file_key_management.so
|
file-key-management
|
file-key-management-filename = /vagrant/keyfile.txt
|
innodb-encrypt-tables
|
innodb-encrypt-log
|
innodb-encryption-threads=4
|
and the keyfile looked like this:
1;770A8A65DA156D24EE2A093277530142
|
18;F5502320F8429037B8DAEF761B189D12F5502320F8429037B8DAEF761B189D12
|
Firstly, I strongly advice that you should use kill -9 mysqld only as a last resort for production database. Now the assertion reported is really a bug. There is following problem: checkpoint can contain up to 5 log crypt data e.g.
write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ]
|
so over time we could have a checkpoint
write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ]
|
killall -9 mysqld causes crash recovery and on crash recovery we read only few latest checkpoints in this case
read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
|
read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
|
This is problematic, as we could still scan log blocks e.g. from checkpoint 4
CRYPT INFO: for checkpoint 14 search 4
|
CRYPT INFO: for checkpoint 13 search 4
|
CRYPT INFO: for checkpoint 12 search 4
|
CRYPT INFO: for checkpoint 11 search 4
|
CRYPT INFO: for checkpoint 10 search 4
|
CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND)
|
BROKEN: block: 118788 checkpoint: 4 3e08c592 ad23ce73
|
BROKEN: block: 118788 checkpoint: 4 3e08c592 ad23ce73
|
InnoDB: Log block no 118788 at lsn 60818944 has
|
InnoDB: ok header, but checksum field contains 2904804979, should be 1040762258
|
Calculated checkpoint is incorrect as log block is still encrypted as we did not find log crypt data from read checkpoints. Problem is on function recv_scan_log_recs() (log0recv.cc) because we do log_block_checksum_is_ok_or_old_format(log_block) even to log records that are not really parsed (actual position is found later).
I will fix this by not checking checksums for log blocks that are not really parsed but this is also problematic as there is no way to know is log block encrypted or corrupted for sure.
"Firstly, I strongly advice that you should use kill -9 mysqld only as a last resort for production database"
I think I forgot to mention that the original reporter also mentioned:
"It sometimes can happen after a regular shutdown, but not always."
I assume that this could be related to innodb_fast_shutdown=1?
kill -9 was only used as a way to reproduce this in a rather reliable way, it would obviously not be used in a production environment. Still power failuers or e.g. out-of-memory kills could have the same effect ...
Change that will avoid hitting the issue:
commit f448a800e173980c18023934f662fe7b8f7e2b9f
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Thu Mar 17 16:24:49 2016 +0200
MDEV-9422: Checksum errors on restart when killing busy instance that uses encrypted XtraDB tables
Analysis:
– InnoDB has n (>0) redo-log files.
– In the first page of redo-log there is 2 checkpoint records on fixed location (checkpoint is not encrypted)
– On every checkpoint record there is up to 5 crypt_keys containing the keys used for encryption/decryption
– On crash recovery we read all checkpoints on every file
– Recovery starts by reading from the latest checkpoint forward
– Problem is that latest checkpoint might not always contain the key we need to decrypt all the
redo-log blocks (see MDEV-9422 for one example)
– Furthermore, there is no way to identify is the log block corrupted or encrypted
For example checkpoint can contain following keys :
write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ]
so over time we could have a checkpoint
write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ]
killall -9 mysqld causes crash recovery and on crash recovery we read as
many checkpoints as there is log files, e.g.
read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
This is problematic, as we could still scan log blocks e.g. from checkpoint 4 and we do
not know anymore the correct key.
CRYPT INFO: for checkpoint 14 search 4
CRYPT INFO: for checkpoint 13 search 4
CRYPT INFO: for checkpoint 12 search 4
CRYPT INFO: for checkpoint 11 search 4
CRYPT INFO: for checkpoint 10 search 4
CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND)
For every checkpoint, code generated a new encrypted key based on key
from encryption plugin and random numbers. Only random numbers are
stored on checkpoint.
Fix: Generate only one key for every log file. If checkpoint contains only
one key, use that key to encrypt/decrypt all log blocks. If checkpoint
contains more than one key (this is case for databases created
using MariaDB server version 10.1.0 - 10.1.12 if log encryption was
used). If looked checkpoint_no is found from keys on checkpoint we use
that key to decrypt the log block. For encryption we use always the
first key. If the looked checkpoint_no is not found from keys on checkpoint
we use the first key.
Modified code also so that if log is not encrypted, we do not generate
any empty keys. If we have a log block and no keys is found from
checkpoint we assume that log block is unencrypted. Log corruption or
missing keys is found by comparing log block checksums. If we have
a keys but current log block checksum is correct we again assume
log block to be unencrypted. This is because current implementation
stores checksum only before encryption and new checksum after
encryption but before disk write is not stored anywhere.
As far as I can tell, there was a design problem with the innodb_encrypt_log feature. The log-based recovery in InnoDB or XtraDB works by finding the latest completed redo log checkpoint and then processing the redo log from that point onwards. The decryption key(s) for processing all the redo log up to the last completely written redo log page must be present either in the checkpoint page or in the log pages that are being read.
In the redo log encryption of MariaDB 10.1, only the log checkpoint page is storing decryption key metadata, so all the log records from that point onwards must be encrypted with the available keys.
There was a problem with encryption key rotation, and the solution that was implemented in this work was to disable the key rotation altogether (write the metadata for only one key in the log checkpoint page). I have filed MDEV-12041 for re-enabling the key rotation in a way that works (write a special redo log record and pad the log block on key rotation).
As far as I can tell, the problem with redo log key rotation was that we should have patched the previously written log checkpoint page to contain all the encryption keys. I am not sure if this was actually done, or if we instead would have scheduled the keys to be written to the next checkpoint page (which the crash recovery would never get to see). In any case, because there are only 5 slots in the checkpoint page, it was possible that if key rotation is done more frequently than log checkpoints, we could run out of slots in the checkpoint page. I also feel that this form of redo log key rotation would have required some special logic in XtraBackup to poll the checkpoint pages for new keys.
Did you try what happens with similar setup but using encryption disabled ?