Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9422

Encrypted redo log checksum errors on restart after killing busy XtraDB instance

Details

    • 10.1.11, 10.1.12, 10.1.13

    Description

      When using encrypted XtraDB tables there is a "good" chance to run into checksum errors in recovery when killing mysqld hard after large inserts on a table (e.g. restoring a mysqldump or doing a large INSERT ... SELECT).

      How to reproduce:

      • enable encryption by default
      • insert a lot of data
      • "killall -9 mysqld" right afterwards while changed tablespace data is still written back to disk
      • in about 1 of 3 cases I got a page checksum error when restarting the server

      e.g.:

      2016-01-14 16:04:47 139797272356736 [Note] InnoDB: Starting crash recovery.
      2016-01-14 16:04:47 139797272356736 [Note] InnoDB: Reading tablespace information from the .ibd files...
      2016-01-14 16:04:47 139797272356736 [Note] InnoDB: Restoring possible half-written data pages
      2016-01-14 16:04:47 139797272356736 [Note] InnoDB: from the doublewrite buffer...
      InnoDB: Doing recovery: scanned up to log sequence number 34316812800
      InnoDB: Doing recovery: scanned up to log sequence number 34322055680
      InnoDB: Doing recovery: scanned up to log sequence number 34327298560
      InnoDB: Doing recovery: scanned up to log sequence number 34332541440
      InnoDB: Doing recovery: scanned up to log sequence number 34337784320
      InnoDB: Doing recovery: scanned up to log sequence number 34343027200
      InnoDB: Doing recovery: scanned up to log sequence number 34348270080
      InnoDB: Doing recovery: scanned up to log sequence number 34353512960
      BROKEN: block: 67102352 checkpoint: 1119 4d783007 94202545
      BROKEN: block: 67102352 checkpoint: 1119 4d783007 94202545
      InnoDB: Log block no 67102352 at lsn 34356403712 has
      InnoDB: ok header, but checksum field contains 2485134661, should be 1299722247
      2016-01-14 16:04:48 139797272356736 [ERROR] InnoDB: Redo log crypto: failed to decrypt log block. Reason could be that requested key version is not found, required encryption key management plugin is not found or configured encryption algorithm and/or method does not match.
      2016-01-14 16:04:48 139797272356736 [ERROR] Plugin 'InnoDB' init function returned error.
      2016-01-14 16:04:48 139797272356736 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2016-01-14 16:04:48 139797272356736 [Note] Plugin 'FEEDBACK' is disabled.
      2016-01-14 16:04:48 139797272356736 [ERROR] Unknown/unsupported storage engine: InnoDB
      2016-01-14 16:04:48 139797272356736 [ERROR] Aborting

      Full dump of crashed instance base- and datadir is in http://php-groupies.de/encryption-crash.tar.gz (~8GB)

      Attachments

        Issue Links

          Activity

            Did you try what happens with similar setup but using encryption disabled ?

            jplindst Jan Lindström (Inactive) added a comment - Did you try what happens with similar setup but using encryption disabled ?
            elenst Elena Stepanova added a comment - - edited

            It seems to happen only if InnoDB log encryption is enabled. I tried multiple times, but could not get the error without log encryption.
            Table encryption does not matter.

            Reproducible easily with the test below on a non-debug build – from sources or the binary tarball, all the same. I could not reproduce it on a debug build, maybe it's just too slow, or maybe there is some other difference in InnoDB behavior.

            To reproduce manually

            Start server with all defaults plus
            --innodb_encrypt_log --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$BASEDIR/mysql-test/std_data/keys.txt

            Feed the following sql file through MySQL client (be careful – at the last step it kills all mysqld instances, if you run other ones, you might want to change it):

            DROP TABLE IF EXISTS t1, t2;
            CREATE TABLE t1 (
              pk bigint auto_increment,
              col_int int,
              col_int_key int,
              col_char char(12),
              col_char_key char(12),
              primary key (pk),
              key (`col_int_key` ),
              key (`col_char_key` )
            ) ENGINE=InnoDB;
            CREATE TABLE t2 LIKE t1;
             
            INSERT INTO t1 VALUES (NULL,1,1,'foo','foo'),(NULL,2,2,'bar','bar'),(NULL,3,3,'baz','baz'),(NULL,4,4,'qux','qux');
            INSERT INTO t2 
              SELECT NULL, a1.col_int, a1.col_int_key, a1.col_char, a1.col_char_key 
                FROM t1 a1, t1 a2, t1 a3, t1 a4, t1 a5, t1 a6, t1 a7, t1 a8, t1 a9, t1 a10;
            system killall -s 9 mysqld;

            Try to restart the server with the same options.

            BROKEN: block: 121242 checkpoint: 9 0a024082 96376a3e
            BROKEN: block: 121242 checkpoint: 9 0a024082 96376a3e
            InnoDB: Log block no 121242 at lsn 62075392 has
            InnoDB: ok header, but checksum field contains 2520214078, should be 167919746
            InnoDB: Set innodb_force_recovery to ignore this error.
            2016-01-16 18:36:02 139887663318880 [ERROR] Plugin 'InnoDB' init function returned error.
            2016-01-16 18:36:02 139887663318880 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2016-01-16 18:36:02 139887663318880 [Note] Plugin 'FEEDBACK' is disabled.
            2016-01-16 18:36:02 139887663318880 [ERROR] Unknown/unsupported storage engine: InnoDB
            2016-01-16 18:36:02 139887663318880 [ERROR] Aborting

            To reproduce with MTR

            Important note: you need to run the MTR test with the following options:

            --mysqld=--innodb --mysqld=--innodb_encrypt_log --mysqld=--plugin-load-add=file_key_management.so --mysqld=--file_key_management_filename=`pwd`/std_data/keys.txt --mysqld=--innodb-buffer-pool-size=128M  --mysqld=--innodb-log-file-size=50331648  --mysqld=--performance-schema=0 --mysqld=--default-storage-engine=innodb

            If you don't do that, MTR enables performance schema in its config, and with performance schema it does not seem to be reproducible (again, maybe it's just slow, or timing is otherwise different).
            MTR also sets very low values for innodb_buffer_pool_size and innodb_log_file_size, they don't allow the problem to happen.
            Finally, setting InnoDB as default storage engine is just to have the test fail more obviously (so that the server cannot start after the error), otherwise it will just complain about failing to start InnoDB, but will survive.
            All these settings are defaults in the server started directly, wtihout MTR, that's why you don't need them in the "manual" variant.

            MTR test

            DROP TABLE IF EXISTS t1, t2;
            CREATE TABLE t1 (
              pk bigint auto_increment,
              col_int int,
              col_int_key int,
              col_char char(12),
              col_char_key char(12),
              primary key (pk),
              key (`col_int_key` ),
              key (`col_char_key` )
            ) ENGINE=InnoDB;
            CREATE TABLE t2 LIKE t1;
             
            INSERT INTO t1 VALUES (NULL,1,1,'foo','foo'),(NULL,2,2,'bar','bar'),(NULL,3,3,'baz','baz'),(NULL,4,4,'qux','qux');
            INSERT INTO t2 
              SELECT NULL, a1.col_int, a1.col_int_key, a1.col_char, a1.col_char_key 
                FROM t1 a1, t1 a2, t1 a3, t1 a4, t1 a5, t1 a6, t1 a7, t1 a8, t1 a9, t1 a10;
             
            --exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
            --shutdown_server 0
            --source include/wait_until_disconnected.inc
             
            --exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect
            --enable_reconnect
            --source include/wait_until_connected_again.inc
             
            DROP TABLE t1, t2;

            Result:

            bug.t3                                   [ fail ]
                    Test ended at 2016-01-16 21:37:35
             
            CURRENT_TEST: bug.t3
             
             
            Server [mysqld.1 - pid: 17331, winpid: 17331, exit: 256] failed during test run
            Server log from this test:
            ----------SERVER LOG START-----------
            2016-01-16 21:37:15 139968639919968 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld (mysqld 10.1.10-MariaDB-log) starting as process 17300 ...
            2016-01-16 21:37:15 139968639919968 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using mutexes to ref count buffer pool pages
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: The InnoDB memory heap is disabled
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Memory barrier is not used
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Compressed tables use zlib 1.2.3
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using Linux native AIO
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using generic crc32 instructions
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Initializing buffer pool, size = 128.0M
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Completed initialization of buffer pool
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: The first specified data file ./ibdata1 did not exist: a new database to be created!
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Setting file ./ibdata1 size to 12 MB
            2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Database physically writes the file full: wait...
            2016-01-16 21:37:16 139968639919968 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB
            2016-01-16 21:37:17 139968639919968 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
            2016-01-16 21:37:18 139968639919968 [Warning] InnoDB: New log files created, LSN=45883
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Doublewrite buffer not found: creating new
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Doublewrite buffer created
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: 128 rollback segment(s) are active.
            2016-01-16 21:37:18 139968639919968 [Warning] InnoDB: Creating foreign key constraint system tables.
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Foreign key constraint system tables created
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Creating tablespace and datafile system tables.
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Tablespace and datafile system tables created.
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Waiting for purge to start
            2016-01-16 21:37:18 139968639919968 [Note] InnoDB: 5.6.27 started; log sequence number 0
            2016-01-16 21:37:18 139968273970944 [Note] InnoDB: Dumping buffer pool(s) not yet started
            2016-01-16 21:37:18 139968639919968 [Note] Plugin 'FEEDBACK' is disabled.
            2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-startup-interval=20'
            2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-first-interval=60'
            2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-interval=60'
            2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300'
            2016-01-16 21:37:18 139968639919968 [Note] Server socket created on IP: '127.0.0.1'.
            2016-01-16 21:37:18 139968639919968 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: ready for connections.
            Version: '10.1.10-MariaDB-log'  socket: '/data/releases/mariadb-10.1.10-linux-x86_64/mysql-test/var/tmp/mysqld.1.sock'  port: 16000  MariaDB Server
            2016-01-16 21:37:35 140548601800544 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld (mysqld 10.1.10-MariaDB-log) starting as process 17332 ...
            2016-01-16 21:37:35 140548601800544 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162)
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using mutexes to ref count buffer pool pages
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: The InnoDB memory heap is disabled
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Memory barrier is not used
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Compressed tables use zlib 1.2.3
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using Linux native AIO
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using generic crc32 instructions
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Initializing buffer pool, size = 128.0M
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Completed initialization of buffer pool
            2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Highest supported file format is Barracuda.
            BROKEN: block: 118788 checkpoint: 3 78ed177a cbbf551b
            BROKEN: block: 118788 checkpoint: 3 78ed177a cbbf551b
            InnoDB: Log block no 118788 at lsn 60818944 has
            InnoDB: ok header, but checksum field contains 3418314011, should be 2028803962
            InnoDB: Set innodb_force_recovery to ignore this error.
            2016-01-16 21:37:35 140548601800544 [ERROR] Plugin 'InnoDB' init function returned error.
            2016-01-16 21:37:35 140548601800544 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
            2016-01-16 21:37:35 140548601800544 [Note] Plugin 'FEEDBACK' is disabled.
            2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-startup-interval=20'
            2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-first-interval=60'
            2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-interval=60'
            2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300'
            2016-01-16 21:37:35 140548601800544 [ERROR] Unknown/unsupported storage engine: innodb
            2016-01-16 21:37:35 140548601800544 [ERROR] Aborting
             
            ----------SERVER LOG END-------------
            mysqltest failed but provided no output

            Also reproducible with InnoDB plugin.

            elenst Elena Stepanova added a comment - - edited It seems to happen only if InnoDB log encryption is enabled. I tried multiple times, but could not get the error without log encryption. Table encryption does not matter. Reproducible easily with the test below on a non-debug build – from sources or the binary tarball, all the same. I could not reproduce it on a debug build, maybe it's just too slow, or maybe there is some other difference in InnoDB behavior. To reproduce manually Start server with all defaults plus --innodb_encrypt_log --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$BASEDIR/mysql-test/std_data/keys.txt Feed the following sql file through MySQL client (be careful – at the last step it kills all mysqld instances, if you run other ones, you might want to change it): DROP TABLE IF EXISTS t1, t2; CREATE TABLE t1 ( pk bigint auto_increment, col_int int, col_int_key int, col_char char(12), col_char_key char(12), primary key (pk), key (`col_int_key` ), key (`col_char_key` ) ) ENGINE=InnoDB; CREATE TABLE t2 LIKE t1;   INSERT INTO t1 VALUES (NULL,1,1,'foo','foo'),(NULL,2,2,'bar','bar'),(NULL,3,3,'baz','baz'),(NULL,4,4,'qux','qux'); INSERT INTO t2 SELECT NULL, a1.col_int, a1.col_int_key, a1.col_char, a1.col_char_key FROM t1 a1, t1 a2, t1 a3, t1 a4, t1 a5, t1 a6, t1 a7, t1 a8, t1 a9, t1 a10; system killall -s 9 mysqld; Try to restart the server with the same options. BROKEN: block: 121242 checkpoint: 9 0a024082 96376a3e BROKEN: block: 121242 checkpoint: 9 0a024082 96376a3e InnoDB: Log block no 121242 at lsn 62075392 has InnoDB: ok header, but checksum field contains 2520214078, should be 167919746 InnoDB: Set innodb_force_recovery to ignore this error. 2016-01-16 18:36:02 139887663318880 [ERROR] Plugin 'InnoDB' init function returned error. 2016-01-16 18:36:02 139887663318880 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2016-01-16 18:36:02 139887663318880 [Note] Plugin 'FEEDBACK' is disabled. 2016-01-16 18:36:02 139887663318880 [ERROR] Unknown/unsupported storage engine: InnoDB 2016-01-16 18:36:02 139887663318880 [ERROR] Aborting To reproduce with MTR Important note: you need to run the MTR test with the following options: --mysqld=--innodb --mysqld=--innodb_encrypt_log --mysqld=--plugin-load-add=file_key_management.so --mysqld=--file_key_management_filename=`pwd`/std_data/keys.txt --mysqld=--innodb-buffer-pool-size=128M --mysqld=--innodb-log-file-size=50331648 --mysqld=--performance-schema=0 --mysqld=--default-storage-engine=innodb If you don't do that, MTR enables performance schema in its config, and with performance schema it does not seem to be reproducible (again, maybe it's just slow, or timing is otherwise different). MTR also sets very low values for innodb_buffer_pool_size and innodb_log_file_size , they don't allow the problem to happen. Finally, setting InnoDB as default storage engine is just to have the test fail more obviously (so that the server cannot start after the error), otherwise it will just complain about failing to start InnoDB, but will survive. All these settings are defaults in the server started directly, wtihout MTR, that's why you don't need them in the "manual" variant. MTR test DROP TABLE IF EXISTS t1, t2; CREATE TABLE t1 ( pk bigint auto_increment, col_int int, col_int_key int, col_char char(12), col_char_key char(12), primary key (pk), key (`col_int_key` ), key (`col_char_key` ) ) ENGINE=InnoDB; CREATE TABLE t2 LIKE t1;   INSERT INTO t1 VALUES (NULL,1,1,'foo','foo'),(NULL,2,2,'bar','bar'),(NULL,3,3,'baz','baz'),(NULL,4,4,'qux','qux'); INSERT INTO t2 SELECT NULL, a1.col_int, a1.col_int_key, a1.col_char, a1.col_char_key FROM t1 a1, t1 a2, t1 a3, t1 a4, t1 a5, t1 a6, t1 a7, t1 a8, t1 a9, t1 a10;   --exec echo "wait" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect --shutdown_server 0 --source include/wait_until_disconnected.inc   --exec echo "restart" > $MYSQLTEST_VARDIR/tmp/mysqld.1.expect --enable_reconnect --source include/wait_until_connected_again.inc   DROP TABLE t1, t2; Result: bug.t3 [ fail ] Test ended at 2016-01-16 21:37:35   CURRENT_TEST: bug.t3     Server [mysqld.1 - pid: 17331, winpid: 17331, exit: 256] failed during test run Server log from this test: ----------SERVER LOG START----------- 2016-01-16 21:37:15 139968639919968 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld (mysqld 10.1.10-MariaDB-log) starting as process 17300 ... 2016-01-16 21:37:15 139968639919968 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162) 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: The InnoDB memory heap is disabled 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Memory barrier is not used 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Compressed tables use zlib 1.2.3 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using Linux native AIO 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Using generic crc32 instructions 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Initializing buffer pool, size = 128.0M 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Completed initialization of buffer pool 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: The first specified data file ./ibdata1 did not exist: a new database to be created! 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Setting file ./ibdata1 size to 12 MB 2016-01-16 21:37:15 139968639919968 [Note] InnoDB: Database physically writes the file full: wait... 2016-01-16 21:37:16 139968639919968 [Note] InnoDB: Setting log file ./ib_logfile101 size to 48 MB 2016-01-16 21:37:17 139968639919968 [Note] InnoDB: Setting log file ./ib_logfile1 size to 48 MB 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0 2016-01-16 21:37:18 139968639919968 [Warning] InnoDB: New log files created, LSN=45883 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Doublewrite buffer not found: creating new 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Doublewrite buffer created 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: 128 rollback segment(s) are active. 2016-01-16 21:37:18 139968639919968 [Warning] InnoDB: Creating foreign key constraint system tables. 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Foreign key constraint system tables created 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Creating tablespace and datafile system tables. 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Tablespace and datafile system tables created. 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: Waiting for purge to start 2016-01-16 21:37:18 139968639919968 [Note] InnoDB: 5.6.27 started; log sequence number 0 2016-01-16 21:37:18 139968273970944 [Note] InnoDB: Dumping buffer pool(s) not yet started 2016-01-16 21:37:18 139968639919968 [Note] Plugin 'FEEDBACK' is disabled. 2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-startup-interval=20' 2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-first-interval=60' 2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-interval=60' 2016-01-16 21:37:18 139968639919968 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300' 2016-01-16 21:37:18 139968639919968 [Note] Server socket created on IP: '127.0.0.1'. 2016-01-16 21:37:18 139968639919968 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: ready for connections. Version: '10.1.10-MariaDB-log' socket: '/data/releases/mariadb-10.1.10-linux-x86_64/mysql-test/var/tmp/mysqld.1.sock' port: 16000 MariaDB Server 2016-01-16 21:37:35 140548601800544 [Note] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld (mysqld 10.1.10-MariaDB-log) starting as process 17332 ... 2016-01-16 21:37:35 140548601800544 [Warning] Could not increase number of max_open_files to more than 1024 (request: 4162) 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using mutexes to ref count buffer pool pages 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: The InnoDB memory heap is disabled 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Memory barrier is not used 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Compressed tables use zlib 1.2.3 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using Linux native AIO 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Using generic crc32 instructions 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Initializing buffer pool, size = 128.0M 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Completed initialization of buffer pool 2016-01-16 21:37:35 140548601800544 [Note] InnoDB: Highest supported file format is Barracuda. BROKEN: block: 118788 checkpoint: 3 78ed177a cbbf551b BROKEN: block: 118788 checkpoint: 3 78ed177a cbbf551b InnoDB: Log block no 118788 at lsn 60818944 has InnoDB: ok header, but checksum field contains 3418314011, should be 2028803962 InnoDB: Set innodb_force_recovery to ignore this error. 2016-01-16 21:37:35 140548601800544 [ERROR] Plugin 'InnoDB' init function returned error. 2016-01-16 21:37:35 140548601800544 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed. 2016-01-16 21:37:35 140548601800544 [Note] Plugin 'FEEDBACK' is disabled. 2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-startup-interval=20' 2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-first-interval=60' 2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-feedback-debug-interval=60' 2016-01-16 21:37:35 140548601800544 [Warning] /data/releases/mariadb-10.1.10-linux-x86_64/bin/mysqld: unknown variable 'loose-debug-sync-timeout=300' 2016-01-16 21:37:35 140548601800544 [ERROR] Unknown/unsupported storage engine: innodb 2016-01-16 21:37:35 140548601800544 [ERROR] Aborting   ----------SERVER LOG END------------- mysqltest failed but provided no output Also reproducible with InnoDB plugin.
            hholzgra Hartmut Holzgraefe added a comment - - edited

            Yes, I tried with encryption disabled, too, but could not reproduce it there. (The customer who originally reported this could also only reproduce this with encryption enabled).

            I also only got it on 10.1.9 so far (about once in every three to five attempts), with 10.1.10 I haven't seen it yet in about 50 attempts, but once I got this instead:

            InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 
            2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Block in space_id 59 in file test/t2 encrypted.
            2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match.
            2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
            2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Block in space_id 59 in file test/t2 encrypted.
            2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match.
            2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file.
            160116 17:46:11 [ERROR] mysqld got signal 11 ;

            test/t2 was the table I imported data into.

            The configuration for my tests was:

            [mysqld]
            user=mysql
            plugin-load-add=file_key_management.so
            file-key-management
            file-key-management-filename = /vagrant/keyfile.txt
            innodb-encrypt-tables
            innodb-encrypt-log
            innodb-encryption-threads=4

            and the keyfile looked like this:

            1;770A8A65DA156D24EE2A093277530142
            18;F5502320F8429037B8DAEF761B189D12F5502320F8429037B8DAEF761B189D12

            hholzgra Hartmut Holzgraefe added a comment - - edited Yes, I tried with encryption disabled, too, but could not reproduce it there. (The customer who originally reported this could also only reproduce this with encryption enabled). I also only got it on 10.1.9 so far (about once in every three to five attempts), with 10.1.10 I haven't seen it yet in about 50 attempts, but once I got this instead: InnoDB: Progress in percent: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Block in space_id 59 in file test/t2 encrypted. 2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match. 2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file. 2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Block in space_id 59 in file test/t2 encrypted. 2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: However key management plugin or used key_id 1 is not found or used encryption algorithm or method does not match. 2016-01-16 17:46:11 139641931941632 [ERROR] InnoDB: Marking tablespace as missing. You may drop this table or install correct key management plugin and key file. 160116 17:46:11 [ERROR] mysqld got signal 11 ; test/t2 was the table I imported data into. The configuration for my tests was: [mysqld] user=mysql plugin-load-add=file_key_management.so file-key-management file-key-management-filename = /vagrant/keyfile.txt innodb-encrypt-tables innodb-encrypt-log innodb-encryption-threads=4 and the keyfile looked like this: 1;770A8A65DA156D24EE2A093277530142 18;F5502320F8429037B8DAEF761B189D12F5502320F8429037B8DAEF761B189D12

            Firstly, I strongly advice that you should use kill -9 mysqld only as a last resort for production database. Now the assertion reported is really a bug. There is following problem: checkpoint can contain up to 5 log crypt data e.g.

            write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ]

            so over time we could have a checkpoint

            write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ]

            killall -9 mysqld causes crash recovery and on crash recovery we read only few latest checkpoints in this case

            read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
            read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]

            This is problematic, as we could still scan log blocks e.g. from checkpoint 4

            CRYPT INFO: for checkpoint 14 search 4
            CRYPT INFO: for checkpoint 13 search 4
            CRYPT INFO: for checkpoint 12 search 4
            CRYPT INFO: for checkpoint 11 search 4
            CRYPT INFO: for checkpoint 10 search 4
            CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND)
            BROKEN: block: 118788 checkpoint: 4 3e08c592 ad23ce73
            BROKEN: block: 118788 checkpoint: 4 3e08c592 ad23ce73
            InnoDB: Log block no 118788 at lsn 60818944 has
            InnoDB: ok header, but checksum field contains 2904804979, should be 1040762258

            Calculated checkpoint is incorrect as log block is still encrypted as we did not find log crypt data from read checkpoints. Problem is on function recv_scan_log_recs() (log0recv.cc) because we do log_block_checksum_is_ok_or_old_format(log_block) even to log records that are not really parsed (actual position is found later).

            I will fix this by not checking checksums for log blocks that are not really parsed but this is also problematic as there is no way to know is log block encrypted or corrupted for sure.

            jplindst Jan Lindström (Inactive) added a comment - Firstly, I strongly advice that you should use kill -9 mysqld only as a last resort for production database. Now the assertion reported is really a bug. There is following problem: checkpoint can contain up to 5 log crypt data e.g. write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ] so over time we could have a checkpoint write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] killall -9 mysqld causes crash recovery and on crash recovery we read only few latest checkpoints in this case read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ] read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ] This is problematic, as we could still scan log blocks e.g. from checkpoint 4 CRYPT INFO: for checkpoint 14 search 4 CRYPT INFO: for checkpoint 13 search 4 CRYPT INFO: for checkpoint 12 search 4 CRYPT INFO: for checkpoint 11 search 4 CRYPT INFO: for checkpoint 10 search 4 CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND) BROKEN: block: 118788 checkpoint: 4 3e08c592 ad23ce73 BROKEN: block: 118788 checkpoint: 4 3e08c592 ad23ce73 InnoDB: Log block no 118788 at lsn 60818944 has InnoDB: ok header, but checksum field contains 2904804979, should be 1040762258 Calculated checkpoint is incorrect as log block is still encrypted as we did not find log crypt data from read checkpoints. Problem is on function recv_scan_log_recs() (log0recv.cc) because we do log_block_checksum_is_ok_or_old_format(log_block) even to log records that are not really parsed (actual position is found later). I will fix this by not checking checksums for log blocks that are not really parsed but this is also problematic as there is no way to know is log block encrypted or corrupted for sure.

            "Firstly, I strongly advice that you should use kill -9 mysqld only as a last resort for production database"

            I think I forgot to mention that the original reporter also mentioned:

            "It sometimes can happen after a regular shutdown, but not always."

            I assume that this could be related to innodb_fast_shutdown=1?

            kill -9 was only used as a way to reproduce this in a rather reliable way, it would obviously not be used in a production environment. Still power failuers or e.g. out-of-memory kills could have the same effect ...

            hholzgra Hartmut Holzgraefe added a comment - "Firstly, I strongly advice that you should use kill -9 mysqld only as a last resort for production database" I think I forgot to mention that the original reporter also mentioned: "It sometimes can happen after a regular shutdown, but not always." I assume that this could be related to innodb_fast_shutdown=1? kill -9 was only used as a way to reproduce this in a rather reliable way, it would obviously not be used in a production environment. Still power failuers or e.g. out-of-memory kills could have the same effect ...

            Change that will avoid hitting the issue:

            commit f448a800e173980c18023934f662fe7b8f7e2b9f
            Author: Jan Lindström <jan.lindstrom@mariadb.com>
            Date: Thu Mar 17 16:24:49 2016 +0200

            MDEV-9422: Checksum errors on restart when killing busy instance that uses encrypted XtraDB tables

            Analysis:

            – InnoDB has n (>0) redo-log files.
            – In the first page of redo-log there is 2 checkpoint records on fixed location (checkpoint is not encrypted)
            – On every checkpoint record there is up to 5 crypt_keys containing the keys used for encryption/decryption
            – On crash recovery we read all checkpoints on every file
            – Recovery starts by reading from the latest checkpoint forward
            – Problem is that latest checkpoint might not always contain the key we need to decrypt all the
            redo-log blocks (see MDEV-9422 for one example)
            – Furthermore, there is no way to identify is the log block corrupted or encrypted

            For example checkpoint can contain following keys :

            write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ]

            so over time we could have a checkpoint

            write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ]

            killall -9 mysqld causes crash recovery and on crash recovery we read as
            many checkpoints as there is log files, e.g.

            read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]
            read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ]

            This is problematic, as we could still scan log blocks e.g. from checkpoint 4 and we do
            not know anymore the correct key.

            CRYPT INFO: for checkpoint 14 search 4
            CRYPT INFO: for checkpoint 13 search 4
            CRYPT INFO: for checkpoint 12 search 4
            CRYPT INFO: for checkpoint 11 search 4
            CRYPT INFO: for checkpoint 10 search 4
            CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND)

            For every checkpoint, code generated a new encrypted key based on key
            from encryption plugin and random numbers. Only random numbers are
            stored on checkpoint.

            Fix: Generate only one key for every log file. If checkpoint contains only
            one key, use that key to encrypt/decrypt all log blocks. If checkpoint
            contains more than one key (this is case for databases created
            using MariaDB server version 10.1.0 - 10.1.12 if log encryption was
            used). If looked checkpoint_no is found from keys on checkpoint we use
            that key to decrypt the log block. For encryption we use always the
            first key. If the looked checkpoint_no is not found from keys on checkpoint
            we use the first key.

            Modified code also so that if log is not encrypted, we do not generate
            any empty keys. If we have a log block and no keys is found from
            checkpoint we assume that log block is unencrypted. Log corruption or
            missing keys is found by comparing log block checksums. If we have
            a keys but current log block checksum is correct we again assume
            log block to be unencrypted. This is because current implementation
            stores checksum only before encryption and new checksum after
            encryption but before disk write is not stored anywhere.

            jplindst Jan Lindström (Inactive) added a comment - Change that will avoid hitting the issue: commit f448a800e173980c18023934f662fe7b8f7e2b9f Author: Jan Lindström <jan.lindstrom@mariadb.com> Date: Thu Mar 17 16:24:49 2016 +0200 MDEV-9422 : Checksum errors on restart when killing busy instance that uses encrypted XtraDB tables Analysis: – InnoDB has n (>0) redo-log files. – In the first page of redo-log there is 2 checkpoint records on fixed location (checkpoint is not encrypted) – On every checkpoint record there is up to 5 crypt_keys containing the keys used for encryption/decryption – On crash recovery we read all checkpoints on every file – Recovery starts by reading from the latest checkpoint forward – Problem is that latest checkpoint might not always contain the key we need to decrypt all the redo-log blocks (see MDEV-9422 for one example) – Furthermore, there is no way to identify is the log block corrupted or encrypted For example checkpoint can contain following keys : write chk: 4 [ chk key ]: [ 5 1 ] [ 4 1 ] [ 3 1 ] [ 2 1 ] [ 1 1 ] so over time we could have a checkpoint write chk: 13 [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] killall -9 mysqld causes crash recovery and on crash recovery we read as many checkpoints as there is log files, e.g. read [ chk key ]: [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ] read [ chk key ]: [ 14 1 ] [ 13 1 ] [ 12 1 ] [ 11 1 ] [ 10 1 ] [ 9 1 ] This is problematic, as we could still scan log blocks e.g. from checkpoint 4 and we do not know anymore the correct key. CRYPT INFO: for checkpoint 14 search 4 CRYPT INFO: for checkpoint 13 search 4 CRYPT INFO: for checkpoint 12 search 4 CRYPT INFO: for checkpoint 11 search 4 CRYPT INFO: for checkpoint 10 search 4 CRYPT INFO: for checkpoint 9 search 4 (NOTE: NOT FOUND) For every checkpoint, code generated a new encrypted key based on key from encryption plugin and random numbers. Only random numbers are stored on checkpoint. Fix: Generate only one key for every log file. If checkpoint contains only one key, use that key to encrypt/decrypt all log blocks. If checkpoint contains more than one key (this is case for databases created using MariaDB server version 10.1.0 - 10.1.12 if log encryption was used). If looked checkpoint_no is found from keys on checkpoint we use that key to decrypt the log block. For encryption we use always the first key. If the looked checkpoint_no is not found from keys on checkpoint we use the first key. Modified code also so that if log is not encrypted, we do not generate any empty keys. If we have a log block and no keys is found from checkpoint we assume that log block is unencrypted. Log corruption or missing keys is found by comparing log block checksums. If we have a keys but current log block checksum is correct we again assume log block to be unencrypted. This is because current implementation stores checksum only before encryption and new checksum after encryption but before disk write is not stored anywhere.

            As far as I can tell, there was a design problem with the innodb_encrypt_log feature. The log-based recovery in InnoDB or XtraDB works by finding the latest completed redo log checkpoint and then processing the redo log from that point onwards. The decryption key(s) for processing all the redo log up to the last completely written redo log page must be present either in the checkpoint page or in the log pages that are being read.
            In the redo log encryption of MariaDB 10.1, only the log checkpoint page is storing decryption key metadata, so all the log records from that point onwards must be encrypted with the available keys.

            There was a problem with encryption key rotation, and the solution that was implemented in this work was to disable the key rotation altogether (write the metadata for only one key in the log checkpoint page). I have filed MDEV-12041 for re-enabling the key rotation in a way that works (write a special redo log record and pad the log block on key rotation).

            As far as I can tell, the problem with redo log key rotation was that we should have patched the previously written log checkpoint page to contain all the encryption keys. I am not sure if this was actually done, or if we instead would have scheduled the keys to be written to the next checkpoint page (which the crash recovery would never get to see). In any case, because there are only 5 slots in the checkpoint page, it was possible that if key rotation is done more frequently than log checkpoints, we could run out of slots in the checkpoint page. I also feel that this form of redo log key rotation would have required some special logic in XtraBackup to poll the checkpoint pages for new keys.

            marko Marko Mäkelä added a comment - As far as I can tell, there was a design problem with the innodb_encrypt_log feature. The log-based recovery in InnoDB or XtraDB works by finding the latest completed redo log checkpoint and then processing the redo log from that point onwards. The decryption key(s) for processing all the redo log up to the last completely written redo log page must be present either in the checkpoint page or in the log pages that are being read. In the redo log encryption of MariaDB 10.1, only the log checkpoint page is storing decryption key metadata, so all the log records from that point onwards must be encrypted with the available keys. There was a problem with encryption key rotation, and the solution that was implemented in this work was to disable the key rotation altogether (write the metadata for only one key in the log checkpoint page). I have filed MDEV-12041 for re-enabling the key rotation in a way that works (write a special redo log record and pad the log block on key rotation). As far as I can tell, the problem with redo log key rotation was that we should have patched the previously written log checkpoint page to contain all the encryption keys. I am not sure if this was actually done, or if we instead would have scheduled the keys to be written to the next checkpoint page (which the crash recovery would never get to see). In any case, because there are only 5 slots in the checkpoint page, it was possible that if key rotation is done more frequently than log checkpoints, we could run out of slots in the checkpoint page. I also feel that this form of redo log key rotation would have required some special logic in XtraBackup to poll the checkpoint pages for new keys.

            People

              jplindst Jan Lindström (Inactive)
              hholzgra Hartmut Holzgraefe
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.