[MDEV-25998] InnoDB removes the tablespace from default encrypt list early Created: 2021-06-23  Updated: 2021-07-27  Resolved: 2021-07-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4

Type: Bug Priority: Critical
Reporter: Thirunarayanan Balathandayuthapani Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-14398 When innodb_encryption_rotate_key_age... Closed

 Description   

Scenario is like the following:

  • When one of the encryption thread does decryption of tablespace. (flushing of rotated pages happening)
  • Test case sets INNODB_ENCRYPT_TABLES = 1
  • Other encryption thread does check whether the space is eligible for key rotation
    and removes it from default encrypt list.

 
encryption.innodb_encryption_filekeys 'cbc,innodb' w2 [ fail ]
        Test ended at 2021-06-21 18:39:22
 
CURRENT_TEST: encryption.innodb_encryption_filekeys
mysqltest: At line 116: Timeout waiting for encryption threads
 
CREATE TABLE t2 (pk INT PRIMARY KEY AUTO_INCREMENT, c VARCHAR(256)) ENGINE=INNODB ENCRYPTED=YES;
CREATE TABLE t3 (pk INT PRIMARY KEY AUTO_INCREMENT, c VARCHAR(256)) ENGINE=INNODB ENCRYPTED=NO;
CREATE TABLE t4 (pk INT PRIMARY KEY AUTO_INCREMENT, c VARCHAR(256)) ENGINE=INNODB ENCRYPTED=YES ENCRYPTION_KEY_ID=4;
SET GLOBAL innodb_encrypt_tables = on;
# Wait max 10 min for key encryption threads to encrypt required all spaces
SELECT NAME,ENCRYPTION_SCHEME,MIN_KEY_VERSION, ROTATING_OR_FLUSHING FROM INFORMATION_SCHEMA.INNODB_TABLESPACES_ENCRYPTION;
NAME    ENCRYPTION_SCHEME       MIN_KEY_VERSION ROTATING_OR_FLUSHING
innodb_system   0       0       0
mysql/innodb_table_stats        1       1       0
mysql/innodb_index_stats        1       1       0
mysql/transaction_registry      1       1       0
test/t1 1       1       0
test/t2 1       1       0
test/t3 0       0       0
test/t4 1       1       0
SHOW STATUS LIKE 'innodb_encryption%';
Variable_name   Value
Innodb_encryption_rotation_pages_read_from_cache        3065
Innodb_encryption_rotation_pages_read_from_disk 32
Innodb_encryption_rotation_pages_modified       2958
Innodb_encryption_rotation_pages_flushed        3274
Innodb_encryption_rotation_estimated_iops       64159
Innodb_encryption_key_rotation_list_length      0
Innodb_encryption_n_merge_blocks_encrypted      0
Innodb_encryption_n_merge_blocks_decrypted      0
Innodb_encryption_n_rowlog_blocks_encrypted     0
Innodb_encryption_n_rowlog_blocks_decrypted     0

Solution could be the only the last active encryption thread working on the thread
should be allowed to remove the tablespace from default encrypt list.
Right now, the Workaround can be set global innodb_encrypt_tables= ON/OFF; as per user wish.



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2021-06-23 ]

#0  __GI___pthread_getspecific (key=1) at pthread_getspecific.c:30
#1  0x000055d3c9a8ab78 in my_get_thread_local (key=1) at /home/thiru/mariarepo/10.6/10.6-sample/storage/perfschema/my_thread.h:41
#2  0x000055d3c9a8ac60 in my_thread_get_THR_PFS () at /home/thiru/mariarepo/10.6/10.6-sample/storage/perfschema/pfs.cc:1371
#3  0x000055d3c9a8ea8b in pfs_start_mutex_wait_v1 (state=0x7f1bba7fb720, mutex=0x7f1bf27655c0, op=PSI_MUTEX_LOCK, 
    src_file=0x55d3ca6208e0 "/home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/buf/buf0flu.cc", src_line=1522)
    at /home/thiru/mariarepo/10.6/10.6-sample/storage/perfschema/pfs.cc:2627
#4  0x000055d3ca005215 in psi_mutex_lock (that=0x55d3cae4fc00 <buf_pool>, 
    file=0x55d3ca6208e0 "/home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/buf/buf0flu.cc", line=1522)
    at /home/thiru/mariarepo/10.6/10.6-sample/mysys/my_thr_init.c:484
#5  0x000055d3c9e641b9 in inline_mysql_mutex_lock (that=0x55d3cae4fc00 <buf_pool>, 
    src_file=0x55d3ca6208e0 "/home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/buf/buf0flu.cc", src_line=1522)
    at /home/thiru/mariarepo/10.6/10.6-sample/include/mysql/psi/mysql_thread.h:746
#6  0x000055d3c9e6ab82 in buf_flush_list_space (space=0x55d3cbe15768, n_flushed=0x7f1bba7fb7f8)
    at /home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/buf/buf0flu.cc:1522
#7  0x000055d3c9ee5ee2 in fil_crypt_flush_space (state=0x7f1bba7fbd60)
    at /home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/fil/fil0crypt.cc:1972
#8  0x000055d3c9ee6386 in fil_crypt_complete_rotate_space (state=0x7f1bba7fbd60)
    at /home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/fil/fil0crypt.cc:2059
#9  0x000055d3c9ee6658 in fil_crypt_thread () at /home/thiru/mariarepo/10.6/10.6-sample/storage/innobase/fil/fil0crypt.cc:2131
#10 0x000055d3c9e702d0 in std::__invoke_impl<void, void (*)()> (__f=@0x7f1bb41f1be8: 0x55d3c9ee6419 <fil_crypt_thread()>)
    at /usr/include/c++/8/bits/invoke.h:60
#11 0x000055d3c9e700b2 in std::__invoke<void (*)()> (__fn=@0x7f1bb41f1be8: 0x55d3c9ee6419 <fil_crypt_thread()>)
    at /usr/include/c++/8/bits/invoke.h:95
#12 0x000055d3c9e70844 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x7f1bb41f1be8)
    at /usr/include/c++/8/thread:244
#13 0x000055d3c9e7081a in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x7f1bb41f1be8)
    at /usr/include/c++/8/thread:253
#14 0x000055d3c9e707fe in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x7f1bb41f1be0)
    at /usr/include/c++/8/thread:196
#15 0x00007f1bf0d3bd80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#16 0x00007f1bf124a6db in start_thread (arg=0x7f1bba7fc700) at pthread_create.c:463
#17 0x00007f1bf03e571f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Above stack trace when innodb_encrypt_tables is 1.
Other encryption thread is skipping the tablespace to encrypt because of the following code:

In fil_crypt_space_needs_rotation():
 
                /* prevent threads from starting to rotate space */
                if (crypt_data->rotate_state.starting) {
                        /* recheck this space later */
                        *recheck = true;
                        break;
                }
 
                /* prevent threads from starting to rotate space */
                if (space->is_stopping()) {
                        break;
                }
 
                if (crypt_data->rotate_state.flushing) {
                        break;
                }

Comment by Elena Stepanova [ 2021-07-06 ]

Raising the priority because the test is failing very frequently in buildbot (10-20 times a day on average), it needs to be fixed.

Comment by Thirunarayanan Balathandayuthapani [ 2021-07-08 ]

Added the patch to avoid the failures in buildbot as of now. It is a workaround patch to stop the noise in buildbot. Patch
is in bb-10.2-MDEV-25998, bb-10.6-MDEV-25998

Comment by Marko Mäkelä [ 2021-07-20 ]

I see that the 10.2 version is a test-only change, while the 10.6 version is a code-only change.

In the 10.6 code-only change, the new function fil_crypt_space_remove_list() is reading srv_encrypt_tables multiple times, which looks suboptimal to me, but should not be a correctness issue, because both that function and SET GLOBAL innodb_encrypt_tables are protected by fil_system.mutex. I would appreciate an assertion to document the mutex ownership. That change passed 3,400 runs of encryption.innodb_encryption_keys without any problems for me. So, it should be OK to push after some cleanup.

Generated at Thu Feb 08 09:41:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.