[MDEV-11581] Mariadb starts innodb encryption threads when key has not changed or data scrubbing turned off Created: 2016-12-15  Updated: 2020-08-25  Resolved: 2017-03-14

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1.16, 10.1.19
Fix Version/s: 10.1.23

Type: Bug Priority: Critical
Reporter: Kishor Grandhe Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MDEV-12428 SIGSEGV in buf_page_decrypt_after_rea... Closed
causes MDEV-12467 encryption.create_or_replace hangs du... Closed
causes MDEV-13639 Server crashes in prepare_inplace_alt... Closed
causes MDEV-19111 Unused field INFORMATION_SCHEMA.INNOD... Closed
Relates
relates to MDEV-12602 InnoDB: Failing assertion: space->n_p... Closed
relates to MDEV-14398 When innodb_encryption_rotate_key_age... Closed
relates to MDEV-11738 Mariadb uses 100% of several of my 8 ... Closed
relates to MDEV-11929 During delete: InnoDB: Assertion fail... Closed
relates to MDEV-12694 test failure: encryption.create_or_re... Closed
Sprint: 10.1.21

 Description   

We migrated our application from MySQL 5.6.21 to Mariadb 10.1.16 to use Data at Rest Encryption (DARE) and it caused a major issues and application started stalling for no reason.

When we use DARE on INNODB tables, using the out of the box plugin file_key_management using the configuration on this page https://mariadb.com/kb/en/mariadb/data-at-rest-encryption/, we encountered periodic (in regular intervals i.e. every hour) high CPU and it stalled the system for any use. Even when there is no user/application connections to the server, the CPU spikes happened regularly. No indication in any logs or anywhere on what was happening. It was a debugging nightmare.

We were using innodb_encryption_threads = 4 as indicated in the above page.

On extensive analysis, following was discovered

Mariadb starts the background threads as specified in the innodb_encryption_threads to perform 2 things - data scrubbing i.e. to remove deleted data and to re-encrypt data pages when key is changed.

The issue noted here is even when scrubbing for compressed and uncompressed is turned off and also when there is no key changed for re-encrypt, the background threads starts periodically as defined in the innodb-background-scrub-data-check-interval and hogs the CPU as high as 200% on a 2 core system for nearly 20+ minutes (depending on the data volume) doing "NOTHING" or to say "NOTHING TO BE DONE", and this stalls the CPU and the system is unusable.

Suspect following critical issues
1. No checks are done to see if scrubbing is enabled for compressed or uncompressed data to start the threads.

Below is the config out of box for scrubbing

MariaDB [(none)]> show global variables like '%scrub%';
+---------------------------------------------+--------+
| Variable_name                               | Value  |
+---------------------------------------------+--------+
| innodb_background_scrub_data_check_interval | 3600   |
| innodb_background_scrub_data_compressed     | OFF    |
| innodb_background_scrub_data_interval       | 604800 |
| innodb_background_scrub_data_uncompressed   | OFF    |
| innodb_immediate_scrub_data_uncompressed    | OFF    |
| innodb_scrub_log                            | OFF    |
| innodb_scrub_log_speed                      | 256    |
+---------------------------------------------+--------+ 

2. There is no check to see if the encryption key has changed to start the new threads. Also per the documentation "This plugin does not support key rotation — all keys always have the version 1.", so it gives more reason not to start the encryption threads until a key change is detected.

3. Encryption/Scrubbing Threads are behaving like high priority threads i.e. it hogs the CPU stalling the system i.e. generally any background processes work on low priority threads such that the core DB functionality is not affected.

4. No information noted in any of the system tables or in the processlist, that the Encryption threads are running and status of processing

Link to a similar high CPU issue has been noted in this ticket MDEV-10368

We temporarily solved the problem by setting innodb_encryption_threads = 0



 Comments   
Comment by Jan Lindström (Inactive) [ 2016-12-16 ]

There is two things user need to carefully configure if key rotation is needed:

  • --innodb-encryption-rotate-key-age=n ; where n is how old key can be before it is rotated
  • --innodb-encryption-threads=n; number of threads used for key rotation

If you have innodb-encrypt-tables=OFF, you can set --innodb-encryption-threads=0. If you have innodb-encrypt-tables=ON|FORCE you still may configure --innodb-encryption-threads=0 but then you may not dynamically set encryption ON|OFF. These threads are also used to convert encrypted tables to unencrypted and unencrypted tables to encrypted if needed. Remember also that new tables using default mode i.e. create table t10() engine=innodb;
will be encrypted using these background threads.

Maybe we should have a new configuration variable to disable key rotation in case user does not really need it, scrubbing is already controlled by configuration variables.

Comment by Jan Lindström (Inactive) [ 2016-12-16 ]

As key rotation is based on key ages, only way to know does tablespace need key rotation or not is to iterate all tablespaces after certain times. Thus, if key rotation is not needed user may set number of background threads to 0 to disable it. That can't be done by default as AWS key management plugin does support key rotation and in InnoDB code base we do not know what plugin is used and does it support key rotation or not. In my opinion current behavior is as designed.

Comment by Geoff Montee (Inactive) [ 2016-12-16 ]

Hi jplindst,

Maybe we should have a new configuration variable to disable key rotation in case user does not really need it, scrubbing is already controlled by configuration variables.

I submitted MDEV-11587 for that feature request, in case you decide to implement that.

Comment by Rasmus Johansson (Inactive) [ 2016-12-20 ]

Correcting resolution

Comment by Rasmus Johansson (Inactive) [ 2016-12-20 ]

By design. A new variable can be implemented as an improvement stated in a previous comment.

Comment by Kishor Grandhe [ 2017-01-04 ]

Re-Tested with 10.1.20, CPU usage still high. The issue identified in MDEV-10368 has reduced the CPU usage by 50% but still very high. System stalls.....

Comment by Jan Lindström (Inactive) [ 2017-01-05 ]

Why you can't set innodb-encryption-threads=0 ? This does not mean that your tables are not encrypted.

Comment by Kishor Grandhe [ 2017-01-05 ]

Based on your statement - "Remember also that new tables using default mode i.e. create table t10() engine=innodb;
will be encrypted using these background threads.", so if we set innodb-encryption-threads =0, any table created using default mode will not be encrypted as the backgrounds threads are not present.

Comment by Jan Lindström (Inactive) [ 2017-01-10 ]

If you think too much resources are used you can use innodb-encryption-threads=[1|2] but that will make CPU spike longer.

Fine encrypting those default mode new tables do take CPU and it happens on background not immediately. Let me see if we can do something better in case where nothing really happens inside InnoDB i.e. no new tables, no new rows and no selects.

Comment by Jan Lindström (Inactive) [ 2017-01-10 ]

Re-opened because of user request.

Comment by Marko Mäkelä [ 2017-01-31 ]

We are also unnecessarily creating a redo log checkpoint at startup, which could slow down startup after crash recovery.
MDEV-11782 will remove that checkpoint if the innodb_encrypt_log setting is not changing. (If it is changing, the whole redo log would be rebuilt, and as part of that process, we would complete a checkpoint on the old redo log.)
The purpose of the log checkpoint is to guarantee that the redo log records seen by crash recovery will be either all clear or all encrypted, but not a mix of them.

Comment by Jan Lindström (Inactive) [ 2017-03-14 ]

commit 50eb40a2a8aa3af6cc271f6028f4d6d74301d030
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date: Tue Mar 14 12:56:01 2017 +0200

MDEV-11738: Mariadb uses 100% of several of my 8 cpus doing nothing

MDEV-11581: Mariadb starts InnoDB encryption threads
when key has not changed or data scrubbing turned off

Background: Key rotation is based on background threads
(innodb-encryption-threads) periodically going through
all tablespaces on fil_system. For each tablespace
current used key version is compared to max key age
(innodb-encryption-rotate-key-age). This process
naturally takes CPU. Similarly, in same time need for
scrubbing is investigated. Currently, key rotation
is fully supported on Amazon AWS key management plugin
only but InnoDB does not have knowledge what key
management plugin is used.

This patch re-purposes innodb-encryption-rotate-key-age=0
to disable key rotation and background data scrubbing.
All new tables are added to special list for key rotation
and key rotation is based on sending a event to
background encryption threads instead of using periodic
checking (i.e. timeout).

fil0fil.cc: Added functions fil_space_acquire_low()
to acquire a tablespace when it could be dropped concurrently.
This function is used from fil_space_acquire() or
fil_space_acquire_silent() that will not print
any messages if we try to acquire space that does not exist.
fil_space_release() to release a acquired tablespace.
fil_space_next() to iterate tablespaces in fil_system
using fil_space_acquire() and fil_space_release().
Similarly, fil_space_keyrotation_next() to iterate new
list fil_system->rotation_list where new tables.
are added if key rotation is disabled.
Removed unnecessary functions fil_get_first_space_safe()
fil_get_next_space_safe()

fil_node_open_file(): After page 0 is read read also
crypt_info if it is not yet read.

btr_scrub_lock_dict_func()
buf_page_check_corrupt()
buf_page_encrypt_before_write()
buf_merge_or_delete_for_page()
lock_print_info_all_transactions()
row_fts_psort_info_init()
row_truncate_table_for_mysql()
row_drop_table_for_mysql()
Use fil_space_acquire()/release() to access fil_space_t.

buf_page_decrypt_after_read():
Use fil_space_get_crypt_data() because at this point
we might not yet have read page 0.

fil0crypt.cc/fil0fil.h: Lot of changes. Pass fil_space_t* directly
to functions needing it and store fil_space_t* to rotation state.
Use fil_space_acquire()/release() when iterating tablespaces
and removed unnecessary is_closing from fil_crypt_t. Use
fil_space_t::is_stopping() to detect when access to
tablespace should be stopped. Removed unnecessary
fil_space_get_crypt_data().

fil_space_create(): Inform key rotation that there could
be something to do if key rotation is disabled and new
table with encryption enabled is created.
Remove unnecessary functions fil_get_first_space_safe()
and fil_get_next_space_safe(). fil_space_acquire()
and fil_space_release() are used instead. Moved
fil_space_get_crypt_data() and fil_space_set_crypt_data()
to fil0crypt.cc.

fsp_header_init(): Acquire fil_space_t*, write crypt_data
and release space.

check_table_options()
Renamed FIL_SPACE_ENCRYPTION_* TO FIL_ENCRYPTION_*

i_s.cc: Added ROTATING_OR_FLUSHING field to
information_schema.innodb_tablespace_encryption
to show current status of key rotation.

Generated at Thu Feb 08 07:51:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.