Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36024

performance regression with encrypted InnoDB log

Details

    Description

      MDEV-14425 introduced piecewise encryption of the InnoDB redo log. This causes a performance regression for workloads which create a lot of redo log, i.e. INSERT. The cause of the regression are repeated calls to MyCTX::init(). We need a different implementation that creates the AES context only once.

      Attachments

        Issue Links

          Activity

            axel Axel Schwenke added a comment -

            Attachment MDEV-36024.pdf shows the effect. The red and blue lines show the behavior with encryption turned off. The green and pink line with innodb-encrypt-log=ON. The performance seen in the green line (pre MDEV-14425) is about 7.5% lower at 20 threads (compared to unencrypted). However the performance seen in the pink line (MDEV-14425) is about 26% lower at 20 threads.

            The tested commits are

            • 685d958e38b825ad9829be311f26729cccf37c46, MDEV-14425 from 21 Jan 2022 (red & green)
            • baef53a70c675da6d19ac3c7f23c7b8b4ed8458c, the immediate predecessor (blue & pink)

            The first release of the MariaDB server which includes this regression is 10.8.1.

            The full regression test results, including Flamegraphs are available from http://g5.xentio.lan/benchmark-archive/MDEV-36024/ (Sofia VPN access required)

            axel Axel Schwenke added a comment - Attachment MDEV-36024.pdf shows the effect. The red and blue lines show the behavior with encryption turned off. The green and pink line with innodb-encrypt-log=ON . The performance seen in the green line (pre MDEV-14425 ) is about 7.5% lower at 20 threads (compared to unencrypted). However the performance seen in the pink line ( MDEV-14425 ) is about 26% lower at 20 threads. The tested commits are 685d958e38b825ad9829be311f26729cccf37c46, MDEV-14425 from 21 Jan 2022 (red & green) baef53a70c675da6d19ac3c7f23c7b8b4ed8458c, the immediate predecessor (blue & pink) The first release of the MariaDB server which includes this regression is 10.8.1. The full regression test results, including Flamegraphs are available from http://g5.xentio.lan/benchmark-archive/MDEV-36024/ (Sofia VPN access required)

            While we fix this, we must ensure that it is possible to back up and recover any databases that are running with innodb_encrypt_log=ON and the current implementation. It could be a good idea to run a patched mariadb-backup --backup against an unpatched server, or vice versa. It would also be useful to repeat one of the experiments in MDEV-29911, that is, measure the time to recover a copy of a particular dataset where the difference of the checkpoint LSN and the last recovered LSN is at least a few hundred megabytes.

            marko Marko Mäkelä added a comment - While we fix this, we must ensure that it is possible to back up and recover any databases that are running with innodb_encrypt_log=ON and the current implementation. It could be a good idea to run a patched mariadb-backup --backup against an unpatched server, or vice versa. It would also be useful to repeat one of the experiments in MDEV-29911 , that is, measure the time to recover a copy of a particular dataset where the difference of the checkpoint LSN and the last recovered LSN is at least a few hundred megabytes.

            The CPU flame graph that that I analyzed yesterday does not seem to be available anymore. It looks like mtr_t::encrypt(), log_encrypt_buf() and encryption_crypt() need to be refactored so that MyCTX::init() would be invoked only once. It seems that encryption_crypt() is consuming over 6% of all CPU time, while MyCTX::update() (appending some data to the encrypted buffer) would only use 0.25%.

            Based on the implementation of encryption_crypt(), it would seem that there is no need to refactor any encryption service interface. We could simply move parts of the inline function in such a way that encryption_ctx_init() and encryption_ctx_finish() will be invoked at most once per mtr_t::commit() or recv_sys_t::parse():

            static inline int encryption_crypt(const unsigned char* src, unsigned int slen,
                                               unsigned char* dst, unsigned int* dlen,
                                               const unsigned char* key, unsigned int klen,
                                               const unsigned char* iv, unsigned int ivlen,
                                               int flags, unsigned int key_id, unsigned int key_version)
            {
              void *ctx= alloca(encryption_ctx_size(key_id, key_version));
              int res1, res2;
              unsigned int d1, d2;
              if ((res1= encryption_ctx_init(ctx, key, klen, iv, ivlen, flags, key_id, key_version)))
                return res1;
              res1= encryption_ctx_update(ctx, src, slen, dst, &d1);
              res2= encryption_ctx_finish(ctx, dst + d1, &d2);
              *dlen= d1 + d2;
              return res1 ? res1 : res2;
            }
            

            If we invoked encryption_ctx_update() multiple times using the initialization vector that is derived from the unencrypted part of the log records, the format of the encrypted log would be changed in an incompatible way with MDEV-14425. We could call this new format innodb_encrypt_log=fast and distinguish it with a new redo log format code.

            marko Marko Mäkelä added a comment - The CPU flame graph that that I analyzed yesterday does not seem to be available anymore. It looks like mtr_t::encrypt() , log_encrypt_buf() and encryption_crypt() need to be refactored so that MyCTX::init() would be invoked only once. It seems that encryption_crypt() is consuming over 6% of all CPU time, while MyCTX::update() (appending some data to the encrypted buffer) would only use 0.25%. Based on the implementation of encryption_crypt() , it would seem that there is no need to refactor any encryption service interface. We could simply move parts of the inline function in such a way that encryption_ctx_init() and encryption_ctx_finish() will be invoked at most once per mtr_t::commit() or recv_sys_t::parse() : static inline int encryption_crypt( const unsigned char * src, unsigned int slen, unsigned char * dst, unsigned int * dlen, const unsigned char * key, unsigned int klen, const unsigned char * iv, unsigned int ivlen, int flags, unsigned int key_id, unsigned int key_version) { void *ctx= alloca(encryption_ctx_size(key_id, key_version)); int res1, res2; unsigned int d1, d2; if ((res1= encryption_ctx_init(ctx, key, klen, iv, ivlen, flags, key_id, key_version))) return res1; res1= encryption_ctx_update(ctx, src, slen, dst, &d1); res2= encryption_ctx_finish(ctx, dst + d1, &d2); *dlen= d1 + d2; return res1 ? res1 : res2; } If we invoked encryption_ctx_update() multiple times using the initialization vector that is derived from the unencrypted part of the log records, the format of the encrypted log would be changed in an incompatible way with MDEV-14425 . We could call this new format innodb_encrypt_log=fast and distinguish it with a new redo log format code.

            I didn’t sleep well due to a flu, so I am not able to implement this properly right now. I may have misunderstood something, but it appears that my approach of invoking encryption_ctx_update() several times would add some extra padding of the encrypted data, which we can’t allow, because we assume that all data will be encrypted in place. I disabled some assertions and basically made the code write totally corrupted log. An alternative solution could be to fill a contiguous buffer for encryption (up to innodb_page_size or perhaps 64 KiB), and invoke encryption_crypt() on such larger buffers. We may need multiple calls per mtr_t::encrypt(), because the maximum size of a mini-transaction corresponds to the minimum innodb_log_buffer_size=2m.

            The branch 10.11-MDEV-36024 should be useful for estimating the performance. Note that crash recovery will be broken. My common sense says that the server should be able restart after a clean shutdown, but I may be mistaken about that.

            marko Marko Mäkelä added a comment - I didn’t sleep well due to a flu, so I am not able to implement this properly right now. I may have misunderstood something, but it appears that my approach of invoking encryption_ctx_update() several times would add some extra padding of the encrypted data, which we can’t allow, because we assume that all data will be encrypted in place. I disabled some assertions and basically made the code write totally corrupted log. An alternative solution could be to fill a contiguous buffer for encryption (up to innodb_page_size or perhaps 64 KiB), and invoke encryption_crypt() on such larger buffers. We may need multiple calls per mtr_t::encrypt() , because the maximum size of a mini-transaction corresponds to the minimum innodb_log_buffer_size=2m . The branch 10.11-MDEV-36024 should be useful for estimating the performance. Note that crash recovery will be broken. My common sense says that the server should be able restart after a clean shutdown, but I may be mistaken about that.

            The current design is that each mini-transaction record, other than FILE_ records, and other than page identifiers, are encrypted in place. The improvement that I have in mind would attempt to encrypt all payload in a single request. The record framing as well as the end-of-mini-transaction marker would be stored without any encryption just like until now. If the total encrypted length of a mini-transaction exceeds 64 KiB, it would be encrypted in multiple pieces.

            This format would be incompatible with what we currently have, so we would have to introduce a new ib_logfile0 format tag. As far as I can tell, starting up the server after a clean shutdown will avoid any access to encrypted parts of log records.

            marko Marko Mäkelä added a comment - The current design is that each mini-transaction record, other than FILE_ records, and other than page identifiers, are encrypted in place. The improvement that I have in mind would attempt to encrypt all payload in a single request. The record framing as well as the end-of-mini-transaction marker would be stored without any encryption just like until now. If the total encrypted length of a mini-transaction exceeds 64 KiB, it would be encrypted in multiple pieces. This format would be incompatible with what we currently have, so we would have to introduce a new ib_logfile0 format tag. As far as I can tell, starting up the server after a clean shutdown will avoid any access to encrypted parts of log records.
            axel Axel Schwenke added a comment -

            The branch 10.11-MDEV-36024 should be useful for estimating the performance.

            I have given it a try: MDEV-36024_vs_MDEV-14425.pdf. While we have gained some performance while progressing from 10.8 to 10.11, the difference between the blue lines is still much bigger than between the red lines. A bit smaller though than between the two green lines.

            axel Axel Schwenke added a comment - The branch 10.11- MDEV-36024 should be useful for estimating the performance. I have given it a try: MDEV-36024_vs_MDEV-14425.pdf . While we have gained some performance while progressing from 10.8 to 10.11, the difference between the blue lines is still much bigger than between the red lines. A bit smaller though than between the two green lines.
            axel Axel Schwenke added a comment -

            I have also seen the branch 10.11-MDEV-36024-2. For that one the server crashed after loading the data set and then could not recover (as expected). Error log:

            2025-03-04 13:38:20 0 [Note] /home/axel/builds/mariadb-community-10.11-MDEV-36024-2-7d0a4b066e2/libexec/mariadbd: ready for connections.
            Version: '10.11.12-MariaDB'  socket: '/tmp/mysqld.sock.sysbench'  port: 3306  Source distribution
            250304 13:38:24 mysqld_safe Number of processes running now: 0
            250304 13:38:24 mysqld_safe mysqld restarted
            2025-03-04 13:38:24 0 [Note] Starting MariaDB 10.11.12-MariaDB source revision 7d0a4b066e22d912504bb60b2331335de98d43d8 server_uid pvM4d5sPnFo+IUOY/yoK02SWRu4= as process 3580697
            2025-03-04 13:38:24 0 [Note] mariadbd: Aria engine: starting recovery
            recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1 0
             (0.0 seconds);
            2025-03-04 13:38:24 0 [Note] mariadbd: Aria engine: recovery done
            2025-03-04 13:38:24 0 [Note] InnoDB: Compressed tables use zlib 1.3
            2025-03-04 13:38:24 0 [Note] InnoDB: Number of transaction pools: 1
            2025-03-04 13:38:24 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
            2025-03-04 13:38:24 0 [Note] InnoDB: Using liburing
            2025-03-04 13:38:24 0 [Note] InnoDB: Initializing buffer pool, total size = 16.000GiB, chunk size = 256.000MiB
            2025-03-04 13:38:24 0 [Note] InnoDB: Completed initialization of buffer pool
            2025-03-04 13:38:24 0 [Note] InnoDB: File system buffers for log disabled (block size=4096 bytes)
            2025-03-04 13:38:24 0 [Note] InnoDB: End of log at LSN=46654
            2025-03-04 13:38:24 0 [ERROR] InnoDB: The log was only scanned up to 46654, while the current LSN at the time of the latest checkpoint 46654 was 48790!
            2025-03-04 13:38:24 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
            

            axel Axel Schwenke added a comment - I have also seen the branch 10.11- MDEV-36024 -2 . For that one the server crashed after loading the data set and then could not recover (as expected). Error log: 2025-03-04 13:38:20 0 [Note] /home/axel/builds/mariadb-community-10.11-MDEV-36024-2-7d0a4b066e2/libexec/mariadbd: ready for connections. Version: '10.11.12-MariaDB' socket: '/tmp/mysqld.sock.sysbench' port: 3306 Source distribution 250304 13:38:24 mysqld_safe Number of processes running now: 0 250304 13:38:24 mysqld_safe mysqld restarted 2025-03-04 13:38:24 0 [Note] Starting MariaDB 10.11.12-MariaDB source revision 7d0a4b066e22d912504bb60b2331335de98d43d8 server_uid pvM4d5sPnFo+IUOY/yoK02SWRu4= as process 3580697 2025-03-04 13:38:24 0 [Note] mariadbd: Aria engine: starting recovery recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 2 1 0 (0.0 seconds); 2025-03-04 13:38:24 0 [Note] mariadbd: Aria engine: recovery done 2025-03-04 13:38:24 0 [Note] InnoDB: Compressed tables use zlib 1.3 2025-03-04 13:38:24 0 [Note] InnoDB: Number of transaction pools: 1 2025-03-04 13:38:24 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions 2025-03-04 13:38:24 0 [Note] InnoDB: Using liburing 2025-03-04 13:38:24 0 [Note] InnoDB: Initializing buffer pool, total size = 16.000GiB, chunk size = 256.000MiB 2025-03-04 13:38:24 0 [Note] InnoDB: Completed initialization of buffer pool 2025-03-04 13:38:24 0 [Note] InnoDB: File system buffers for log disabled (block size=4096 bytes) 2025-03-04 13:38:24 0 [Note] InnoDB: End of log at LSN=46654 2025-03-04 13:38:24 0 [ERROR] InnoDB: The log was only scanned up to 46654, while the current LSN at the time of the latest checkpoint 46654 was 48790! 2025-03-04 13:38:24 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error

            People

              marko Marko Mäkelä
              axel Axel Schwenke
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.