Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-39695

Refactor InnoDB page encryption architecture to eliminate thread contention and consolidate flush logic

    XMLWordPrintable

Details

    Description

      Current Problem
      =================
      The current background encryption key rotation architecture relies on multiple "encryption threads"
      (fil_crypt_thread) acting as producers to perform dummy writes and dirty data pages within the
      buffer pool. However, this design introduces significant inefficiencies and architectural bottlenecks:

      The actual CPU-intensive operations—buf_page_encrypt() and CRC-32C computation—are bottlenecked
      inside a single consumer thread: buf_flush_page_cleaner(). Having multiple threads dirtying
      pages provides no apparent throughput benefit while heavily taxing the buffer pool.

      • Multiple fil_crypt_thread instances concurrently increase the size of buf_pool.flush_list
        and invoke buf_flush_list_space(). This repeatedly invalidates the hazard pointer
        buf_pool.flush_hp, disrupting normal page flushing activity.

      When pages must be read into the buffer pool for encryption, the current implementation
      invokes synchronous buf_page_read() via buf_page_get_gen(), blocking progress.

      Proposal Solution:
      ==================
      Tightly couple the encryption/rotation process directly
      into the buf_flush_page_cleaner() loop:

      Eliminate the independent page-dirtying fil_crypt_thread loops.
      Let buf_flush_page_cleaner() manage key rotation natively during its regular buffer pool scans.

      Embed an "encryption step" processing missed pages from buf_flush_list_holding_mutex() and buf_flush_LRU().
      This allows key rotation to naturally respect the innodb_io_capacity budget and scale
      back during high application workloads (e.g., when threads are blocked in buf_flush_wait()).

      Replace synchronous page reads with asynchronous reads (keeping future MDEV-11378 compliance in mind).
      The read completion callback will write the dummy log record to queue the page for re-encryption.

      Repurpose innodb_encrypt_threads to act as parallel worker tasks managed by buf_flush_page_cleaner().
      Split buf_page_t::flush() so the single cleaner thread handles the initial phase, offloading the
      heavy encryption/CRC-32C computation tasks to these workers via a task queue.

      Move relevant elements of fil_space_rotate_state_t and key_state_t into fil_space_t or fil_space_crypt_t.

      A dedicated testing track must focus entirely on isolating the behavior of the innodb_encrypt_threads
      parameter. Because the current architecture splits the workflow between multiple page-dirtying
      producer threads and a single page-cleaning consumer thread, tests should explicitly scale
      innodb_encrypt_threads from 1 up to higher parallel counts (e.g., 4, 8, and 16)
      under identical hardware configurations.

      Updated Encryption Performance Testing Matrix

      To accurately evaluate the system under continuous stress, all tests must be executed with innodb_encrypt_tables
      set to always encrypt / force re-encryption. This ensures that key rotation activity never goes idle,
      keeping the fil_crypt_thread loops continuously active and constantly forcing page-dirtying behavior throughout
      the entire duration of the benchmark. The revised testing suite eliminates standard read-only
      profiles—which mask true engine contention—and instead relies on four distinct write-driven configurations
      mapped across varying thread scales (1, 4, 8, and 16 threads).

      Test TP-01 (Read-Heavy / Light-Write): This configuration pairs a small innodb_buffer_pool_size
      (forcing high page churn from disk) with a large innodb_log_file_size (minimizing log boundary pressure)
      to specifically isolate synchronous read interference. It measures how much synchronous buf_page_read calls
      from the encryption threads delay legitimate application reads when the buffer pool is under
      constant replacement pressure.

      Test TP-02 (Balanced Read-Write): Utilizing a standard OLTP workload, this setup leverages a large buffer pool
      (ensuring an in-memory operational fit) and a large log file size to evaluate hazard pointer contention.
      It directly tracks the rate at which buf_pool.flush_hp is invalidated by concurrent buf_flush_list_space()
      calls coming from the active encryption threads against normal background transactional flushes.

      Test TP-03 (Write-Heavy): This profile uses bulk inserts/updates alongside a large
      buffer pool and a small log file size to evaluate I/O capacity and back-off budgeting. It determines
      whether the background rotation loops correctly scale back under aggressive, immediate flushing
      pressure, or if they blindly consume the innodb_io_capacity budget while user threads starve for log space.

      Test TP-04 (High-Saturation Stress): Operating under severe simultaneous pressure from both a small buffer
      pool and a small log file. By forcing extreme flushing contention from both user transactions and key
      rotation threads simultaneously, it creates the precise boundary conditions needed to expose and verify
      the fix for the thread lockup vulnerability.

      Measure TPS and also essential metrics for buffer pool contention:

      SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests';
      SHOW STATUS LIKE 'Innodb_buffer_pool_reads';
      SHOW STATUS LIKE 'Innodb_buffer_pool_wait_free';

      From this variables, we can figure out the following like

      Hit_rate = (read_requests - reads) / read_requests * 100%

      SHOW STATUS LIKE 'Innodb_buffer_pool_reads'; in reads_before
      ... 10 sec...
      SHOW STATUS LIKE 'Innodb_buffer_pool_reads'; in reads_now

      Physical_rd_per_sec = (reads_now - reads_before) / time_interval

      Flush List Growth Rate= Flush List Now - Flush List Before \ Time_interval;

      Expectation is that more encryption threads lead to more TPS degradation

      Attachments

        Activity

          People

            thiru Thirunarayanan Balathandayuthapani
            thiru Thirunarayanan Balathandayuthapani
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.