Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36406

Collect Session State Data for Performance Analysis

Details

    • Server 12.1 dev sprint

    Description

      Description:

      As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

      Acceptance Criteria:

      For each sampled session, collect the metrics listed in the example table below.
      Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

      For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

      For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

      Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

      Example Metrics Table:

      The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.

      Column Name Data Type Description
      sample_time TIMESTAMP Time when the sample was taken
      session_id INT Unique identifier for the session
      user VARCHAR Username associated with the session
      host VARCHAR Host from which the session is connected
      database VARCHAR Current database of the session
      connection_type VARCHAR Type of connection (e.g., TCP/IP, socket)
      connection_start_time TIMESTAMP Time when the session started
      current_sql TEXT SQL statement being executed
      sql_id VARCHAR Hash or ID of the SQL statement
      query_phase VARCHAR Current phase of query execution (e.g., parsing, executing)
      rows_processed INT Number of rows processed by the query
      temp_tables_used INT Number of temporary tables used
      wait_event VARCHAR Specific wait event (e.g., 'lock wait', 'I/O wait')
      wait_event_group VARCHAR Group of the wait event (e.g., 'CPU', 'IO', 'Lock')
      wait_time FLOAT Time spent waiting on the event (in seconds)
      event_specific_details VARCHAR Additional details about the wait event
      cpu_time FLOAT CPU time used by the session (in seconds)
      io_operations INT Number of I/O operations (reads/writes)
      memory_usage INT Memory used by the session (in bytes)
      network_usage INT Network bytes sent/received
      transaction_id INT Current transaction ID
      transaction_state VARCHAR State of the transaction (e.g., active, committed)
      transaction_start_time TIMESTAMP Start time of the transaction
      isolation_level VARCHAR Isolation level of the transaction
      locks_held INT Number of locks held by the session
      locks_waited_on INT Number of locks the session is waiting on
      blocking_session_id INT ID of the session blocking this one
      lock_wait_time FLOAT Time spent waiting for locks (in seconds)
      client_application VARCHAR Name of the client application
      optimizer_statistics VARCHAR Optimizer decisions (e.g., join type)
      error_flags VARCHAR Any error or warning flags
      thread_state VARCHAR State of the thread (e.g., executing, waiting, sleeping)
      priority INT Priority of the thread
      cpu_core_affinity INT CPU core affinity of the thread

      Notes on Wait Event Grouping:

      'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
      'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
      'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
      *'Network': *For network-related delays.
      'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit).
      'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise).

      This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

      Server Enhancement Requirements:

      Instrumentation: Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
      Wait Event Grouping Logic: Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
      Optimization: Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

      Attachments

        Issue Links

          Activity

            People

              rucha174 Rucha Deodhar
              adamluciano Adam Luciano
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.