Details
-
New Feature
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
Server 12.1 dev sprint
Description
Description:
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.
Acceptance Criteria:
For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).
For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.
For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).
Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).
Example Metrics Table:
The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.
Column Name | Data Type | Description |
---|---|---|
sample_time | TIMESTAMP | Time when the sample was taken |
session_id | INT | Unique identifier for the session |
user | VARCHAR | Username associated with the session |
host | VARCHAR | Host from which the session is connected |
database | VARCHAR | Current database of the session |
connection_type | VARCHAR | Type of connection (e.g., TCP/IP, socket) |
connection_start_time | TIMESTAMP | Time when the session started |
current_sql | TEXT | SQL statement being executed |
sql_id | VARCHAR | Hash or ID of the SQL statement |
query_phase | VARCHAR | Current phase of query execution (e.g., parsing, executing) |
rows_processed | INT | Number of rows processed by the query |
temp_tables_used | INT | Number of temporary tables used |
wait_event | VARCHAR | Specific wait event (e.g., 'lock wait', 'I/O wait') |
wait_event_group | VARCHAR | Group of the wait event (e.g., 'CPU', 'IO', 'Lock') |
wait_time | FLOAT | Time spent waiting on the event (in seconds) |
event_specific_details | VARCHAR | Additional details about the wait event |
cpu_time | FLOAT | CPU time used by the session (in seconds) |
io_operations | INT | Number of I/O operations (reads/writes) |
memory_usage | INT | Memory used by the session (in bytes) |
network_usage | INT | Network bytes sent/received |
transaction_id | INT | Current transaction ID |
transaction_state | VARCHAR | State of the transaction (e.g., active, committed) |
transaction_start_time | TIMESTAMP | Start time of the transaction |
isolation_level | VARCHAR | Isolation level of the transaction |
locks_held | INT | Number of locks held by the session |
locks_waited_on | INT | Number of locks the session is waiting on |
blocking_session_id | INT | ID of the session blocking this one |
lock_wait_time | FLOAT | Time spent waiting for locks (in seconds) |
client_application | VARCHAR | Name of the client application |
optimizer_statistics | VARCHAR | Optimizer decisions (e.g., join type) |
error_flags | VARCHAR | Any error or warning flags |
thread_state | VARCHAR | State of the thread (e.g., executing, waiting, sleeping) |
priority | INT | Priority of the thread |
cpu_core_affinity | INT | CPU core affinity of the thread |
Notes on Wait Event Grouping:
'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit).
'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise).
This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.
Server Enhancement Requirements:
Instrumentation: Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
Wait Event Grouping Logic: Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
Optimization: Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.
Attachments
Issue Links
- blocks
-
MDEV-36407 Store Sampled Performance Data Efficiently for Performance Analysis
-
- Open
-
- is blocked by
-
MDEV-36404 Implement Session Sampling Mechanism for Performance Analysis
-
- Open
-
Activity
Field | Original Value | New Value |
---|---|---|
Link | This issue is blocked by MDEV-36404 [ MDEV-36404 ] |
Labels | Monitoring Performance |
Description |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Heading 1||Heading 2|| |Column Name Data Type Description sample_time TIMESTAMP Time when the sample was taken session_id INT Unique identifier for the session user VARCHAR Username associated with the session host VARCHAR Host from which the session is connected database VARCHAR Current database of the session connection_type VARCHAR Type of connection (e.g., TCP/IP, socket) connection_start_time TIMESTAMP Time when the session started current_sql TEXT SQL statement being executed sql_id VARCHAR Hash or ID of the SQL statement query_phase VARCHAR Current phase of query execution (e.g., parsing, executing) rows_processed INT Number of rows processed by the query temp_tables_used INT Number of temporary tables used wait_event VARCHAR Specific wait event (e.g., 'lock wait', 'I/O wait') wait_event_group VARCHAR Group of the wait event (e.g., 'CPU', 'IO', 'Lock') wait_time FLOAT Time spent waiting on the event (in seconds) event_specific_details VARCHAR Additional details about the wait event cpu_time FLOAT CPU time used by the session (in seconds) io_operations INT Number of I/O operations (reads/writes) memory_usage INT Memory used by the session (in bytes) network_usage INT Network bytes sent/received transaction_id INT Current transaction ID transaction_state VARCHAR State of the transaction (e.g., active, committed) transaction_start_time TIMESTAMP Start time of the transaction isolation_level VARCHAR Isolation level of the transaction locks_held INT Number of locks held by the session locks_waited_on INT Number of locks the session is waiting on blocking_session_id INT ID of the session blocking this one lock_wait_time FLOAT Time spent waiting for locks (in seconds) client_application VARCHAR Name of the client application optimizer_statistics VARCHAR Optimizer decisions (e.g., join type) error_flags VARCHAR Any error or warning flags thread_state VARCHAR State of the thread (e.g., executing, waiting, sleeping) priority INT Priority of the thread cpu_core_affinity INT CPU core affinity of the thread|Col A2| *Notes on Wait Event Grouping:* 'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). 'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. 'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). 'Network': For network-related delays. 'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit). 'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* 'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). 'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. 'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). 'Network': For network-related delays. 'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit). 'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
Description |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* 'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). 'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. 'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). 'Network': For network-related delays. 'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit). 'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* *'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). *'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. *'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). *'Network': *For network-related delays. *'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit). *'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
Description |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* *'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). *'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. *'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). *'Network': *For network-related delays. *'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit). *'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* *'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). *'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. *'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). *'Network': *For network-related delays. *'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit). *'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
Summary | Collect Session State Data | Collect Session State Data for Performance Analysis |
Link | This issue blocks MDEV-36407 [ MDEV-36407 ] |
Description |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* *'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). *'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. *'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). *'Network': *For network-related delays. *'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit). *'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
*Description:*
As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues. *Acceptance Criteria:* For each sampled session, collect the metrics listed in the example table below. Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations). For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'. For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits). Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible). *Example Metrics Table:* The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events. ||Column Name||Data Type||Description|| |sample_time|TIMESTAMP|Time when the sample was taken| |session_id|INT|Unique identifier for the session| |user|VARCHAR|Username associated with the session| |host|VARCHAR|Host from which the session is connected| |database|VARCHAR|Current database of the session| |connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)| |connection_start_time|TIMESTAMP|Time when the session started| |current_sql|TEXT|SQL statement being executed| |sql_id|VARCHAR|Hash or ID of the SQL statement| |query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)| |rows_processed|INT|Number of rows processed by the query| |temp_tables_used|INT|Number of temporary tables used| |wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')| |wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')| |wait_time|FLOAT|Time spent waiting on the event (in seconds)| |event_specific_details|VARCHAR|Additional details about the wait event| |cpu_time|FLOAT|CPU time used by the session (in seconds)| |io_operations|INT|Number of I/O operations (reads/writes)| |memory_usage|INT|Memory used by the session (in bytes)| |network_usage|INT|Network bytes sent/received| |transaction_id|INT|Current transaction ID| |transaction_state|VARCHAR|State of the transaction (e.g., active, committed)| |transaction_start_time|TIMESTAMP|Start time of the transaction| |isolation_level|VARCHAR|Isolation level of the transaction| |locks_held|INT|Number of locks held by the session| |locks_waited_on|INT|Number of locks the session is waiting on| |blocking_session_id|INT|ID of the session blocking this one| |lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)| |client_application|VARCHAR|Name of the client application| |optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)| |error_flags|VARCHAR|Any error or warning flags| |thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)| |priority|INT|Priority of the thread| |cpu_core_affinity|INT|CPU core affinity of the thread| *Notes on Wait Event Grouping:* *'CPU':* Assigned when the session is actively executing (e.g., wait_event = 'on CPU'). *'IO':* For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context. *'Lock':* For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock'). *'Network': *For network-related delays. *'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit). *'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise). This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category. *Server Enhancement Requirements:* *Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments. *Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table. *Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary. |
Sprint | Server 12.1 dev sprint [ 793 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |