[MDEV-36406] Collect Session State Data for Performance Analysis - Jira

Details

Type: New Feature
Status: Open (View Workflow)
Priority: Critical
Resolution: Unresolved
Fix Version/s: 12.1
Component/s: Server
Labels:
- Monitoring
- Performance

Sprint:
Server 12.1 dev sprint

Description

Description:

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

Acceptance Criteria:

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

Example Metrics Table:

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.

Column Name	Data Type	Description
sample_time	TIMESTAMP	Time when the sample was taken
session_id	INT	Unique identifier for the session
user	VARCHAR	Username associated with the session
host	VARCHAR	Host from which the session is connected
database	VARCHAR	Current database of the session
connection_type	VARCHAR	Type of connection (e.g., TCP/IP, socket)
connection_start_time	TIMESTAMP	Time when the session started
current_sql	TEXT	SQL statement being executed
sql_id	VARCHAR	Hash or ID of the SQL statement
query_phase	VARCHAR	Current phase of query execution (e.g., parsing, executing)
rows_processed	INT	Number of rows processed by the query
temp_tables_used	INT	Number of temporary tables used
wait_event	VARCHAR	Specific wait event (e.g., 'lock wait', 'I/O wait')
wait_event_group	VARCHAR	Group of the wait event (e.g., 'CPU', 'IO', 'Lock')
wait_time	FLOAT	Time spent waiting on the event (in seconds)
event_specific_details	VARCHAR	Additional details about the wait event
cpu_time	FLOAT	CPU time used by the session (in seconds)
io_operations	INT	Number of I/O operations (reads/writes)
memory_usage	INT	Memory used by the session (in bytes)
network_usage	INT	Network bytes sent/received
transaction_id	INT	Current transaction ID
transaction_state	VARCHAR	State of the transaction (e.g., active, committed)
transaction_start_time	TIMESTAMP	Start time of the transaction
isolation_level	VARCHAR	Isolation level of the transaction
locks_held	INT	Number of locks held by the session
locks_waited_on	INT	Number of locks the session is waiting on
blocking_session_id	INT	ID of the session blocking this one
lock_wait_time	FLOAT	Time spent waiting for locks (in seconds)
client_application	VARCHAR	Name of the client application
optimizer_statistics	VARCHAR	Optimizer decisions (e.g., join type)
error_flags	VARCHAR	Any error or warning flags
thread_state	VARCHAR	State of the thread (e.g., executing, waiting, sleeping)
priority	INT	Priority of the thread
cpu_core_affinity	INT	CPU core affinity of the thread

Notes on Wait Event Grouping:

'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit).
'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise).

This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

Server Enhancement Requirements:

Instrumentation: Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
Wait Event Grouping Logic: Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
Optimization: Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

Attachments

Issue Links

blocks

MDEV-36407 Store Sampled Performance Data Efficiently for Performance Analysis

Open

is blocked by

MDEV-36404 Implement Session Sampling Mechanism for Performance Analysis

Open

Activity

Ascending order - Click to sort in descending order

Adam Luciano created issue - 2025-03-26 21:42

Adam Luciano made changes - 2025-03-26 21:42

Field	Original Value	New Value
Link		This issue is blocked by MDEV-36404 [ MDEV-36404 ]

Adam Luciano made changes - 2025-03-26 21:42

Labels

Monitoring Performance

Adam Luciano made changes - 2025-03-26 21:44

Description

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.

||Heading 1||Heading 2||
|Column Name Data Type Description
sample_time TIMESTAMP Time when the sample was taken
session_id INT Unique identifier for the session
user VARCHAR Username associated with the session
host VARCHAR Host from which the session is connected
database VARCHAR Current database of the session
connection_type VARCHAR Type of connection (e.g., TCP/IP, socket)
connection_start_time TIMESTAMP Time when the session started
current_sql TEXT SQL statement being executed
sql_id VARCHAR Hash or ID of the SQL statement
query_phase VARCHAR Current phase of query execution (e.g., parsing, executing)
rows_processed INT Number of rows processed by the query
temp_tables_used INT Number of temporary tables used
wait_event VARCHAR Specific wait event (e.g., 'lock wait', 'I/O wait')
wait_event_group VARCHAR Group of the wait event (e.g., 'CPU', 'IO', 'Lock')
wait_time FLOAT Time spent waiting on the event (in seconds)
event_specific_details VARCHAR Additional details about the wait event
cpu_time FLOAT CPU time used by the session (in seconds)
io_operations INT Number of I/O operations (reads/writes)
memory_usage INT Memory used by the session (in bytes)
network_usage INT Network bytes sent/received
transaction_id INT Current transaction ID
transaction_state VARCHAR State of the transaction (e.g., active, committed)
transaction_start_time TIMESTAMP Start time of the transaction
isolation_level VARCHAR Isolation level of the transaction
locks_held INT Number of locks held by the session
locks_waited_on INT Number of locks the session is waiting on
blocking_session_id INT ID of the session blocking this one
lock_wait_time FLOAT Time spent waiting for locks (in seconds)
client_application VARCHAR Name of the client application
optimizer_statistics VARCHAR Optimizer decisions (e.g., join type)
error_flags VARCHAR Any error or warning flags
thread_state VARCHAR State of the thread (e.g., executing, waiting, sleeping)
priority INT Priority of the thread
cpu_core_affinity INT CPU core affinity of the thread|Col A2|

*Notes on Wait Event Grouping:*

'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
'Network': For network-related delays.
'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit).
'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise).
This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.
||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
'Network': For network-related delays.
'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit).
'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise).
This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

Adam Luciano made changes - 2025-03-26 21:44

Description

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.
||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

'CPU': Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
'IO': For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
'Lock': For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
'Network': For network-related delays.
'Commit': For waits related to transaction commits (e.g., redo log or I/O during commit).
'Idle': For sessions in a sleep state (though typically not sampled unless configured otherwise).
This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.
||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

*'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
*'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
*'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
*'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit).
*'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise).

This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

Adam Luciano made changes - 2025-03-26 21:44

Description

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.
||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

*'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
*'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
*'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
*'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit).
*'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise).

This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.

||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

*'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
*'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
*'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
*'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit).
*'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise).

This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

Adam Luciano made changes - 2025-03-26 21:45

Summary

Collect Session State Data

Collect Session State Data for Performance Analysis

Adam Luciano made changes - 2025-03-26 21:50

Link

This issue blocks MDEV-36407 [ MDEV-36407 ]

Adam Luciano made changes - 2025-03-27 13:57

Description

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.

||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

*'CPU': *Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
*'IO': *For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
*'Lock': *For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
*'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit).
*'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise).

This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

*Description:*

As a database administrator, I want the sampling mechanism to capture detailed state information for each active session so that I can diagnose what each session was doing during performance issues.

*Acceptance Criteria:*

For each sampled session, collect the metrics listed in the example table below.
Include a wait_event_group metric derived from wait_event, with predefined groups such as 'CPU', 'IO', 'Lock', 'Network', 'Commit', 'Idle', to categorize waits for easier troubleshooting (e.g., identifying if slow performance is due to CPU usage, I/O delays, or commit operations).

For sessions actively executing (on CPU), set wait_event_group to 'CPU' and wait_event to 'on CPU'.

For sessions waiting, assign wait_event_group based on the type of wait event (e.g., 'IO' for I/O waits, 'Lock' for lock waits).

Ensure data collection is optimized to avoid significant performance overhead (e.g., reuse existing Performance Schema hooks where possible).

*Example Metrics Table:*

The table below outlines the metrics to collect for each sampled session to support ASH-like monitoring in MariaDB. This comprehensive list ensures detailed performance analysis, incorporating feedback about grouping wait events.

||Column Name||Data Type||Description||
|sample_time|TIMESTAMP|Time when the sample was taken|
|session_id|INT|Unique identifier for the session|
|user|VARCHAR|Username associated with the session|
|host|VARCHAR|Host from which the session is connected|
|database|VARCHAR|Current database of the session|
|connection_type|VARCHAR|Type of connection (e.g., TCP/IP, socket)|
|connection_start_time|TIMESTAMP|Time when the session started|
|current_sql|TEXT|SQL statement being executed|
|sql_id|VARCHAR|Hash or ID of the SQL statement|
|query_phase|VARCHAR|Current phase of query execution (e.g., parsing, executing)|
|rows_processed|INT|Number of rows processed by the query|
|temp_tables_used|INT|Number of temporary tables used|
|wait_event|VARCHAR|Specific wait event (e.g., 'lock wait', 'I/O wait')|
|wait_event_group|VARCHAR|Group of the wait event (e.g., 'CPU', 'IO', 'Lock')|
|wait_time|FLOAT|Time spent waiting on the event (in seconds)|
|event_specific_details|VARCHAR|Additional details about the wait event|
|cpu_time|FLOAT|CPU time used by the session (in seconds)|
|io_operations|INT|Number of I/O operations (reads/writes)|
|memory_usage|INT|Memory used by the session (in bytes)|
|network_usage|INT|Network bytes sent/received|
|transaction_id|INT|Current transaction ID|
|transaction_state|VARCHAR|State of the transaction (e.g., active, committed)|
|transaction_start_time|TIMESTAMP|Start time of the transaction|
|isolation_level|VARCHAR|Isolation level of the transaction|
|locks_held|INT|Number of locks held by the session|
|locks_waited_on|INT|Number of locks the session is waiting on|
|blocking_session_id|INT|ID of the session blocking this one|
|lock_wait_time|FLOAT|Time spent waiting for locks (in seconds)|
|client_application|VARCHAR|Name of the client application|
|optimizer_statistics|VARCHAR|Optimizer decisions (e.g., join type)|
|error_flags|VARCHAR|Any error or warning flags|
|thread_state|VARCHAR|State of the thread (e.g., executing, waiting, sleeping)|
|priority|INT|Priority of the thread|
|cpu_core_affinity|INT|CPU core affinity of the thread|

*Notes on Wait Event Grouping:*

*'CPU':* Assigned when the session is actively executing (e.g., wait_event = 'on CPU').
*'IO':* For I/O-related waits (e.g., wait/io/file/sql/binlog → 'IO'). Maps to User I/O or System I/O based on context.
*'Lock':* For lock-related waits (e.g., wait/synch/mutex/sql/LOCK_global_system_variables → 'Lock').
*'Network': *For network-related delays.
*'Commit':* For waits related to transaction commits (e.g., redo log or I/O during commit).
*'Idle':* For sessions in a sleep state (though typically not sampled unless configured otherwise).

This grouping aligns with general user feedback and pgsentinel’s approach, making it easier to troubleshoot performance bottlenecks by category.

*Server Enhancement Requirements:*

*Instrumentation:* Extend the MariaDB server’s instrumentation to collect additional metrics such as cpu_time, network_usage, lock_wait_time, and detailed wait event information. This may require adding new hooks or extending existing Performance Schema instruments.
*Wait Event Grouping Logic:* Implement logic to derive wait_event_group from wait_event. Initially, use a fixed mapping based on the wait event name prefix (e.g., 'wait/io' → 'IO', 'wait/synch/mutex' → 'Lock'). Future enhancements could use a configurable mapping table.
*Optimization:* Ensure the sampling mechanism efficiently gathers metrics by leveraging existing Performance Schema data (e.g., events_waits_current) and only calculates additional metrics when necessary.

Julien Fritsch made changes - 5 days ago

Sprint

Server 12.1 dev sprint [ 793 ]

Julien Fritsch made changes - 4 days ago

Priority

Major [ 3 ]

Critical [ 2 ]

People

Assignee:: Rucha Deodhar

Reporter:: Adam Luciano

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2025-03-26 21:42

Updated:: 5 days ago 15:49

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration