[MDEV-30658] lock_row_lock_current_waits counter in information_schema.innodb_metrics may become negative Created: 2023-02-15 Updated: 2023-11-28 Resolved: 2023-10-05 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Information Schema, Locking, Storage Engine - InnoDB |
| Affects Version/s: | 10.5.17, 10.6.12 |
| Fix Version/s: | 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Valerii Kravchuk | Assignee: | Vladislav Lesin |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Consider the following test case involving 3 sessions. In the first one run this:
Now, in the second one update the only row in the InnoDB table created:
In the third one run similar update that will hang waiting for 50 seconds (with default settings):
Now in the first session we can see counter increased, reset it and wait a bit:
When we get this error message in the third session:
in the first one we get a negative value of the counter:
This is a bug and similar bugs were reported (and some fixed) many times for MySQL: https://bugs.mysql.com/bug.php?id=73114 |
| Comments |
| Comment by Valerii Kravchuk [ 2023-02-16 ] | |||||||||||||||||||||
|
This was reported to MySQL too: | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-27 ] | |||||||||||||||||||||
|
As far as I remember, there are three types of these counters:
The lock_row_lock_current_waits counter in MariaDB 10.6 is MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT. It reflects the following:
Updates of the counter are protected by lock_sys.wait_mutex, but we are performing a dirty read here, because after we have read the value, it may be stale anyway. I do not think that a negative return value can be caused due to that. In 10.5, the code is a little different:
This is a sharded atomic counter:
In neither case should the negative value be caused by an unsafe read of the counter. Something in the InnoDB transactional lock subsystem is causing the counter value to become corrupted. | |||||||||||||||||||||
| Comment by Vladislav Lesin [ 2023-10-03 ] | |||||||||||||||||||||
|
The following analyses is based on 10.5. The stable test case for 10.5 is the following: n_lock_wait_current_count.test
Apart from srv_stats.n_lock_wait_current_count, as Marko noted above, there is also innodb_counter_value array. And it contains the values which then displayed in information schema, and srv_mon_process_existing_counter() fills the elements of the array. There are two ways to fill the element of innodb_counter_value array in srv_mon_process_existing_counter depending on MONITOR_DISPLAY_CURRENT flag in monitor_info->monitor_type: For MONITOR_OVLD_ROW_LOCK_CURRENT_WAIT MONITOR_DISPLAY_CURRENT is not set for some reason, that's why MONITOR_SET_DIFF(). It's counted by the following formula:
At the moment when the value is set to -1, we have the following innodb_counter_value array element value:
So, according to the formula above, we have:
mon_start_value is set during "set global innodb_monitor_disable='lock_row_lock_current_waits'" execution. The above formula counts something, which should be accumulated. But srv_stats.n_lock_wait_current_count counter is not supposed to be accumulated, because it's incremented just before thread suspending and decremented just after thread wake up. The fix I propose is to set MONITOR_DISPLAY_CURRENT for this monitor. | |||||||||||||||||||||
| Comment by Marko Mäkelä [ 2023-10-04 ] | |||||||||||||||||||||
|
Thank you, the fix looks good to me. |