[MDEV-33046] race condition in InnoDB dict_stats_schedule() Created: 2023-12-17 Updated: 2023-12-21 Resolved: 2023-12-21 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6, 10.11, 11.0, 11.1, 11.2 |
| Fix Version/s: | 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Sergei Golubchik | Assignee: | Vladislav Vaintroub |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Description |
|
dict_stats_schedule() can be invoked with ms=0. In that case dict_stats_func is This causes innodb.innodb_stats_auto_recalc to fail sporadically. The fix could be to revert commit 7c7f9bef28a, use mutex locks, not trylocks, and resolve the deadlock somehow differently. |
| Comments |
| Comment by Marko Mäkelä [ 2023-12-20 ] | ||
|
The test innodb.innodb_stats_auto_recalc is failing very often on mandatory builders of 10.6 and later branches despite the workaround of changing the parameter to ms=10. | ||
| Comment by Vladislav Vaintroub [ 2023-12-21 ] | ||
|
I'm not sure the initial analysis was 100% correct. if timer is scheduled by 2 threads the very same time, and this schedule is protected by try_lock, one try_lock succeeds, another one fails. timer will schedule either now or in 10 seconds from now, and at the given time, it would fire. So "neither now, not later" seems incorrect, it is more "either now, or later". Having said that, the protection does not seem to make much sense anyway, timer::set_time implementations both have internal synchronization. Why is this filed against 10.6? 10.5 has that logic, too. It has the same test innodb.innodb_stats_auto_recalc, which , according to buildbot cross-reference never failed in 10.5 | ||
| Comment by Sergei Golubchik [ 2023-12-21 ] | ||
|
it is "either now or later", and sometimes it happens to be "now". That is dict_stats_process_entry_from_recalc_pool() is called now. But inside it has this condition
and in the test the time(NULL) is equal to table->stats_last_recalc on the first invocation, so update does not happen, it's queued to happen later. That is, it's not queued, because trylock fails. So "now" happens, indeed, but doesn't do anything, and "later" is not queued. | ||
| Comment by Vladislav Vaintroub [ 2023-12-21 ] | ||
|
the thing is this. There was no race condition involving timers. It was like this in 10.5, and it worked in 10.5. The important part in the is : |