[MDEV-24303] information_schema.INNODB_SYS_TABLESTATS stat_modified_counter thread-safety issue Created: 2020-11-27 Updated: 2022-11-06 |
|
| Status: | Open |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.4.17, 10.5.8 |
| Fix Version/s: | 10.4, 10.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Jukka Santala | Assignee: | Marko Mäkelä |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | performance | ||
| Environment: |
CentOS 7 |
||
| Description |
|
I noticed some weird behavior with the InnoDB automatic persistent stats recalculation counter, and ran MariaDB 10.5.8 under TSAN and discovered that stat_modified_counter is accessed in thread-unsafe manner. Worst case is dict_stats_update_if_needed(_func) in dict0stats_bg.cc, where the counter is incremented and then zeroed if automatic stats update is triggered so it could zero the counter while another thread re-sets it one higher. In row0mysql.cc there are branches which increase the counter without calling dict_stats_update_if_needed(_func) as well. Putting all of those inside dict_sys mutex solves the thread conflicts, but probably has unacceptable performance impact. Since dict_stats_update_if_needed(_func) compares the stat_modified_counter to dict_table_get_n_rows(table) which could also be being updated, the mutex may be best recourse here. In dict0stats.cc the updates are already wrapped in dict_sys mutex. As a note row0umod.cc also increases the stat_modified_counter but according to the comments that branch is only called while already holding dict_sys mutex. This comment appears to be wrong at least in current versions, because dict_stats_update_if_needed(_func) indeed doesn't acquire the mutex. Since that code undoes a modify, it seems to me like it should reduce the modify counter (if more than zero), not increase it. Because dict_stats_update_if_needed(_func) already increases it, I'm not sure what's the best course of action is for that. More significant change would be to have the modified counter count downwards. This would be more "fair" than the up-counter, because INSERT's and DELETE's change the threshold that's being compared, significantly affecting how often the background analysis is ran for smaller tables. It would not solve the counter atomicity, but it should remove the need of comparison to dict_table_get_n_rows(table) and allow using std::atomic, or perhaps just accept that some decrements could be lost. Due to the unpredictability of the current counter I assume nobody is using it for anything useful that down-counting might mess with, although it does certainly modify the intended behavior. There's some undocumented behavior in that UPD_NODE_NO_ORD_CHANGE transaction flag and innodb_stats_include_delete_marked server variable alter the automatic recalculation. Transactions which do not change any index, or by default deletes, do not cause dict_stats_update_if_needed(_func) to be called and statistics recalculated, but the modification counter is increased so that first index write triggers statistics recalculation. This is quite odd behavior, but explains why some tables aren't re-analyzed no matter the number of updates. |
| Comments |
| Comment by Marko Mäkelä [ 2020-11-28 ] | |||||||||||||
|
Thank you for the observations, which look valid to me. Would you like to contribute some changes in the form of a pull request? I am assigning this to kevg, because he recently made some improvements in I feel that the dict_sys.mutex is being overused. I do not want contention on it to increase. Perhaps we could use dict_index_t::lock or std::atomic for additional protection? For relaxed loads and stores, according to https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html using std::atomic there should be no additional penalty. Read-modify-write operations are a different matter. | |||||||||||||
| Comment by Jukka Santala [ 2020-11-29 ] | |||||||||||||
|
The code is littered with comments like:
So I get the idea we would not want additional mutexes on every data modifying query (Also good practice putting a big comment when thread-unsafe access is intentional). Although std::atomic also by default forces sequential order of access on the variable, preventing some optimizations, this is generally what we want. With optimizations it's hard to tell how weird things can get when another thread accesses variable during operation. The dict_table_get_n_rows(table) may be biggest problem in current code, because it's only protected by dict_sys.mutex. If we can turn the stat_modified_counter to count down instead, then stat_n_rows will only have to be read when dict_sys.mutex is acquired for updating InnoDB persistent stats anyway (Some refactoring needed, and need to think carefully about the timing of setting stat_modified_counter, but it's probably doable). I can try to write a patch based on that, although after finding out that stat recalculation doesn't get triggered by deletes/non-key-modifying queries explaining why some of the tables weren't getting re-analyzed this doesn't look hugely critical. I'm not sure if there's a way to test how often the automatic analysis is mis-triggered due to data race only. It can depend on environment and compiler of course, so if it can be fixed without performance impact it's worth it. | |||||||||||||
| Comment by Jukka Santala [ 2020-12-02 ] | |||||||||||||
|
After reviewing comments elsewhere in the code, the data races are indeed intentional, though comparing two live counters in non-threadsafe manner is certainly asking for trouble. In addition to obvious instruction order issues, modern CPU architectures of course don't guarantee that changes are visible to all processor threads in same order. For good measure I tested std::atomic on CentOS 7/gcc 9.3.1.
With std::atomic:
If there is low contention on the data address, the overhead of std::atomic may not be too bad: https://travisdowns.github.io/blog/2020/07/06/concurrency-costs.html However n_rows and persistent stats flags should be protected likewise, and indeed the comparison between them all, and the performance would start to be felt. But I believe we can still do better without synchronizing data access explicitly, by storing the intended change threshold at start and comparing the live counter against that. Hopefully, at most we'll lose some counter changes, and the critical region stays small. In the pull request I also addressed the initial issues I had with persistent stats recalc either never getting ran or getting constantly ran unnecessarily, and MODIFIED_COUNTER being non-obvious. In fact this would resolve at least Because the reanalysis counter is not changed for DML that doesn't change indexes (taking into account delete-marked), tables will get re-analyzed less rarely in general, and the 10/6.25% thresholds might need adjustment. Fix for | |||||||||||||
| Comment by Marko Mäkelä [ 2020-12-02 ] | |||||||||||||
|
jsantala, I think that the comment that you found must be originally from 2012. I think that we can and should try to do better than that. Starting with MariaDB 10.4, C++11 and std::atomic are available to all storage engines. One thing that I have been thinking about recently is: How to avoid extra overhead if multiple adjacent atomic data fields are being modified? I have not been curious enough to check the generated code. I would be mildly surprised if the compiler were clever enough to omit a memory fence instruction in the middle of multiple atomic operations. I wonder if we could help the situation by using posix_memalign() (via our aligned_malloc() wrapper) and by providing hints with my_assume_aligned. If there are multiple data members that are to be updated in sync with each other, it is easiest to use an explicit mutex or lock around the operations. What I suggest above could at most be a potential performance optimization, or a technique to avoid some of the overhead of doing things correctly. Sometimes, multiple fields can be folded into a single one, like I recently did to some fil_space_t members in order to reduce fil_system.mutex contention in If the operations require adjustment (say, ‘subtract one if the number was not 0’), then I think that a proper implementation with atomic operations would involve a loop with std::atomic::compare_exchange_strong(). I do not know, but it might be useful to ‘seed’ the initial iteration of the loop by a std:atomic::load(). Such tweaks should ideally be validated with (micro)benchmarks. | |||||||||||||
| Comment by Marko Mäkelä [ 2022-02-21 ] | |||||||||||||
|
jsantala, did you have a chance to test 10.6 with ThreadSanitizer? In Could use dict_table_t::lock_mutex instead of dict_sys.latch
Already protected by metadata lock (MDL, table name lock)
Should be made protected by dict_table_t::lock_mutex:
This ticket could be about the last category. | |||||||||||||
| Comment by Jukka Santala [ 2022-02-28 ] | |||||||||||||
|
I noticed there were some changes to the area, but the behavior with regards to this is still the same. Most of the problem is with stat_modified_counter:
There are additonal related issues:
Most of those are minor issues, though it's hard to tell if it works. I had originally used countdown to avoid comparing two non-threadsafe variables and few of the corner-cases, added std::atomic, but not sure how to best do the queue+reset and make the counter persist. The mutex would certainly make that easier, at some performance-tradeoff. |