Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24303

information_schema.INNODB_SYS_TABLESTATS stat_modified_counter thread-safety issue




      I noticed some weird behavior with the InnoDB automatic persistent stats recalculation counter, and ran MariaDB 10.5.8 under TSAN and discovered that stat_modified_counter is accessed in thread-unsafe manner. Worst case is dict_stats_update_if_needed(_func) in dict0stats_bg.cc, where the counter is incremented and then zeroed if automatic stats update is triggered so it could zero the counter while another thread re-sets it one higher. In row0mysql.cc there are branches which increase the counter without calling dict_stats_update_if_needed(_func) as well.

      Putting all of those inside dict_sys mutex solves the thread conflicts, but probably has unacceptable performance impact. Since dict_stats_update_if_needed(_func) compares the stat_modified_counter to dict_table_get_n_rows(table) which could also be being updated, the mutex may be best recourse here. In dict0stats.cc the updates are already wrapped in dict_sys mutex.

      As a note row0umod.cc also increases the stat_modified_counter but according to the comments that branch is only called while already holding dict_sys mutex. This comment appears to be wrong at least in current versions, because dict_stats_update_if_needed(_func) indeed doesn't acquire the mutex. Since that code undoes a modify, it seems to me like it should reduce the modify counter (if more than zero), not increase it. Because dict_stats_update_if_needed(_func) already increases it, I'm not sure what's the best course of action is for that.

      More significant change would be to have the modified counter count downwards. This would be more "fair" than the up-counter, because INSERT's and DELETE's change the threshold that's being compared, significantly affecting how often the background analysis is ran for smaller tables. It would not solve the counter atomicity, but it should remove the need of comparison to dict_table_get_n_rows(table) and allow using std::atomic, or perhaps just accept that some decrements could be lost.

      Due to the unpredictability of the current counter I assume nobody is using it for anything useful that down-counting might mess with, although it does certainly modify the intended behavior.

      There's some undocumented behavior in that UPD_NODE_NO_ORD_CHANGE transaction flag and innodb_stats_include_delete_marked server variable alter the automatic recalculation. Transactions which do not change any index, or by default deletes, do not cause dict_stats_update_if_needed(_func) to be called and statistics recalculated, but the modification counter is increased so that first index write triggers statistics recalculation. This is quite odd behavior, but explains why some tables aren't re-analyzed no matter the number of updates.




            kevg Eugene Kosov
            jsantala Jukka Santala
            0 Vote for this issue
            2 Start watching this issue