[MDEV-13918] Race condition between INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS and ALTER/DROP/TRUNCATE TABLE Created: 2017-09-27 Updated: 2018-09-12 Resolved: 2017-10-24 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2.2 |
| Fix Version/s: | 10.0.33, 10.1.29, 10.2.10, 10.3.3 |
| Type: | Bug | Priority: | Major |
| Reporter: | Marko Mäkelä | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | information_schema, race, upstream | ||
| Issue Links: |
|
||||||||||||||||
| Description |
|
elenst reported this in
From the stack traces of all threads, I concluded that the culprit must be a race condition between a table-rebuilding ALTER TABLE (or OPTIMIZE TABLE) and INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS. This was broken in MySQL 5.7.10 and merged to MariaDB 10.2.2. The bug is that i_s_sys_tables_fill_table_stats() is incrementing the reference count of a table while not holding something that conflicts with a concurrent DDL operation (either shared dict_operation_lock or something that conflicts with MDL_EXCLUSIVE). In MySQL 5.6 and MariaDB 10.0/10.1, there is a different race condition: the table can be dropped while the function is accessing it. I believe that the correct fix would be to acquire shared dict_operation_lock before looking up the table, and releasing it after the table is no longer being used. There is no need to increment or decrement the reference count. |
| Comments |
| Comment by Elena Stepanova [ 2017-10-09 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
This MTR test seems to reproduce the problem:
If it doesn't, here is also the RQG way:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-10-18 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
elenst, while you did not repeat this on 10.0, I think a similar bug is possible there. Instead of leading into an assertion failure on the table reference count, we could be accessing a dict_table_t object after it has been removed the data dictionary cache and freed. I pushed a fix to bb-10.0-marko for validation. It survives the mysql-test-run stress test case (which I did not include). I plan to merge the same code to 10.2, replacing the current improper use of the reference counter. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-10-20 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I pushed a fix to 10.0. There are no merge conflicts when merging it up to 10.2, but an additional code patch is needed in 10.2 to remove the 10.2 specific assertion failure. I believe that this should be amended to the merge commit:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2017-10-24 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I pushed the post-merge fix to 10.2 separately, because it was not amended as part of the merge of the fix. This also closes |