[MDEV-22180] Planner opens unnecessary tables when updated table is referenced by foreign keys Created: 2020-04-07  Updated: 2022-12-09  Resolved: 2020-05-05

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1.40, 10.1.44, 10.2.31, 10.3.22, 10.4.12, 10.5.2
Fix Version/s: 10.1.45, 10.2.32, 10.3.23, 10.4.13

Type: Bug Priority: Major
Reporter: Guillermo Schulman Assignee: Sergei Golubchik
Resolution: Fixed Votes: 4
Labels: explain, foreign-keys, performance, plan, planner
Environment:

Ubuntu


Attachments: PNG File Screenshot from 2022-12-02 19-46-42.png     File fktest.sql.gz    
Issue Links:
Duplicate
duplicates MDEV-21226 open_tables count is increasing with ... Closed
is duplicated by MDEV-18259 ASAN heap-use-after-free or server cr... Closed
Problem/Incident
causes MDEV-30103 FK constraint fails upon referenced t... Stalled
Relates
relates to MDEV-30103 FK constraint fails upon referenced t... Stalled
relates to MDEV-30182 Optimize open_tables to take O(N) time Open
relates to MDEV-16417 Store Foreign Key metadata outside of... In Review
relates to MDEV-21052 InnoDB foreign key refactoring for TA... Stalled
relates to MDEV-21175 Remove dict_table_t::n_foreign_key_ch... Closed

 Description   

When performing a writing query on a referenced table, the planner opens all the tables referencing it, even if the modified fields are not the referenced ones and even if no record matches the WHERE clause. This behavior started to happen at some version between 10.1.20 and 10.1.40, as it is reproducible in the later and not in the former.
It is interesting to mention that the problem is triggered even if just the explain is executed which seems to be related to the planner rather than the excution of the query itself.
It takes a lot of time and resources, mainly when several FKs are involved.
To reproduce it just follow these steps:
1 . Create a DB with one table referenced by several tables. Here's a bash example of how to create it (and there is a dump file attached to this issue which you can just restore):

mysql -uroot -e "DROP DATABASE IF EXISTS fktest; CREATE DATABASE fktest"
 
mysql -uroot fktest -e "
  DROP TABLE IF EXISTS fkParent; 
  CREATE TABLE fkParent (
      id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
      textField varchar(10)
  );"
 
for n in $(seq 12000); do
    [ $(($n % 100)) -eq 0 ] && echo $n
    mysql -uroot fktest -e " 
    CREATE TABLE fkChild$n (
      id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY, 
      fkField$n int(10) unsigned,
      CONSTRAINT FK_$n FOREIGN KEY (fkField$n) REFERENCES fkParent(id)
    );"
done

Then just access that DB and run this explain statement:

explain UPDATE fkParent SET textField="test";

It will take quite long and it seems that the planner is opnening all those referencing tables.
If you run exactly the same test in version 10.1.20 it just goes instantaneous.

To see it more clearly here are some evident tests:

In version 10.1.40 (or any later version):

MariaDB [fktest]> show status like "opened_table%";
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Opened_table_definitions | 0     |
| Opened_tables            | 0     |
+--------------------------+-------+
2 rows in set (0.00 sec)
 
MariaDB [fktest]> explain UPDATE fkParent SET textField="test";
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
| id   | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
|    1 | SIMPLE      | fkParent | index | NULL          | PRIMARY | 4       | NULL |    1 |       |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
1 row in set (16.18 sec)
 
MariaDB [fktest]> show status like "opened_table%";
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Opened_table_definitions | 3934  |
| Opened_tables            | 3934  |
+--------------------------+-------+
2 rows in set (0.01 sec)

In version 10.1.20 looks quite better and does it in 0 miliseconds

MariaDB [siemens]> show status like "opened_table%";
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Opened_table_definitions | 0     |
| Opened_tables            | 0     |
+--------------------------+-------+
2 rows in set (0.01 sec)
 
MariaDB [fktest]> explain UPDATE fkParent SET textField="test";
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
| id   | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
|    1 | SIMPLE      | fkParent | index | NULL          | PRIMARY | 4       | NULL |    1 |       |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
1 row in set (0.00 sec)
 
MariaDB [siemens]> show status like "opened_table%";
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Opened_table_definitions | 0     |
| Opened_tables            | 0     |
+--------------------------+-------+
2 rows in set (0.00 sec)

Additionally, you can see it in the profiling detail for the failing version:

MariaDB [fktest]> set profiling =1;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [fktest]> explain UPDATE fkParent SET textField="test";
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
| id   | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
|    1 | SIMPLE      | fkParent | index | NULL          | PRIMARY | 4       | NULL |    1 |       |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
1 row in set (12.49 sec)
 
MariaDB [fktest]> show profile for query 1;
+----------------------+-----------+
| Status               | Duration  |
+----------------------+-----------+
| starting             |  0.000259 |
| checking permissions |  0.000055 |
| Opening tables       | 12.409309 |
| After opening tables |  0.006020 |
| System lock          |  0.005769 |
| Table lock           |  0.007041 |
| init                 |  0.001493 |
| query end            |  0.000017 |
| closing tables       |  0.003398 |
| Unlocking tables     |  0.060393 |
| freeing items        |  0.000021 |
| updating status      |  0.000045 |
| cleaning up          |  0.002870 |
+----------------------+-----------+
13 rows in set (0.01 sec)

As you can see, all the time is spent opening tables.



 Comments   
Comment by Guillermo Schulman [ 2020-04-07 ]

The behavior is reproducible regardless file-per-table is enabled or not.

Comment by Guillermo Schulman [ 2020-04-07 ]

Just wanted to add that, of course, running the UPDATE without the explain takes the same resources and time which seems to be obvious as it runs the planner before. I did not mention it as I considered it obvious.
It must be said, also, that having 12k tables with FKs to one table could be a very weird and complex scenario, but the performance is also hit with lower numbers, the effect appears to be proportional to the number of FKs involved. I provided a critical example to make it more clear and descriptive. Anyway, I think that a lower impact is still unacceptable if it is, as it seems, avoidable.

Comment by Elena Stepanova [ 2020-04-13 ]

Thanks for the report.

The difference was introduced by this commit in 10.2.3

commit 2614a0ab0ffd802b3b9e62795979e83b6a10fcb3
Author: Sergei Golubchik
Date:   Sun Nov 20 11:23:48 2016 +0100
 
    extend prelocking to FK-accessed tables

and later by the backport to 10.0.37 / 10.1.36

commit 64a23c1c8a826a6f58f8a415f60a0e3cc0e0375f
Author: Sergei Golubchik
Date:   Tue Jul 17 16:56:40 2018 +0200
 
    extend prelocking to FK-accessed tables

From the sound of it, it might have been intentional, I'll leave it to serg to determine it.

The MTR version of the test case:

--source include/have_innodb.inc
 
CREATE TABLE fkParent (
      id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY,
      textField varchar(10)
) ENGINE=InnoDB;
 
CREATE TABLE fkChild1 (
      id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY, 
      fkField1 int(10) unsigned,
      CONSTRAINT FK_1 FOREIGN KEY (fkField1) REFERENCES fkParent(id)
) ENGINE=InnoDB;
 
CREATE TABLE fkChild2 (
      id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY, 
      fkField2 int(10) unsigned,
      CONSTRAINT FK_2 FOREIGN KEY (fkField2) REFERENCES fkParent(id)
) ENGINE=InnoDB;
 
CREATE TABLE fkChild3 (
      id int(10) unsigned NOT NULL AUTO_INCREMENT PRIMARY KEY, 
      fkField3 int(10) unsigned,
      CONSTRAINT FK_3 FOREIGN KEY (fkField3) REFERENCES fkParent(id)
) ENGINE=InnoDB;
 
FLUSH STATUS;
 
show status like "opened_table%";
 
explain UPDATE fkParent SET textField="test";
 
show status like "opened_table%";
 
DROP TABLE fkChild1, fkChild2, fkChild3, fkParent;

After the change, the second show status returns

Variable_name	Value
Opened_table_definitions	4
Opened_tables	4

Before the change

show status like "opened_table%";
Variable_name	Value
Opened_table_definitions	1
Opened_tables	1

Comment by Sergei Golubchik [ 2020-04-14 ]

It is intentional, indeed. The server needs to open/lock all tables that participate in a query, at least for

  • if a child table has indexed virtual columns, on a CASCADE operation it might need to recalculate those columns and that absolutely requires the table to be opened in the server
  • if there's a concurrent ALTER TABLE on a child table with a CASCADE, it might break foreign key constraints.

Still, let's keep this issue open — we'll try to reduce the overhead. For example, both issues above only apply to CASCADE, not to RESTRICT. With RESTRICT we, quite possibly, don't need to open child tables. Either we can ignore them completely (but beware MDEV-21175) or only mdl-lock them without opening.

Comment by Guillermo Schulman [ 2020-04-14 ]

Thanks Sergei and Elena.
Saving that time/resources for RESTRICT cases would be great.
Anyway, I wonder why it would still be necessary in CASCADE scenarios when the field being updated is not the referenced by the FKs, as in the example provided. Nothing should be triggered if the referenced column is not affected, should it? Same question for virtual column scenario: if the "parent table" has no virtual columns or none of the referenced columns is used in the virtual column formulas, is it still necessary to open all the children tables? Isn't the parent table information good enough to foresee no need to open additional tables?
And finally, why it was possible to avoid opening all those tables in previous versions (< 10.1.36)? Was there a bug and the bugfix introduced this behavior?
Thanks again.

Comment by Sergei Golubchik [ 2020-04-14 ]

1) yes, if referenced columns aren't touched it looks like FK tables don't have to be opened. But I'm not sure we have this information when tables are being opened.

2) virtual columns: we need to open the child table to know whether it has indexed virtual columns, we cannot know it without opening. Still, we can open only the frm and not open the table in the engine if it doesn't have virtual columns

3) yes, the bug with ALTER TABLE was already in 10.1, I presume. Indexed virtual columns were introduced only in 10.2, so they couldn't be an issue in 10.1.

Comment by Guillermo Schulman [ 2020-04-15 ]

Thanks Sergei. Some comments/questions for each item:
1) What do you mean by "this information"? I mean, do you need to open all the children tables referencing the parent table to check if the updated column is being referenced?
2) Why do you need to know if a referencing table has an indexed virtual column? No children row/column is going to be affected when you change a non-referenced column in a parent table. Am I missing anything?
3) Ok, I see the point. So, would it be possible to avoid this new mechanism in 10.1 at least?

To be honest, this bug is really preventing us to go on with the upgrade from mariaDb 10.1.20. This is really bad news for us if we wanted to keep on using MariaDb.
Do you know any estimation for a fix? Is it in the short term roadmap?
Thanks a lot.

Comment by Sergei Golubchik [ 2020-04-17 ]

1. No, it doesn't need to open children tables for that. Normally MariaDB opens all tables that will be needed for a statement in one step, at the beginning of statement processing. This is done to avoid deadlocks — all tables are locked in the same order, so deadlocks are impossible. Analyzing affected fields and mapping them to tables (e.g. in SELECT a FROM t1, t2 — what table does a belong to?) is done later. That is, first MariaDB opens all [possibly] affected tables, then it finds out whether referenced columns are being updated.

May be it can be fixed, but it won't be a simple change.

2. For CASCADE/SET NULL/SET DEFAULT an UPDATE can cause a child table to be updated. If the referencing column is indexed — the index needs to be updated. If the referencing column is a part of the multi-column index and another column in the index is a virtual one — InnoDB will need to know the value of the virtual column to be able to update the index. To evaluate a virtual column the server will need the table open, otherwise it won't even know what SQL expression to evaluate.

3. No, this was a fix for MDEV-16465. First this behavior was implemented in 10.2, for virtual indexed columns and later backported to 10.1 because of MDEV-16465.


But your test case uses RESTRICT and I suspect it can be fixed easily

Comment by Guillermo Schulman [ 2020-04-17 ]

Thanks Sergei.
When do you think we can have a fix for the RESTRICT scenario? Is it expected to happen soon?

Comment by Greg Smethells [ 2021-02-10 ]

This is still not fixed. Why is it closed?

Comment by Greg Smethells [ 2021-02-10 ]

We generated the sample db from that bug ticket and it is indeed slow, on version 10.2.36:

MariaDB [fktest]> explain UPDATE fkParent SET textField="test";
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
| id   | select_type | table    | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
|    1 | SIMPLE      | fkParent | index | NULL          | PRIMARY | 4       | NULL |    1 |       |
+------+-------------+----------+-------+---------------+---------+---------+------+------+-------+
1 row in set (3.50 sec)

Comment by Greg Smethells [ 2021-02-10 ]

It is also slow on version 10.5.8

Comment by Sergei Golubchik [ 2021-02-12 ]

gsmethells, that depends on what you mean by "fixed". If you mean that the server still opens referenced tables, and you expected it not to do that just as it was not doing it before — this isn't likely to happen, not doing it was a bug that allowed concurrent DDLs on fk-referenced tables. It could've caused crashes or data corruption.

The fix implemented here was to reduce the overhead and keep the DDL protection without actually opening tables when possible.

What is your use case? Perhaps we've missed it. Or perhaps it is not possible to avoid opening of tables. Can you show your CREATE TABLE statements?

Comment by Dan Sobczak [ 2021-02-12 ]

Hello,

Our use case is similar to the example above. We have a single authentication table as well as ~16k foreign key constraints referencing that table. Our foreign keys appear to be using the RESTRICT scenario, which as you stated above may be able to skip opening tables:

With RESTRICT we, quite possibly, don't need to open child tables. Either we can ignore them completely (but beware MDEV-21175) or only mdl-lock them without opening.

Here are some CREATE TABLE statements. What we are seeing is any UPDATE on our authentication table has slowed down considerably from 10.1.32.

Single authentication table:

CREATE TABLE `authentication` (
  `userID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `customerID` int(10) unsigned DEFAULT NULL,
  `email` varchar(50) NOT NULL,
  `verifier` varchar(64) NOT NULL,
  `name` varchar(100) NOT NULL,
  `accountStatus` tinyint(3) unsigned NOT NULL,
  `emailCode` varchar(64) DEFAULT NULL,
  `authChanged` datetime NOT NULL,
  `failures` tinyint(3) unsigned NOT NULL,
  `statusChanged` datetime NOT NULL,
  `homeServer` int(10) unsigned NOT NULL,
  `isHomeSet` tinyint(1) NOT NULL DEFAULT 0,
  `rpcKey` varchar(50) NOT NULL,
  PRIMARY KEY (`userID`),
  UNIQUE KEY `idx_authentication_email` (`email`),
  KEY `fk_authentication_customerID` (`customerID`),
  KEY `fk_authentication_homeServer` (`homeServer`),
  CONSTRAINT `fk_authentication_customerID` FOREIGN KEY (`customerID`) REFERENCES `customer` (`customerID`),
  CONSTRAINT `fk_authentication_homeServer` FOREIGN KEY (`homeServer`) REFERENCES `server` (`serverID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Access table (one of many):

CREATE TABLE `1_access` (
  `userID` int(10) unsigned NOT NULL,
  `uid` varchar(64) NOT NULL,
  `obj` varchar(8) NOT NULL,
  `granterID` int(10) unsigned NOT NULL,
  `hasEditor` tinyint(1) NOT NULL DEFAULT 0,
  `accessCreated` datetime(6) NOT NULL,
  PRIMARY KEY (`userID`,`uid`,`obj`),
  KEY `fk_1_access_granterID` (`granterID`),
  CONSTRAINT `fk_1_access_granterID` FOREIGN KEY (`granterID`) REFERENCES `authentication` (`userID`),
  CONSTRAINT `fk_1_access_userID` FOREIGN KEY (`userID`) REFERENCES `authentication` (`userID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

Comment by Dan Sobczak [ 2021-02-12 ]

Example of a slow query, "failures" is not referenced by any foreign key so this UPDATE being slow is surprising.

MariaDB [joints]> update authentication set failures=0 where userid=51;
Query OK, 0 rows affected (3.61 sec)
Rows matched: 1  Changed: 0  Warnings: 0
 
MariaDB [joints]> show profile;
+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000087 |
| checking permissions | 0.000007 |
| Opening tables       | 3.575954 |
| After opening tables | 0.000322 |
| System lock          | 0.000017 |
| Table lock           | 0.000266 |
| init                 | 0.001088 |
| updating             | 0.004479 |
| end                  | 0.000031 |
| query end            | 0.015326 |
| closing tables       | 0.000014 |
| Unlocking tables     | 0.008250 |
| freeing items        | 0.000041 |
| updating status      | 0.000062 |
| cleaning up          | 0.000094 |
+----------------------+----------+

Comment by Nikita Malyavin [ 2022-12-09 ]

Hello dsobczak, gschulman and gsmethells!

You were right, a time to open tables is not just suspicous.

I've been working on a few bugs around foreign keys, namely MDEV-30103, where I've seen one of the possible solutions in making a complete bidirectional prelocking and pre-opening.

Your report became an obstacle for that solution. I decided to find out what is taking so much time, and is it really because of ha_innobase::open taking long.
I have repeated your test setup with 12k tables. Perf reports following:

As you may see, most of the time spent was in MDL_context::find_ticket and prepare_fk_prelocking_list. I don't know how __divtf3 ended up there, it's probably some symbol missing in my debug libc build. Anyway there are two recognized functions taking 60% of time. find_ticket traverses a list of all MDL tickets, which is at least one ticket per table. prepare_fk_prelocking_list does nothing weighty except calling find_fk_prelocked_table, which also traverses a list, of opened tables this time. This means that open_tables took O(tables^2) time!

Sounds nonsense for me, but I guess we just didn't expect so many tables to open in one query, it wasn't supposed as a normal use case.
Making those two functions simply return NULL decreased a time to 0.15 seconds!

It was no big difference with or without fix on the structure of the report. It's only that find_ticket somehow took more time (according to a number of samples reported by perf) in the reverted fix case.

I have opened MDEV-30182 to implement an optimization. Such kind of work is usually only done in the major versions.

Thank you for your alarms!

Generated at Thu Feb 08 09:12:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.