Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34015

Copying condition from WHERE to JOIN makes query 2 times faster

Details

    Description

      Not sure if that is a bug or feature request, leaving it as a bug so far.

      Connectors ODBC, JDBC and probably C++ use the following query for API functions returning foreign keys info. With growing number of schemas this query tends to become noticeably slow.

      MariaDB [xxx127]> SELECT A.REFERENCED_TABLE_SCHEMA PKTABLE_CAT, NULL PKTABLE_SCHEM, A.REFERENCED_TABLE_NAME PKTABLE_NAME, A.REFERENCED_COLUMN_NAME PKCOLUMN_NAME, A.TABLE_SCHEMA FKTABLE_CAT, NULL FKTABLE_SCHEM, A.TABLE_NAME FKTABLE_NAME, A.COLUMN_NAME FKCOLUMN_NAME, A.POSITION_IN_UNIQUE_CONSTRAINT KEY_SEQ, CASE update_rule   WHEN 'RESTRICT' THEN 1  WHEN 'NO ACTION' THEN 3  WHEN 'CASCADE' THEN 0  WHEN 'SET NULL' THEN 2  WHEN 'SET DEFAULT' THEN 4 END UPDATE_RULE, CASE DELETE_RULE  WHEN 'RESTRICT' THEN 1  WHEN 'NO ACTION' THEN 3  WHEN 'CASCADE' THEN 0  WHEN 'SET NULL' THEN 2  WHEN 'SET DEFAULT' THEN 4  END DELETE_RULE,A.CONSTRAINT_NAME FK_NAME, 'PRIMARY' PK_NAME,7 AS DEFERRABILITY
      FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE B ON (B.TABLE_SCHEMA = A.REFERENCED_TABLE_SCHEMA AND B.TABLE_NAME = A.REFERENCED_TABLE_NAME AND B.COLUMN_NAME = A.REFERENCED_COLUMN_NAME AND B.CONSTRAINT_NAME= 'PRIMARY') JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC ON (RC.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = A.TABLE_SCHEMA)
      WHERE A.TABLE_SCHEMA=DATABASE()  AND A.TABLE_NAME='Invoice' ORDER BY FKTABLE_CAT, FKTABLE_SCHEM, FKTABLE_NAME, KEY_SEQ, PKTABLE_NAME;
      +-------------+---------------+--------------+---------------+-------------+---------------+--------------+---------------+---------+-------------+-------------+----------------------+---------+---------------+
      | PKTABLE_CAT | PKTABLE_SCHEM | PKTABLE_NAME | PKCOLUMN_NAME | FKTABLE_CAT | FKTABLE_SCHEM | FKTABLE_NAME | FKCOLUMN_NAME | KEY_SEQ | UPDATE_RULE | DELETE_RULE | FK_NAME              | PK_NAME | DEFERRABILITY |
      +-------------+---------------+--------------+---------------+-------------+---------------+--------------+---------------+---------+-------------+-------------+----------------------+---------+---------------+
      | xxx127      |          NULL | Customer     | CustomerId    | xxx127      |          NULL | invoice      | CustomerId    |       1 |           3 |           3 | FK_InvoiceCustomerId | PRIMARY |             7 |
      +-------------+---------------+--------------+---------------+-------------+---------------+--------------+---------------+---------+-------------+-------------+----------------------+---------+---------------+
      1 row in set (4.302 sec)
      

      Depending on function parameter combination we can have values either for A.TABLE_NAME and A.TABLE_SCHEMA or for A.REFERENCED_TABLE_SCHEMA and A.REFERENCED_TABLE_SCHEMA, or for both. In the query above we have it TABLE_NAME and TABLE_SCHEMA. And for schemas it can be either concrete value or current schema - it doesn't matter in our case, but in the example above it is current schema, i.e. WHERE contains "A.TABLE_SCHEMA=DATABASE() AND A.TABLE_NAME='Invoice'". One of tables JOIN condition has "AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = A.TABLE_SCHEMA"
      If we change it as "AND RC.TABLE_NAME = 'Invoice' AND RC.CONSTRAINT_SCHEMA = DATABASE()" as int he example below, the query will run ~2 times faster

      MariaDB [xxx127]> SELECT A.REFERENCED_TABLE_SCHEMA PKTABLE_CAT, NULL PKTABLE_SCHEM, A.REFERENCED_TABLE_NAME PKTABLE_NAME, A.REFERENCED_COLUMN_NAME PKCOLUMN_NAME, A.TABLE_SCHEMA FKTABLE_CAT, NULL FKTABLE_SCHEM, A.TABLE_NAME FKTABLE_NAME, A.COLUMN_NAME FKCOLUMN_NAME, A.POSITION_IN_UNIQUE_CONSTRAINT KEY_SEQ, CASE update_rule   WHEN 'RESTRICT' THEN 1  WHEN 'NO ACTION' THEN 3  WHEN 'CASCADE' THEN 0  WHEN 'SET NULL' THEN 2  WHEN 'SET DEFAULT' THEN 4 END UPDATE_RULE, CASE DELETE_RULE  WHEN 'RESTRICT' THEN 1  WHEN 'NO ACTION' THEN 3  WHEN 'CASCADE' THEN 0  WHEN 'SET NULL' THEN 2  WHEN 'SET DEFAULT' THEN 4  END DELETE_RULE,A.CONSTRAINT_NAME FK_NAME, 'PRIMARY' PK_NAME,7 AS DEFERRABILITY
      FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.KEY_COLUMN_USAGE B ON (B.TABLE_SCHEMA = A.REFERENCED_TABLE_SCHEMA AND B.TABLE_NAME = A.REFERENCED_TABLE_NAME AND B.COLUMN_NAME = A.REFERENCED_COLUMN_NAME AND B.CONSTRAINT_NAME= 'PRIMARY') JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC ON (RC.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = DATABASE())
      WHERE A.TABLE_SCHEMA=DATABASE()  AND A.TABLE_NAME='Invoice' ORDER BY FKTABLE_CAT, FKTABLE_SCHEM, FKTABLE_NAME, KEY_SEQ, PKTABLE_NAME;
      +-------------+---------------+--------------+---------------+-------------+---------------+--------------+---------------+---------+-------------+-------------+----------------------+---------+---------------+
      | PKTABLE_CAT | PKTABLE_SCHEM | PKTABLE_NAME | PKCOLUMN_NAME | FKTABLE_CAT | FKTABLE_SCHEM | FKTABLE_NAME | FKCOLUMN_NAME | KEY_SEQ | UPDATE_RULE | DELETE_RULE | FK_NAME              | PK_NAME | DEFERRABILITY |
      +-------------+---------------+--------------+---------------+-------------+---------------+--------------+---------------+---------+-------------+-------------+----------------------+---------+---------------+
      | xxx127      |          NULL | Customer     | CustomerId    | xxx127      |          NULL | invoice      | CustomerId    |       1 |           3 |           3 | FK_InvoiceCustomerId | PRIMARY |             7 |
      +-------------+---------------+--------------+---------------+-------------+---------------+--------------+---------------+---------+-------------+-------------+----------------------+---------+---------------+
      1 row in set (1.921 sec)
      

      The other table is join'ed using "A.REFERENCED_TABLE_SCHEMA AND B.TABLE_NAME = A.REFERENCED_TABLE_NAME". If both parameters are given, and to move the condition in the same way, the query will run in no time.
      My query times here are on the server with ~300 schemas

      Attachments

        Issue Links

          Activity

            I've seen the same effect with only copying of the schema condition to the JOIN(without table name part), and I've seen the same effect with only 2nd parameter, i.e. for referenced_schema/table_name, but it was smaller, but noticeable and l worthy.

            Lawrin Lawrin Novitsky added a comment - I've seen the same effect with only copying of the schema condition to the JOIN(without table name part), and I've seen the same effect with only 2nd parameter, i.e. for referenced_schema/table_name, but it was smaller, but noticeable and l worthy.
            Lawrin Lawrin Novitsky added a comment - - edited

            Maybe even "causes" MDEV-28410. At least partly.

            Lawrin Lawrin Novitsky added a comment - - edited Maybe even "causes" MDEV-28410 . At least partly.

            The important part here is

            SELECT * FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC ON (RC.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = A.TABLE_SCHEMA)
            WHERE A.TABLE_SCHEMA=DATABASE()  AND A.TABLE_NAME='Invoice';
            

            vs

            SELECT * FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC ON (RC.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = DATABASE())
            WHERE A.TABLE_SCHEMA=DATABASE()  AND A.TABLE_NAME='Invoice';
            

            because apparently both create_schema_table() and get_all_tables() are called before constant propagation, so REFERENTIAL_CONSTRAINTS cannot optimize table access.

            serg Sergei Golubchik added a comment - The important part here is SELECT * FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC ON (RC.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = A.TABLE_SCHEMA) WHERE A.TABLE_SCHEMA= DATABASE () AND A.TABLE_NAME= 'Invoice' ; vs SELECT * FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE A JOIN INFORMATION_SCHEMA.REFERENTIAL_CONSTRAINTS RC ON (RC.CONSTRAINT_NAME = A.CONSTRAINT_NAME AND RC.TABLE_NAME = A.TABLE_NAME AND RC.CONSTRAINT_SCHEMA = DATABASE ()) WHERE A.TABLE_SCHEMA= DATABASE () AND A.TABLE_NAME= 'Invoice' ; because apparently both create_schema_table() and get_all_tables() are called before constant propagation, so REFERENTIAL_CONSTRAINTS cannot optimize table access.

            People

              psergei Sergei Petrunia
              Lawrin Lawrin Novitsky
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.