[MDEV-6612] New CONNECT engine causes some queries to hang - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.13
Fix Version/s: 10.0.14
Component/s: None
Labels:
- connect-engine
Environment:
Amazon EC2 instance, Linux version 3.10.48-55.140.amzn1.x86_64 (mockbuild@gobi-build-60002) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #1 SMP Wed Jul 9 23:32:19 UTC 2014

Description

The day after upgrading to 10.0.13, we found our daily ETL processes hung (we use MariaDB for a data warehouse). We have CONNECT tables that point to various databases: MySQL, DB2, and Oracle. Not all remote databases are available 24/7--we stop them during off hours to reduce our costs. Some of the remote tables are quite large and exceed 10 million rows.

We frequently query the INFORMATION_SCHEMA database during our ETL cycle. When querying information_schema.tables in particular, instead of returning instantly as it did on earlier versions of MariaDB, it would appear to hang for 30-45 minutes. I determined that if you query particular columns such as "create_time", the CONNECT engine appears to be reaching out to the remote databases to retrieve information about the underlying tables. If the remote database was unavailable, it would hang and eventually time out, and if it was available, it appeared to be getting a rowcount for the remote table. This causes the query to "hang" if a large number of CONNECT tables refer to databases that are temporarily unavailable or if the tables are very large. This was definitely true of remote MySQL tables--we downgraded back to 10.0.12 before I was able to finish testing with the ODBC connections to DB2 and Oracle.

To be clear, this behavior is occurring when querying INFORMATION_SCHEMA. I'm not actually trying to pull data from the CONNECT tables via a select.

After downgrading to 10.0.12, the behavior returned to normal, and querying INFORMATION_SCHEMA only returned whatever information had been stored locally about the remote tables.

Is there a way to disable this remote querying? Since remote connections/queries can be expensive, I would prefer to only access the underlying tables when querying the CONNECT tables directly, and not when performing simple queries of the INFORMATION_SCHEMA. I'm not sure if this new behavior was intentional, but it is undesirable for us and will limit our ability to use the CONNECT engine going forward when we eventually upgrade MariaDB. For now, we are staying on 10.0.12.

Attachments

Activity

Ascending order - Click to sort in descending order

Olivier Bertrand added a comment - 2014-08-22 03:12 - edited

Indeed, I recently modified the way some information about tables was returned in order to have more precise values. In particular, the number of records info may be long to retrieve on some large tables. However, this happens only when a fast way to get it is not possible. Normally, retrieving this info is fast for all fixed record size table types (FIX, BIN, DBF,and VEC)
It is also fast for variable record length tables (DOS, CSV, FMT) when they are indexed because the table size can be retrieved from an index. The only long case are variable record length tables when they are not indexed.
For remote tables, to get the number of records info, CONNECT sends

select count(*) from remtab

to the remote server and some ODBC data source have not a fast way to return that (before, I was just returning 10)
Now, to handle your problems, I must have some examples of INFORMATION_SCHEMA queries that take a long time to be sure to find when and where there is a performance issue to fix. Then I could for instance add a CONNECT global variable allowing to tell whether or not one needs to have exact values when querying for table info or if just an estimate is enough.

Olivier Bertrand added a comment - 2014-08-22 03:12 - edited Indeed, I recently modified the way some information about tables was returned in order to have more precise values. In particular, the number of records info may be long to retrieve on some large tables. However, this happens only when a fast way to get it is not possible. Normally, retrieving this info is fast for all fixed record size table types (FIX, BIN, DBF,and VEC) It is also fast for variable record length tables (DOS, CSV, FMT) when they are indexed because the table size can be retrieved from an index. The only long case are variable record length tables when they are not indexed. For remote tables, to get the number of records info, CONNECT sends select count(*) from remtab to the remote server and some ODBC data source have not a fast way to return that (before, I was just returning 10) Now, to handle your problems, I must have some examples of INFORMATION_SCHEMA queries that take a long time to be sure to find when and where there is a performance issue to fix. Then I could for instance add a CONNECT global variable allowing to tell whether or not one needs to have exact values when querying for table info or if just an estimate is enough.

Scott Weisgarber added a comment - 2014-08-22 16:45

The 1st query that burned us looks at the information_schema to see if any of our source or target tables have changed since the previous ETL cycle, and will require our dynamic stored procedures to be rebuilt. Because of the nature of the join, it requires every row from "tables" to be returned, and I believe MariaDB is executing the "select count" for each row.

select ds.base_cdc_name, ds.cmd_line_args, ds.destination_table

  from util.deploy_specs ds

       join information_schema.tables inf

          on (ds.destination_table =

                 concat(inf.table_schema, '.', inf.table_name))

 where     ds.last_change_to_destination_ddl < inf.create_time;

Although we no longer have 10.0.13 installed, I believe any query that retrieves a row from "tables" will trigger a "select count", even if I'm not interested in the rowcount of the table. I know the query below was extremely slow before we downgraded to 10.0.12.

select table_schema, table_name, create_time from information_schema.tables;

I understand the desire to return more accurate results, but indiscriminantly running a "select count" seems dangerous from a performance point of view. You don't find many articles on the web that discuss data warehousing and MySQL, so maybe the kinds of large tables that I deal with are a rarity in the MySQL community. Still, even if you have a lot of small CONNECT tables, a bunch of "select count" calls could add up.

I'd argue that reaching out into a remote database at all for an information_schema query (even when querying the index) may be undesirable. Many companies are moving portions of their infrastructure to the cloud, and the way you make the costs work is by shutting down servers when you don't need them. I know when my remote servers are available and can schedule my ETL processes to query those servers at the appropriate times. But if I look at the information schema when the servers are down, the queries will presumably hang and time out after a while.

I suppose I'm coming to this from an Oracle background, where similar functionality is achieved using database links, and calls to remote databases are not made unless you specifically query them. Table statistics from the system tables are normally cached from prior analyze statements and are usually only estimates.

A workaround could be for me to move all CONNECT tables into their own schema(s), and explicity skip those schemas when querying some on the information_schema tables. I'm not sure if adding a "where table_name != 'my_connect_schema'" would alleviate the problem.

Scott Weisgarber added a comment - 2014-08-22 16:45 The 1st query that burned us looks at the information_schema to see if any of our source or target tables have changed since the previous ETL cycle, and will require our dynamic stored procedures to be rebuilt. Because of the nature of the join, it requires every row from "tables" to be returned, and I believe MariaDB is executing the "select count" for each row. select ds.base_cdc_name, ds.cmd_line_args, ds.destination_table from util.deploy_specs ds join information_schema.tables inf on (ds.destination_table = concat(inf.table_schema, '.' , inf.table_name)) where ds.last_change_to_destination_ddl < inf.create_time; Although we no longer have 10.0.13 installed, I believe any query that retrieves a row from "tables" will trigger a "select count", even if I'm not interested in the rowcount of the table. I know the query below was extremely slow before we downgraded to 10.0.12. select table_schema, table_name, create_time from information_schema.tables; I understand the desire to return more accurate results, but indiscriminantly running a "select count" seems dangerous from a performance point of view. You don't find many articles on the web that discuss data warehousing and MySQL, so maybe the kinds of large tables that I deal with are a rarity in the MySQL community. Still, even if you have a lot of small CONNECT tables, a bunch of "select count" calls could add up. I'd argue that reaching out into a remote database at all for an information_schema query (even when querying the index) may be undesirable. Many companies are moving portions of their infrastructure to the cloud, and the way you make the costs work is by shutting down servers when you don't need them. I know when my remote servers are available and can schedule my ETL processes to query those servers at the appropriate times. But if I look at the information schema when the servers are down, the queries will presumably hang and time out after a while. I suppose I'm coming to this from an Oracle background, where similar functionality is achieved using database links, and calls to remote databases are not made unless you specifically query them. Table statistics from the system tables are normally cached from prior analyze statements and are usually only estimates. A workaround could be for me to move all CONNECT tables into their own schema(s), and explicity skip those schemas when querying some on the information_schema tables. I'm not sure if adding a "where table_name != 'my_connect_schema'" would alleviate the problem.

Olivier Bertrand added a comment - 2014-08-22 18:14 - edited

I have added a new CONNECT global variable connect_exact_info. One can set it by:

set @@global.connect_exact_info={ON | OFF};

It is OFF by default meaning that the new version will behave like 10.0.12 and can be set to ON to work like 10.0.13.

Olivier Bertrand added a comment - 2014-08-22 18:14 - edited I have added a new CONNECT global variable connect_exact_info. One can set it by: set @@global.connect_exact_info={ON | OFF}; It is OFF by default meaning that the new version will behave like 10.0.12 and can be set to ON to work like 10.0.13.

Scott Weisgarber added a comment - 2014-08-25 21:35

Thank you--this will do the trick.

Scott Weisgarber added a comment - 2014-08-25 21:35 Thank you--this will do the trick.

MariaDB Server

New CONNECT engine causes some queries to hang

Details

Description

Attachments

Activity

People

Dates

Git Integration