Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.13
-
None
-
Amazon EC2 instance, Linux version 3.10.48-55.140.amzn1.x86_64 (mockbuild@gobi-build-60002) (gcc version 4.8.2 20131212 (Red Hat 4.8.2-7) (GCC) ) #1 SMP Wed Jul 9 23:32:19 UTC 2014
Description
The day after upgrading to 10.0.13, we found our daily ETL processes hung (we use MariaDB for a data warehouse). We have CONNECT tables that point to various databases: MySQL, DB2, and Oracle. Not all remote databases are available 24/7--we stop them during off hours to reduce our costs. Some of the remote tables are quite large and exceed 10 million rows.
We frequently query the INFORMATION_SCHEMA database during our ETL cycle. When querying information_schema.tables in particular, instead of returning instantly as it did on earlier versions of MariaDB, it would appear to hang for 30-45 minutes. I determined that if you query particular columns such as "create_time", the CONNECT engine appears to be reaching out to the remote databases to retrieve information about the underlying tables. If the remote database was unavailable, it would hang and eventually time out, and if it was available, it appeared to be getting a rowcount for the remote table. This causes the query to "hang" if a large number of CONNECT tables refer to databases that are temporarily unavailable or if the tables are very large. This was definitely true of remote MySQL tables--we downgraded back to 10.0.12 before I was able to finish testing with the ODBC connections to DB2 and Oracle.
To be clear, this behavior is occurring when querying INFORMATION_SCHEMA. I'm not actually trying to pull data from the CONNECT tables via a select.
After downgrading to 10.0.12, the behavior returned to normal, and querying INFORMATION_SCHEMA only returned whatever information had been stored locally about the remote tables.
Is there a way to disable this remote querying? Since remote connections/queries can be expensive, I would prefer to only access the underlying tables when querying the CONNECT tables directly, and not when performing simple queries of the INFORMATION_SCHEMA. I'm not sure if this new behavior was intentional, but it is undesirable for us and will limit our ability to use the CONNECT engine going forward when we eventually upgrade MariaDB. For now, we are staying on 10.0.12.