[MDEV-6142] CONNECT Engine could eat all the memory Created: 2014-04-21 Updated: 2014-05-13 Resolved: 2014-05-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.0.10 |
| Fix Version/s: | 10.0.12 |
| Type: | Bug | Priority: | Major |
| Reporter: | jocelyn fournier | Assignee: | Olivier Bertrand |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | connect-engine | ||
| Description |
|
Hi, I've just tested the CONNECT engine on a "big" table (24GB) on my local server :
The original table is a tokudb table with a tokudb_cache_size of 12Go Then I've tested the following : This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results. "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records." To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause. Thanks and regards, |
| Comments |
| Comment by Olivier Bertrand [ 2014-04-22 ] | |||||||||||||||||||||||||||||||
|
Normally, CONNECT does not make a full table scan when retrieving query result. However there are exceptions: 1 - XML tables. Because the used libraries (DOMDOC or libxml2) are not used in SAX mode but in DOM mode. This means that they parse the whole document and make a memory image that is later used by CONNECT. 2 - MYSQL tables. What is done for MYSQL tables is to construct a SELECT statement specifying only the columns used in the query and adding a WHERE clause retrieved by cond_push (unlike FEDERATED that retrieve the whole table except when indexing is used) The last version of CONNECT also can use "REMOTE" indexing. Unfortunately, it seems difficult to handle the LIMIT clause. Firstly because engines have no way to now about it (except by re-parsing the original query) but chiefly because of queries like this:
The engine is called by MariaDB to return the table data, the engine not knowing for what purpose. Here CONNECT will construct the query:
(tab being the t1 source table) However, adding the LIMIT clause would be wrong because it applies to the GROUP BY result, not to the table that is grouped. Perhaps, a solution could be to retrieve the table data row by row, or by fixed number of rows as CASSANDRA does. This is a little more complicated to implement but feasible if the mysql API permit doing so. I am not sure whether or not they built a memory result completely in memory in any case. Concerning the ODBC tables, they are retrieved by CONNECT by block of size given by the BLOCK_SIZE option. Meanwhile, you can browse the first rows of a big MYSQL table using a view or a SRCDEF option that includes the LIMIT clause, for instance replacing:
by
| |||||||||||||||||||||||||||||||
| Comment by jocelyn fournier [ 2014-04-27 ] | |||||||||||||||||||||||||||||||
|
Hi Olivier, I think the best solution to avoid the crash would be to implement the same mechanism than CASSANDRA with a rows fetch by blocks. Jocelyn | |||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2014-04-27 ] | |||||||||||||||||||||||||||||||
|
Hi, Jocelyn, I don't know how CASSANDRA does this. CONNECT is using the MySQL API and there are no such things available (alike the ODBC recordset size and extended fetch) Therefore, what I did is to enable CONNECT MYSQL tables to retrieve the result via mysql_use_result instead of mysql_store_result. Apparently it does the same thing but does not allocate the whole thing in memory. My fix was pushed as revision 4159. To trigger this, use in the CREATE or ALTER statement the option HUGE, for instance:
I cannot test it, not having such big tables; let me know whether it works for you. Note that the way these API functions work, using LIMIT does not reduce the time spent to retrieve the result. It seems that even the result set is not stored, MySQL must internally go to the end of it before the connection is released. Olivier | |||||||||||||||||||||||||||||||
| Comment by jocelyn fournier [ 2014-04-27 ] | |||||||||||||||||||||||||||||||
|
Hi Olivier, I'll take a look ASAP.
Do you want me to open a new bug report for this one ? Thanks, | |||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2014-04-28 ] | |||||||||||||||||||||||||||||||
|
Hmmm... I did not think plugins had to do something about this. I thought this was handled by MySQL/MariaDB. Are you sure that this is not working only with CONNECT? If so yes, this could be a new bug report. | |||||||||||||||||||||||||||||||
| Comment by jocelyn fournier [ 2014-04-28 ] | |||||||||||||||||||||||||||||||
|
At least it's the first I see this kind of behaviour | |||||||||||||||||||||||||||||||
| Comment by Olivier Bertrand [ 2014-05-13 ] | |||||||||||||||||||||||||||||||
|
The huge=1 option solves the problem. |