[MDEV-6142] CONNECT Engine could eat all the memory - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.0.10
Fix Version/s: 10.0.12
Component/s: None
Labels:
- connect-engine

Description

Hi,

I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

CREATE TABLE `url_list_connect` (

  `id` int(10) unsigned NOT NULL,

  `url` varchar(2048) NOT NULL DEFAULT '',

  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',

  `id_license` int(10) unsigned NOT NULL DEFAULT '0',

  `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',

  `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',

  `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',

  `script` varchar(2048) NOT NULL DEFAULT '',

  `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'

) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

The original table is a tokudb table with a tokudb_cache_size of 12Go
(I can see the same behaviour with an InnoDB table).

Then I've tested the following :
SELECT * FROM url_list_connect LIMIT 10;

This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
I don't see any variable like in cassandra SE which allows to control the full scan behaviour :

"Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

Thanks and regards,
Jocelyn Fournier

Attachments

Activity

Ascending order - Click to sort in descending order

jocelyn fournier created issue - 2014-04-21 10:18

jocelyn fournier made changes - 2014-04-21 11:19

Field	Original Value	New Value
Description	Hi, I've just tested the CONNECT engine on a "big" table (24GB) on my local server : CREATE TABLE `url_list_connect` ( `id` int(10) unsigned NOT NULL, `url` varchar(2048) NOT NULL DEFAULT '', `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0', `id_license` int(10) unsigned NOT NULL DEFAULT '0', `dom` mediumint(8) unsigned NOT NULL DEFAULT '0', `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0', `gen` mediumint(8) unsigned NOT NULL DEFAULT '0', `script` varchar(2048) NOT NULL DEFAULT '', `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0' ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx' The original table is a tokudb table with a tokudb_cache_size of 12Go (I can see the same behaviour with an InnoDB table). Then I've tested the following : SELECT * FROM url_list_connect LIMIT 10; This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results. Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size) To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause. Thanks and regards, Jocelyn Fournier	Hi, I've just tested the CONNECT engine on a "big" table (24GB) on my local server : CREATE TABLE `url_list_connect` ( `id` int(10) unsigned NOT NULL, `url` varchar(2048) NOT NULL DEFAULT '', `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0', `id_license` int(10) unsigned NOT NULL DEFAULT '0', `dom` mediumint(8) unsigned NOT NULL DEFAULT '0', `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0', `gen` mediumint(8) unsigned NOT NULL DEFAULT '0', `script` varchar(2048) NOT NULL DEFAULT '', `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0' ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx' The original table is a tokudb table with a tokudb_cache_size of 12Go (I can see the same behaviour with an InnoDB table). Then I've tested the following : SELECT * FROM url_list_connect LIMIT 10; This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results. Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size). I don't see any variable like in cassandra which allows to control the full scan behaviour : "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records." To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause. Thanks and regards, Jocelyn Fournier

jocelyn fournier made changes - 2014-04-21 11:19

Description

Hi,

I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

CREATE TABLE `url_list_connect` (
  `id` int(10) unsigned NOT NULL,
  `url` varchar(2048) NOT NULL DEFAULT '',
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
  `id_license` int(10) unsigned NOT NULL DEFAULT '0',
  `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `script` varchar(2048) NOT NULL DEFAULT '',
  `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

The original table is a tokudb table with a tokudb_cache_size of 12Go
(I can see the same behaviour with an InnoDB table).

Then I've tested the following :
SELECT * FROM url_list_connect LIMIT 10;

This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
I don't see any variable like in cassandra which allows to control the full scan behaviour :

"Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

Thanks and regards,
  Jocelyn Fournier

Hi,

I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

CREATE TABLE `url_list_connect` (
  `id` int(10) unsigned NOT NULL,
  `url` varchar(2048) NOT NULL DEFAULT '',
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
  `id_license` int(10) unsigned NOT NULL DEFAULT '0',
  `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
  `script` varchar(2048) NOT NULL DEFAULT '',
  `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

The original table is a tokudb table with a tokudb_cache_size of 12Go
(I can see the same behaviour with an InnoDB table).

Then I've tested the following :
SELECT * FROM url_list_connect LIMIT 10;

This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
I don't see any variable like in cassandra SE which allows to control the full scan behaviour :

"Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

Thanks and regards,
  Jocelyn Fournier

Elena Stepanova made changes - 2014-04-21 21:23

Fix Version/s		10.0.11 [ 15200 ]
Assignee		Olivier Bertrand [ bertrandop ]
Description	Hi, I've just tested the CONNECT engine on a "big" table (24GB) on my local server : CREATE TABLE `url_list_connect` ( `id` int(10) unsigned NOT NULL, `url` varchar(2048) NOT NULL DEFAULT '', `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0', `id_license` int(10) unsigned NOT NULL DEFAULT '0', `dom` mediumint(8) unsigned NOT NULL DEFAULT '0', `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0', `gen` mediumint(8) unsigned NOT NULL DEFAULT '0', `script` varchar(2048) NOT NULL DEFAULT '', `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0' ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx' The original table is a tokudb table with a tokudb_cache_size of 12Go (I can see the same behaviour with an InnoDB table). Then I've tested the following : SELECT * FROM url_list_connect LIMIT 10; This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results. Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size). I don't see any variable like in cassandra SE which allows to control the full scan behaviour : "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records." To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause. Thanks and regards, Jocelyn Fournier	Hi, I've just tested the CONNECT engine on a "big" table (24GB) on my local server : {code:sql} CREATE TABLE `url_list_connect` ( `id` int(10) unsigned NOT NULL, `url` varchar(2048) NOT NULL DEFAULT '', `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0', `id_license` int(10) unsigned NOT NULL DEFAULT '0', `dom` mediumint(8) unsigned NOT NULL DEFAULT '0', `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0', `gen` mediumint(8) unsigned NOT NULL DEFAULT '0', `script` varchar(2048) NOT NULL DEFAULT '', `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0' ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx' {code} The original table is a tokudb table with a tokudb_cache_size of 12Go (I can see the same behaviour with an InnoDB table). Then I've tested the following : {{SELECT * FROM url_list_connect LIMIT 10;}} This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results. Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size). I don't see any variable like in cassandra SE which allows to control the full scan behaviour : "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records." To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause. Thanks and regards, Jocelyn Fournier
Labels		connect-engine

Elena Stepanova made changes - 2014-04-29 16:31

Priority

Critical [ 2 ]

Major [ 3 ]

Sergei Golubchik made changes - 2014-05-10 23:12

Fix Version/s		10.0.12 [ 15201 ]
Fix Version/s	10.0.11 [ 15200 ]

Olivier Bertrand made changes - 2014-05-13 00:57

Resolution		Fixed [ 1 ]
Status	Open [ 1 ]	Closed [ 6 ]

Sergei Golubchik made changes - 2014-06-13 15:06

Workflow

defaullt [ 39400 ]

MariaDB v2 [ 43186 ]

Rasmus Johansson (Inactive) made changes - 2015-05-18 17:51

Workflow

MariaDB v2 [ 43186 ]

MariaDB v3 [ 62925 ]

Sergei Golubchik made changes - 2021-12-06 21:40

Workflow

MariaDB v3 [ 62925 ]

MariaDB v4 [ 147818 ]

People

Assignee:: Olivier Bertrand

Reporter:: jocelyn fournier

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2014-04-21 10:18

Updated:: 2014-05-13 00:57

Resolved:: 2014-05-13 00:57

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Activity

People

Dates

Git Integration