Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-6142

CONNECT Engine could eat all the memory

Details

    Description

      Hi,

      I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

      CREATE TABLE `url_list_connect` (
        `id` int(10) unsigned NOT NULL,
        `url` varchar(2048) NOT NULL DEFAULT '',
        `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
        `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
        `id_license` int(10) unsigned NOT NULL DEFAULT '0',
        `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
        `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
        `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
        `script` varchar(2048) NOT NULL DEFAULT '',
        `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
      ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

      The original table is a tokudb table with a tokudb_cache_size of 12Go
      (I can see the same behaviour with an InnoDB table).

      Then I've tested the following :
      SELECT * FROM url_list_connect LIMIT 10;

      This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
      Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
      I don't see any variable like in cassandra SE which allows to control the full scan behaviour :

      "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

      To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

      Thanks and regards,
      Jocelyn Fournier

      Attachments

        Activity

          jocel1 jocelyn fournier created issue -
          jocel1 jocelyn fournier made changes -
          Field Original Value New Value
          Description Hi,

          I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

          CREATE TABLE `url_list_connect` (
            `id` int(10) unsigned NOT NULL,
            `url` varchar(2048) NOT NULL DEFAULT '',
            `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
            `id_license` int(10) unsigned NOT NULL DEFAULT '0',
            `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `script` varchar(2048) NOT NULL DEFAULT '',
            `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
          ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

          The original table is a tokudb table with a tokudb_cache_size of 12Go
          (I can see the same behaviour with an InnoDB table).

          Then I've tested the following :
          SELECT * FROM url_list_connect LIMIT 10;

          This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
          Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size)

          To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

          Thanks and regards,
            Jocelyn Fournier
          Hi,

          I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

          CREATE TABLE `url_list_connect` (
            `id` int(10) unsigned NOT NULL,
            `url` varchar(2048) NOT NULL DEFAULT '',
            `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
            `id_license` int(10) unsigned NOT NULL DEFAULT '0',
            `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `script` varchar(2048) NOT NULL DEFAULT '',
            `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
          ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

          The original table is a tokudb table with a tokudb_cache_size of 12Go
          (I can see the same behaviour with an InnoDB table).

          Then I've tested the following :
          SELECT * FROM url_list_connect LIMIT 10;

          This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
          Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
          I don't see any variable like in cassandra which allows to control the full scan behaviour :

          "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

          To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

          Thanks and regards,
            Jocelyn Fournier
          jocel1 jocelyn fournier made changes -
          Description Hi,

          I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

          CREATE TABLE `url_list_connect` (
            `id` int(10) unsigned NOT NULL,
            `url` varchar(2048) NOT NULL DEFAULT '',
            `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
            `id_license` int(10) unsigned NOT NULL DEFAULT '0',
            `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `script` varchar(2048) NOT NULL DEFAULT '',
            `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
          ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

          The original table is a tokudb table with a tokudb_cache_size of 12Go
          (I can see the same behaviour with an InnoDB table).

          Then I've tested the following :
          SELECT * FROM url_list_connect LIMIT 10;

          This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
          Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
          I don't see any variable like in cassandra which allows to control the full scan behaviour :

          "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

          To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

          Thanks and regards,
            Jocelyn Fournier
          Hi,

          I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

          CREATE TABLE `url_list_connect` (
            `id` int(10) unsigned NOT NULL,
            `url` varchar(2048) NOT NULL DEFAULT '',
            `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
            `id_license` int(10) unsigned NOT NULL DEFAULT '0',
            `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `script` varchar(2048) NOT NULL DEFAULT '',
            `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
          ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

          The original table is a tokudb table with a tokudb_cache_size of 12Go
          (I can see the same behaviour with an InnoDB table).

          Then I've tested the following :
          SELECT * FROM url_list_connect LIMIT 10;

          This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
          Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
          I don't see any variable like in cassandra SE which allows to control the full scan behaviour :

          "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

          To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

          Thanks and regards,
            Jocelyn Fournier
          elenst Elena Stepanova made changes -
          Fix Version/s 10.0.11 [ 15200 ]
          Assignee Olivier Bertrand [ bertrandop ]
          Description Hi,

          I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

          CREATE TABLE `url_list_connect` (
            `id` int(10) unsigned NOT NULL,
            `url` varchar(2048) NOT NULL DEFAULT '',
            `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
            `id_license` int(10) unsigned NOT NULL DEFAULT '0',
            `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `script` varchar(2048) NOT NULL DEFAULT '',
            `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
          ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'

          The original table is a tokudb table with a tokudb_cache_size of 12Go
          (I can see the same behaviour with an InnoDB table).

          Then I've tested the following :
          SELECT * FROM url_list_connect LIMIT 10;

          This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
          Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
          I don't see any variable like in cassandra SE which allows to control the full scan behaviour :

          "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

          To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

          Thanks and regards,
            Jocelyn Fournier
          Hi,

          I've just tested the CONNECT engine on a "big" table (24GB) on my local server :

          {code:sql}
          CREATE TABLE `url_list_connect` (
            `id` int(10) unsigned NOT NULL,
            `url` varchar(2048) NOT NULL DEFAULT '',
            `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
            `id_license_domain` int(10) unsigned NOT NULL DEFAULT '0',
            `id_license` int(10) unsigned NOT NULL DEFAULT '0',
            `dom` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `loadtime` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `gen` mediumint(8) unsigned NOT NULL DEFAULT '0',
            `script` varchar(2048) NOT NULL DEFAULT '',
            `database_time` mediumint(8) unsigned NOT NULL DEFAULT '0'
          ) ENGINE=CONNECT DEFAULT CHARSET=latin1 `table_type`=MYSQL `dbname`=test `tabname`=url_list `option_list`='user=root,host=localhost,password=xxxx'
          {code}

          The original table is a tokudb table with a tokudb_cache_size of 12Go
          (I can see the same behaviour with an InnoDB table).

          Then I've tested the following :
          {{SELECT * FROM url_list_connect LIMIT 10;}}

          This query seems to execute a full scan of the table and load the whole table in memory, before returning the 10 first results.
          Hence the mysql server finally crashes because out of memory. (even with a small innodb_buffer_pool_size / tokudb_cache_size / tmp_table_size).
          I don't see any variable like in cassandra SE which allows to control the full scan behaviour :

          "Full table scans are performed in a memory-efficient way. Cassandra SE performs a full table scan as a series of batches, each of which reads not more than cassandra_rnd_batch_size records."

          To reproduce : create a table with a size > the total memory on the server, then select without a WHERE clause.

          Thanks and regards,
            Jocelyn Fournier
          Labels connect-engine
          elenst Elena Stepanova made changes -
          Priority Critical [ 2 ] Major [ 3 ]
          serg Sergei Golubchik made changes -
          Fix Version/s 10.0.12 [ 15201 ]
          Fix Version/s 10.0.11 [ 15200 ]
          bertrandop Olivier Bertrand made changes -
          Resolution Fixed [ 1 ]
          Status Open [ 1 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow defaullt [ 39400 ] MariaDB v2 [ 43186 ]
          ratzpo Rasmus Johansson (Inactive) made changes -
          Workflow MariaDB v2 [ 43186 ] MariaDB v3 [ 62925 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 62925 ] MariaDB v4 [ 147818 ]

          People

            bertrandop Olivier Bertrand
            jocel1 jocelyn fournier
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.