Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30508

Can't connect to MySQL server -- server out of memory

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.10
    • None
    • None
    • Linux/Windows

    Description

      I regularly need to copy "big" tables from a MySQL 5.7 server (Windows) to a MariaDB 10.6.10 server (Linux). I am trying to implement this using CONNECT tables defined on the MariaDB server reaching out via CONNECT tables to the MySQL server, doing a simple insert into linux_table select * from mysql_table_connect, where mysql_table_connect uses the CONNECT engine to access the table on the MySQL server.
      This generally works fine for tables up to several millions of rows, but it fails consistently for tables containing more than 14 or so millions of rows.
      First, I get a message Lost connection to MySQL server during query. On trying to establish a new connection to the Linux server, I then get Can't connect to MySQL server on '...', where ... is the name of the Linux server.
      My monitoring system tells me at about the same time that the Linux server is running out of memory.

      Attachments

        Activity

          danblack Daniel Black added a comment -

          Can you include a SHOW CREATE TABLE of the source table, don't care if you change the field names, to give a rough indication of row size?

          danblack Daniel Black added a comment - Can you include a SHOW CREATE TABLE of the source table, don't care if you change the field names, to give a rough indication of row size?
          gwselke Gisbert W. Selke added a comment - - edited

          Sure – see below!
          (And thank you for the very fast answer, I really appreciate this!)
          I should still explicitly say that the underlying table is, as you see below, partitioned. The copying process creates, however, one target table per partition and also one connection per partition to copy. The copying process then uses WHERE clauses in order always to copy only one partition. (This is the result of refinements when I discovered that copying the whole table would crash. Unfortunetely, it still does...) Each of the partitions contains the afore-mentioned roughly 15 million rows. The table as a whole (all partitions together) contains roughly 4.5 billion rows.

          So the statement I actually use for copying has this structure:
          insert into target_table select * from connected_table where myyear=2020 and period=5

          (For the case described above where the copying process works, those tables contain roughly 1/3 as much.)

          CREATE TABLE `atcdetail` (
            `myyear` year(4) NOT NULL DEFAULT '0000',
            `period` tinyint(2) unsigned NOT NULL DEFAULT '0',
            `k1` smallint(3) unsigned NOT NULL DEFAULT '0',
            `kv` tinyint(3) unsigned NOT NULL DEFAULT '0',
            `a1` char(7) COLLATE latin1_german1_ci NOT NULL DEFAULT '',
            `a2` tinyint(1) unsigned NOT NULL DEFAULT '0',
            `v2` smallint(5) unsigned NOT NULL DEFAULT '0',
            `a3` tinyint(2) unsigned NOT NULL DEFAULT '0',
            `a4` tinyint(2) unsigned NOT NULL DEFAULT '15',
            `s1` tinyint(1) unsigned NOT NULL DEFAULT '0',
            `vo` double(15,3) NOT NULL DEFAULT '0.000',
            `b1` double(15,2) NOT NULL DEFAULT '0.00',
            `n1` double(15,2) NOT NULL DEFAULT '0.00',
            `z1` double(15,2) NOT NULL DEFAULT '0.00',
            `d1` double(15,3) NOT NULL DEFAULT '0.000',
            `v1` double(15,2) NOT NULL DEFAULT '0.00',
            `h1` double(15,2) NOT NULL DEFAULT '0.00',
            PRIMARY KEY (`myyear`,`period`,`kv`,`a1`,`a2`,`v2`,`a3`,`a4`,`s1`)
          ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_german1_ci MAX_ROWS=150000000
          /*!50100 PARTITION BY RANGE ( myyear)
          SUBPARTITION BY HASH ( period mod 17 + (period div 20 - period div 30)*10)
          SUBPARTITIONS 17
          (PARTITION p2001 VALUES LESS THAN (2002) ENGINE = InnoDB,
           PARTITION p2002 VALUES LESS THAN (2003) ENGINE = InnoDB,
           PARTITION p2003 VALUES LESS THAN (2004) ENGINE = InnoDB,
           PARTITION p2004 VALUES LESS THAN (2005) ENGINE = InnoDB,
           PARTITION p2005 VALUES LESS THAN (2006) ENGINE = InnoDB,
           PARTITION p2006 VALUES LESS THAN (2007) ENGINE = InnoDB,
           PARTITION p2007 VALUES LESS THAN (2008) ENGINE = InnoDB,
           PARTITION p2008 VALUES LESS THAN (2009) ENGINE = InnoDB,
           PARTITION p2009 VALUES LESS THAN (2010) ENGINE = InnoDB,
           PARTITION p2010 VALUES LESS THAN (2011) ENGINE = InnoDB,
           PARTITION p2011 VALUES LESS THAN (2012) ENGINE = InnoDB,
           PARTITION p2012 VALUES LESS THAN (2013) ENGINE = InnoDB,
           PARTITION p2013 VALUES LESS THAN (2014) ENGINE = InnoDB,
           PARTITION p2014 VALUES LESS THAN (2015) ENGINE = InnoDB,
           PARTITION p2015 VALUES LESS THAN (2016) ENGINE = InnoDB,
           PARTITION p2016 VALUES LESS THAN (2017) ENGINE = InnoDB,
           PARTITION p2017 VALUES LESS THAN (2018) ENGINE = InnoDB,
           PARTITION p2018 VALUES LESS THAN (2019) ENGINE = InnoDB,
           PARTITION p2019 VALUES LESS THAN (2020) ENGINE = InnoDB,
           PARTITION p2020 VALUES LESS THAN (2021) ENGINE = InnoDB,
           PARTITION p2021 VALUES LESS THAN (2022) ENGINE = InnoDB,
           PARTITION p2022 VALUES LESS THAN (2023) ENGINE = InnoDB,
           PARTITION p2023 VALUES LESS THAN (2024) ENGINE = InnoDB) */
          

          gwselke Gisbert W. Selke added a comment - - edited Sure – see below! (And thank you for the very fast answer, I really appreciate this!) I should still explicitly say that the underlying table is, as you see below, partitioned. The copying process creates, however, one target table per partition and also one connection per partition to copy. The copying process then uses WHERE clauses in order always to copy only one partition. (This is the result of refinements when I discovered that copying the whole table would crash. Unfortunetely, it still does...) Each of the partitions contains the afore-mentioned roughly 15 million rows. The table as a whole (all partitions together) contains roughly 4.5 billion rows. So the statement I actually use for copying has this structure: insert into target_table select * from connected_table where myyear=2020 and period=5 (For the case described above where the copying process works, those tables contain roughly 1/3 as much.) CREATE TABLE `atcdetail` ( `myyear` year (4) NOT NULL DEFAULT '0000' , `period` tinyint(2) unsigned NOT NULL DEFAULT '0' , `k1` smallint (3) unsigned NOT NULL DEFAULT '0' , `kv` tinyint(3) unsigned NOT NULL DEFAULT '0' , `a1` char (7) COLLATE latin1_german1_ci NOT NULL DEFAULT '' , `a2` tinyint(1) unsigned NOT NULL DEFAULT '0' , `v2` smallint (5) unsigned NOT NULL DEFAULT '0' , `a3` tinyint(2) unsigned NOT NULL DEFAULT '0' , `a4` tinyint(2) unsigned NOT NULL DEFAULT '15' , `s1` tinyint(1) unsigned NOT NULL DEFAULT '0' , `vo` double (15,3) NOT NULL DEFAULT '0.000' , `b1` double (15,2) NOT NULL DEFAULT '0.00' , `n1` double (15,2) NOT NULL DEFAULT '0.00' , `z1` double (15,2) NOT NULL DEFAULT '0.00' , `d1` double (15,3) NOT NULL DEFAULT '0.000' , `v1` double (15,2) NOT NULL DEFAULT '0.00' , `h1` double (15,2) NOT NULL DEFAULT '0.00' , PRIMARY KEY (`myyear`,`period`,`kv`,`a1`,`a2`,`v2`,`a3`,`a4`,`s1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE =latin1_german1_ci MAX_ROWS=150000000 /*!50100 PARTITION BY RANGE ( myyear) SUBPARTITION BY HASH ( period mod 17 + (period div 20 - period div 30)*10) SUBPARTITIONS 17 (PARTITION p2001 VALUES LESS THAN (2002) ENGINE = InnoDB, PARTITION p2002 VALUES LESS THAN (2003) ENGINE = InnoDB, PARTITION p2003 VALUES LESS THAN (2004) ENGINE = InnoDB, PARTITION p2004 VALUES LESS THAN (2005) ENGINE = InnoDB, PARTITION p2005 VALUES LESS THAN (2006) ENGINE = InnoDB, PARTITION p2006 VALUES LESS THAN (2007) ENGINE = InnoDB, PARTITION p2007 VALUES LESS THAN (2008) ENGINE = InnoDB, PARTITION p2008 VALUES LESS THAN (2009) ENGINE = InnoDB, PARTITION p2009 VALUES LESS THAN (2010) ENGINE = InnoDB, PARTITION p2010 VALUES LESS THAN (2011) ENGINE = InnoDB, PARTITION p2011 VALUES LESS THAN (2012) ENGINE = InnoDB, PARTITION p2012 VALUES LESS THAN (2013) ENGINE = InnoDB, PARTITION p2013 VALUES LESS THAN (2014) ENGINE = InnoDB, PARTITION p2014 VALUES LESS THAN (2015) ENGINE = InnoDB, PARTITION p2015 VALUES LESS THAN (2016) ENGINE = InnoDB, PARTITION p2016 VALUES LESS THAN (2017) ENGINE = InnoDB, PARTITION p2017 VALUES LESS THAN (2018) ENGINE = InnoDB, PARTITION p2018 VALUES LESS THAN (2019) ENGINE = InnoDB, PARTITION p2019 VALUES LESS THAN (2020) ENGINE = InnoDB, PARTITION p2020 VALUES LESS THAN (2021) ENGINE = InnoDB, PARTITION p2021 VALUES LESS THAN (2022) ENGINE = InnoDB, PARTITION p2022 VALUES LESS THAN (2023) ENGINE = InnoDB, PARTITION p2023 VALUES LESS THAN (2024) ENGINE = InnoDB) */
          gwselke Gisbert W. Selke added a comment - - edited

          Here is some more detail:
          I have checked memory allocation as per information_schema.processlist. It seems that a huge portion of memory is assigned by the process doing the copying. For the example given above, which has a record size of roughly 75 bytes, and for 12 million rows, the memory allocated is 2 GB.
          For a different, wider table with a row size of about 340 bytes, and 10 million rows, the memory is 10 GB. Same table with 20 million rows: 20 GB.
          So it seems the memory grows linearly with the number of rows to copy, and probably also depends on row size.
          It is not clear to me whether the (binary) row size is relevant here, or a textual representation.
          The only reason for this behaviour that I can imagine is that all the contents of the CONNECTed table are copied to memory before being written to the table. (I am not sure why this should be so.)
          In any case, it would be good if there were some hint on how to estimate memory usage for a given table structure and a given number of rows in order to prevent out of memory conditions with the server not answering any more.
          In addition, it would be good to have some mechanism hat avoids crashing in favour of swapping (which is also undesirable, but still the better alternative.)
          I am also not sure whether there is a way of configuring the memory used by CONNECT. There is a variable called connect_work_size, which in my case has its default value (2^26 bytes), but this is way less than the allocated memory quoted above, so this variable seems to have a different purpose.
          There is also a variable called connect_use_tempfile, which in my case has its default value of AUTO, but this also seems to serve a different purpose. (Or does it?)

          gwselke Gisbert W. Selke added a comment - - edited Here is some more detail: I have checked memory allocation as per information_schema.processlist. It seems that a huge portion of memory is assigned by the process doing the copying. For the example given above, which has a record size of roughly 75 bytes, and for 12 million rows, the memory allocated is 2 GB. For a different, wider table with a row size of about 340 bytes, and 10 million rows, the memory is 10 GB. Same table with 20 million rows: 20 GB. So it seems the memory grows linearly with the number of rows to copy, and probably also depends on row size. It is not clear to me whether the (binary) row size is relevant here, or a textual representation. The only reason for this behaviour that I can imagine is that all the contents of the CONNECTed table are copied to memory before being written to the table. (I am not sure why this should be so.) In any case, it would be good if there were some hint on how to estimate memory usage for a given table structure and a given number of rows in order to prevent out of memory conditions with the server not answering any more. In addition, it would be good to have some mechanism hat avoids crashing in favour of swapping (which is also undesirable, but still the better alternative.) I am also not sure whether there is a way of configuring the memory used by CONNECT. There is a variable called connect_work_size , which in my case has its default value (2^26 bytes), but this is way less than the allocated memory quoted above, so this variable seems to have a different purpose. There is also a variable called connect_use_tempfile , which in my case has its default value of AUTO, but this also seems to serve a different purpose. (Or does it?)

          People

            Unassigned Unassigned
            gwselke Gisbert W. Selke
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.