Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14460

Memory leak with only SELECT statements

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.2.9
    • 10.2.15
    • None
    • Debian 8. 16 GB RAM. 8 CPU (Xeon E5-1620 v2).

    Description

      The same IO-bound scenario was run on 2 identical machines, one with RocksDB and the other with TokuDB. In a Zope application (ERP5), using NEO/MariaDB as storage, we filled the disks with a few data streams. The memory leak happened when reading a single whole data stream.

      The result is that ERP5/NEO does 2 queries in loop. Here is an example from general log:

      SELECT tid, compression, data.hash, value, value_tid FROM obj FORCE INDEX(`partition`) LEFT JOIN data ON (obj.data_id = data.id) WHERE `partition` = 6 AND oid = 94294686 AND tid < 271213171275484455 ORDER BY tid DESC LIMIT 1
      SELECT tid FROM obj FORCE INDEX(`partition`) WHERE `partition`=6 AND oid=94294686 AND tid>271177614835290848 ORDER BY tid LIMIT 1
      

      More about NEO's architecture is described at https://www.nexedi.com/blog/NXD-Document.Blog.Optimising.MariaDB.Big.Data

      The data stream is 1.5TB uncompressed, taking about 650MB on-disk. RocksDB got OOM-killed when 1TB was read. TokuDB respected its memory constraints until the end.

      Each machine has 2 disks of 2TB, each one with a MariaDB DB that holds half of the Zope DB. IOW, we have 2 RocksDB databases of about 2TB. 8GB of RAM for the 2 mysqld (rocksdb_block_cache_size = 2G).

      Attachments

        Issue Links

          Activity

            Sorry, for the confusion. I edited my first reply to remove parts that aren't pertinent anymore. Can you please check the svg I reuploaded ? All addresses are resolved.

            jmuchemb Julien Muchembled added a comment - Sorry, for the confusion. I edited my first reply to remove parts that aren't pertinent anymore. Can you please check the svg I reuploaded ? All addresses are resolved.

            (jmuchemb, thanks for the re-uploaded file. It helped)

            Discussed this with Herman Lee.

            Discussion results:

            • The memory is consumed by rocksdb::TransactionBaseImpl::SetSavePoint
            • Debugging, we can see that this scenario

            begin;
            select #1;
            select #2;
            ... 
            select #n;
            

            will create a new savepoint for each select. A transaction doing lots of select will end up using a lot of memory.

            This can be fixed, for example, by not creating a savepoint when the statement is a SELECT statement. We can see this in ha_rocksdb::external_lock() call.

            That call is done for each table, for statements that do both reads and writes (e.g. INSERT ... SELECT), the first call is the external_lock(F_WRLCK) for the table that we're going to write to. Note: if we haven't allocated a savepoint at statement start, we should not try to remove it at statement end.

            psergei Sergei Petrunia added a comment - ( jmuchemb , thanks for the re-uploaded file. It helped) Discussed this with Herman Lee. Discussion results: The memory is consumed by rocksdb::TransactionBaseImpl::SetSavePoint Debugging, we can see that this scenario begin; select #1; select #2; ... select #n; will create a new savepoint for each select. A transaction doing lots of select will end up using a lot of memory. This can be fixed, for example, by not creating a savepoint when the statement is a SELECT statement. We can see this in ha_rocksdb::external_lock() call. That call is done for each table, for statements that do both reads and writes (e.g. INSERT ... SELECT), the first call is the external_lock(F_WRLCK) for the table that we're going to write to. Note: if we haven't allocated a savepoint at statement start, we should not try to remove it at statement end.

            Started attempt to implement it and encountered this: https://github.com/facebook/mysql-5.6/issues/802 (not related to memory consumption as such but shows that both us and upstream have RollbackToSavepoint() call in the wrong place).

            psergei Sergei Petrunia added a comment - Started attempt to implement it and encountered this: https://github.com/facebook/mysql-5.6/issues/802 (not related to memory consumption as such but shows that both us and upstream have RollbackToSavepoint() call in the wrong place).
            psergei Sergei Petrunia added a comment - - edited

            The patch for issue #802 here: https://github.com/spetrunia/mysql-5.6/commit/612ec75814f30076f53d294cd0267533c807022f is also a fix for this MDEV.

            That tree is based on facebook/mysql-5.6, but the patch should apply to MariaDB as well.

            Next steps: I'll discuss this the upstream and then cherry-pick into MariaDB (so that we get it in MariaDB sooner than the merge from upstream happens)

            psergei Sergei Petrunia added a comment - - edited The patch for issue #802 here: https://github.com/spetrunia/mysql-5.6/commit/612ec75814f30076f53d294cd0267533c807022f is also a fix for this MDEV. That tree is based on facebook/mysql-5.6, but the patch should apply to MariaDB as well. Next steps: I'll discuss this the upstream and then cherry-pick into MariaDB (so that we get it in MariaDB sooner than the merge from upstream happens)

            Applied the patch for upstream issue #802. It fixes this issue as well.

            psergei Sergei Petrunia added a comment - Applied the patch for upstream issue #802. It fixes this issue as well.

            People

              psergei Sergei Petrunia
              jmuchemb Julien Muchembled
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.