Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34912

THD may not cleaning up all memory between queries

Details

    Description

      THD may not cleaning up all memory between queries

      THD stores allocated memory for a set of different purposes:

      • Vio data data (used for reading data from a connection)
      • Default 16K may be allocated in THD->net.read_buffer
      • While testing this, it was not allocated
      • Connection data
      • Basically query string + some overhead
      • Stored in THD->net.buff of size (THD->net.buff_end - THD->net.buff)
      • A block of 16K is allocated by default
      • Can grow up to max_allowed_size = 16777216. Will not decrease
      • Slave connection size may grow up to slave_max_allowed_packet = 1073741824
      • We should probably reallocate this at end of query if bigger than
        max_allowed_size.
      • Memory used to process the current query
      • Stored in THD->main_mem_root
      • At startup, a permanent block of query_prealloc_size(24K) is
        allocated. If needed new blocks of query_alloc_size(16K) are
        allocated.
      • This is reset to query_prealloc_size at end of each statement
      • For prepared statements
        These are allocated in blocks of query_alloc_block_size (16K)
      • Transactions information
      • At startup, a permanent block of transaction_prealloc_size(4K) is
        allocated. If needed new blocks of transaction_alloc_block_size
        (8K) is allocated.
      • Query digest
      • Stored in THD->m_token_array
      • Fixed size array of 'max_digest_length' (1024)
      • Binary log, allocated if log is enabled.
      • Stored in THD->cache_mngr
      • A memory of binlog_stmt_cache_size (32K) + binlog_cache_size (32K)
        is allocated at first binlog write.
      • User variables
      • Stored in THD->user_vars
      • stores name and data for each variable
      • Data can be reset to 0 but name will be available until connection is closed
      • User level locks
      • Stored in THD->ull_hash
      • Memory reused for new entries
      • hash memory will not shrink
      • Open handlers (HANDLER OPEN)
      • Stored in THD->handler_tables_hash
      • Memory reused for new entries
      • hash memory will not shrink
      • Sequences (to get previous value of a sequence)
      • Stored in THD->sequences
      • Session tracker. Active if session_track_schema = ON
      • Variables stored in THD->session_tracker.sysvars in a hash
      • One allocation per tracked variable (THD->session_track_system_variables)
      • 8K for the hash to track variables; Could be reduced to 1K

      Note that the 'basic' memory used for idle connection should be
      sizeof(THD) = 27888 + query_prealloc_size (24) + connection data(16K)
      + some other small objects (see above).

      On my current system memory_used memory_used is 75872 bytes for an
      idle, just started connection without binary logging.

      Normally when running queries the 'memory_used' from
      SELECT id, command, memory_used, max_memory_used from information_schema.processlist
      should not grow 'a lot' over time except in the following cases:

      • There are active prepared statements
      • This memory should go back to 'basic' when all statements are freed.
      • Queries with a lot of text (which causes connection string to
        permanently increased).

      While visiting a MariaDB user running 10.6.18 and checking the process
      lists, I noticed a couple of connections that where using some 100M of
      memory even while being 'Sleep'.

      The task is to find out case if there is some common cases where the
      'max_memory' grows a lot without coming back.

      Known issues where this could happen (probably not related to this case)

      • A lot of prepared statements that are never freed
      • A lot of user variables that are never freed
      • A lot of locks that are never freed
      • Having really long query strings, up to 128M or more

      Some variables from the user:
      query_alloc_block_size 16384 (default)
      query_prealloc_size 24576 (default)
      max_allowed_packet 268,435,456
      slave_max_allowed_packet 1,073,741,824 (default)
      binlog_cache_size 1,048,576
      binlog_stmt_cache_size 131,072
      transaction_alloc_block_size 8192 (default)
      transaction_prealloc_size 4096 (default)

      Note that if one is using a server compiled with safemalloc, one can find all allocated memory for a connection with:

      p sf_report_leaked_memory(connection_id)

      Attachments

        Issue Links

          Activity

            The first result which the tests yielded was the orphan prepared statements. I know that the description mentioned it as an unlikely cause, but I think it shouldn't be dismissed so lightly.
            It is very easy to achieve the exact described result by just running PREPARE on new statement handles without DEALLOCATE, but that can indeed be not a very realistic scenario in well-written applications.
            However, most of users feed their SQL to the server via some connectors, and the same problematic outcome can happen with some connectors without any obvious fault on the user's application side.

            Here is an example in Perl DBI (wrapped for running in MTR just for convenience).

            perl;
            use DBI;
            my $dbh = DBI->connect("dbi:mysql:host=127.0.0.1:port=$ENV{MASTER_MYPORT}:database=test:mysql_server_prepare=1",'root');
            my $res= $dbh->selectrow_arrayref("select MEMORY_USED from information_schema.processlist where ID = CONNECTION_ID()");
            print "MEMORY_USED before: ",$res->[0],"\n";
            foreach (1..10000) {
              $dbh->do("SELECT 1");
            }
            $res= $dbh->selectrow_arrayref("select MEMORY_USED from information_schema.processlist where ID = CONNECTION_ID()");
            print "MEMORY_USED after: ",$res->[0],"\n";
            EOF
            

            The result of this test is that the connection grows up to 240M in the matter of a second:

            MEMORY_USED before: 147480
            MEMORY_USED after: 246387480
            

            Here we enable server-side prepared statements, as many users surely do, and then run queries in a usual way, not worrying about PREPARE/DEALLOCATE at all, as the connector is supposed to deal with it.
            If we look at the general log, we'll see two different patterns.
            When we use selectrow_arrayref method (or, I assume, other similar fetchers), the statement gets closed after execution:

                                 5 Prepare  select MEMORY_USED as memory_used_before from information_schema.processlist where ID = CONNECTION_ID()
                                 5 Execute  select MEMORY_USED as memory_used_before from information_schema.processlist where ID = CONNECTION_ID()
                                 5 Close stmt       
            

            However, when we use do method, prepared statements don't get closed and they pile up (if we continued the loop above 16K repetitions, we'd get the error that we exhausted the number of prepared statements):

                                 5 Prepare  SELECT 1
                                 5 Execute  SELECT 1
                                 5 Prepare  SELECT 1
                                 5 Execute  SELECT 1
            ...
            

            Note that I'm using here SELECT to minimize the test case, but the same happens with DML, for example.
            So, in a real-life application, it would be perfectly normal to have a connection which has server-side prepared statements enabled and mostly does SELECTs (and fetches the results), but every once in a while it also does DML/DDL via do or a similar method, so the orphan prepared statements would slowly pile up – not quickly enough to hit the limit, but sufficiently to show 100-200M memory usage at some point in the processlist.

            It would be great to have the general log from the user in question, in this case we could see quickly if there is the same pattern of not closed prepared statements in there; the next best option would be to find out which exactly connector they are using, so we could try it and see if there is a similar issue.

            elenst Elena Stepanova added a comment - The first result which the tests yielded was the orphan prepared statements. I know that the description mentioned it as an unlikely cause, but I think it shouldn't be dismissed so lightly. It is very easy to achieve the exact described result by just running PREPARE on new statement handles without DEALLOCATE, but that can indeed be not a very realistic scenario in well-written applications. However, most of users feed their SQL to the server via some connectors, and the same problematic outcome can happen with some connectors without any obvious fault on the user's application side. Here is an example in Perl DBI (wrapped for running in MTR just for convenience). perl; use DBI; my $dbh = DBI-> connect ( "dbi:mysql:host=127.0.0.1:port=$ENV{MASTER_MYPORT}:database=test:mysql_server_prepare=1" , 'root' ); my $res= $dbh->selectrow_arrayref( "select MEMORY_USED from information_schema.processlist where ID = CONNECTION_ID()" ); print "MEMORY_USED before: " ,$res->[0], "\n" ; foreach (1..10000) { $dbh->do( "SELECT 1" ); } $res= $dbh->selectrow_arrayref( "select MEMORY_USED from information_schema.processlist where ID = CONNECTION_ID()" ); print "MEMORY_USED after: " ,$res->[0], "\n" ; EOF The result of this test is that the connection grows up to 240M in the matter of a second: MEMORY_USED before: 147480 MEMORY_USED after: 246387480 Here we enable server-side prepared statements, as many users surely do, and then run queries in a usual way, not worrying about PREPARE/DEALLOCATE at all, as the connector is supposed to deal with it. If we look at the general log, we'll see two different patterns. When we use selectrow_arrayref method (or, I assume, other similar fetchers), the statement gets closed after execution: 5 Prepare select MEMORY_USED as memory_used_before from information_schema.processlist where ID = CONNECTION_ID() 5 Execute select MEMORY_USED as memory_used_before from information_schema.processlist where ID = CONNECTION_ID() 5 Close stmt However, when we use do method, prepared statements don't get closed and they pile up (if we continued the loop above 16K repetitions, we'd get the error that we exhausted the number of prepared statements): 5 Prepare SELECT 1 5 Execute SELECT 1 5 Prepare SELECT 1 5 Execute SELECT 1 ... Note that I'm using here SELECT to minimize the test case, but the same happens with DML, for example. So, in a real-life application, it would be perfectly normal to have a connection which has server-side prepared statements enabled and mostly does SELECTs (and fetches the results), but every once in a while it also does DML/DDL via do or a similar method, so the orphan prepared statements would slowly pile up – not quickly enough to hit the limit, but sufficiently to show 100-200M memory usage at some point in the processlist. It would be great to have the general log from the user in question, in this case we could see quickly if there is the same pattern of not closed prepared statements in there; the next best option would be to find out which exactly connector they are using, so we could try it and see if there is a similar issue.
            elenst Elena Stepanova added a comment - - edited

            Other than that, we have plenty of open bugs for the debug assertion failure status_var.local_memory_used == 0.
            I didn't go through the whole list to re-check, but at least some of them are still valid, and surely they can cause an effect similar to what's described here.
            Here is one example (I didn't file it separately, because I think it's MDEV-32533 or its close sibling, added a comment to it instead):

            CREATE TABLE t1 (a INT);
            INSERT INTO t1 VALUES (1);
             
            CREATE TABLE t2 (b CHAR(8));
            INSERT INTO t2 VALUES ('x'),('xx');
             
            CREATE TABLE t3 (c TIMESTAMP);
             
            select memory_used as "memory used before" from information_schema.processlist where id = connection_id();
             
            --let $run=100000
            --disable_query_log
            --disable_result_log
            while ($run)
            {
              --error ER_TRUNCATED_WRONG_VALUE
              UPDATE t1 SET a = 0 WHERE 'xx' IN (SELECT MAX(b) FROM t2 JOIN t3 JOIN t1 t1a ON (t1a.a = t3.c));
              --dec $run
            }
            --enable_query_log
            --enable_result_log
             
            select memory_used as "memory used after" from information_schema.processlist where id = connection_id();
             
            # Cleanup
            DROP TABLE t1, t2, t3;
            

            On a non-debug build the test case takes about 10 seconds and results in

            select memory_used as "memory used before" from information_schema.processlist where id = connection_id();
            memory used before
            139224
            select memory_used as "memory used after" from information_schema.processlist where id = connection_id();
            memory used after
            785739240
            

            That is, the connection grows to ~750M.

            Of course, the scenario above can be considered a "corner case", but for the assertion status_var.local_memory_used == 0 alone JIRA finds at least 19 unresolved bug reports, so I guess some of them can easily affect the user (especially since they are not getting the leak all the time, but only sometimes in some connections – thus, not for the most common usage).

            There can be also reports for ASAN LeakSanitizer errors, Valgrind warnings, and the generic "Warning: Memory not freed" on non-debug builds.

            elenst Elena Stepanova added a comment - - edited Other than that, we have plenty of open bugs for the debug assertion failure status_var.local_memory_used == 0 . I didn't go through the whole list to re-check, but at least some of them are still valid, and surely they can cause an effect similar to what's described here. Here is one example (I didn't file it separately, because I think it's MDEV-32533 or its close sibling, added a comment to it instead): CREATE TABLE t1 (a INT ); INSERT INTO t1 VALUES (1);   CREATE TABLE t2 (b CHAR (8)); INSERT INTO t2 VALUES ( 'x' ),( 'xx' );   CREATE TABLE t3 (c TIMESTAMP );   select memory_used as "memory used before" from information_schema.processlist where id = connection_id();   --let $run=100000 --disable_query_log --disable_result_log while ($run) { --error ER_TRUNCATED_WRONG_VALUE UPDATE t1 SET a = 0 WHERE 'xx' IN ( SELECT MAX (b) FROM t2 JOIN t3 JOIN t1 t1a ON (t1a.a = t3.c)); --dec $run } --enable_query_log --enable_result_log   select memory_used as "memory used after" from information_schema.processlist where id = connection_id();   # Cleanup DROP TABLE t1, t2, t3; On a non-debug build the test case takes about 10 seconds and results in select memory_used as "memory used before" from information_schema.processlist where id = connection_id(); memory used before 139224 select memory_used as "memory used after" from information_schema.processlist where id = connection_id(); memory used after 785739240 That is, the connection grows to ~750M. Of course, the scenario above can be considered a "corner case", but for the assertion status_var.local_memory_used == 0 alone JIRA finds at least 19 unresolved bug reports , so I guess some of them can easily affect the user (especially since they are not getting the leak all the time, but only sometimes in some connections – thus, not for the most common usage). There can be also reports for ASAN LeakSanitizer errors, Valgrind warnings, and the generic "Warning: Memory not freed" on non-debug builds.

            People

              monty Michael Widenius
              monty Michael Widenius
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.