Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34493

memory leak when using galera node with only replication thread

Details

    • Bug
    • Status: Open (View Workflow)
    • Critical
    • Resolution: Unresolved
    • 10.6.16
    • 10.6
    • Galera, Server
    • None
    • redhat 8 on vmware

    Description

      we are getting memory leak in one DB server.

      situation:

      • our architecture is 3 nodes galera cluster - 2 DB nodes + 1 arbitrator in different locations. Plus running haproxy as the front end
      • haproxy redirects traffic to one DB node in normal stage. Another DB node is configured as backup in haproxy
      • the backup DB node used up its operating system available memory in 1 or 2 days. The DB node is running on 64GB memory VM and 8GB buffer pool
      • after available memory used up in backup DB node, OS killed the DB process and available memory got released
      • the backup DB node got 128 galera replication threads

      Since haproxy does not route traffic to backup DB node in normal situation, only replication and Percona PMM are using backup DB node.

      Screenshot:
      uploaded top command output, keeping decreasing available memory and output of memory_summary_global_by_event_name and memory_summary_by_thread_by_event_name

      Question:
      Any method to continue the troubleshooting? Kindly advise "memory/sql/thd::main_mem_root" is related to the memory leak or not.

      Regards,
      William Wong

      Attachments

        1. available-memory-keeps-decreaing.png
          713 kB
          William Wong
        2. Investigation-2025-04-03.txt
          8 kB
          Lawrence Man
        3. main_mem_root-at-session-level.png
          831 kB
          William Wong
        4. node2 memory_usage_pattern.xlsx
          347 kB
          Lawrence Man
        5. top.png
          360 kB
          William Wong
        6. top-memory_summary_global_by_event_name.png
          557 kB
          William Wong

        Activity

          LawrenceMan Lawrence Man added a comment -

          Yes, but we recently we have reduced it match with the # of vCPUs and the situation remains the same.

          LawrenceMan Lawrence Man added a comment - Yes, but we recently we have reduced it match with the # of vCPUs and the situation remains the same.
          LawrenceMan Lawrence Man added a comment -

          I got a small reproducible case as in the attached file:

          1. A table called t2 is created.
          2. An insert w/ 10mil records via the stored procedure InsertIntoLknmT2_bind_savept() caused increase in node 2's memory utilization, whenever the SP was executed for the first & second time.
          3. An SP Update_LknmT2_bind_savept was executed twice. Each time, there was rapid increase in memory utilization at node 2, during commit.
          4. An SP Update_LknmT2_bind was executed twice. Each time, there was rapid increase in memory utilization at node 2, during commit.

          It appears that the committing after running large batch of INSERT or UPDATE can lead to such situation, regardless of the presence of savepoint / release savepoint.

          Thanks and best regards,

          Lawrence

          Investigation-2025-04-03.txt

          LawrenceMan Lawrence Man added a comment - I got a small reproducible case as in the attached file: 1. A table called t2 is created. 2. An insert w/ 10mil records via the stored procedure InsertIntoLknmT2_bind_savept() caused increase in node 2's memory utilization, whenever the SP was executed for the first & second time. 3. An SP Update_LknmT2_bind_savept was executed twice. Each time, there was rapid increase in memory utilization at node 2, during commit. 4. An SP Update_LknmT2_bind was executed twice. Each time, there was rapid increase in memory utilization at node 2, during commit. It appears that the committing after running large batch of INSERT or UPDATE can lead to such situation, regardless of the presence of savepoint / release savepoint. Thanks and best regards, Lawrence Investigation-2025-04-03.txt
          LawrenceMan Lawrence Man added a comment -

          In the previously attached file, the same insert / update is executed 10mil times, in a single transaction. The SQLs are written using bind variable. In this situation, will the memory used to parse the SQL be allocated for the first SQL, and re-used for the rest of SQLs within that same transaction, or will the memory used to parse the SQL be multiplied by 10mil times, since they are in the same transaction?

          I raise the above query because it can be observed that there is a sharp rise in memory utilization, even though only a single INSERT/UPDATE SQL skeleton is used.

          LawrenceMan Lawrence Man added a comment - In the previously attached file, the same insert / update is executed 10mil times, in a single transaction. The SQLs are written using bind variable. In this situation, will the memory used to parse the SQL be allocated for the first SQL, and re-used for the rest of SQLs within that same transaction, or will the memory used to parse the SQL be multiplied by 10mil times, since they are in the same transaction? I raise the above query because it can be observed that there is a sharp rise in memory utilization, even though only a single INSERT/UPDATE SQL skeleton is used.
          janlindstrom Jan Lindström added a comment - - edited

          LawrenceMan I would not recommend inserting 10 million rows in a single transaction. From node2 point of view it will be 10 million events to be executed and every event requires memory. If you send those inserts as a single transaction and of you have more than one applier threads as these inserts do not really conflict more than one applier thread will start allocating memory for events. From node2 point of view it does not matter do you have savepoints or prepared transactions. I do not know about node1 point of view.

          janlindstrom Jan Lindström added a comment - - edited LawrenceMan I would not recommend inserting 10 million rows in a single transaction. From node2 point of view it will be 10 million events to be executed and every event requires memory. If you send those inserts as a single transaction and of you have more than one applier threads as these inserts do not really conflict more than one applier thread will start allocating memory for events. From node2 point of view it does not matter do you have savepoints or prepared transactions. I do not know about node1 point of view.
          LawrenceMan Lawrence Man added a comment -

          Hi Jan,

          Thanks for your reply.

          The program was meant to reproduce the case more obviously for illustration only. If the # of records committed in a batch is reduced to 100K (the actual order of magnitude in my customer's case), the situation was similar but the increase is less significant, and still requires periodic restart of MariaDB at node 2 to release the memory.

          Is MariaDB supposedly able to handle 100K records transaction without continuous memory growth / leak at node 2? Or is it necessary to break down the transaction into still smaller chunks?

          Thanks and best regards,

          Lawrence

          LawrenceMan Lawrence Man added a comment - Hi Jan, Thanks for your reply. The program was meant to reproduce the case more obviously for illustration only. If the # of records committed in a batch is reduced to 100K (the actual order of magnitude in my customer's case), the situation was similar but the increase is less significant, and still requires periodic restart of MariaDB at node 2 to release the memory. Is MariaDB supposedly able to handle 100K records transaction without continuous memory growth / leak at node 2? Or is it necessary to break down the transaction into still smaller chunks? Thanks and best regards, Lawrence

          People

            janlindstrom Jan Lindström
            frelist William Wong
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.