[MCOL-5384] Plugin code leaks network connections towards ExeMgr making EM a threads leak bomb - possible mem leak Created: 2023-01-09  Updated: 2023-02-24  Resolved: 2023-01-27

Status: Closed
Project: MariaDB ColumnStore
Component/s: ExeMgr, MariaDB Server
Affects Version/s: 6.4.6, 22.08.7
Fix Version/s: 22.08.8, 6.4.7- CS only

Type: Bug Priority: Critical
Reporter: Karol Roslaniec Assignee: Roman
Resolution: Fixed Votes: 0
Labels: None
Environment:

Mariadb ver. 10.6.11/MCS ver. 6.4.6 (compiled from sources)
CentOS 7 x86_64


Attachments: File mariadb-10.6.11.patch    
Issue Links:
Duplicate
is duplicated by MCOL-4997 too many threads running in ExeMgr? Open
Relates
relates to MCOL-4997 too many threads running in ExeMgr? Open
Sprint: 10.0.20
Assigned for Review: Roman Roman
Assigned for Testing: Kirill Perov Kirill Perov

 Description   

Summary: ExeMgr spawns unlimited number of threads

This problem is not new, it's already been reported as MCOL-4279, MCOL-4997.
Additionally I could find also a post on mariadb-columnstore about it.
Unfortunately all these cases were not solved, because... "not reproducible".

Hopefully now we can fix it, because I'm determined to find the problem, with some help from you guys.

To clarify: I'm very new to Columnstore. This is really my first try to Columnstore, so I cannot say if there are other scenarios or problematic cases. Unfortunately this issue is also a showstopper for me.

My scenario:
Every 15 min I execute couple of "LOAD DATA LOCAL INFILE ..." queries.
Two things might be important here, although I'm not sure if they really have impact on bug or not:
1. file is loaded from remote host (LOCAL)
2. I have 3 parallel processes for loading data, so it might happen that multiple "LAOD DATA" statements are executed at once, although I guarantee that single table is LOAD-ed only by one source at the time.

What I was able to find out till now:
1. Every time I execute LOAD DATA LOCAL, new thread is created inside mariadbd. I can see this clearly in debug.log (dbcon).
2. This new thread opens new the TCP-connection to ExeMgr
3. ExeMgr creates new thread to serve request(-s) from mariadbd.
4. When LOAD DATA is finished, mariadbd destroys the thread it created.
5. But the TCP-connection to ExeMgr is still open (it's not closed).
6. Because connection is still opened, ExeMgr-thread is still waiting in socket-read() mode, expecting to get messages from mariadbd.

At least this is my current understanding of the problem:
tcp-connection is not closed properly before destroying the thread.

Could you give me a hint, what code is responsible for creating mariadb-thread, creating TCP-connection, cleaning up connection and destroying the mariadb-thread?
Looking at the logs I concluded the LOAD DATA statement process is started in
ha_mcs_impl_start_bulk_insert() and ends in ha_mcs_impl_end_bulk_insert()
But still have the problem to figure out the functions responsible for
TCP-connection management during the query.

Best Regards,
Karol



 Comments   
Comment by Karol Roslaniec [ 2023-01-10 ]

I have already found out that ClientRotator class is responsible for TCP-connection.
And indeed: new ClientRotator object is created with each LOAD DATA, but it's never destroyed.
So what must be done is to destroy this object before thread is destroyed.

Best Regards,
Karol

Comment by Karol Roslaniec [ 2023-01-11 ]

I think I have found the problem.

It turns out, the real problem is actually much more severe than only ExeMgr.
It seems there is a resource/memory leak in columnstore plugin, thus mariadbd.
Namely: cal_connection_info was never destroyed.

Please find the patch to ha_mcs_impl.cpp in attachment (ha_mcs_impl.patch).
But also notice that actually there are more files/places in the code which should be fixed accordingly.

Comment by Roman [ 2023-01-12 ]

Hey Karol,
We appreciate you contribution b/c it should fix the leaking ClientRotator TCP connections. Someone will take a look at your patch shortly.
Do you use enterprise or community server?

Comment by Karol Roslaniec [ 2023-01-12 ]

I'm using community server.
Patches are created against mariadb-10.6.11.tar.gz, downloaded from mariadb site.

Today I've removed old patch-files and created one single patch, because just today I have noticed a possible issue in my previous fixes. It is possible, but actually all my changes are made purely on code analysis rather than "running code", so many changes in patch are theoretical issues that I'm fixing.

Namely in function ha_mcs_impl_close_connection I have put:

set_fe_conn_info_ptr(nullptr);

But the problem is, I actually don't know in which thread-context this method is called, and set_fe_conn_info_ptr works only on current_thd, which could be really bad in this scenario.
Therefore, I have replaced it with:

set_fe_conn_info_ptr(nullptr, thd);

But then I have realized, that implementation of set_fe_conn_info_ptr is really ignoring thd parameter, so I had to fix it as well.

Overall this patch is touching many other functionalities, so it would require some testing.
It did work in may testing sandbox, but my tests are really very very limited in scope.

Best Regards,
Karol

Comment by Roman [ 2023-01-16 ]

Karol I finished the review and I will hand over the patch to our QAs for additional testing. When they are done I will propagate the patch so the next community release will have it. The patch is much appreciated!

Comment by Gagan Goel (Inactive) [ 2023-01-20 ]

Hi Karol,

Thank you very much for reporting this bug and for submitting the patch. I have confirmed that lingering TCP connections and ExeMgr threads indeed remain when an LDI (LOAD DATA (LOCAL) INFILE) statement is executed and the client connection to mariadb server is closed.

Your patch does fix this problem. However, I was able to find an alternative with much lesser code change. See the patch below:

diff --git a/dbcon/mysql/ha_mcs_impl.cpp b/dbcon/mysql/ha_mcs_impl.cpp
index 53ffe0cb2..d5d719ce7 100644
--- a/dbcon/mysql/ha_mcs_impl.cpp
+++ b/dbcon/mysql/ha_mcs_impl.cpp
@@ -3874,6 +3874,10 @@ int ha_mcs_impl_close_connection(handlerton* hton, THD* thd)
     ci->cal_conn_hndl = 0;
   }
 
+  delete ci;
+  set_fe_conn_info_ptr(nullptr, thd);
+  thd_set_ha_data(thd, hton, nullptr);
+
   return rc;
 }
 
@@ -4110,6 +4114,10 @@ int ha_mcs::impl_external_lock(THD* thd, TABLE* table, int lock_type)
       // CS ends up processing query with handlers
       if (iter != ci->physTablesList.end() && ci->physTablesList.empty())
       {
+        // MCOL-3247 Use THD::ha_data as a per-plugin per-session
+        // storage for cal_conn_hndl to use it later in close_connection
+        thd_set_ha_data(thd, mcs_hton, get_fe_conn_info_ptr());
+
         if (!ci->cal_conn_hndl)
           return 0;
 
@@ -4120,9 +4128,6 @@ int ha_mcs::impl_external_lock(THD* thd, TABLE* table, int lock_type)
         ci->extendedStats = ci->cal_conn_hndl->extendedStats;
         ci->miniStats = ci->cal_conn_hndl->miniStats;
         ci->queryState = 0;
-        // MCOL-3247 Use THD::ha_data as a per-plugin per-session
-        // storage for cal_conn_hndl to use it later in close_connection
-        thd_set_ha_data(thd, mcs_hton, get_fe_conn_info_ptr());
         // Clean up all tableMap entries made by cond_push
         for (auto& tme : ci->tableMap)
         {
diff --git a/dbcon/mysql/ha_mcs_sysvars.cpp b/dbcon/mysql/ha_mcs_sysvars.cpp
index 415257c5f..fdd76c829 100644
--- a/dbcon/mysql/ha_mcs_sysvars.cpp
+++ b/dbcon/mysql/ha_mcs_sysvars.cpp
@@ -256,12 +256,10 @@ void* get_fe_conn_info_ptr(THD* thd)
 
 void set_fe_conn_info_ptr(void* ptr, THD* thd)
 {
-  if (current_thd == NULL && thd == NULL)
-  {
-    return;
-  }
+  if (thd == NULL) thd = current_thd;
+  if (thd == NULL) return;
 
-  THDVAR(current_thd, fe_conn_info_ptr) = (uint64_t)(ptr);
+  THDVAR(thd, fe_conn_info_ptr) = (uint64_t)(ptr);
 }
 
 ulonglong get_original_optimizer_flags(THD* thd)

As you can see, I have retained your changes in ha_mcs_impl_close_connection() as this was indeed a legitimate mem leak. I have also retained your changes in set_fe_conn_info_ptr. The change I have introduced is in impl_external_lock(). This function is called by the server at the start and end of a query to:

  • Retrieve a READ lock (lock_type=0) on a table for SELECT queries.
  • Retrieve a WRITE lock (lock_type=1) on a table for INSERTs/UPDATEs/DELETEs.
  • UNLOCK (lock_type=2) the retrieved READ/WRITE locks after the operation is done.

Now at the end of an LDI statement when the lock_type=2, we previously did not reach the thd_set_ha_data(thd, mcs_hton, get_fe_conn_info_ptr()); code due to an early return caused by:

         if (!ci->cal_conn_hndl)
           return 0;

This caused the server to not call ha_mcs_impl_close_connection() when the client closed the session. For an LDI, ci->cal_conn_hndl is NULL. This is because for an LDI statement, mariadb server does not connect to the ExeMgr as there is no data read from the table being loaded. To fix the TCP connection/thread leak problem, I have simply changed the ordering so that the thd_set_ha_data() call executes before the if condition.

Can you please try this patch in your system and let me know your thoughts? If this patch looks ok to you, I would suggest to create a PR in the engine github repo against the branch for which you are fixing: https://github.com/mariadb-corporation/mariadb-columnstore-engine and assign me as the reviewer (github user: tntnatbry). It will be much easier to perform code review there.

Comment by Karol Roslaniec [ 2023-01-22 ]

Hi Gagan,

I'm glad I could help.
I guess that your patch could even fix the issue with LDI statement, but I'm afraid that the bug seems to be more generic than only LDI.

Frankly speaking, I don't like your approach at all. It's using kind of "side effects" (function not directly involved); it does not guarantee that all other (theoretically) possible leaks are fixed; even "visually" looks weird, because it's setting the same value multiple times, wasting CPU cycles.
Ironically, inside the code you pointed out, there is a comment referencing MCOL-3247. Not surprisingly this is exactly the same problem as this one, dated already 3 years back.

My solution is really very simple and logically correct in my opinion: every time when you create an object

set_fe_conn_info_ptr(new cal_connection_info())

you immediately register it for deletion with:

thd_set_ha_data(thd, mcs_hton, get_fe_conn_info_ptr());

What is really missing in my patch, is the cleanup of code, namely all these random calls to thd_set_ha_data could be now removed. Additionally this code:

thd_set_ha_data(thd, mcs_hton, reinterpret_cast<void*>(0x42));

is really ugly. I'm not sure what functions ha_mcs_impl_pushdown_init() or impl_rnd_init() are doing, but probably it would be much better to create new cal_connection_info() in this place even if it's not supposed to be used later, just to ensure that thd_ha_data is always pointing to an object pointer and not to some magic numbers.
Another good code cleanup would be to create one common function for accessing cal_connection_info which would create this object if needed and immediately register it with thd_set_ha_data(); as result you would have only one place where you call new cal_connection_info().

Best Regards,
Karol

Comment by Roman [ 2023-01-23 ]

Let me add my 5 cents to this convo.
I was also concerned with the other cases that are covered by your patch Karol and I tested the cases to see if they, e.g. select mcssettrace(1) share the connection thread leak or not.

Here is the test case I run:

run ss -tenp | grep 8601
run "select funcname(args);" in a number of separate MDB client connections

The results divides the so-called client UDFs: mcssettrace, mcssetparams, calshowpartitions etc in two groups:

  • those that doesn't open a connection to run System Catalog queries (1)
  • those that do open a connection to run syscat queries (2)
    Here is the list 1:
  • select calshowpartitions('cs1','i');
  • select caldisablepartitions('cs1','0.0.1');
  • select calenablepartitions('cs1','0.0.1');
  • select caldroppartitions('cs1','0.0.1');
  • cleartablelock
  • viewtablelock
  • calflushcache
  • calgetsqlcount
  • calgetstats
  • calgettrace
  • calgetversion
    Here is the list 2.
  • select calshowpartitionsbyvalue('cs1','i','0','10000');
  • select caldisablepartitionsbyvalue('cs1','i','0','10000');
  • select calenablepartitionsbyvalue('cs1','i','0','1000000');
  • select caldroppartitionsbyvalue('cs1','0.0.1');
  • select callastinsertid('cs1');

Frankly speaking, I don't like your approach at all. It's using kind of "side effects" (function not directly involved); it does not guarantee that all other (theoretically) possible leaks are fixed; even "visually" looks weird, because it's setting the same value multiple times, wasting CPU cycles.
The code that you looking into is the part of MariaDB runtime so we both leverage the infrastructure the server provides us with and to some degree depend on this 'side effect', namely on relying on external_lock() b/c a storage engine(that MCS is from the point of view of MDB) isn't allowed to have/run hocks early in the processing path. However with the suggestion to reduce a boilerplate code, namely
My solution is really very simple and logically correct in my opinion: every time when you create an object you immediately register it for deletion

thd_set_ha_data(thd, mcs_hton, reinterpret_cast<void*>(0x42));

This is my unfortunate invention that is forced by the fact that this part of our codebase has to obey MDB API's, implicit and explicit call sequence conventions, namely if the ha_data value is 0 after the return from the method that sets ha_data to 0x42 there will be no call to close_connection later. I couldn't put cal_connection_info() blindly b/c it affected the other engines. We will re-evaluate this on the next refactoring run for the plugin code though.
JFYI there is no thd_set_ha_data in ha_mcs_ddl.cpp, ha_mcs_execplan.cpp, ha_pseudocolumn.cpp b/c these code parts can be reached only when cal_connection_info() is created.
If you find this convenient you can reach us in Zulip.

Comment by Roman [ 2023-01-24 ]

The problem also affects INFORMATION_SCHEMA tables, namely columnstore_columns and columnstore_tables. A client connection that accesses these tables leaves a network connection + ExeMgr client read thread behind.
I added an additional commit to address the problem with both UDFs and I_S tables.

Comment by Roman [ 2023-01-25 ]

4QA. The leaking connections can be detected with a simple test:
1 Connect
2 Run the offending expression
3 quit
4 ss -tenp | grep 8601 must show at least a pair of sockets open.

The list of the offending expression is given in one of my previous comments:
select calshowpartitionsbyvalue('cs1','i','0','10000');
select caldisablepartitionsbyvalue('cs1','i','0','10000');
select calenablepartitionsbyvalue('cs1','i','0','1000000');
select caldroppartitionsbyvalue('cs1','0.0.1');
select callastinsertid('cs1');
LDIF

Comment by Kirill Perov [ 2023-01-26 ]

updated 4QA:

The leaking connections can be detected with a simple test:
1 Connect
CREATE TABLE cs1 (i bigint)engine=columnstore;
INSERT INTO cs1 values (42);
select * from cs1;
2 Run the offending expression
3 quit
4 ss -tenp | grep 8601

Comment by Kirill Perov [ 2023-01-26 ]

verified at RockyLinux9
for https://buildbot.mariadb.net/archive/pack/bb-10.6-all-builders/build-52847/kvm-rpm-rocky9-amd64/rpms/
MariaDB 10.6.11
MariaDB-columnstore-engine.x86_64 10.6.11_6.4.7-1.el9
MariaDB-columnstore-engine-debuginfo.x86_64 10.6.11_6.4.7-1.el9

ss -tenp | grep 8601 shows nothing after previous steps 'updated 4QA'.

But result is exactly same for 6.4.6.

Comment by Kirill Perov [ 2023-01-26 ]

tested 6.4.6 and 6.4.7 via "select * from information_schema.columnstore_columns":
for 6.4.6 connects remain even after mysql is closed
for 6.4.7 no connects upon end of query output

Comment by Kirill Perov [ 2023-01-27 ]

confirmed using "select * from information_schema.columnstore_columns" behavior

Comment by Daniel Lee (Inactive) [ 2023-02-06 ]

Build verified: 22.08.8 RC2.

Generated at Thu Feb 08 02:57:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.