[MDEV-18353] Shutdown may miss to wait for connection thread Created: 2019-01-23  Updated: 2022-09-07  Resolved: 2021-07-26

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.4.21, 10.5.12, 10.6.4

Type: Bug Priority: Major
Reporter: Sergey Vojtovich Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MDEV-26325 mysqld process hangs when stopped aft... Closed

 Description   

Side effects may vary, but I guess crash is most probable outcome.

To reproduce, first apply this patch:

diff --git a/sql/mysqld.cc b/sql/mysqld.cc
index e1c9f68..028db04 100644
--- a/sql/mysqld.cc
+++ b/sql/mysqld.cc
@@ -2045,11 +2045,13 @@ static void mysqld_exit(int exit_code)
 #ifdef SAFEMALLOC
     sf_report_leaked_memory(0);
 #endif
-    DBUG_SLOW_ASSERT(global_status_var.global_memory_used == 0);
+//    DBUG_SLOW_ASSERT(global_status_var.global_memory_used == 0);
   }
   cleanup_tls();
   DBUG_LEAVE;
   sd_notify(0, "STATUS=MariaDB server is down");
+  if (delayed_queue_size == 10)
+    sleep(10);
   exit(exit_code); /* purecov: inspected */
 }
 
diff --git a/sql/sql_connect.cc b/sql/sql_connect.cc
index a2f2407..373f9c6 100644
--- a/sql/sql_connect.cc
+++ b/sql/sql_connect.cc
@@ -1303,6 +1303,9 @@ pthread_handler_t handle_one_connection(void *arg)
 {
   CONNECT *connect= (CONNECT*) arg;
 
+  if (delayed_queue_size == 10)
+    sleep(10);
+
   mysql_thread_set_psi_id(connect->thread_id);
 
   do_handle_one_connection(connect);

Note: delayed_queue_size variable was chosen randomly and doesn't make any difference, it is there just to avoid sleeping in unrelated connections.

Test:

set global delayed_queue_size=10;
connect(con1,localhost,root,,);

run mtr and as soon as it shows "set global delayed_queue_size=10;" kill mysqld manually, like killall mysqld.

You should end up with crash like like:

Thread 2 (Thread 0x7fb03f8678c0 (LWP 16008)):
#0  0x00007fb03d2da30d in nanosleep () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fb03d2da25a in __sleep (seconds=0) at ../sysdeps/posix/sleep.c:55
#2  0x0000559aa846aa4d in mysqld_exit (exit_code=0) at /home/svoj/devel/maria/mariadb/sql/mysqld.cc:2054
#3  0x0000559aa847308f in mysqld_main (argc=155, argv=0x559aaabc1a20) at /home/svoj/devel/maria/mariadb/sql/mysqld.cc:6034
#4  0x0000559aa8466ee0 in main (argc=32, argv=0x7ffc504365a8) at /home/svoj/devel/maria/mariadb/sql/main.cc:25
 
Thread 1 (Thread 0x7fb0354c8700 (LWP 16040)):
#0  __pthread_kill (threadid=<optimized out>, signo=11) at ../sysdeps/unix/sysv/linux/pthread_kill.c:62
#1  0x0000559aa9071112 in my_write_core (sig=11) at /home/svoj/devel/maria/mariadb/mysys/stacktrace.c:481
#2  0x0000559aa88a92f1 in handle_fatal_signal (sig=11) at /home/svoj/devel/maria/mariadb/sql/signal_handler.cc:305
#3  <signal handler called>
#4  0x0000559aa8ae5813 in set_thread_id_v1 (thread=0x7fb03a8be340, processlist_id=10) at /home/svoj/devel/maria/mariadb/storage/perfschema/pfs.cc:1945
#5  0x0000559aa86fa923 in inline_mysql_thread_set_psi_id (id=10) at /home/svoj/devel/maria/mariadb/include/mysql/psi/mysql_thread.h:1275
#6  0x0000559aa86fdb1d in handle_one_connection (arg=0x559aab131c90) at /home/svoj/devel/maria/mariadb/sql/sql_connect.cc:1309
#7  0x0000559aa8ae55d0 in pfs_spawn_thread (arg=0x559aab0acd30) at /home/svoj/devel/maria/mariadb/storage/perfschema/pfs.cc:1862
#8  0x00007fb03de806ba in start_thread (arg=0x7fb0354c8700) at pthread_create.c:333
#9  0x00007fb03d31541d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Most probably regression after connection speed improvements. And likely affects 10.0+.



 Comments   
Comment by Sergey Vojtovich [ 2019-02-26 ]

15f6d218600063beddfdd78763d766f48ace6daf provided by wlad

Comment by Sergey Vojtovich [ 2020-05-05 ]

wlad, as discussed before there're two problems with your approach:
1. it fixes only one side effect of this bug leaving destroyed data access
2. it doesn't encapsulate nicely into the THD_list class.

I suggest alternative solution implemented in 2 patches in bb-10.5-svoj: https://github.com/MariaDB/server/commits/bb-10.5-svoj

4650899 Thread_cache::unregistred_connections()
ebae2bc Move all thread cache specific code to a new class

How do you feel about this alternative?

Comment by Vladislav Vaintroub [ 2020-05-05 ]

I like thread cache encapsulation, but I do not find it very intuitive to add yet another endless loop, to fix some other endless loop. Is there a certainty, that we will not leak some unregistered_connections now, or in the future?

I do not see why the encapsulation is missing in 15f6d218600063beddfdd78763d766f48ace6daf . The THD_list is not bound to be a just a container. it can be a container and a state, too, if this better describes server' and connections lifecycle. If I think what is bad that might happen if you have a destroyed data access, I do not find anything special. So there will be a crash, in an extremely improbable situation, at the time where there is no more need for recovery. This crash is already possible today, right? But it does not happen much

Comment by Sergey Vojtovich [ 2020-05-05 ]

I like thread cache encapsulation,

Thanks!

but I do not find it very intuitive to add yet another endless loop, to fix some other endless loop. Is there a certainty, that we will not leak some unregistered_connections now, or in the future?

There's no certainty of course. I'm not completely happy about it either. There're better solutions, but they're more intrusive:
1. it should be alright to let acceptor thread allocate and insert THD into server_threads
2. do not remove THD from the server_threads when parked, but mark it "parked" instead

I was hoping to implement 2. anyway for performance reasons.

I do not see why the encapsulation is missing in 15f6d218600063beddfdd78763d766f48ace6daf

Well, I think this flag is only needed for the one-thread-per-connection scheduler. Not for the "no-threads" scheduler, not for delayed inserts, not for slave threads or rpl_parallel threads, not for Event scheduler threads, etc - all of them are handled already. Unsure re threadpool, couldn't track it down carefully.

This crash is already possible today, right? But it does not happen much

Sure, as well as this hang did not happen much.

So you suggest to leave destroyed data access and only fix the hang?

Comment by Vladislav Vaintroub [ 2020-05-05 ]

I would leave the improbable destroyed data access (it is likely just PSI, and the THD_list's own rwlock), and fix the hang. Crash is bad when it comes after query or so, but has no impact on production, when database has already properly shut down, and while do some destroy feast, to please valgrind/LSAN. Hang is always bad, on the other hand, it affects the production users. Imagine someone does shutdown to quickly install the server upgrade, and then the script waits overnight, waiting for the server that is doing my_sleep(100), halting the production.

Comment by Sergey Vojtovich [ 2020-05-05 ]

...when database has already properly shut down...

No, that's not the case. If we put appropriate sleeps, this thread can continue to executing queries against a partially de-initialised server. Storage engines are still available at this point and about to shutdown.

I would leave the improbable destroyed data access

Then we seem to disagree. Let's seek for a third opinion? Will you suggest anybody?

Comment by Vladislav Vaintroub [ 2020-05-05 ]

which resources are freed when storage engines are not freed? I'd like to see crash callstack(s). The thread did not start to execute yet. It did not go through authentication phase, yet. you just have the socket that was connected, not much else.

Comment by Sergey Vojtovich [ 2020-05-05 ]

No, that's not the case. If we put appropriate sleeps, this thread can continue to executing queries against a partially de-initialised server. Storage engines are still available at this point and about to shutdown.

Uhm, probably not with your fixes. Although who knows what can happen with destroyed server_threads?

Anyway, let's seek for third opinion. May be somebody could suggest better solution.

Comment by Vladislav Vaintroub [ 2020-05-05 ]

and you can destroy server_threads as late as you want, probably latest prior to PSI. at that stage nothing is alive. a new thread might still pop up, but if you put some (arguably not 100% clean) check for abort_loop, at the start of handle_one_connection, the probability of a relatively harmless crash on shutdown, that was already extremely tiny, becomes tiny-tiny, i.e a couple of factors less.

Comment by Sergey Vojtovich [ 2020-05-06 ]

serg, we seem to disagree with wlad on how this bug should be fixed. There're 2 competing solutions:
Wlad's https://github.com/MariaDB/server/commit/15f6d218600063beddfdd78763d766f48ace6daf
Mine https://github.com/MariaDB/server/commit/725e43cefc422fe49fdbfe2f9252698b5c4a7175

In a nutshell, when a thread is created at around shutdown, there's a gap between thread creation and thread becoming killable (registration in server_threads). Thus shutdown code may miss to handle it properly. As a result shutdown may hang or newly created thread may access destroyed data.

My approach is a complete solution for this problem, unfortunately it adds another wait loop in shutdown code (to make sure all threads left that gap). Which Wlad didn't like (me neither, but I don't see other bulletproof fix).
Wlad's solution fixes only shutdown hang. He claims that only hang is important and we can disregard these very unlikely late stage shutdown crashes caused by destroyed memory access.

serg, can we have your opinion on which solution should be preferred?

Comment by Sergei Golubchik [ 2020-10-23 ]

I understand wlad's fix, and I can see its deficiencies.

I don't understand svoj's fix, it seems to be revolving around thread cache. What if the thread doesn't come from a cache?

Comment by Sergey Vojtovich [ 2020-10-23 ]

serg, the problem is specific to the thread cache, or one-thread-per-connection scheduler. Background threads are started and killed by different means. Though I don't recall if pool-of-threads was affected.

Comment by Vladislav Vaintroub [ 2020-10-24 ]

I recall that in my patch, I just tried to avoid endless looping in shutdown, which I think might be more probable, and the consequences are far worse than merely-cosmetical "crash in shutdown"

Comment by Sergei Golubchik [ 2021-07-14 ]

wlad and svoj, what do you think about this patch: https://github.com/MariaDB/server/commit/63f3eecaa0f ?

Comment by Sergey Vojtovich [ 2021-07-14 ]

serg, looks like it solves the problem.
1. I can't tell the impact on thread cache, wlad should know better. Feels like TP is also affected.
2. Not a big deal, but such "half-lost" threads may slow down shutdown for 2 seconds.
3. I'd add a comment around newly added server_threads.iterate(kill_thread_phase_1) "We may have missed newly spawned thread at the first kill attempt, sigh, try killing it once again."

But from the maintenance point of view the code becomes counter-intuitive: it gives additional meaning to thread_count, it becomes THD count + CONNECT count. It has to perform extra kills, and technically it waits for such threads anyway. My fix was ugly, nevertheless cleaner in these regard.

Assuming TP is affected, dunno, may be add CONNECT_count, make sure delete connect is called after server_threads.insert() and then add a wait for threads in this gap?

Comment by Vladislav Vaintroub [ 2021-07-14 ]

I only mind the thread_count name and THD_count class, because it neither counts threads, not does it count THDs now. Note, that thread_count was not used either, except to hang on shutdown waiting for this count to become 0.

I'm all for that fix, provided that this variable thread_count, and class THD_count will get better names.

Comment by Sergei Golubchik [ 2021-07-20 ]

Thanks.

I'll add a comment. I'll rename thread_count.

THD_count? I don't know, THD_and_CONNECT_count is kind of silly, isn't it? The way I looked at it, CONNECT is a precursor for THD. I wanted to increment thread_count in CONNECT and then "pass over" this increment to THD, that is not increment thread_count in the THD constructor. But the easiest way to to do it turned out to be to increment thread_count in THD::THD and immediately decrement again in ~CONNECT.

That is, in a way THD_count still counts THD's. It's just when a THD is created from a CONNECT (which is a short-lived object), it starts counting a THD in advance, since the moment when CONNECT is created.

Comment by Vladislav Vaintroub [ 2021-07-20 ]

I dunno, but since there are problems in accurate naming, perhaps there are problems with that code too.
What about not having single THD_count , but say Object_count<atomic_uint>, and THD will derive from Object_count<thd_count> and CONNECT will derive from Object_count<connect_count> , and you can fully stop when thd_count == connect_count == 0, and so one can operate with thd_count+connect_count is one needs, but also use the variables independently, if one needs.

Comment by Sergei Golubchik [ 2021-07-21 ]

we cannot use templates like that:

sql/sql_class.h|2165 col 35 error| non-type template parameters of class type only available with ‘-std=c++2a’ or ‘-std=gnu++2a’

I don't think there's a problem with naming, and I commented the usage too. In

Comment by Vladislav Vaintroub [ 2021-07-21 ]

@serg, THD_count is not only the way to increment-decrement it. There are background THDs in 10.5 that are not counted and are not part of the THD_list either, and there is no need to wait for them during shutdown.

"Not counted" means that thread_count was manually incremented and decremented to fight the magic THD constructor.

But there should be a way, template, reference, or whatever , to have an object bound to integer variable, that increments it on constructor, and decrements on destructor.

Comment by Sergei Golubchik [ 2021-07-21 ]

There is a way, It could be

class Object_count
{
  Atomic_counter<uint32_t> *counter;
  Object_count(Atomic_counter<uint32_t> *arg): counter(arg)
  { counter++; }
  ~Object_count() { count--; }
}

but I don't really want to store the same pointer in every THD

Comment by Vladislav Vaintroub [ 2021-07-21 ]

I actually find a generic object count a better/clearer idea, and one can't really make much THD worse by adding 8 bytes pointer.
But OK, if you want a combined count of 2 different object, I will not protest much. Though I still think that "if you can't accurately name it, which is the case here, there is a problem" (likely the problem is the very existence of a temporary CONNECT object, which we won't discuss in that patch).

Comment by Sergei Golubchik [ 2021-07-22 ]

wlad, do you like https://github.com/MariaDB/server/commit/af41a54bf12 more?

Comment by Sergey Vojtovich [ 2021-07-22 ]

serg af41a54bf12 goes more or less inline with what I had on my mind. But you missed to move all delete connect after server_threads.insert(). FWICS there're 3 occurrences. I'd also suggest to make counter private.

Comment by Sergey Vojtovich [ 2021-07-22 ]

Hmm... in fact reorder of delete connect is attributed to 705026b723b4dab01a6637c90138e31e1f38b5dd. However that commit doesn't really need it (because you issue kills in a loop). Whereas it is a must for af41a54bf12.

Comment by Sergei Golubchik [ 2021-07-22 ]

Yes, that's my point. If I add a separate counter and a loop, it'll be like your patch, which wlad didn't like. I just wanted to show where it's going.

If I'm to push this last version, I'll merge two last commits, so reordering and a separate loop will be together.

Comment by Sergey Vojtovich [ 2021-07-22 ]

FWIW: I forgot to mention that there's a justified need for THD_count to be a separate class as described in its comment. However CONNECT_count does not have to be separated, it can easily go inside CONNECT.

Comment by Sergei Golubchik [ 2021-07-24 ]

https://github.com/MariaDB/server/commit/cb645a373fde0728e8ace64b1b684306c06d9e55

Comment by Vladislav Vaintroub [ 2021-07-24 ]

serg, that needs some polishing , as it obviously is missing the same handling in threadpool_common.cc, as there is for one-thread-per-connection.

Maybe you can run that test with thread-handling=pool-of-threads , too. IS it possible to make the test portable (I noticed the _no_windows part)?

Comment by Sergei Golubchik [ 2021-07-24 ]

That test doesn't fail with thread-handling=pool-of-threads. With or without a bug fix.

The test needs connection made in the background. mysqltest can only send a query in the background, it cannot do

--send --connect

so the only workaround that I came up with was

exec $MYSQL -e 'select sleep(3600)' >/dev/null 2>&1 &;

that's why _no_windows. If you can suggest a portable or even windows-specific way to open a connection in the background without waiting for a status, I will put it in the test.

Comment by Vladislav Vaintroub [ 2021-07-24 ]

What is missing in pool-of-threads is

  DBUG_EXECUTE_IF("CONNECT_wait",
  {
    extern MYSQL_SOCKET unix_sock;
    DBUG_ASSERT(unix_sock.fd >= 0);
    while (unix_sock.fd >= 0)
      my_sleep(1000);
  });

Maybe this is why the test does not fail with pool-of-threads. What I was talking about, is that you changed the order of delete connect and server_threads.insert() only in do_handle_one_connection, not in threadpool_add_connection()

Comment by Sergei Golubchik [ 2021-07-24 ]

Thanks, fixed pool-of-threads too, https://github.com/MariaDB/server/commit/4533e6ef65b

Comment by Vladislav Vaintroub [ 2021-07-24 ]

A Windows specific way to open connection in the background, seems to be the "start"
https://stackoverflow.com/questions/2937569/how-to-start-an-application-without-waiting-in-a-batch-file/2938022 (also look for interpretation of quoted arguments in the answers)

start  /b <path-to-exe> <args>

I did not try myself yet, it could have problems if slashes are Unix way in <path-to-exe>, because cmd itself is picky in this regard.

Generated at Thu Feb 08 08:43:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.