[MDEV-29861] Galera "notify" test cases hang Created: 2022-10-24  Updated: 2023-12-12  Resolved: 2023-09-12

Status: Closed
Project: MariaDB Server
Component/s: Galera, Tests
Affects Version/s: 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3

Type: Bug Priority: Major
Reporter: Jan Lindström (Inactive) Assignee: Julius Goryavsky
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Blocks
blocks MDEV-30172 Galera test case cleanup Stalled

 Description   


 Comments   
Comment by Sergei Golubchik [ 2022-12-24 ]

I find it the easiest to reproduce locally with rr:

./mtr galera.galera_var_notify_cmd --rr

this hits it every time for me, and without --rr the test mostly passes.

With the following change:

diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
--- a/sql/wsrep_mysqld.cc
+++ b/sql/wsrep_mysqld.cc
@@ -847,7 +847,9 @@ void wsrep_init_globals()
   wsrep_init_schema();
   if (WSREP_ON)
   {
+    WSREP_INFO("before Wsrep_server_state::instance().initialized()");
     Wsrep_server_state::instance().initialized();
+    WSREP_INFO("after Wsrep_server_state::instance().initialized()");
   }
 }
 
diff --git a/sql/wsrep_utils.cc b/sql/wsrep_utils.cc
index 854c11c1eea..56477d662ed 100644
--- a/sql/wsrep_utils.cc
+++ b/sql/wsrep_utils.cc
@@ -377,6 +377,7 @@ process::wait ()
   if (pid_)
   {
       int status;
+      WSREP_INFO("Waiting for process %s", str_);
       if (-1 == waitpid(pid_, &status, 0))
       {
           err_= errno; assert (err_);

one can easily see why it hangs.
This is when the test passed:

2022-12-24 13:32:49 0 [Note] WSREP: before Wsrep_server_state::instance().initialized()
2022-12-24 13:32:49 0 [Note] WSREP: Server initialized
2022-12-24 13:32:49 0 [Note] WSREP: Server status change initializing -> initialized
2022-12-24 13:32:49 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized
2022-12-24 13:32:49 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
2022-12-24 13:32:49 0 [Note] WSREP: after Wsrep_server_state::instance().initialized()

Here:

  • the server status is changed to "initialized"
  • notification script runs (does nothing for --status initialized)
  • Wsrep_server_state::instance().initialized() returns
  • server initialization continues

And this is from the timed out test:

2022-12-24 13:19:02 0 [Note] WSREP: before Wsrep_server_state::instance().initialized()
2022-12-24 13:19:02 0 [Note] WSREP: Server initialized
2022-12-24 13:19:02 0 [Note] WSREP: Server status change initializing -> initialized
2022-12-24 13:19:02 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized
2022-12-24 13:19:02 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
2022-12-24 13:19:02 5 [Note] WSREP: Cluster table is empty, not recovering transactions
2022-12-24 13:19:02 2 [Note] WSREP: Server status change initialized -> joined
2022-12-24 13:19:02 2 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status joined

As you see, server status has changed to "joined" before Wsrep_server_state::instance().initialized() returned, so the script is started with a new status, on "joined" it tries to connect to the server to update some table, but the server isn't initialized yet, it's not listening for connections. So the server waits for the script to finish, script waits for mariadb client to connect, and the client cannot connect, because the server isn't listening.

Comment by Jan Lindström [ 2023-08-16 ]

julien.fritsch I do not think this is critical bug as it effects at the moment only tests, and requires wsrep_notify_cmd configuration variable to be used, by default it is not defined. Regardless, I will work on this issue and propose a fix.

Comment by Jan Lindström [ 2023-08-25 ]

https://github.com/MariaDB/server/pull/2738

Comment by Julius Goryavsky [ 2023-09-12 ]

Fixed, https://github.com/MariaDB/server/commit/632a503ce7726547d8d858ae89fc3833ff95fe83

Generated at Thu Feb 08 10:11:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.