Details

    Description

      Attachments

        Issue Links

          Activity

            I find it the easiest to reproduce locally with rr:

            ./mtr galera.galera_var_notify_cmd --rr
            

            this hits it every time for me, and without --rr the test mostly passes.

            With the following change:

            diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
            --- a/sql/wsrep_mysqld.cc
            +++ b/sql/wsrep_mysqld.cc
            @@ -847,7 +847,9 @@ void wsrep_init_globals()
               wsrep_init_schema();
               if (WSREP_ON)
               {
            +    WSREP_INFO("before Wsrep_server_state::instance().initialized()");
                 Wsrep_server_state::instance().initialized();
            +    WSREP_INFO("after Wsrep_server_state::instance().initialized()");
               }
             }
             
            diff --git a/sql/wsrep_utils.cc b/sql/wsrep_utils.cc
            index 854c11c1eea..56477d662ed 100644
            --- a/sql/wsrep_utils.cc
            +++ b/sql/wsrep_utils.cc
            @@ -377,6 +377,7 @@ process::wait ()
               if (pid_)
               {
                   int status;
            +      WSREP_INFO("Waiting for process %s", str_);
                   if (-1 == waitpid(pid_, &status, 0))
                   {
                       err_= errno; assert (err_);
            

            one can easily see why it hangs.
            This is when the test passed:

            2022-12-24 13:32:49 0 [Note] WSREP: before Wsrep_server_state::instance().initialized()
            2022-12-24 13:32:49 0 [Note] WSREP: Server initialized
            2022-12-24 13:32:49 0 [Note] WSREP: Server status change initializing -> initialized
            2022-12-24 13:32:49 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized
            2022-12-24 13:32:49 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
            2022-12-24 13:32:49 0 [Note] WSREP: after Wsrep_server_state::instance().initialized()
            

            Here:

            • the server status is changed to "initialized"
            • notification script runs (does nothing for --status initialized)
            • Wsrep_server_state::instance().initialized() returns
            • server initialization continues

            And this is from the timed out test:

            2022-12-24 13:19:02 0 [Note] WSREP: before Wsrep_server_state::instance().initialized()
            2022-12-24 13:19:02 0 [Note] WSREP: Server initialized
            2022-12-24 13:19:02 0 [Note] WSREP: Server status change initializing -> initialized
            2022-12-24 13:19:02 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized
            2022-12-24 13:19:02 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
            2022-12-24 13:19:02 5 [Note] WSREP: Cluster table is empty, not recovering transactions
            2022-12-24 13:19:02 2 [Note] WSREP: Server status change initialized -> joined
            2022-12-24 13:19:02 2 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status joined
            

            As you see, server status has changed to "joined" before Wsrep_server_state::instance().initialized() returned, so the script is started with a new status, on "joined" it tries to connect to the server to update some table, but the server isn't initialized yet, it's not listening for connections. So the server waits for the script to finish, script waits for mariadb client to connect, and the client cannot connect, because the server isn't listening.

            serg Sergei Golubchik added a comment - I find it the easiest to reproduce locally with rr : . /mtr galera.galera_var_notify_cmd --rr this hits it every time for me, and without --rr the test mostly passes. With the following change: diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc --- a/sql/wsrep_mysqld.cc +++ b/sql/wsrep_mysqld.cc @@ -847,7 +847,9 @@ void wsrep_init_globals() wsrep_init_schema(); if (WSREP_ON) { + WSREP_INFO("before Wsrep_server_state::instance().initialized()"); Wsrep_server_state::instance().initialized(); + WSREP_INFO("after Wsrep_server_state::instance().initialized()"); } } diff --git a/sql/wsrep_utils.cc b/sql/wsrep_utils.cc index 854c11c1eea..56477d662ed 100644 --- a/sql/wsrep_utils.cc +++ b/sql/wsrep_utils.cc @@ -377,6 +377,7 @@ process::wait () if (pid_) { int status; + WSREP_INFO("Waiting for process %s", str_); if (-1 == waitpid(pid_, &status, 0)) { err_= errno; assert (err_); one can easily see why it hangs. This is when the test passed: 2022-12-24 13:32:49 0 [Note] WSREP: before Wsrep_server_state::instance().initialized() 2022-12-24 13:32:49 0 [Note] WSREP: Server initialized 2022-12-24 13:32:49 0 [Note] WSREP: Server status change initializing -> initialized 2022-12-24 13:32:49 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized 2022-12-24 13:32:49 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1 2022-12-24 13:32:49 0 [Note] WSREP: after Wsrep_server_state::instance().initialized() Here: the server status is changed to "initialized" notification script runs (does nothing for --status initialized ) Wsrep_server_state::instance().initialized() returns server initialization continues And this is from the timed out test: 2022-12-24 13:19:02 0 [Note] WSREP: before Wsrep_server_state::instance().initialized() 2022-12-24 13:19:02 0 [Note] WSREP: Server initialized 2022-12-24 13:19:02 0 [Note] WSREP: Server status change initializing -> initialized 2022-12-24 13:19:02 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized 2022-12-24 13:19:02 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1 2022-12-24 13:19:02 5 [Note] WSREP: Cluster table is empty, not recovering transactions 2022-12-24 13:19:02 2 [Note] WSREP: Server status change initialized -> joined 2022-12-24 13:19:02 2 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status joined As you see, server status has changed to "joined" before Wsrep_server_state::instance().initialized() returned, so the script is started with a new status, on "joined" it tries to connect to the server to update some table, but the server isn't initialized yet, it's not listening for connections. So the server waits for the script to finish, script waits for mariadb client to connect, and the client cannot connect, because the server isn't listening.

            julien.fritsch I do not think this is critical bug as it effects at the moment only tests, and requires wsrep_notify_cmd configuration variable to be used, by default it is not defined. Regardless, I will work on this issue and propose a fix.

            janlindstrom Jan Lindström added a comment - julien.fritsch I do not think this is critical bug as it effects at the moment only tests, and requires wsrep_notify_cmd configuration variable to be used, by default it is not defined. Regardless, I will work on this issue and propose a fix.
            janlindstrom Jan Lindström added a comment - https://github.com/MariaDB/server/pull/2738
            sysprg Julius Goryavsky added a comment - Fixed, https://github.com/MariaDB/server/commit/632a503ce7726547d8d858ae89fc3833ff95fe83

            People

              sysprg Julius Goryavsky
              jplindst Jan Lindström (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.