Details

    Description

      Attachments

        Issue Links

          Activity

            jplindst Jan Lindström (Inactive) created issue -
            jplindst Jan Lindström (Inactive) made changes -
            Field Original Value New Value
            Status Open [ 1 ] In Progress [ 3 ]
            jplindst Jan Lindström (Inactive) made changes -
            Description * Example: https://buildbot.askmonty.org/buildbot/builders/kvm-deb-buster-amd64/builds/8294/steps/mtr-galera/logs/stdio
            * Suspected test cases: galera.galera_backup_stage
            or galera.MDEV-24143
            * Example: https://buildbot.askmonty.org/buildbot/builders/kvm-deb-buster-amd64/builds/8294/steps/mtr-galera/logs/stdio
            * Disabled :
            ** galera.MW-284
            ** galera.galera_binlog_checksum
            ** galera_var_notify_ssl_ipv6
            jplindst Jan Lindström (Inactive) made changes -
            Priority Blocker [ 1 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            Affects Version/s 10.7 [ 24805 ]
            Affects Version/s 10.8 [ 26121 ]
            Affects Version/s 10.9 [ 26905 ]
            Affects Version/s 10.10 [ 27530 ]
            Affects Version/s 10.11 [ 27614 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.7 [ 24805 ]
            Fix Version/s 10.8 [ 26121 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]

            I find it the easiest to reproduce locally with rr:

            ./mtr galera.galera_var_notify_cmd --rr
            

            this hits it every time for me, and without --rr the test mostly passes.

            With the following change:

            diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc
            --- a/sql/wsrep_mysqld.cc
            +++ b/sql/wsrep_mysqld.cc
            @@ -847,7 +847,9 @@ void wsrep_init_globals()
               wsrep_init_schema();
               if (WSREP_ON)
               {
            +    WSREP_INFO("before Wsrep_server_state::instance().initialized()");
                 Wsrep_server_state::instance().initialized();
            +    WSREP_INFO("after Wsrep_server_state::instance().initialized()");
               }
             }
             
            diff --git a/sql/wsrep_utils.cc b/sql/wsrep_utils.cc
            index 854c11c1eea..56477d662ed 100644
            --- a/sql/wsrep_utils.cc
            +++ b/sql/wsrep_utils.cc
            @@ -377,6 +377,7 @@ process::wait ()
               if (pid_)
               {
                   int status;
            +      WSREP_INFO("Waiting for process %s", str_);
                   if (-1 == waitpid(pid_, &status, 0))
                   {
                       err_= errno; assert (err_);
            

            one can easily see why it hangs.
            This is when the test passed:

            2022-12-24 13:32:49 0 [Note] WSREP: before Wsrep_server_state::instance().initialized()
            2022-12-24 13:32:49 0 [Note] WSREP: Server initialized
            2022-12-24 13:32:49 0 [Note] WSREP: Server status change initializing -> initialized
            2022-12-24 13:32:49 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized
            2022-12-24 13:32:49 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
            2022-12-24 13:32:49 0 [Note] WSREP: after Wsrep_server_state::instance().initialized()
            

            Here:

            • the server status is changed to "initialized"
            • notification script runs (does nothing for --status initialized)
            • Wsrep_server_state::instance().initialized() returns
            • server initialization continues

            And this is from the timed out test:

            2022-12-24 13:19:02 0 [Note] WSREP: before Wsrep_server_state::instance().initialized()
            2022-12-24 13:19:02 0 [Note] WSREP: Server initialized
            2022-12-24 13:19:02 0 [Note] WSREP: Server status change initializing -> initialized
            2022-12-24 13:19:02 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized
            2022-12-24 13:19:02 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
            2022-12-24 13:19:02 5 [Note] WSREP: Cluster table is empty, not recovering transactions
            2022-12-24 13:19:02 2 [Note] WSREP: Server status change initialized -> joined
            2022-12-24 13:19:02 2 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status joined
            

            As you see, server status has changed to "joined" before Wsrep_server_state::instance().initialized() returned, so the script is started with a new status, on "joined" it tries to connect to the server to update some table, but the server isn't initialized yet, it's not listening for connections. So the server waits for the script to finish, script waits for mariadb client to connect, and the client cannot connect, because the server isn't listening.

            serg Sergei Golubchik added a comment - I find it the easiest to reproduce locally with rr : . /mtr galera.galera_var_notify_cmd --rr this hits it every time for me, and without --rr the test mostly passes. With the following change: diff --git a/sql/wsrep_mysqld.cc b/sql/wsrep_mysqld.cc --- a/sql/wsrep_mysqld.cc +++ b/sql/wsrep_mysqld.cc @@ -847,7 +847,9 @@ void wsrep_init_globals() wsrep_init_schema(); if (WSREP_ON) { + WSREP_INFO("before Wsrep_server_state::instance().initialized()"); Wsrep_server_state::instance().initialized(); + WSREP_INFO("after Wsrep_server_state::instance().initialized()"); } } diff --git a/sql/wsrep_utils.cc b/sql/wsrep_utils.cc index 854c11c1eea..56477d662ed 100644 --- a/sql/wsrep_utils.cc +++ b/sql/wsrep_utils.cc @@ -377,6 +377,7 @@ process::wait () if (pid_) { int status; + WSREP_INFO("Waiting for process %s", str_); if (-1 == waitpid(pid_, &status, 0)) { err_= errno; assert (err_); one can easily see why it hangs. This is when the test passed: 2022-12-24 13:32:49 0 [Note] WSREP: before Wsrep_server_state::instance().initialized() 2022-12-24 13:32:49 0 [Note] WSREP: Server initialized 2022-12-24 13:32:49 0 [Note] WSREP: Server status change initializing -> initialized 2022-12-24 13:32:49 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized 2022-12-24 13:32:49 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1 2022-12-24 13:32:49 0 [Note] WSREP: after Wsrep_server_state::instance().initialized() Here: the server status is changed to "initialized" notification script runs (does nothing for --status initialized ) Wsrep_server_state::instance().initialized() returns server initialization continues And this is from the timed out test: 2022-12-24 13:19:02 0 [Note] WSREP: before Wsrep_server_state::instance().initialized() 2022-12-24 13:19:02 0 [Note] WSREP: Server initialized 2022-12-24 13:19:02 0 [Note] WSREP: Server status change initializing -> initialized 2022-12-24 13:19:02 0 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status initialized 2022-12-24 13:19:02 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1 2022-12-24 13:19:02 5 [Note] WSREP: Cluster table is empty, not recovering transactions 2022-12-24 13:19:02 2 [Note] WSREP: Server status change initialized -> joined 2022-12-24 13:19:02 2 [Note] WSREP: Waiting for process /home/serg/Abk/maria/mysql-test/std_data/wsrep_notify.sh --status joined As you see, server status has changed to "joined" before Wsrep_server_state::instance().initialized() returned, so the script is started with a new status, on "joined" it tries to connect to the server to update some table, but the server isn't initialized yet, it's not listening for connections. So the server waits for the script to finish, script waits for mariadb client to connect, and the client cannot connect, because the server isn't listening.
            serg Sergei Golubchik made changes -
            Summary Galera test case hangs Galera "notify" test cases hang
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.7 [ 24805 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.8 [ 26121 ]

            julien.fritsch I do not think this is critical bug as it effects at the moment only tests, and requires wsrep_notify_cmd configuration variable to be used, by default it is not defined. Regardless, I will work on this issue and propose a fix.

            janlindstrom Jan Lindström added a comment - julien.fritsch I do not think this is critical bug as it effects at the moment only tests, and requires wsrep_notify_cmd configuration variable to be used, by default it is not defined. Regardless, I will work on this issue and propose a fix.
            janlindstrom Jan Lindström made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            janlindstrom Jan Lindström made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ JIRAUSER53125 ]
            janlindstrom Jan Lindström added a comment - https://github.com/MariaDB/server/pull/2738
            janlindstrom Jan Lindström made changes -
            Assignee Jan Lindström [ JIRAUSER53125 ] Julius Goryavsky [ sysprg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            sysprg Julius Goryavsky made changes -
            Status In Review [ 10002 ] Stalled [ 10000 ]
            sysprg Julius Goryavsky made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            sysprg Julius Goryavsky added a comment - Fixed, https://github.com/MariaDB/server/commit/632a503ce7726547d8d858ae89fc3833ff95fe83
            sysprg Julius Goryavsky made changes -
            issue.field.resolutiondate 2023-09-12 01:46:46.0 2023-09-12 01:46:45.798
            sysprg Julius Goryavsky made changes -
            Fix Version/s 10.4.32 [ 29300 ]
            Fix Version/s 10.5.23 [ 29012 ]
            Fix Version/s 10.6.16 [ 29014 ]
            Fix Version/s 10.10.7 [ 29018 ]
            Fix Version/s 10.11.6 [ 29020 ]
            Fix Version/s 11.1.3 [ 29023 ]
            Fix Version/s 11.2.2 [ 29035 ]
            Fix Version/s 11.3.0 [ 29302 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.9 [ 26905 ]
            Fix Version/s 10.10 [ 27530 ]
            Fix Version/s 10.11 [ 27614 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            sysprg Julius Goryavsky made changes -
            Fix Version/s 11.0.4 [ 29021 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 11.2.2 [ 29035 ]
            Fix Version/s 11.3.0 [ 29302 ]
            janlindstrom Jan Lindström made changes -

            People

              sysprg Julius Goryavsky
              jplindst Jan Lindström (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.