Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32515

The test spider/bugfix.mdev_30370 fails with "98: Address already in use"

Details

    Description

      It only happens on builders at buildbot.mariadb.net, but it works fine for builders at buildbot.mariadb.org (see e.g. https://buildbot.mariadb.org/#grid?branch=bb-10.10-mdev-32507)

      For example

      https://buildbot.mariadb.net/buildbot/builders/kvm-deb-bullseye-amd64/builds/3143
      https://buildbot.mariadb.net/buildbot/builders/kvm-deb-bullseye-amd64/builds/3143/steps/mtr/logs/stdio

        spider/bugfix.mdev_30370                 w4 [ fail ]
                Test ended at 2023-10-18 03:56:23
       
        CURRENT_TEST: spider/bugfix.mdev_30370
        2023-10-18  3:56:09 0 [Warning] Could not increase number of max_open_files to more than 1024 (request: 32186)
        2023-10-18  3:56:09 0 [Warning] Changed limits: max_open_files: 1024  max_connections: 151 (was 151)  table_cache: 421 (was 2000)
        2023-10-18  3:56:09 0 [Note] Starting MariaDB 10.10.7-MariaDB-1:10.10.7+maria~deb11 source revision f63845524aacdd06431399c857c93aa52559b76c as process 5437
        2023-10-18  3:56:09 0 [Note] mariadbd: Aria engine: starting recovery
        recovered pages: 0% 11% 22% 36% 50% 60% 70% 80% 90% 100% (0.0 seconds); tables to flush: 3 2 1 0
         (0.0 seconds);
        #  [... 39 lines elided]
        2023-10-18  3:56:17 0 [Note] Retrying bind on TCP/IP port 3306
        2023-10-18  3:56:23 0 [ERROR] Can't start server: Bind on TCP/IP port. Got error: 98: Address already in use
        2023-10-18  3:56:23 0 [ERROR] Do you already have another server running on port: 3306 ?
        2023-10-18  3:56:23 0 [ERROR] Aborting
        mysqltest: At line 9: exec of '/usr/sbin/mariadbd --defaults-group-suffix=.1 --defaults-file=/dev/shm/var/4/my.cnf  --datadir=/dev/shm/var/4/mysqld.1.1/data/ --wsrep-recover --plugin-dir=/usr/lib/mysql/plugin/ --plugin-load-add=ha_spider' failed, error: 256, status: 1, errno: 32
        Output from before failure:
        # Kill the server
       
       
       
        The result from queries just before the failure was:
        #
        # MDEV-30370 mariadbd hangs when running with --wsrep-recover and --plugin-load-add=ha_spider.so
        #
        # Kill the server
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            As some background, this is a peculiar test, because it tests the
            server start with a flag (--wsrep-recover) that "aborts" the server
            start.

            mtr does not like to see its own server dead, so we can't have the
            flags in an .opt file, or in a restart_parameter for
            --source include/restart_mysqld.inc.

            Prior to MDEV-22979, the test used $MYSQLD_BOOTSTRAP_CMD. For whatever
            reason it stopped working with the fix of the spider init bugs, but in
            any case, we should use $MYSQLD_CMD instead.

            ycp Yuchen Pei added a comment - - edited As some background, this is a peculiar test, because it tests the server start with a flag (--wsrep-recover) that "aborts" the server start. mtr does not like to see its own server dead, so we can't have the flags in an .opt file, or in a restart_parameter for --source include/restart_mysqld.inc . Prior to MDEV-22979 , the test used $MYSQLD_BOOTSTRAP_CMD. For whatever reason it stopped working with the fix of the spider init bugs, but in any case, we should use $MYSQLD_CMD instead.
            ycp Yuchen Pei added a comment - - edited

            Here's an initial attempt to fix this issue:

            upstream/bb-10.10-mdev-32515 upstream/bb-10.10-all-builders 5bd85cb229f187c7c24c69659ff2caedb99f6366
            MDEV-32515 Use the mtr cnf in the spider/bugfix.mdev_30370 mysqld invocation
             
            This makes sure the $MYSQLD_CMD invocation uses the same port, and
            does not use the default port which may already be in use.

            Let's wait and see how it works in the CI. That did not work, as
            we get the same failures:
            https://buildbot.mariadb.net/buildbot/grid?category=main&branch=bb-10.10-all-builders
            https://buildbot.mariadb.org/#grid?branch=bb-10.10-mdev-32515

            ycp Yuchen Pei added a comment - - edited Here's an initial attempt to fix this issue: upstream/bb-10.10-mdev-32515 upstream/bb-10.10-all-builders 5bd85cb229f187c7c24c69659ff2caedb99f6366 MDEV-32515 Use the mtr cnf in the spider/bugfix.mdev_30370 mysqld invocation   This makes sure the $MYSQLD_CMD invocation uses the same port, and does not use the default port which may already be in use. Let's wait and see how it works in the CI. That did not work, as we get the same failures: https://buildbot.mariadb.net/buildbot/grid?category=main&branch=bb-10.10-all-builders https://buildbot.mariadb.org/#grid?branch=bb-10.10-mdev-32515
            ycp Yuchen Pei added a comment -

            It is strange that this test fails at --exec $MYSQLD_CMD with "Do you
            already have another server running on socket"[1][2] or "Do you
            already have another server running on port: 3306 ?" [3][4]

            let $MYSQLD_DATADIR= `select @@datadir`;
            let $PLUGIN_DIR=`select @@plugin_dir`;
            --source include/kill_mysqld.inc
            --exec $MYSQLD_CMD --datadir=$MYSQLD_DATADIR --wsrep-recover --plugin-dir=$PLUGIN_DIR --plugin-load-add=ha_spider
            --source include/start_mysqld.inc
            --disable_query_log
            --source ../../include/clean_up_spider.inc

            but this test passes

            let $MYSQLD_DATADIR= `select @@datadir`;
            let $PLUGIN_DIR=`select @@plugin_dir`;
            --source include/kill_mysqld.inc
            --write_file $MYSQLTEST_VARDIR/tmp/mdev_22979.sql
            drop table if exists foo.bar;
            EOF
            --exec $MYSQLD_CMD --datadir=$MYSQLD_DATADIR --bootstrap --plugin-dir=$PLUGIN_DIR --plugin-load-add=ha_spider < $MYSQLTEST_VARDIR/tmp/mdev_22979.sql
            --source include/start_mysqld.inc
            --disable_query_log
            --source ../../include/clean_up_spider.inc

            The only difference I can see is the --bootstrap flag... It only
            happens in certain old buildbot CI builders[5].

            [1]
            https://buildbot.mariadb.net/buildbot/builders/kvm-zyp-opensuse150-amd64/builds/10857/steps/mtr/logs/stdio
            [2]
            https://buildbot.mariadb.net/buildbot/builders/kvm-zyp-opensuse150-amd64/builds/10857
            [3]
            https://buildbot.mariadb.net/buildbot/builders/kvm-deb-bullseye-aarch64/builds/1585
            [4]
            https://buildbot.mariadb.net/buildbot/builders/kvm-deb-bullseye-aarch64/builds/1585/steps/mtr/logs/stdio
            [5]
            https://buildbot.mariadb.net/buildbot/grid?category=main&branch=bb-10.10-all-builders

            In any case, I created a commit to add --bootstrap. Let's see whether
            that helps.

            upstream/bb-10.10-all-builders c93d8c32c97170d63e968da047927ecf0a3b2001
            MDEV-32515 [experiment] Add --bootstrap to the $MYSQLD_CMD invocation in mdev_30370
             
            After all, spider/bugfix.mdev_22979 passes, and the only difference
            that may matter is the --bootstrap flag

            ycp Yuchen Pei added a comment - It is strange that this test fails at --exec $MYSQLD_CMD with "Do you already have another server running on socket" [1] [2] or "Do you already have another server running on port: 3306 ?" [3] [4] let $MYSQLD_DATADIR= ` select @@datadir`; let $PLUGIN_DIR=` select @@plugin_dir`; --source include/kill_mysqld.inc --exec $MYSQLD_CMD --datadir=$MYSQLD_DATADIR --wsrep-recover --plugin-dir=$PLUGIN_DIR --plugin-load-add=ha_spider --source include/start_mysqld.inc --disable_query_log --source ../../include/clean_up_spider.inc but this test passes let $MYSQLD_DATADIR= ` select @@datadir`; let $PLUGIN_DIR=` select @@plugin_dir`; --source include/kill_mysqld.inc --write_file $MYSQLTEST_VARDIR/tmp/mdev_22979.sql drop table if exists foo.bar; EOF --exec $MYSQLD_CMD --datadir=$MYSQLD_DATADIR --bootstrap --plugin-dir=$PLUGIN_DIR --plugin-load-add=ha_spider < $MYSQLTEST_VARDIR/tmp/mdev_22979.sql --source include/start_mysqld.inc --disable_query_log --source ../../include/clean_up_spider.inc The only difference I can see is the --bootstrap flag... It only happens in certain old buildbot CI builders [5] . [1] https://buildbot.mariadb.net/buildbot/builders/kvm-zyp-opensuse150-amd64/builds/10857/steps/mtr/logs/stdio [2] https://buildbot.mariadb.net/buildbot/builders/kvm-zyp-opensuse150-amd64/builds/10857 [3] https://buildbot.mariadb.net/buildbot/builders/kvm-deb-bullseye-aarch64/builds/1585 [4] https://buildbot.mariadb.net/buildbot/builders/kvm-deb-bullseye-aarch64/builds/1585/steps/mtr/logs/stdio [5] https://buildbot.mariadb.net/buildbot/grid?category=main&branch=bb-10.10-all-builders In any case, I created a commit to add --bootstrap. Let's see whether that helps. upstream/bb-10.10-all-builders c93d8c32c97170d63e968da047927ecf0a3b2001 MDEV-32515 [experiment] Add --bootstrap to the $MYSQLD_CMD invocation in mdev_30370   After all, spider/bugfix.mdev_22979 passes, and the only difference that may matter is the --bootstrap flag
            ycp Yuchen Pei added a comment - - edited

            This simple fix seems to work, see [1] for the commit c9e5d725bb8c
            which is identical except the commit comment.

            bb-10.10-mdev-32515 85262c138dbdd1e39046571cb87645621fa7baf2
            MDEV-32515 Use $MYSQLD_LAST_CMD in spider/bugfix.mdev_30370
             
            $MYSQLD_CMD uses .1 as the defaults-group-suffix, which could cause
            the use of the default port (3306) or socket, which will fail in
            environment where these defaults are already in use by another server.
             
            Adding an extra --defaults-group-suffix=.1.1 does not help, because
            the first flag wins.
             
            So we use $MYSQLD_LAST_CMD instead, which uses the correct suffix.
             
            The extra innodb buffer pool warning is irrelevant to the goal of the
            test (running --wsrep-recover with --plug-load-add=ha_spider should
            not cause hang)

            [1] https://buildbot.mariadb.net/buildbot/grid?category=main&branch=bb-10.10-all-builders

            ycp Yuchen Pei added a comment - - edited This simple fix seems to work, see [1] for the commit c9e5d725bb8c which is identical except the commit comment. bb-10.10-mdev-32515 85262c138dbdd1e39046571cb87645621fa7baf2 MDEV-32515 Use $MYSQLD_LAST_CMD in spider/bugfix.mdev_30370   $MYSQLD_CMD uses .1 as the defaults-group-suffix, which could cause the use of the default port (3306) or socket, which will fail in environment where these defaults are already in use by another server.   Adding an extra --defaults-group-suffix=.1.1 does not help, because the first flag wins.   So we use $MYSQLD_LAST_CMD instead, which uses the correct suffix.   The extra innodb buffer pool warning is irrelevant to the goal of the test (running --wsrep-recover with --plug-load-add=ha_spider should not cause hang) [1] https://buildbot.mariadb.net/buildbot/grid?category=main&branch=bb-10.10-all-builders
            ycp Yuchen Pei added a comment -

            danblack said "c9e5d725bb8c0d8eb28caf6bc766e946fc0cf8d7 is fine. Reviewed by me complete." Thanks for the review.

            So I am going to push 85262c138dbdd1e39046571cb87645621fa7baf2 which is the same as c9e5d725bb8c0d8eb28caf6bc766e946fc0cf8d7 except a more elaborate commit message.

            ycp Yuchen Pei added a comment - danblack said "c9e5d725bb8c0d8eb28caf6bc766e946fc0cf8d7 is fine. Reviewed by me complete." Thanks for the review. So I am going to push 85262c138dbdd1e39046571cb87645621fa7baf2 which is the same as c9e5d725bb8c0d8eb28caf6bc766e946fc0cf8d7 except a more elaborate commit message.
            danblack Daniel Black added a comment -

            Yep. Its good.

            danblack Daniel Black added a comment - Yep. Its good.
            danblack Daniel Black added a comment -

            yep. good fix.

            danblack Daniel Black added a comment - yep. good fix.
            ycp Yuchen Pei added a comment -

            Pushed 057fd528766eba150b9d7a0de8f95a4094f0e460 to 10.10

            ycp Yuchen Pei added a comment - Pushed 057fd528766eba150b9d7a0de8f95a4094f0e460 to 10.10

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.