Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34788

Tests main.bind_address_resolution and main.bind_multiple_addresses_resolution fail on Debian builders

Details

    • Bug
    • Status: Open (View Workflow)
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • Tests
    • None

    Description

      On the official Debian buildd hosts rebuilds of MariaDB occasionally fail on MTR tests:

      > main.bind_multiple_addresses_resolution w2 [ fail ]
      > Test ended at 2024-07-28 18:37:49
      >
      > CURRENT_TEST: main.bind_multiple_addresses_resolution
      >
      >
      > Failed to start mysqld.1
      > mysqltest failed but provided no output
      >
      >
      > - saving '/<<PKGBUILDDIR>>/builddir/mysql-test/var/2/log/main.bind_multiple_addresses_resolution/' to '/<<PKGBUILDDIR>>/builddir/mysql-test/var/log/main.bind_multiple_addresses_resolution/'
      >
      > Retrying test main.bind_multiple_addresses_resolution, attempt(2/3)...
      >
      > worker[02] > Restart - not started
      > ***Warnings generated in error logs during shutdown after running tests: main.bind_multiple_addresses_resolution
      >
      > 2024-07-28 18:37:49 0 [ERROR] Can't start server: Bind on TCP/IP port. Got error: 98: Address already in use
      > 2024-07-28 18:37:49 0 [ERROR] Do you already have another server running on port: 16100 ?
      > 2024-07-28 18:37:49 0 [ERROR] Aborting

      > main.bind_address_resolution w1 [ fail ]
      > Test ended at 2024-07-28 18:37:49
      >
      > CURRENT_TEST: main.bind_address_resolution
      >
      >
      > Failed to start mysqld.1
      > mysqltest failed but provided no output
      >
      >
      > - skipping '/<<PKGBUILDDIR>>/builddir/mysql-test/var/1/log/main.bind_address_resolution/'
      >
      > Retrying test main.bind_address_resolution, attempt(2/3)...
      >
      > worker[01] > Restart - not started
      > ***Warnings generated in error logs during shutdown after running tests: main.bind_address_resolution
      >
      > 2024-07-28 18:37:49 0 [ERROR] Can't start server: Bind on TCP/IP port. Got error: 98: Address already in use
      > 2024-07-28 18:37:49 0 [ERROR] Do you already have another server running on port: 16000 ?
      > 2024-07-28 18:37:49 0 [ERROR] Aborting

      This is still happening with MariaDB 11.4, thus filing bug upstream for help.

      Details in Debian bug https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052838

      Attachments

        Issue Links

          Activity

            danblack Daniel Black added a comment -

            There is at least one log https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64&ver=1%3A10.11.5-2&stamp=1696735349&raw=0 that begins its test run as:

            MariaDB Version 10.11.5-MariaDB-2
             - SSL connections supported
            Using suites: main
            Collecting tests...
            Installing system database...
            worker[1]  - 'localhost:16000' was not free
            worker[1]  - 'localhost:16020' was not free
            worker[1]  - 'localhost:16040' was not free
            worker[1]  - 'localhost:16060' was not free
            worker[1]  - 'localhost:16080' was not free
            worker[1] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119
            worker[5]  - 'localhost:16000' was not free
            worker[5]  - 'localhost:16020' was not free
            worker[5]  - 'localhost:16040' was not free
            worker[5]  - 'localhost:16060' was not free
            worker[5]  - 'localhost:16080' was not free
            worker[5] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139
            worker[3]  - 'localhost:16000' was not free
            worker[3]  - 'localhost:16020' was not free
            worker[3]  - 'localhost:16040' was not free
            worker[3]  - 'localhost:16060' was not free
            worker[3]  - 'localhost:16080' was not free
            worker[4]  - 'localhost:16000' was not free
            worker[9]  - 'localhost:16020' was not free
            worker[8]  - 'localhost:16040' was not free
            worker[9]  - 'localhost:16060' was not free
            worker[4]  - 'localhost:16020' was not free
            worker[8]  - 'localhost:16080' was not free
            worker[4]  - 'localhost:16040' was not free
            worker[4]  - 'localhost:16060' was not free
            worker[9]  - 'localhost:16160' was not free
            worker[8]  - 'localhost:16180' was not free
            worker[8]  - 'localhost:16220' was not free
            worker[4]  - 'localhost:16080' was not free
            worker[9]  - 'localhost:16200' was not free
            worker[4]  - 'localhost:16160' was not free
            worker[9]  - 'localhost:16220' was not free
            worker[2]  - 'localhost:16000' was not free
            worker[4]  - 'localhost:16180' was not free
            worker[4]  - 'localhost:16200' was not free
            worker[4]  - 'localhost:16220' was not free
            worker[4]  - 'localhost:16280' was not free
            worker[4]  - 'localhost:16300' was not free
            worker[2]  - 'localhost:16020' was not free
            worker[2]  - 'localhost:16040' was not free
            worker[2]  - 'localhost:16060' was not free
            worker[8]  - 'localhost:16240' was not free
            worker[9] Using MTR_BUILD_THREAD 313, with reserved ports 16260..16279
             
            worker[2]  - 'localhost:16080' was not free
            ==============================================================================
             
            TEST                                  WORKER RESULT   TIME (ms) or COMMENT
            worker[4] Using MTR_BUILD_THREAD 316, with reserved ports 16320..16339
            --------------------------------------------------------------------------
             
            worker[8]  - 'localhost:16280' was not free
            worker[2]  - 'localhost:16160' was not free
            worker[8]  - 'localhost:16300' was not free
            worker[2]  - 'localhost:16180' was not free
            worker[2]  - 'localhost:16200' was not free
            worker[2]  - 'localhost:16220' was not free
            worker[2]  - 'localhost:16240' was not free
            worker[2]  - 'localhost:16280' was not free
            worker[2]  - 'localhost:16300' was not free
            worker[3] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159
            worker[7]  - 'localhost:16000' was not free
            worker[7]  - 'localhost:16020' was not free
            worker[7]  - 'localhost:16040' was not free
            worker[7]  - 'localhost:16060' was not free
            worker[2] Using MTR_BUILD_THREAD 318, with reserved ports 16360..16379
            worker[7]  - 'localhost:16080' was not free
            worker[8] Using MTR_BUILD_THREAD 317, with reserved ports 16340..16359
            worker[7]  - 'localhost:16160' was not free
            worker[7]  - 'localhost:16180' was not free
            worker[7]  - 'localhost:16200' was not free
            worker[7]  - 'localhost:16220' was not free
            worker[7]  - 'localhost:16240' was not free
            worker[7]  - 'localhost:16280' was not free
            worker[7]  - 'localhost:16300' was not free
            worker[11]  - 'localhost:16000' was not free
            worker[11]  - 'localhost:16020' was not free
            worker[11]  - 'localhost:16040' was not free
            worker[11]  - 'localhost:16060' was not free
            worker[11]  - 'localhost:16080' was not free
            worker[10]  - 'localhost:16000' was not free
            worker[11]  - 'localhost:16160' was not free
            worker[15]  - 'localhost:16020' was not free
            worker[10]  - 'localhost:16040' was not free
            worker[11]  - 'localhost:16180' was not free
            worker[15]  - 'localhost:16060' was not free
            worker[11]  - 'localhost:16200' was not free
            worker[10]  - 'localhost:16080' was not free
            worker[11]  - 'localhost:16220' was not free
            worker[15]  - 'localhost:16160' was not free
            worker[11]  - 'localhost:16240' was not free
            worker[15]  - 'localhost:16180' was not free
            worker[10]  - 'localhost:16160' was not free
            worker[11]  - 'localhost:16280' was not free
            worker[15]  - 'localhost:16200' was not free
            worker[15]  - 'localhost:16220' was not free
            worker[11]  - 'localhost:16300' was not free
            worker[15]  - 'localhost:16240' was not free
            worker[15]  - 'localhost:16280' was not free
            worker[15]  - 'localhost:16300' was not free
            worker[6]  - 'localhost:16000' was not free
            worker[6]  - 'localhost:16020' was not free
            worker[11] Using MTR_BUILD_THREAD 320, with reserved ports 16400..16419
            worker[6]  - 'localhost:16040' was not free
            worker[6]  - 'localhost:16060' was not free
            worker[15] Using MTR_BUILD_THREAD 321, with reserved ports 16420..16439
            worker[6]  - 'localhost:16080' was not free
            worker[12]  - 'localhost:16020' was not free
            worker[12]  - 'localhost:16040' was not free
            worker[12]  - 'localhost:16060' was not free
            worker[12]  - 'localhost:16080' was not free
            worker[10]  - 'localhost:16180' was not free
            worker[12]  - 'localhost:16200' was not free
            worker[10]  - 'localhost:16200' was not free
            worker[10]  - 'localhost:16240' was not free
            worker[12]  - 'localhost:16220' was not free
            worker[10]  - 'localhost:16280' was not free
            worker[6]  - 'localhost:16160' was not free
            worker[12]  - 'localhost:16240' was not free
            worker[10]  - 'localhost:16300' was not free
            worker[6]  - 'localhost:16180' was not free
            worker[6]  - 'localhost:16200' was not free
            worker[12]  - 'localhost:16280' was not free
            worker[6]  - 'localhost:16220' was not free
            worker[6]  - 'localhost:16240' was not free
            worker[16]  - 'localhost:16000' was not free
            worker[6]  - 'localhost:16280' was not free
            worker[12]  - 'localhost:16300' was not free
            worker[16]  - 'localhost:16020' was not free
            worker[16]  - 'localhost:16040' was not free
            worker[16]  - 'localhost:16060' was not free
            worker[16]  - 'localhost:16080' was not free
            worker[10] Using MTR_BUILD_THREAD 322, with reserved ports 16440..16459
            worker[16]  - 'localhost:16160' was not free
            worker[16]  - 'localhost:16180' was not free
            worker[16]  - 'localhost:16200' was not free
            worker[16]  - 'localhost:16220' was not free
            worker[16]  - 'localhost:16240' was not free
            worker[16]  - 'localhost:16280' was not free
            worker[16]  - 'localhost:16300' was not free
            worker[12] Using MTR_BUILD_THREAD 324, with reserved ports 16480..16499
            worker[6] Using MTR_BUILD_THREAD 323, with reserved ports 16460..16479
            worker[14]  - 'localhost:16000' was not free
            worker[7] Using MTR_BUILD_THREAD 319, with reserved ports 16380..16399
            worker[14]  - 'localhost:16020' was not free
            worker[14]  - 'localhost:16040' was not free
            worker[14]  - 'localhost:16060' was not free
            worker[14]  - 'localhost:16080' was not free
            worker[14]  - 'localhost:16160' was not free
            worker[14]  - 'localhost:16180' was not free
            worker[13]  - 'localhost:16000' was not free
            worker[14]  - 'localhost:16200' was not free
            worker[13]  - 'localhost:16020' was not free
            worker[14]  - 'localhost:16220' was not free
            worker[13]  - 'localhost:16040' was not free
            worker[13]  - 'localhost:16060' was not free
            worker[14]  - 'localhost:16240' was not free
            worker[13]  - 'localhost:16080' was not free
            worker[14]  - 'localhost:16280' was not free
            worker[14]  - 'localhost:16300' was not free
            worker[13]  - 'localhost:16160' was not free
            worker[16] Using MTR_BUILD_THREAD 325, with reserved ports 16500..16519
            worker[13]  - 'localhost:16180' was not free
            worker[13]  - 'localhost:16200' was not free
            worker[13]  - 'localhost:16220' was not free
            worker[13]  - 'localhost:16240' was not free
            worker[13]  - 'localhost:16280' was not free
            worker[13]  - 'localhost:16300' was not free
            worker[14] Using MTR_BUILD_THREAD 326, with reserved ports 16520..16539
            worker[13] Using MTR_BUILD_THREAD 327, with reserved ports 16540..16559
            

            Reserving ports is a function within MTR itself base on filesystem locks. It might be the case a different mtr is running in the same network namespace at the same time to trigger this.

            I couldn't find a reference to the above failure on https://buildd.debian.org/status/package.php?p=mariadb&suite=sid

            If these are running in a chroot the network is shared. From the above its obvious some architectures share builders resulting in collisions.

            Lucas in the debian bug seems to have hit an isolated case. Hard to say without full logs. Might still be a particular port and how its bound. I can't easily say.

            danblack Daniel Black added a comment - There is at least one log https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=ppc64&ver=1%3A10.11.5-2&stamp=1696735349&raw=0 that begins its test run as: MariaDB Version 10.11.5-MariaDB-2 - SSL connections supported Using suites: main Collecting tests... Installing system database... worker[1] - 'localhost:16000' was not free worker[1] - 'localhost:16020' was not free worker[1] - 'localhost:16040' was not free worker[1] - 'localhost:16060' was not free worker[1] - 'localhost:16080' was not free worker[1] Using MTR_BUILD_THREAD 305, with reserved ports 16100..16119 worker[5] - 'localhost:16000' was not free worker[5] - 'localhost:16020' was not free worker[5] - 'localhost:16040' was not free worker[5] - 'localhost:16060' was not free worker[5] - 'localhost:16080' was not free worker[5] Using MTR_BUILD_THREAD 306, with reserved ports 16120..16139 worker[3] - 'localhost:16000' was not free worker[3] - 'localhost:16020' was not free worker[3] - 'localhost:16040' was not free worker[3] - 'localhost:16060' was not free worker[3] - 'localhost:16080' was not free worker[4] - 'localhost:16000' was not free worker[9] - 'localhost:16020' was not free worker[8] - 'localhost:16040' was not free worker[9] - 'localhost:16060' was not free worker[4] - 'localhost:16020' was not free worker[8] - 'localhost:16080' was not free worker[4] - 'localhost:16040' was not free worker[4] - 'localhost:16060' was not free worker[9] - 'localhost:16160' was not free worker[8] - 'localhost:16180' was not free worker[8] - 'localhost:16220' was not free worker[4] - 'localhost:16080' was not free worker[9] - 'localhost:16200' was not free worker[4] - 'localhost:16160' was not free worker[9] - 'localhost:16220' was not free worker[2] - 'localhost:16000' was not free worker[4] - 'localhost:16180' was not free worker[4] - 'localhost:16200' was not free worker[4] - 'localhost:16220' was not free worker[4] - 'localhost:16280' was not free worker[4] - 'localhost:16300' was not free worker[2] - 'localhost:16020' was not free worker[2] - 'localhost:16040' was not free worker[2] - 'localhost:16060' was not free worker[8] - 'localhost:16240' was not free worker[9] Using MTR_BUILD_THREAD 313, with reserved ports 16260..16279   worker[2] - 'localhost:16080' was not free ==============================================================================   TEST WORKER RESULT TIME (ms) or COMMENT worker[4] Using MTR_BUILD_THREAD 316, with reserved ports 16320..16339 --------------------------------------------------------------------------   worker[8] - 'localhost:16280' was not free worker[2] - 'localhost:16160' was not free worker[8] - 'localhost:16300' was not free worker[2] - 'localhost:16180' was not free worker[2] - 'localhost:16200' was not free worker[2] - 'localhost:16220' was not free worker[2] - 'localhost:16240' was not free worker[2] - 'localhost:16280' was not free worker[2] - 'localhost:16300' was not free worker[3] Using MTR_BUILD_THREAD 307, with reserved ports 16140..16159 worker[7] - 'localhost:16000' was not free worker[7] - 'localhost:16020' was not free worker[7] - 'localhost:16040' was not free worker[7] - 'localhost:16060' was not free worker[2] Using MTR_BUILD_THREAD 318, with reserved ports 16360..16379 worker[7] - 'localhost:16080' was not free worker[8] Using MTR_BUILD_THREAD 317, with reserved ports 16340..16359 worker[7] - 'localhost:16160' was not free worker[7] - 'localhost:16180' was not free worker[7] - 'localhost:16200' was not free worker[7] - 'localhost:16220' was not free worker[7] - 'localhost:16240' was not free worker[7] - 'localhost:16280' was not free worker[7] - 'localhost:16300' was not free worker[11] - 'localhost:16000' was not free worker[11] - 'localhost:16020' was not free worker[11] - 'localhost:16040' was not free worker[11] - 'localhost:16060' was not free worker[11] - 'localhost:16080' was not free worker[10] - 'localhost:16000' was not free worker[11] - 'localhost:16160' was not free worker[15] - 'localhost:16020' was not free worker[10] - 'localhost:16040' was not free worker[11] - 'localhost:16180' was not free worker[15] - 'localhost:16060' was not free worker[11] - 'localhost:16200' was not free worker[10] - 'localhost:16080' was not free worker[11] - 'localhost:16220' was not free worker[15] - 'localhost:16160' was not free worker[11] - 'localhost:16240' was not free worker[15] - 'localhost:16180' was not free worker[10] - 'localhost:16160' was not free worker[11] - 'localhost:16280' was not free worker[15] - 'localhost:16200' was not free worker[15] - 'localhost:16220' was not free worker[11] - 'localhost:16300' was not free worker[15] - 'localhost:16240' was not free worker[15] - 'localhost:16280' was not free worker[15] - 'localhost:16300' was not free worker[6] - 'localhost:16000' was not free worker[6] - 'localhost:16020' was not free worker[11] Using MTR_BUILD_THREAD 320, with reserved ports 16400..16419 worker[6] - 'localhost:16040' was not free worker[6] - 'localhost:16060' was not free worker[15] Using MTR_BUILD_THREAD 321, with reserved ports 16420..16439 worker[6] - 'localhost:16080' was not free worker[12] - 'localhost:16020' was not free worker[12] - 'localhost:16040' was not free worker[12] - 'localhost:16060' was not free worker[12] - 'localhost:16080' was not free worker[10] - 'localhost:16180' was not free worker[12] - 'localhost:16200' was not free worker[10] - 'localhost:16200' was not free worker[10] - 'localhost:16240' was not free worker[12] - 'localhost:16220' was not free worker[10] - 'localhost:16280' was not free worker[6] - 'localhost:16160' was not free worker[12] - 'localhost:16240' was not free worker[10] - 'localhost:16300' was not free worker[6] - 'localhost:16180' was not free worker[6] - 'localhost:16200' was not free worker[12] - 'localhost:16280' was not free worker[6] - 'localhost:16220' was not free worker[6] - 'localhost:16240' was not free worker[16] - 'localhost:16000' was not free worker[6] - 'localhost:16280' was not free worker[12] - 'localhost:16300' was not free worker[16] - 'localhost:16020' was not free worker[16] - 'localhost:16040' was not free worker[16] - 'localhost:16060' was not free worker[16] - 'localhost:16080' was not free worker[10] Using MTR_BUILD_THREAD 322, with reserved ports 16440..16459 worker[16] - 'localhost:16160' was not free worker[16] - 'localhost:16180' was not free worker[16] - 'localhost:16200' was not free worker[16] - 'localhost:16220' was not free worker[16] - 'localhost:16240' was not free worker[16] - 'localhost:16280' was not free worker[16] - 'localhost:16300' was not free worker[12] Using MTR_BUILD_THREAD 324, with reserved ports 16480..16499 worker[6] Using MTR_BUILD_THREAD 323, with reserved ports 16460..16479 worker[14] - 'localhost:16000' was not free worker[7] Using MTR_BUILD_THREAD 319, with reserved ports 16380..16399 worker[14] - 'localhost:16020' was not free worker[14] - 'localhost:16040' was not free worker[14] - 'localhost:16060' was not free worker[14] - 'localhost:16080' was not free worker[14] - 'localhost:16160' was not free worker[14] - 'localhost:16180' was not free worker[13] - 'localhost:16000' was not free worker[14] - 'localhost:16200' was not free worker[13] - 'localhost:16020' was not free worker[14] - 'localhost:16220' was not free worker[13] - 'localhost:16040' was not free worker[13] - 'localhost:16060' was not free worker[14] - 'localhost:16240' was not free worker[13] - 'localhost:16080' was not free worker[14] - 'localhost:16280' was not free worker[14] - 'localhost:16300' was not free worker[13] - 'localhost:16160' was not free worker[16] Using MTR_BUILD_THREAD 325, with reserved ports 16500..16519 worker[13] - 'localhost:16180' was not free worker[13] - 'localhost:16200' was not free worker[13] - 'localhost:16220' was not free worker[13] - 'localhost:16240' was not free worker[13] - 'localhost:16280' was not free worker[13] - 'localhost:16300' was not free worker[14] Using MTR_BUILD_THREAD 326, with reserved ports 16520..16539 worker[13] Using MTR_BUILD_THREAD 327, with reserved ports 16540..16559 Reserving ports is a function within MTR itself base on filesystem locks. It might be the case a different mtr is running in the same network namespace at the same time to trigger this. I couldn't find a reference to the above failure on https://buildd.debian.org/status/package.php?p=mariadb&suite=sid If these are running in a chroot the network is shared. From the above its obvious some architectures share builders resulting in collisions. Lucas in the debian bug seems to have hit an isolated case. Hard to say without full logs. Might still be a particular port and how its bound. I can't easily say.
            otto Otto Kekäläinen added a comment - For reference, this is tracked in Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1052838 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1077524 https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1091557
            danblack Daniel Black added a comment -

            Had some "address in use" errors in Buildbot that marko commented as the following which might be related:

            mtr is notoriously bad at handling process crashes. Often they are noticed, sometimes not. There was some significant improvement a couple of years ago, but it is not stable yet. It’s easy to test this by introducing some code that crashes in most tests outside bootstrap, and then running ./mtr --force --max-test-fail=0. Just today I noticed that if mariadbd would crash during bootstrap, MDEV-21010 could kick in and cause everything to hang instead of crashing.

            danblack Daniel Black added a comment - Had some "address in use" errors in Buildbot that marko commented as the following which might be related: mtr is notoriously bad at handling process crashes. Often they are noticed, sometimes not. There was some significant improvement a couple of years ago, but it is not stable yet. It’s easy to test this by introducing some code that crashes in most tests outside bootstrap, and then running ./mtr --force --max-test-fail=0. Just today I noticed that if mariadbd would crash during bootstrap, MDEV-21010 could kick in and cause everything to hang instead of crashing.

            In https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=amd64&ver=1%3A11.4.5-2%7Eexp1&stamp=1740281411&raw=0 these were passing:

            main.bind_address_resolution             w3 [ pass ]     25
            main.bind_multiple_addresses_resolution  w3 [ pass ]     34
            

            I will re-enable these and continue to monitor to see how sporadic this is with MariaDB 11.8.

            otto Otto Kekäläinen added a comment - In https://buildd.debian.org/status/fetch.php?pkg=mariadb&arch=amd64&ver=1%3A11.4.5-2%7Eexp1&stamp=1740281411&raw=0 these were passing: main.bind_address_resolution w3 [ pass ] 25 main.bind_multiple_addresses_resolution w3 [ pass ] 34 I will re-enable these and continue to monitor to see how sporadic this is with MariaDB 11.8.

            People

              Unassigned Unassigned
              otto Otto Kekäläinen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.