Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32046

Spider test instability caused by ER_NET_READ_ERROR

Details

    Description

      Setting it to critical because it pollutes CI and local mtr output. Might be similar to MDEV-31586

      For example: https://buildbot.mariadb.org/#/builders/35/builds/27272

      ^ this failure is at a custom branch (bb-11.2-ycp-mdev-28856). First we need to verify that it reprod on a main version.

      spider/bugfix.mdev_27240                 w6 [ fail ]
              Test ended at 2023-08-30 08:48:28
       
      CURRENT_TEST: spider/bugfix.mdev_27240
      mysqltest: At line 18: query 'LOCK TABLE tbl_a READ' failed with wrong errno ER_NET_READ_ERROR (1158): 'Got an error reading communication packets', instead of ER_CONNECT_TO_FOREIGN_DATA_SOURCE (1429)...
       
      The result from queries just before the failure was:
      for master_1
      for child2
      for child3
      CREATE DATABASE auto_test_local;
      USE auto_test_local;
      CREATE TABLE tbl_a (a INT KEY) ENGINE=SPIDER;
      SELECT a.z FROM tbl_a AS a,tbl_a b WHERE a.z=b.z;
      ERROR 42S22: Unknown column 'a.z' in 'field list'
      ALTER TABLE tbl_a CHANGE c c INT;
      ERROR 42S22: Unknown column 'c' in 'tbl_a'
      LOCK TABLE tbl_a READ;
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            I am linking this issue to MDEV-28856 because the CI failures in the
            description appear in bb-11.2-ycp-mdev-28856, development branch
            with spider commits that are under review or have been reviewed,
            with the changes for MDEV-28856 on top.

            Actually, this issue is probably not related to MDEV-28856, as the
            failures don't appear in bb-11.3-mdev-28856, and given it is failing
            in multiple builds in bb-11.3-ycp-mdev-28856, it is probably some
            other commits in the latter branch.

            ycp Yuchen Pei added a comment - - edited I am linking this issue to MDEV-28856 because the CI failures in the description appear in bb-11.2-ycp-mdev-28856, development branch with spider commits that are under review or have been reviewed, with the changes for MDEV-28856 on top. Actually, this issue is probably not related to MDEV-28856 , as the failures don't appear in bb-11.3-mdev-28856, and given it is failing in multiple builds in bb-11.3-ycp-mdev-28856, it is probably some other commits in the latter branch.
            ycp Yuchen Pei added a comment -

            Lowering the prio since it is not appearing in main branches.

            ycp Yuchen Pei added a comment - Lowering the prio since it is not appearing in main branches.
            ycp Yuchen Pei added a comment -

            Hi julien.fritsch, I do not see it happening in 11.2, so I'm not
            sure if it makes sense to add 11.2 as a fixversion.

            The bug appeared in some of my custom branches, which contain unpushed
            commits (probably the cause), or pushed but not merged commits
            (probably not the cause).

            I think it is ok for it to not appear in my queue atm, as it will
            re-emerge if it still exists.

            ycp Yuchen Pei added a comment - Hi julien.fritsch , I do not see it happening in 11.2, so I'm not sure if it makes sense to add 11.2 as a fixversion. The bug appeared in some of my custom branches, which contain unpushed commits (probably the cause), or pushed but not merged commits (probably not the cause). I think it is ok for it to not appear in my queue atm, as it will re-emerge if it still exists.
            ycp Yuchen Pei added a comment -

            Some observations:

            Today I applied commits for MDEV-32157 to a custom branch
            (bb-11.0-ycp-mdev-26247):

            2f2fbabe24c MDEV-32157 MDEV-28856 Spider: Tests, documentation, small fixes and cleanups
            13ca614e1b3 MDEV-32157 MDEV-28856 Spider: drop server in tests

            Before the application, the branch seems to consistently fail
            mdev_27240 when running mtr on spider suites:

            ./mysql-test/mtr --suite spider,spider/,spider//* --skip-test="spider/oracle.|./t\..*" --parallel=auto --big-test --force --max-test-fail=0

            But after the application of these commits, the failure disappeared.

            ycp Yuchen Pei added a comment - Some observations: Today I applied commits for MDEV-32157 to a custom branch (bb-11.0-ycp-mdev-26247): 2f2fbabe24c MDEV-32157 MDEV-28856 Spider: Tests, documentation, small fixes and cleanups 13ca614e1b3 MDEV-32157 MDEV-28856 Spider: drop server in tests Before the application, the branch seems to consistently fail mdev_27240 when running mtr on spider suites: ./mysql-test/mtr --suite spider,spider/ ,spider/ /* --skip-test="spider/oracle. |. /t\..*" --parallel=auto --big-test --force --max-test-fail=0 But after the application of these commits, the failure disappeared.
            ycp Yuchen Pei added a comment -

            I could not reproduce the failure in a debuggable way, and
            ER_NET_READ_ERROR is a very rare and strange error. Given that the
            failure is pretty consistent, unlike MDEV-31586 which displays
            multiple errors, and the alternative error code is more a nuisance
            than a problem, and is irrelevant to the actual bug in MDEV-27240, I
            am going to add the alternative error code as an option in the test
            file, like so:

            976ab215416 upstream/bb-10.10-mdev-22979 MDEV-32046 Fix flaky error codes in spider/bugfix.mdev_27240

            ycp Yuchen Pei added a comment - I could not reproduce the failure in a debuggable way, and ER_NET_READ_ERROR is a very rare and strange error. Given that the failure is pretty consistent, unlike MDEV-31586 which displays multiple errors, and the alternative error code is more a nuisance than a problem, and is irrelevant to the actual bug in MDEV-27240 , I am going to add the alternative error code as an option in the test file, like so: 976ab215416 upstream/bb-10.10-mdev-22979 MDEV-32046 Fix flaky error codes in spider/bugfix.mdev_27240
            ycp Yuchen Pei added a comment - - edited

            Another one, even rarer failure, this time on spider/bugfix.mdev_27239.

            https://buildbot.mariadb.org/#/builders/160/builds/23176

            spider/bugfix.mdev_27239                 w12 [ fail ]
                    Test ended at 2023-10-05 20:53:32
             
            CURRENT_TEST: spider/bugfix.mdev_27239
            --- /home/buildbot/ppc64le-ubuntu-2004/build/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.result	2023-10-05 08:00:34.000000000 +0000
            +++ /home/buildbot/ppc64le-ubuntu-2004/build/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.reject	2023-10-05 20:53:31.877054131 +0000
            @@ -9,10 +9,11 @@
             CREATE TABLE tbl_a (a INT) ENGINE=SPIDER;
             FLUSH TABLE tbl_a WITH READ LOCK;
             Warnings:
            +Error	1158	Got an error reading communication packets
             Error	1429	Unable to connect to foreign data source: localhost
            -Error	1429	Unable to connect to foreign data source: localhost
            -Error	1429	Unable to connect to foreign data source: localhost
            -Error	1429	Unable to connect to foreign data source: localhost
            +Error	1429	Got an error reading communication packets
            +Error	1429	Got an error reading communication packets
            +Error	1429	Got an error reading communication packets
             BEGIN;
             DROP DATABASE auto_test_local;
             for master_1

            It is worth further investigation, but not important enough to block
            MDEV-22979, so I'm gonna lower the prio back to major and push the
            workaround in the previous comment and leave this ticket open.

            Updated on [2023-10-17 Tue]: this happened again, in the recent
            10.6->10.10 push: https://buildbot.mariadb.org/#/builders/181/builds/23899

            ycp Yuchen Pei added a comment - - edited Another one, even rarer failure, this time on spider/bugfix.mdev_27239. https://buildbot.mariadb.org/#/builders/160/builds/23176 spider/bugfix.mdev_27239 w12 [ fail ] Test ended at 2023-10-05 20:53:32   CURRENT_TEST: spider/bugfix.mdev_27239 --- /home/buildbot/ppc64le-ubuntu-2004/build/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.result 2023-10-05 08:00:34.000000000 +0000 +++ /home/buildbot/ppc64le-ubuntu-2004/build/storage/spider/mysql-test/spider/bugfix/r/mdev_27239.reject 2023-10-05 20:53:31.877054131 +0000 @@ -9,10 +9,11 @@ CREATE TABLE tbl_a (a INT) ENGINE=SPIDER; FLUSH TABLE tbl_a WITH READ LOCK; Warnings: +Error 1158 Got an error reading communication packets Error 1429 Unable to connect to foreign data source: localhost -Error 1429 Unable to connect to foreign data source: localhost -Error 1429 Unable to connect to foreign data source: localhost -Error 1429 Unable to connect to foreign data source: localhost +Error 1429 Got an error reading communication packets +Error 1429 Got an error reading communication packets +Error 1429 Got an error reading communication packets BEGIN; DROP DATABASE auto_test_local; for master_1 It is worth further investigation, but not important enough to block MDEV-22979 , so I'm gonna lower the prio back to major and push the workaround in the previous comment and leave this ticket open. Updated on [2023-10-17 Tue] : this happened again, in the recent 10.6->10.10 push: https://buildbot.mariadb.org/#/builders/181/builds/23899

            People

              ycp Yuchen Pei
              ycp Yuchen Pei
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.