Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33322

Check of testcase failed for random spider tests

Details

    Description

      Testing of MariaDB 10.6.16 randomly reports Check of testcase failed. For example:

      ...
      rpl.semisync_future-7591 'row'           w4 [ pass ]  11906
      spider/bugfix.mdev_26544                 w7 [ pass ]     96
      spider/bugfix.mdev_26539                 w6 [ pass ]    161
      spider/bugfix.mdev_24517                 w8 [ pass ]    435
      spider/bugfix.mdev_27172                 w6 [ pass ]    687
      spider/bugfix.mdev_27184                 w8 [ pass ]    166
      rpl.show_status_stop_slave_race-7126 'row' w4 [ pass ]   1808
      spider/bugfix.mdev_25684                 w2 [ pass ]     79
      spider/bugfix.mdev_27240                 w8 [ pass ]   1717
      rpl.rpl_idempotency 'stmt'               w3 [ pass ]   1122
      connect.infoschema-9739                  w1 [ skipped ]  Need windows
      spider/bugfix.mdev_28996                 w2 [ pass ]    280
      spider/bugfix.mdev_29008                 w8 [ pass ]    640
      connect.infoschema2-9739                 w1 [ pass ]     95
      spider/bugfix.mdev_26582                 w7 [ pass ]    255
       
      MTR's internal check of the test case 'spider/bugfix.mdev_26582' failed.
      This means that the test case does not preserve the state that existed
      before the test case was executed.  Most likely the test case did not
      do a proper clean-up. It could also be caused by the previous test run
      by this thread, if the server wasn't restarted.
      This is the diff of the states of the servers before and after the
      test case was executed:                  
      mysqltest: Logging to '$(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.log'.
      mysqltest: Results saved in '$(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.result'.
      mysqltest: Connecting to server localhost:16120 (socket /tmp/HxLXgFojOK/7/mysqld.1.1.sock) as 'root', connection 'default', attempt 0 ...
      mysqltest: ... Connected.
      mysqltest: Start processing test commands from './include/check-testcase.test' ...
      mysqltest: ... Done processing test commands. 
      --- $(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.result   Sun Jan 28 11:00:01 2024
      +++ $(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.reject   Sun Jan 28 11:00:02 2024
      @@ -553,6 +553,126 @@
       SLOW_QUERY_LOG_FILE    $(BUILD_DIR)/mysql-test/var/7/mysqld.1.1/mysqld-slow.log
       SOCKET /tmp/HxLXgFojOK/7/mysqld.1.1.sock
       SORT_BUFFER_SIZE       262144
      +SPIDER_AUTO_INCREMENT_MODE     0
      +SPIDER_BGS_FIRST_READ  2
      +SPIDER_BGS_MODE        0
      +SPIDER_BGS_SECOND_READ 100
      +SPIDER_BKA_ENGINE      
      +SPIDER_BKA_MODE        1
      +SPIDER_BKA_TABLE_NAME_TYPE     0
      +SPIDER_BLOCK_SIZE      16384
      +SPIDER_BUFFER_SIZE     16000
      +SPIDER_BULK_SIZE       16000
      +SPIDER_BULK_UPDATE_MODE        0
      +SPIDER_BULK_UPDATE_SIZE        16000
      +SPIDER_CASUAL_READ     0
      +SPIDER_CONNECT_ERROR_INTERVAL  1
      +SPIDER_CONNECT_MUTEX   OFF
      +SPIDER_CONNECT_RETRY_COUNT     1000
      +SPIDER_CONNECT_RETRY_INTERVAL  1000
      +SPIDER_CONNECT_TIMEOUT 6
      +SPIDER_CONN_RECYCLE_MODE       0
      +SPIDER_CONN_RECYCLE_STRICT     0
      +SPIDER_CONN_WAIT_TIMEOUT       10
      +SPIDER_CRD_BG_MODE     2
      +SPIDER_CRD_INTERVAL    51
      +SPIDER_CRD_MODE        1
      +SPIDER_CRD_SYNC        0
      +SPIDER_CRD_TYPE        2
      +SPIDER_CRD_WEIGHT      2
      +SPIDER_DELETE_ALL_ROWS_TYPE    1
      +SPIDER_DIRECT_AGGREGATE        ON
      +SPIDER_DIRECT_DUP_INSERT       0
      +SPIDER_DIRECT_ORDER_LIMIT      9223372036854775807
      +SPIDER_DRY_ACCESS      OFF
      +SPIDER_ERROR_READ_MODE 0
      +SPIDER_ERROR_WRITE_MODE        0
      +SPIDER_FIRST_READ      0
      +SPIDER_FORCE_COMMIT    1
      +SPIDER_GENERAL_LOG     OFF
      +SPIDER_INDEX_HINT_PUSHDOWN     OFF
      +SPIDER_INIT_SQL_ALLOC_SIZE     1024
      +SPIDER_INTERNAL_LIMIT  9223372036854775807
      +SPIDER_INTERNAL_OFFSET 0
      +SPIDER_INTERNAL_OPTIMIZE       0
      +SPIDER_INTERNAL_OPTIMIZE_LOCAL 0
      +SPIDER_INTERNAL_SQL_LOG_OFF    -1
      +SPIDER_INTERNAL_UNLOCK OFF
      +SPIDER_INTERNAL_XA     OFF
      +SPIDER_INTERNAL_XA_ID_TYPE     0
      +SPIDER_INTERNAL_XA_SNAPSHOT    0
      +SPIDER_LOAD_CRD_AT_STARTUP     1
      +SPIDER_LOAD_STS_AT_STARTUP     1
      +SPIDER_LOCAL_LOCK_TABLE        OFF
      +SPIDER_LOCK_EXCHANGE   OFF
      +SPIDER_LOG_RESULT_ERRORS       0
      +SPIDER_LOG_RESULT_ERROR_WITH_SQL       0
      +SPIDER_LOW_MEM_READ    1
      +SPIDER_MAX_CONNECTIONS 0
      +SPIDER_MAX_ORDER       32767
      +SPIDER_MULTI_SPLIT_READ        100
      +SPIDER_NET_READ_TIMEOUT        600
      +SPIDER_NET_WRITE_TIMEOUT       600
      +SPIDER_PING_INTERVAL_AT_TRX_START      3600
      +SPIDER_QUICK_MODE      3
      +SPIDER_QUICK_PAGE_BYTE 10485760
      +SPIDER_QUICK_PAGE_SIZE 1024
      +SPIDER_READ_ONLY_MODE  0
      +SPIDER_REMOTE_ACCESS_CHARSET
      +SPIDER_REMOTE_AUTOCOMMIT       -1
      +SPIDER_REMOTE_DEFAULT_DATABASE
      +SPIDER_REMOTE_SQL_LOG_OFF      -1
      +SPIDER_REMOTE_TIME_ZONE
      +SPIDER_REMOTE_TRX_ISOLATION    -1
      +SPIDER_REMOTE_WAIT_TIMEOUT     -1
      +SPIDER_RESET_SQL_ALLOC 1
      +SPIDER_SAME_SERVER_LINK        OFF
      +SPIDER_SECOND_READ     0
      +SPIDER_SELECT_COLUMN_MODE      1
      +SPIDER_SELUPD_LOCK_MODE        1
      +SPIDER_SEMI_SPLIT_READ 2
      +SPIDER_SEMI_SPLIT_READ_LIMIT   9223372036854775807
      +SPIDER_SEMI_TABLE_LOCK 0
      +SPIDER_SEMI_TABLE_LOCK_CONNECTION      1
      +SPIDER_SEMI_TRX        ON
      +SPIDER_SEMI_TRX_ISOLATION      -1
      +SPIDER_SKIP_DEFAULT_CONDITION  0
      +SPIDER_SKIP_PARALLEL_SEARCH    0
      +SPIDER_SLAVE_TRX_ISOLATION     -1
      +SPIDER_SPLIT_READ      9223372036854775807
      +SPIDER_STORE_LAST_CRD  1
      +SPIDER_STORE_LAST_STS  1
      +SPIDER_STRICT_GROUP_BY 1
      +SPIDER_STS_BG_MODE     2
      +SPIDER_STS_INTERVAL    10
      +SPIDER_STS_MODE        1
      +SPIDER_STS_SYNC        0
      +SPIDER_SUPPORT_XA      ON
      +SPIDER_SYNC_AUTOCOMMIT ON
      +SPIDER_SYNC_SQL_MODE   ON
      +SPIDER_SYNC_TRX_ISOLATION      ON
      +SPIDER_TABLE_CRD_THREAD_COUNT  10
      +SPIDER_TABLE_INIT_ERROR_INTERVAL       1
      +SPIDER_TABLE_STS_THREAD_COUNT  10
      +SPIDER_UDF_CT_BULK_INSERT_INTERVAL     10
      +SPIDER_UDF_CT_BULK_INSERT_ROWS 100
      +SPIDER_UDF_DS_BULK_INSERT_ROWS 3000
      +SPIDER_UDF_DS_TABLE_LOOP_MODE  0
      +SPIDER_UDF_DS_USE_REAL_TABLE   0
      +SPIDER_UDF_TABLE_LOCK_MUTEX_COUNT      20
      +SPIDER_UDF_TABLE_MON_MUTEX_COUNT       20
      +SPIDER_USE_ALL_CONNS_SNAPSHOT  OFF
      +SPIDER_USE_COND_OTHER_THAN_PK_FOR_UPDATE       1
      +SPIDER_USE_CONSISTENT_SNAPSHOT OFF
      +SPIDER_USE_DEFAULT_DATABASE    ON
      +SPIDER_USE_FLASH_LOGS  OFF
      +SPIDER_USE_HANDLER     0
      +SPIDER_USE_PUSHDOWN_UDF        0
      +SPIDER_USE_SNAPSHOT_WITH_FLUSH_TABLES  0
      +SPIDER_USE_TABLE_CHARSET       1
      +SPIDER_VERSION 3.3.15
      +SPIDER_WAIT_TIMEOUT    604800
      +SPIDER_XA_REGISTER_MODE        1
       SQL_AUTO_IS_NULL       OFF
       SQL_BIG_SELECTS        ON
       SQL_BUFFER_RESULT      OFF
       
      mysqltest: Result length mismatch
       
      not ok
       
      spider/bugfix.mdev_29011                 w2 [ pass ]    542
      spider/bugfix.mdev_29447                 w8 [ pass ]    260
      rpl.rpl_insert 'stmt'                    w3 [ pass ]   2768
      spider/bugfix.mdev_24523                 w5 [ pass ]    312  
      connect.ini                              w1 [ pass ]    155
      spider/bugfix.mdev_29484                 w7 [ pass ]    250
      ...
      

      The testing was conducted using the following command on OpenIndiana:

      ./mtr --parallel=8 --mem --force --retry=0 --force-restart --max-test-fail=0 --skip-test-list=$(COMPONENT_DIR)/files/disabled-tests
      

      I saw spider/bugfix.self_reference failing similarly, but unfortunately I've no detailed log for the failure.

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment - - edited

            Thanks for reporting.

            I haven't seen this one locally or in CI, and I could not reproduce locally at 10.6.16 b83c379420a8846ae4b28768d3c81fa354cca056. I also tried mtr --no-reorder spider/bugfix.mdev_26544 spider/bugfix.mdev_26582.

            ycp Yuchen Pei added a comment - - edited Thanks for reporting. I haven't seen this one locally or in CI, and I could not reproduce locally at 10.6.16 b83c379420a8846ae4b28768d3c81fa354cca056. I also tried mtr --no-reorder spider/bugfix.mdev_26544 spider/bugfix.mdev_26582 .
            marcel Marcel Telka added a comment -

            I'm unable to reproduce this issue consistently either. It seems truly random.

            Anyway, I re-run the tests again today and I see even worse results:

            The servers were restarted 5478 times
            Spent 8704.701 of 4520 seconds executing testcases 
             
            Completed: Failed 7/5209 tests, 99.87% were successful. 
             
            Failing test(s): spider/bugfix.mdev_26544 spider/bugfix.mdev_26582 spider/bugfix.mdev_27184 spider/bugfix.mdev_27239 spider/bugfix.mdev_27240 spider/bugfix.mdev_29484 perfschema.threads_innodb
             
            The log files in var/log may give you some hint of what went wrong.
             
            If you want to report this error, please read first the documentation
            at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
             
            Errors/warnings were found in logfiles during server shutdown after running the
            following sequence(s) of tests:
                spider/bugfix.mdev_27240
                spider/bugfix.mdev_29484
            969 tests were skipped, 277 by the test itself.
            

            The complete test log is available at https://telka.sk/illumos/MDEV-33322/2024-01-29-08:37.log

            marcel Marcel Telka added a comment - I'm unable to reproduce this issue consistently either. It seems truly random. Anyway, I re-run the tests again today and I see even worse results: The servers were restarted 5478 times Spent 8704.701 of 4520 seconds executing testcases   Completed: Failed 7/5209 tests, 99.87% were successful.   Failing test(s): spider/bugfix.mdev_26544 spider/bugfix.mdev_26582 spider/bugfix.mdev_27184 spider/bugfix.mdev_27239 spider/bugfix.mdev_27240 spider/bugfix.mdev_29484 perfschema.threads_innodb   The log files in var/log may give you some hint of what went wrong.   If you want to report this error, please read first the documentation at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html   Errors/warnings were found in logfiles during server shutdown after running the following sequence(s) of tests: spider/bugfix.mdev_27240 spider/bugfix.mdev_29484 969 tests were skipped, 277 by the test itself. The complete test log is available at https://telka.sk/illumos/MDEV-33322/2024-01-29-08:37.log
            ycp Yuchen Pei added a comment -

            > The complete test log is available at https://telka.sk/illumos/MDEV-33322/2024-01-29-08:37.log

            This looks different and more like MDEV-31902. Is there a way to
            reproduce or is it also highly sporadic?

            ycp Yuchen Pei added a comment - > The complete test log is available at https://telka.sk/illumos/MDEV-33322/2024-01-29-08:37.log This looks different and more like MDEV-31902 . Is there a way to reproduce or is it also highly sporadic?
            marcel Marcel Telka added a comment -

            I do always the same thing: just running tests. Sometimes it fully passes, sometimes reports Check of testcase failed for (ca 3 times so far, IIRC), and once it failed. So it behaves non-deterministic. I ran tests less than 10 times so far. And yes, the hard failure looks really like MDEV-31902.

            I'm going to try few more test runs to see how it behaves. If you have any suggestion what to do to get the problem root caused then I can try it.

            marcel Marcel Telka added a comment - I do always the same thing: just running tests. Sometimes it fully passes, sometimes reports Check of testcase failed for (ca 3 times so far, IIRC), and once it failed. So it behaves non-deterministic. I ran tests less than 10 times so far. And yes, the hard failure looks really like MDEV-31902 . I'm going to try few more test runs to see how it behaves. If you have any suggestion what to do to get the problem root caused then I can try it.
            marcel Marcel Telka added a comment -

            I do have one case where both this one and MDEV-31902 are reproduced: https://telka.sk/illumos/MDEV-33322/2024-01-30-11:00.log

            marcel Marcel Telka added a comment - I do have one case where both this one and MDEV-31902 are reproduced: https://telka.sk/illumos/MDEV-33322/2024-01-30-11:00.log
            ycp Yuchen Pei added a comment -

            For spider/bugfix.mdev_26582, can you replace -source ../../include/init_spider.inc with source ../../t/test_init.inc and source ../../include/deinit_spider.inc with -source ../../t/test_deinit.inc, and mtr --record, and see whether the check fails again?

            I know the bug from MDEV-26582 won't reproduce with this change and reverting the changes in the fix of that bug edde9084c2a8ee2e7b702c994945a4dfdb7e2bdf, but it would be interesting to see whether this would cause the check failures to disapppear and this is pretty much the only test using --source ../../include/(de)init_spider.inc.

            ycp Yuchen Pei added a comment - For spider/bugfix.mdev_26582, can you replace - source ../../include/init_spider.inc with source ../../t/test_init.inc and source ../../include/deinit_spider.inc with -source ../../t/test_deinit.inc , and mtr --record , and see whether the check fails again? I know the bug from MDEV-26582 won't reproduce with this change and reverting the changes in the fix of that bug edde9084c2a8ee2e7b702c994945a4dfdb7e2bdf, but it would be interesting to see whether this would cause the check failures to disapppear and this is pretty much the only test using --source ../../include/(de)init_spider.inc .
            marcel Marcel Telka added a comment -

            Honestly, I do not understand how that could help, because Check of testcase failed for is reported randomly for various tests. In my previous comment the Check of testcase failed for was for spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582 simply failed, likely because of MDEV-31902.

            marcel Marcel Telka added a comment - Honestly, I do not understand how that could help, because Check of testcase failed for is reported randomly for various tests. In my previous comment the Check of testcase failed for was for spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582 simply failed, likely because of MDEV-31902 .
            marcel Marcel Telka added a comment -

            Interestingly, the spider/bugfix.self_reference_multi test was effectively the first test ran by the worker:

            worker[03] mysql-test-run: WARNING: Test reserved for w1 picked up by w3
            spider/bugfix.mdev_26541                 w3 [ skipped ]  test needs to be run with UBSAN
            spider/bugfix.self_reference_multi       w3 [ pass ]    467
             
            MTR's internal check of the test case 'spider/bugfix.self_reference_multi' failed.
            

            This could mean that there is some interference between workers. I tried to run with --parallel=1 few times and I'm unable to reproduce this one (only MDEV-31902). But that proves nothing because maybe I was just unlucky.

            marcel Marcel Telka added a comment - Interestingly, the spider/bugfix.self_reference_multi test was effectively the first test ran by the worker: worker[03] mysql-test-run: WARNING: Test reserved for w1 picked up by w3 spider/bugfix.mdev_26541 w3 [ skipped ] test needs to be run with UBSAN spider/bugfix.self_reference_multi w3 [ pass ] 467   MTR's internal check of the test case 'spider/bugfix.self_reference_multi' failed. This could mean that there is some interference between workers. I tried to run with --parallel=1 few times and I'm unable to reproduce this one (only MDEV-31902 ). But that proves nothing because maybe I was just unlucky.
            ycp Yuchen Pei added a comment -

            > Honestly, I do not understand how that could help, because Check of
            > testcase failed for is reported randomly for various tests.

            My apologies, I missed the part in your second attached stdout where
            spider/bugfix.self_reference_multi also resulted in the same check
            failures of spider system variables. It is strange that of all the
            tests, these two have such check fails.

            > In my previous comment the Check of testcase failed for was for
            > spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582
            > simply failed, likely because of MDEV-31902.

            Why do you think MDEV-31902 may be the cause of these check failures?

            ycp Yuchen Pei added a comment - > Honestly, I do not understand how that could help, because Check of > testcase failed for is reported randomly for various tests. My apologies, I missed the part in your second attached stdout where spider/bugfix.self_reference_multi also resulted in the same check failures of spider system variables. It is strange that of all the tests, these two have such check fails. > In my previous comment the Check of testcase failed for was for > spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582 > simply failed, likely because of MDEV-31902 . Why do you think MDEV-31902 may be the cause of these check failures?
            marcel Marcel Telka added a comment -

            >> In my previous comment the Check of testcase failed for was for
            >> spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582
            >> simply failed, likely because of MDEV-31902.
            >
            > Why do you think MDEV-31902 may be the cause of these check failures?

            Sorry, my statement was confusing. I wanted to say that in the https://telka.sk/illumos/MDEV-33322/2024-01-30-11:00.log file I noticed two things:

            1. The Check of testcase failed for was for spider/bugfix.self_reference_multi instead of spider/bugfix.mdev_26582.
            2. The spider/bugfix.mdev_26582 test didn't cause the Check of testcase failed for error. It simply failed, likely because of MDEV-31902.

            So I do not think the Check of testcase failed for issue is caused by MDEV-31902.

            marcel Marcel Telka added a comment - >> In my previous comment the Check of testcase failed for was for >> spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582 >> simply failed, likely because of MDEV-31902 . > > Why do you think MDEV-31902 may be the cause of these check failures? Sorry, my statement was confusing. I wanted to say that in the https://telka.sk/illumos/MDEV-33322/2024-01-30-11:00.log file I noticed two things: The Check of testcase failed for was for spider/bugfix.self_reference_multi instead of spider/bugfix.mdev_26582 . The spider/bugfix.mdev_26582 test didn't cause the Check of testcase failed for error. It simply failed, likely because of MDEV-31902 . So I do not think the Check of testcase failed for issue is caused by MDEV-31902 .

            People

              ycp Yuchen Pei
              marcel Marcel Telka
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.