[MDEV-33322] Check of testcase failed for random spider tests Created: 2024-01-28  Updated: 2024-02-01

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Spider
Affects Version/s: 10.6.16
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Marcel Telka Assignee: Yuchen Pei
Resolution: Unresolved Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-31902 spider/bugfix.mdev_20100, spider/bugf... Open

 Description   

Testing of MariaDB 10.6.16 randomly reports Check of testcase failed. For example:

...
rpl.semisync_future-7591 'row'           w4 [ pass ]  11906
spider/bugfix.mdev_26544                 w7 [ pass ]     96
spider/bugfix.mdev_26539                 w6 [ pass ]    161
spider/bugfix.mdev_24517                 w8 [ pass ]    435
spider/bugfix.mdev_27172                 w6 [ pass ]    687
spider/bugfix.mdev_27184                 w8 [ pass ]    166
rpl.show_status_stop_slave_race-7126 'row' w4 [ pass ]   1808
spider/bugfix.mdev_25684                 w2 [ pass ]     79
spider/bugfix.mdev_27240                 w8 [ pass ]   1717
rpl.rpl_idempotency 'stmt'               w3 [ pass ]   1122
connect.infoschema-9739                  w1 [ skipped ]  Need windows
spider/bugfix.mdev_28996                 w2 [ pass ]    280
spider/bugfix.mdev_29008                 w8 [ pass ]    640
connect.infoschema2-9739                 w1 [ pass ]     95
spider/bugfix.mdev_26582                 w7 [ pass ]    255
 
MTR's internal check of the test case 'spider/bugfix.mdev_26582' failed.
This means that the test case does not preserve the state that existed
before the test case was executed.  Most likely the test case did not
do a proper clean-up. It could also be caused by the previous test run
by this thread, if the server wasn't restarted.
This is the diff of the states of the servers before and after the
test case was executed:                  
mysqltest: Logging to '$(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.log'.
mysqltest: Results saved in '$(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.result'.
mysqltest: Connecting to server localhost:16120 (socket /tmp/HxLXgFojOK/7/mysqld.1.1.sock) as 'root', connection 'default', attempt 0 ...
mysqltest: ... Connected.
mysqltest: Start processing test commands from './include/check-testcase.test' ...
mysqltest: ... Done processing test commands. 
--- $(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.result   Sun Jan 28 11:00:01 2024
+++ $(BUILD_DIR)/mysql-test/var/7/tmp/check-mysqld_1_1.reject   Sun Jan 28 11:00:02 2024
@@ -553,6 +553,126 @@
 SLOW_QUERY_LOG_FILE    $(BUILD_DIR)/mysql-test/var/7/mysqld.1.1/mysqld-slow.log
 SOCKET /tmp/HxLXgFojOK/7/mysqld.1.1.sock
 SORT_BUFFER_SIZE       262144
+SPIDER_AUTO_INCREMENT_MODE     0
+SPIDER_BGS_FIRST_READ  2
+SPIDER_BGS_MODE        0
+SPIDER_BGS_SECOND_READ 100
+SPIDER_BKA_ENGINE      
+SPIDER_BKA_MODE        1
+SPIDER_BKA_TABLE_NAME_TYPE     0
+SPIDER_BLOCK_SIZE      16384
+SPIDER_BUFFER_SIZE     16000
+SPIDER_BULK_SIZE       16000
+SPIDER_BULK_UPDATE_MODE        0
+SPIDER_BULK_UPDATE_SIZE        16000
+SPIDER_CASUAL_READ     0
+SPIDER_CONNECT_ERROR_INTERVAL  1
+SPIDER_CONNECT_MUTEX   OFF
+SPIDER_CONNECT_RETRY_COUNT     1000
+SPIDER_CONNECT_RETRY_INTERVAL  1000
+SPIDER_CONNECT_TIMEOUT 6
+SPIDER_CONN_RECYCLE_MODE       0
+SPIDER_CONN_RECYCLE_STRICT     0
+SPIDER_CONN_WAIT_TIMEOUT       10
+SPIDER_CRD_BG_MODE     2
+SPIDER_CRD_INTERVAL    51
+SPIDER_CRD_MODE        1
+SPIDER_CRD_SYNC        0
+SPIDER_CRD_TYPE        2
+SPIDER_CRD_WEIGHT      2
+SPIDER_DELETE_ALL_ROWS_TYPE    1
+SPIDER_DIRECT_AGGREGATE        ON
+SPIDER_DIRECT_DUP_INSERT       0
+SPIDER_DIRECT_ORDER_LIMIT      9223372036854775807
+SPIDER_DRY_ACCESS      OFF
+SPIDER_ERROR_READ_MODE 0
+SPIDER_ERROR_WRITE_MODE        0
+SPIDER_FIRST_READ      0
+SPIDER_FORCE_COMMIT    1
+SPIDER_GENERAL_LOG     OFF
+SPIDER_INDEX_HINT_PUSHDOWN     OFF
+SPIDER_INIT_SQL_ALLOC_SIZE     1024
+SPIDER_INTERNAL_LIMIT  9223372036854775807
+SPIDER_INTERNAL_OFFSET 0
+SPIDER_INTERNAL_OPTIMIZE       0
+SPIDER_INTERNAL_OPTIMIZE_LOCAL 0
+SPIDER_INTERNAL_SQL_LOG_OFF    -1
+SPIDER_INTERNAL_UNLOCK OFF
+SPIDER_INTERNAL_XA     OFF
+SPIDER_INTERNAL_XA_ID_TYPE     0
+SPIDER_INTERNAL_XA_SNAPSHOT    0
+SPIDER_LOAD_CRD_AT_STARTUP     1
+SPIDER_LOAD_STS_AT_STARTUP     1
+SPIDER_LOCAL_LOCK_TABLE        OFF
+SPIDER_LOCK_EXCHANGE   OFF
+SPIDER_LOG_RESULT_ERRORS       0
+SPIDER_LOG_RESULT_ERROR_WITH_SQL       0
+SPIDER_LOW_MEM_READ    1
+SPIDER_MAX_CONNECTIONS 0
+SPIDER_MAX_ORDER       32767
+SPIDER_MULTI_SPLIT_READ        100
+SPIDER_NET_READ_TIMEOUT        600
+SPIDER_NET_WRITE_TIMEOUT       600
+SPIDER_PING_INTERVAL_AT_TRX_START      3600
+SPIDER_QUICK_MODE      3
+SPIDER_QUICK_PAGE_BYTE 10485760
+SPIDER_QUICK_PAGE_SIZE 1024
+SPIDER_READ_ONLY_MODE  0
+SPIDER_REMOTE_ACCESS_CHARSET
+SPIDER_REMOTE_AUTOCOMMIT       -1
+SPIDER_REMOTE_DEFAULT_DATABASE
+SPIDER_REMOTE_SQL_LOG_OFF      -1
+SPIDER_REMOTE_TIME_ZONE
+SPIDER_REMOTE_TRX_ISOLATION    -1
+SPIDER_REMOTE_WAIT_TIMEOUT     -1
+SPIDER_RESET_SQL_ALLOC 1
+SPIDER_SAME_SERVER_LINK        OFF
+SPIDER_SECOND_READ     0
+SPIDER_SELECT_COLUMN_MODE      1
+SPIDER_SELUPD_LOCK_MODE        1
+SPIDER_SEMI_SPLIT_READ 2
+SPIDER_SEMI_SPLIT_READ_LIMIT   9223372036854775807
+SPIDER_SEMI_TABLE_LOCK 0
+SPIDER_SEMI_TABLE_LOCK_CONNECTION      1
+SPIDER_SEMI_TRX        ON
+SPIDER_SEMI_TRX_ISOLATION      -1
+SPIDER_SKIP_DEFAULT_CONDITION  0
+SPIDER_SKIP_PARALLEL_SEARCH    0
+SPIDER_SLAVE_TRX_ISOLATION     -1
+SPIDER_SPLIT_READ      9223372036854775807
+SPIDER_STORE_LAST_CRD  1
+SPIDER_STORE_LAST_STS  1
+SPIDER_STRICT_GROUP_BY 1
+SPIDER_STS_BG_MODE     2
+SPIDER_STS_INTERVAL    10
+SPIDER_STS_MODE        1
+SPIDER_STS_SYNC        0
+SPIDER_SUPPORT_XA      ON
+SPIDER_SYNC_AUTOCOMMIT ON
+SPIDER_SYNC_SQL_MODE   ON
+SPIDER_SYNC_TRX_ISOLATION      ON
+SPIDER_TABLE_CRD_THREAD_COUNT  10
+SPIDER_TABLE_INIT_ERROR_INTERVAL       1
+SPIDER_TABLE_STS_THREAD_COUNT  10
+SPIDER_UDF_CT_BULK_INSERT_INTERVAL     10
+SPIDER_UDF_CT_BULK_INSERT_ROWS 100
+SPIDER_UDF_DS_BULK_INSERT_ROWS 3000
+SPIDER_UDF_DS_TABLE_LOOP_MODE  0
+SPIDER_UDF_DS_USE_REAL_TABLE   0
+SPIDER_UDF_TABLE_LOCK_MUTEX_COUNT      20
+SPIDER_UDF_TABLE_MON_MUTEX_COUNT       20
+SPIDER_USE_ALL_CONNS_SNAPSHOT  OFF
+SPIDER_USE_COND_OTHER_THAN_PK_FOR_UPDATE       1
+SPIDER_USE_CONSISTENT_SNAPSHOT OFF
+SPIDER_USE_DEFAULT_DATABASE    ON
+SPIDER_USE_FLASH_LOGS  OFF
+SPIDER_USE_HANDLER     0
+SPIDER_USE_PUSHDOWN_UDF        0
+SPIDER_USE_SNAPSHOT_WITH_FLUSH_TABLES  0
+SPIDER_USE_TABLE_CHARSET       1
+SPIDER_VERSION 3.3.15
+SPIDER_WAIT_TIMEOUT    604800
+SPIDER_XA_REGISTER_MODE        1
 SQL_AUTO_IS_NULL       OFF
 SQL_BIG_SELECTS        ON
 SQL_BUFFER_RESULT      OFF
 
mysqltest: Result length mismatch
 
not ok
 
spider/bugfix.mdev_29011                 w2 [ pass ]    542
spider/bugfix.mdev_29447                 w8 [ pass ]    260
rpl.rpl_insert 'stmt'                    w3 [ pass ]   2768
spider/bugfix.mdev_24523                 w5 [ pass ]    312  
connect.ini                              w1 [ pass ]    155
spider/bugfix.mdev_29484                 w7 [ pass ]    250
...

The testing was conducted using the following command on OpenIndiana:

./mtr --parallel=8 --mem --force --retry=0 --force-restart --max-test-fail=0 --skip-test-list=$(COMPONENT_DIR)/files/disabled-tests

I saw spider/bugfix.self_reference failing similarly, but unfortunately I've no detailed log for the failure.



 Comments   
Comment by Yuchen Pei [ 2024-01-29 ]

Thanks for reporting.

I haven't seen this one locally or in CI, and I could not reproduce locally at 10.6.16 b83c379420a8846ae4b28768d3c81fa354cca056. I also tried mtr --no-reorder spider/bugfix.mdev_26544 spider/bugfix.mdev_26582.

Comment by Marcel Telka [ 2024-01-29 ]

I'm unable to reproduce this issue consistently either. It seems truly random.

Anyway, I re-run the tests again today and I see even worse results:

The servers were restarted 5478 times
Spent 8704.701 of 4520 seconds executing testcases 
 
Completed: Failed 7/5209 tests, 99.87% were successful. 
 
Failing test(s): spider/bugfix.mdev_26544 spider/bugfix.mdev_26582 spider/bugfix.mdev_27184 spider/bugfix.mdev_27239 spider/bugfix.mdev_27240 spider/bugfix.mdev_29484 perfschema.threads_innodb
 
The log files in var/log may give you some hint of what went wrong.
 
If you want to report this error, please read first the documentation
at http://dev.mysql.com/doc/mysql/en/mysql-test-suite.html
 
Errors/warnings were found in logfiles during server shutdown after running the
following sequence(s) of tests:
    spider/bugfix.mdev_27240
    spider/bugfix.mdev_29484
969 tests were skipped, 277 by the test itself.

The complete test log is available at https://telka.sk/illumos/MDEV-33322/2024-01-29-08:37.log

Comment by Yuchen Pei [ 2024-01-29 ]

> The complete test log is available at https://telka.sk/illumos/MDEV-33322/2024-01-29-08:37.log

This looks different and more like MDEV-31902. Is there a way to
reproduce or is it also highly sporadic?

Comment by Marcel Telka [ 2024-01-29 ]

I do always the same thing: just running tests. Sometimes it fully passes, sometimes reports Check of testcase failed for (ca 3 times so far, IIRC), and once it failed. So it behaves non-deterministic. I ran tests less than 10 times so far. And yes, the hard failure looks really like MDEV-31902.

I'm going to try few more test runs to see how it behaves. If you have any suggestion what to do to get the problem root caused then I can try it.

Comment by Marcel Telka [ 2024-01-30 ]

I do have one case where both this one and MDEV-31902 are reproduced: https://telka.sk/illumos/MDEV-33322/2024-01-30-11:00.log

Comment by Yuchen Pei [ 2024-01-31 ]

For spider/bugfix.mdev_26582, can you replace -source ../../include/init_spider.inc with source ../../t/test_init.inc and source ../../include/deinit_spider.inc with -source ../../t/test_deinit.inc, and mtr --record, and see whether the check fails again?

I know the bug from MDEV-26582 won't reproduce with this change and reverting the changes in the fix of that bug edde9084c2a8ee2e7b702c994945a4dfdb7e2bdf, but it would be interesting to see whether this would cause the check failures to disapppear and this is pretty much the only test using --source ../../include/(de)init_spider.inc.

Comment by Marcel Telka [ 2024-01-31 ]

Honestly, I do not understand how that could help, because Check of testcase failed for is reported randomly for various tests. In my previous comment the Check of testcase failed for was for spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582 simply failed, likely because of MDEV-31902.

Comment by Marcel Telka [ 2024-01-31 ]

Interestingly, the spider/bugfix.self_reference_multi test was effectively the first test ran by the worker:

worker[03] mysql-test-run: WARNING: Test reserved for w1 picked up by w3
spider/bugfix.mdev_26541                 w3 [ skipped ]  test needs to be run with UBSAN
spider/bugfix.self_reference_multi       w3 [ pass ]    467
 
MTR's internal check of the test case 'spider/bugfix.self_reference_multi' failed.

This could mean that there is some interference between workers. I tried to run with --parallel=1 few times and I'm unable to reproduce this one (only MDEV-31902). But that proves nothing because maybe I was just unlucky.

Comment by Yuchen Pei [ 2024-02-01 ]

> Honestly, I do not understand how that could help, because Check of
> testcase failed for is reported randomly for various tests.

My apologies, I missed the part in your second attached stdout where
spider/bugfix.self_reference_multi also resulted in the same check
failures of spider system variables. It is strange that of all the
tests, these two have such check fails.

> In my previous comment the Check of testcase failed for was for
> spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582
> simply failed, likely because of MDEV-31902.

Why do you think MDEV-31902 may be the cause of these check failures?

Comment by Marcel Telka [ 2024-02-01 ]

>> In my previous comment the Check of testcase failed for was for
>> spider/bugfix.self_reference_multi and spider/bugfix.mdev_26582
>> simply failed, likely because of MDEV-31902.
>
> Why do you think MDEV-31902 may be the cause of these check failures?

Sorry, my statement was confusing. I wanted to say that in the https://telka.sk/illumos/MDEV-33322/2024-01-30-11:00.log file I noticed two things:

  1. The Check of testcase failed for was for spider/bugfix.self_reference_multi instead of spider/bugfix.mdev_26582.
  2. The spider/bugfix.mdev_26582 test didn't cause the Check of testcase failed for error. It simply failed, likely because of MDEV-31902.

So I do not think the Check of testcase failed for issue is caused by MDEV-31902.

Generated at Thu Feb 08 10:38:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.