Details
-
Bug
-
Status: Closed (View Workflow)
-
Blocker
-
Resolution: Fixed
-
11.4, 11.8
-
Red Hat Enterprise Linux 9 (amd64-rhel-9-rpm-autobake-migration)
-
Not for Release Notes
-
Q1/2026 Server Maintenance
Description
Starting with this 11.4 build of this merge from 10.11 as well as this 11.8 build of this merge from 11.4 and this 12.2 build of this merge to 12.2 we have apparent hangs of our regression test suite.
Here is some detail for a 11.4 based run (MDEV-37949, #4405) where I first noticed this:
innodb_fts.innodb_fts_stopword_charset 'orig' w14 [ pass ] 3914
|
sys_vars.allow_suspicious_udfs w10 [ pass ] 2610
|
stress.ddl_myisam w1 [ pass ] 31722
|
innodb.xa_recovery w6 [ pass ] 53826
|
Only 8011 of 8017 completed.
|
--------------------------------------------------------------------------
|
The servers were restarted 2054 times
|
Spent 4685.815 of 860 seconds executing testcases
|
Completed: All 6768 tests were successful.
|
mysql-test-run: *** ERROR: Not all tests completed (only 8011 of 8017)
|
There are only 14 concurrent workers, so it’s not that hard to search when each of them was last running. We can see w14 in the above snippet (line 11255 of the original input). But, the last occurrence of w13 was in line 1473 (almost 10,000 lines earlier):
spider/bugfix.mdev_29562 'usual_handler' w13 [ pass ] 49
|
worker[13] > Restart [mysqld.1.1 - pid: 8893, winpid: 8893] - using different config file
|
worker[13] > Restart [mysqld.2.1 - pid: 8928, winpid: 8928] - using different config file
|
And so on:
w12 restart after spider/bugfix.mdev_29653 'usual_handler', line 1485
w11 restart after spider/bugfix.mdev_29644 'usual_handler', line 1479
w9 restart after spider/bugfix.mdev_29667 'group_by_handler', line 1488
w8 restart after spider/bugfix.mdev_29653 'group_by_handler', line 1482
w5 restart after spider/bugfix.mdev_29644 'group_by_handler', line 1476
This accounts for all the 6 not completed tests.
In some of the related log server files in https://ci.mariadb.org/62260/logs/amd64-rhel-9-rpm-autobake-migration/logs.tar.gz we can see messages from ENGINE=Connect after server shutdown, like this:
2026-01-08 17:53:29 0 [Note] /usr/sbin/mariadbd: Shutdown complete
|
Exception 666: Cannot write expanded column when Pretty is not 2
|
We need the ability to reliably detect hangs in order to catch regressions in the server, especially when working on features that change the way how crash recovery works.
In MDEV-28976 back in 2022 I posted some evidence that mysql-test/mtr does not always wait for the server process to complete before starting a new one. It is unclear to me if that was ever fixed. It could possibly be related to these anomalies.