Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
N/A
-
None
-
None
Description
RQG tests fail sometimes with the symptom "mariabackup --backup" hangs because the builtin timeout of currently 200s (with valgrind or rr 400s) gets exceeded.
debarun's analysis of some of such runs (see MDEV-33669) revealed that grammars
conf/mariadb/table_stress_*.yy dicing BACKUP STAGE commands can cause trouble like
Some session A executes BLOCK STAGE START with success.
|
But there is no 100% guarantee this session will ever execute a BLOCK STAGE END.
|
So some later started "mariabackup --backup" might fail to aquire the required backup lock over some long timespan.
|
 |
The goal is to find some fix like
|
- simple, a small share of such false alarms will remain, reduced functional coverage
|
Modify the RQG grammars conf/mariadb/table_stress_*.yy
|
- sophisticated, no reduced functional coverage
|
Modify the RQG reporter 'Mariabackup_linux'.
|
If the timeout gets exceeded search like
|
SELECT * FROM information_schema.metadata_lock_info;
|
THREAD_ID LOCK_MODE LOCK_DURATION LOCK_TYPE TABLE_SCHEMA TABLE_NAME
|
5 MDL_BACKUP_FLUSH NULL Backup lock
|
and try to kill all connections holding a backup lock.
|
In case that fixes the problem ignore the hang. Otherwise report a fail.
|