Details
-
Bug
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
-
None
Description
If a DEBUG_SYNC signal is overwritten before the target thread acknowledges the signal, the thread will become stuck (until timeout) due to awaiting the missed signal. rpl.rpl_seconds_behind_master_spike highlights this problem with an example fix commit cdf19cd.
Other tests which may be impacted by this issue are rpl.rpl_dump_request_retry_warning, main.query_cache_debug, and main.partition_debug_sync. A comprehensive list of effected tests should be created, and then they should be fixed.
Edit:
The following is an (ongoing) list of tests which are potentially impacted by this race condition along with a message if fixed. Note that part of this work extended the debug_sync mechanism to automatically detect when an unacknowledged signal is overwritten or reset, and this list contains all tests which fail from that detection:
- innodb.innodb-table-online
- innodb.innodb-index-online
- binlog_encryption.rpl_parallel
- binlog_encryption.rpl_parallel_ignored_errors
- rpl.rpl_get_master_version_and_clock
- rpl.rpl_parallel
- rpl.rpl_parallel_ignored_errors
- rpl.kill_race_condition
- rpl.rpl_seconds_behind_master_spike (fixed)
- rpl.rpl_dump_request_retry_warning (fixed)
- main.query_cache_debug (fixed)
- main.partition_debug_sync (fixed)
Attachments
Issue Links
- relates to
-
MDEV-32651 Lost Debug_sync signal in rpl_sql_thd_start_errno_cleared
- Closed