Details
-
Bug
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11
-
None
Description
If a DEBUG_SYNC signal is overwritten before the target thread acknowledges the signal, the thread will become stuck (until timeout) due to awaiting the missed signal. rpl.rpl_seconds_behind_master_spike highlights this problem with an example fix commit cdf19cd.
Other tests which may be impacted by this issue are rpl.rpl_dump_request_retry_warning, main.query_cache_debug, and main.partition_debug_sync. A comprehensive list of effected tests should be created, and then they should be fixed.
Edit:
The following is an (ongoing) list of tests which are potentially impacted by this race condition along with a message if fixed. Note that part of this work extended the debug_sync mechanism to automatically detect when an unacknowledged signal is overwritten or reset, and this list contains all tests which fail from that detection:
- innodb.innodb-table-online
- innodb.innodb-index-online
- binlog_encryption.rpl_parallel
- binlog_encryption.rpl_parallel_ignored_errors
- rpl.rpl_get_master_version_and_clock
- rpl.rpl_parallel
- rpl.rpl_parallel_ignored_errors
- rpl.kill_race_condition
- rpl.rpl_seconds_behind_master_spike (fixed)
- rpl.rpl_dump_request_retry_warning (fixed)
- main.query_cache_debug (fixed)
- main.partition_debug_sync (fixed)
Attachments
Issue Links
- relates to
-
MDEV-32651 Lost Debug_sync signal in rpl_sql_thd_start_errno_cleared
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Fix Version/s | 10.2 [ 14601 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Assignee | Brandon Nesterenko [ JIRAUSER48702 ] | Andrei Elkin [ elkin ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Andrei Elkin [ elkin ] | Brandon Nesterenko [ JIRAUSER48702 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Description |
If a DEBUG_SYNC signal is overwritten before the target thread acknowledges the signal, the thread will become stuck (until timeout) due to awaiting the missed signal. rpl.rpl_seconds_behind_master_spike highlights this problem with an example fix commit [cdf19cd|https://github.com/MariaDB/server/commit/cdf19cd618ed23fbf7051130b2a6b587c4a4316b].
Other tests which may be impacted by this issue are rpl.rpl_dump_request_retry_warning, main.query_cache_debug, and main.partition_debug_sync. A comprehensive list of effected tests should be created, and then they should be fixed. |
If a DEBUG_SYNC signal is overwritten before the target thread acknowledges the signal, the thread will become stuck (until timeout) due to awaiting the missed signal. rpl.rpl_seconds_behind_master_spike highlights this problem with an example fix commit [cdf19cd|https://github.com/MariaDB/server/commit/cdf19cd618ed23fbf7051130b2a6b587c4a4316b].
Other tests which may be impacted by this issue are rpl.rpl_dump_request_retry_warning, main.query_cache_debug, and main.partition_debug_sync. A comprehensive list of effected tests should be created, and then they should be fixed. Edit: The following is an (ongoing) list of tests which are potentially impacted by this race condition along with a message if fixed. Note that part of this work extended the debug_sync mechanism to automatically detect when an unacknowledged signal is overwritten or reset, and this list contains all tests which fail from that detection: * innodb.innodb-table-online * innodb.innodb-index-online * binlog_encryption.rpl_parallel * binlog_encryption.rpl_parallel_ignored_errors * rpl.rpl_get_master_version_and_clock * rpl.rpl_parallel * rpl.rpl_parallel_ignored_errors * rpl.kill_race_condition * rpl.rpl_seconds_behind_master_spike (fixed) * rpl.rpl_dump_request_retry_warning (fixed) * main.query_cache_debug (fixed) * main.partition_debug_sync (fixed) |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.2 [ 14601 ] |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Affects Version/s | 10.10 [ 27530 ] | |
Affects Version/s | 10.11 [ 27614 ] |
Fix Version/s | 10.7 [ 24805 ] |
Fix Version/s | 10.3 [ 22126 ] |
Fix Version/s | 10.8 [ 26121 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Link |
This issue relates to |
Fix Version/s | 10.9 [ 26905 ] |
Fix Version/s | 10.10 [ 27530 ] |
Fix Version/s | 10.4 [ 22408 ] |
Hey Andrei!
Can you review my patch for fixing tests main.query_cache_debug, main.partition_debug_sync, and rpl.rpl_dump_request_retry_warning?
Commit: 883fe83
Buildbot: bb-10.2-MDEV-27850
Thanks!