[MDEV-37046] How does non-GTID replication implement this with crash compatibility? - Jira

XML

Word

Printable

Details

Type: Technical task
Status: Closed (View Workflow)
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: N/A
Component/s: Replication
Labels:
None

Sprint:
Q3/2025 Maintenance, Q4/2025 Server Maintenance, Q1/2026 Server Maintenance

Description

The problem to be solved in this task is not the coding, it is the design, to carefully consider all relevant scenarios and decide how to handle them correctly.
Ad-hoc testing will not be able to exhaustively test all required cases and avoid tricky regressions in corner cases.
⸺ knielsen

How do the IO and SQL threads coörporate so the IO doesn't simply start a new transaction right after an incomplete one?
- The IO thread can pause and resume mid-transaction. While paused, the SQL thread also waits mid-transaction (non-blocking).
  - This is why mariadb-binlog output have ROLLBACK /* added by mysqlbinlog */; at the end.
  - The SQL thread cannot pause mid-transaction: Stopping it mid-transaction could cancel the transaction.
  - Additionally, the relay log rotates
    - during the startup sequence (specifically, when Relay_log_info::init() calls MYSQL_BIN_LOG::open()).
    - when the IO thread starts. This may be because of the fake Rotate from the MariaDB 10+ primary.
- START REPLICA rewinds the SQL thread’s read position so it can re-read the FDEv before jumping to its previous position.
How does it account for changed configs?
- The IO thread applies CHANGE MASTER TO filters, @@replicate_same_server_id, and ignores of MySQL events before it logs events.
  When the IO thread shuts down, it calls write_ignored_events_info_to_relay_log() to make the skips visible to the SQL thread; this is the cause of MDEV-33268.
- The SQL thread applies @@GLOBAL var filters (despite beïng a Master_info member), replicate_events_marked_for_skip=FILTER_ON_SLAVE and sql_slave_skip_counter to the cached relay log.
- How does it merge with user-specified positions?
  - The CHANGE MASTER statement usually deletes all relay log files. However, if the RELAY_LOG_FILE and/or RELAY_LOG_POS options are specified, then existing relay log files are kept.
    ⸺ https://mariadb.com/docs/server/reference/sql-statements/administrative-sql-statements/replication-statements/change-master-to#relay-log-options
  - If MASTER_LOG_FILE/POS positions are specified, they supersede others.
  - Otherwise, if the host and port are specified, CHANGE MASTER resets the position to the beginning.
  - Otherwise, if RELAY_LOG_FILE/POS are not specified, CHANGE MASTER uses the Relay_Log_File/Pos from SHOW REPLICA STATUS for the users’ convenience.
  - Otherwise, CHANGE MASTER keeps the IO position from master.info (i.e., Master_Log_File & Read_Master_Log_Pos from SHOW REPLICA STATUS)
  - What happens if the specified SQL thread position (RELAY_LOG_FILE/POS) is outside of the relay log (in a purged file or not yet reached by the IO thread), and what should happen?
    - 1380 ER_RELAY_LOG_INIT: Failed initializing relay log position: Could not find target log during relay log initialization
      - Making the IO thread re-fetch or the SQL thread wait would be a separate feature request or two.
    - OTOH, GTID start positions (from modifying @@GLOBAL.gtid_slave_pos) are loose boundaries similar to mariadb-binlog’s --start-position (disregarding MDEV-37231).
How does it factor in Delayed Replication?
- The SQL thread sleeps for the Replication Delay. STOP REPLICA SQL_THREAD can wake it up.
Is the relay log crash-safe?
- What were the crash safety concerns that removed this capability from GTID replication in the first place?
  - I copied my paraphrase to the description of the main issue, MDEV-4698.
- How does it recover from a crash, if at all?
  - Does it depend on binlog recovery?
    - Binlog recovery (do_binlog_recovery()) is skipped for relay logs during the startup sequence.
  - @@relay_log_recovery merely rewinds the replica threads’ positions.
  - And there is no recovery specific to the relay log… The replica threads must be detecting corruption at runtime.
- How malformed could the relay log become?
  - What file flushes does it use for crash safety?
    1. The IO thread pre-processes, writes and flushes in units of events.
    2. The @@master_info_file only updates:
      - with CHANGE MASTER
      - when the IO thread shuts down gracefully, to save the IO file-position
      - after a semi-sync transaction, if required
        
        Kristian is concerned that it would be a significant overhead to update this file during the loop.
        Indeed, the file also stores the general CHANGE MASTER configs.
      - GTID mode does not currently persist its IO position anywhere – Gtid_IO_Pos is a C++ variable.
    3. The SQL thread updates @@relay_log_info_file every transaction and each time it switches the log file to read.
      - See also MDEV-37584 “convert from GTID over rely on @@relay_log_info_file”
      - GTID mode stores its position in the mysql.gtid_slave_pos table and only uses @@relay_log_info_file for read switches and CHANGE MASTER TO master_delay.
  - A binary or relay log has a header but no footer.
  - How does the IO thread avoid starting a new transaction right after a corruption?
    How does the SQL thread identify corrupted (i.e., incomplete?) transactions?
    - The SQL thread moves to the next relay log when it reaches EOF without requiring a ROTATE_EVENT.
    - It does not care about missing COMMIT events of transactions, and neither do MariaDB transactions themselves (~~MDEV-32652~~).
      Together with the lack of regular persistence of the IO file-position (somewhat part of MDEV-8946), they lead to the infamous repeated events problem.
    - If at an ungraceful shutdown, the SQL thread reads and processes events before the IO thread durably writes them to disk, it apparently does not care about missing events in the relay log, and carries on from whatever positions it has recorded.
    - It is an error to read a corrupted event, whether it is from a network, hardware or power failure.

(skipping older binlog versions)

Attachments

Issue Links

relates to

MDEV-8946 Add replication crash-safety for non-GTID slave.

Open

Activity

People

Assignee:: Jimmy Hú

Reporter:: Jimmy Hú

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2025-06-19 22:04

Updated:: 2025-10-09 22:18

Resolved:: 2025-09-21 05:03

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.