Details
-
Technical task
-
Status: Stalled (View Workflow)
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
Q3/2025 Maintenance, Q4/2025 Server Maintenance
Description
The problem to be solved in this task is not the coding, it is the design, to carefully consider all relevant scenarios and decide how to handle them correctly.
Ad-hoc testing will not be able to exhaustively test all required cases and avoid tricky regressions in corner cases.
⸺ knielsen
How do the IO and SQL threads coörporate so the IO doesn't simply start a new transaction right after an incomplete one?
- The IO thread can pause and resume mid-transaction. While paused, the SQL thread also waits mid-transaction (non-blocking).
- This is why mariadb-binlog output have ROLLBACK /* added by mysqlbinlog */; at the end.
- The SQL thread cannot pause mid-transaction: Stopping it mid-transaction could cancel the transaction.
- The relay log rotates whenever the IO thread starts. This is because of the fake Rotate from the MariaDB 10+ primary.
- START REPLICA rewinds the SQL thread’s read position so it can re-read the FDEv before jumping to its previous position.
- The IO thread can pause and resume mid-transaction. While paused, the SQL thread also waits mid-transaction (non-blocking).
- ( ) How does it account for changed configs?
(Non-key) CHANGE MASTER TO
The CHANGE MASTER statement usually deletes all relay log files. However, if the RELAY_LOG_FILE and/or RELAY_LOG_POS options are specified, then existing relay log files are kept.
⸺ https://mariadb.com/docs/server/reference/sql-statements/administrative-sql-statements/replication-statements/change-master-to#relay-log-optionsHow does it merge with user-specified positions?
- If MASTER_LOG_FILE/POS positions are specified, they supersede others.
- Otherwise, if the host and port are specified, CHANGE MASTER resets the position to the beginning.
- Otherwise, if RELAY_LOG_FILE/POS are not specified, CHANGE MASTER uses the Relay_Log_File/Pos from SHOW REPLICA STATUS for the users’ convenience.
- Otherwise, CHANGE MASTER keeps the IO position from master.info (i.e., Master_Log_File & Read_Master_Log_Pos from SHOW REPLICA STATUS)
- ( ) @@GLOBAL vars that apply to all connections
- ( ) Replication filters
- ( ) Do the new configs apply to the cached relay log (SQL thread) or the binlog dump that builds them (IO thread)?
It's not the primary (MDEV-9345).
How does it factor in Delayed Replication?
- The SQL thread sleeps the Replication Delay. STOP REPLICA SQL_THREAD can wake it.
- ( ) What disk writes & syncs does it use for crash safety?
- ( ) How does it recover from a crash, if at all?
- ( ) Is the relay log crash-safe?
- ( ) How malformed could the relay log become?
- Normally, a binary or relay log has a header but no footer; they at most end with a Rotate event.
- ( ) How does it identify corrupted (i.e., incomplete?) transactions?
- ( ) How similar is one compared to a random FDEv from a restarted, crashed primary?
- ( ) Does it depend on binlog recovery?
- ( ) How do the IO and SQL threads coörporate so the IO doesn't simply start a new transaction right after a corruption?
- ( ) What were the crash safety concerns that removed this capability from GTID replication in the first place?