[MDEV-13564] TRUNCATE TABLE and undo tablespace truncation are not compatible with Mariabackup Created: 2017-08-17  Updated: 2023-12-22  Resolved: 2018-09-07

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.2.2
Fix Version/s: 10.3.10, 10.4.0, 10.2.19

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: backup, ddl, performance, recovery

Attachments: File truncate.patch    
Issue Links:
Blocks
blocks MDEV-14481 Execute InnoDB crash recovery in the ... Closed
is blocked by MDEV-14717 RENAME TABLE in InnoDB is not crash-safe Closed
Duplicate
is duplicated by MDEV-9459 Truncate table causes innodb stalls Closed
Problem/Incident
causes MDEV-17816 InnoDB: Failing assertion: trx->dict_... Closed
causes MDEV-17849 Undo tablespace truncation recovery f... Closed
causes MDEV-17885 TRUNCATE on temporary table causes ER... Closed
causes MDEV-18836 Race conditions in TRUNCATE TABLE Closed
causes MDEV-19449 1030: Got error 168 "Unknown (generic... Closed
causes MDEV-21496 Downgrade from current 10.2 to 10.2.1... Closed
causes MDEV-23705 Assertion `table->data_dir_path || !s... Closed
causes MDEV-24532 Table corruption ER_NO_SUCH_TABLE_IN_... Closed
causes MDEV-26450 Corruption due to innodb_undo_log_tru... Closed
Relates
relates to MDEV-9459 Truncate table causes innodb stalls Closed
relates to MDEV-14585 Automatically remove #sql- tables in ... Closed
relates to MDEV-16557 Remove INNOBASE_SHARE::idx_trans_tbl Closed
relates to MDEV-17049 Enable --suite=innodb_undo on buildbot Closed
relates to MDEV-17138 Reduce redo log volume for undo table... Closed
relates to MDEV-17158 TRUNCATE is not atomic after MDEV-13564 Closed
relates to MDEV-17780 innodb.truncate_recover crashes in re... Closed
relates to MDEV-17794 Do not assign persistent ID for tempo... Closed
relates to MDEV-17831 Assertion `supports_instant()' failed... Closed
relates to MDEV-18739 crash (long semaphore wait) Closed
relates to MDEV-19769 Mariabackup should write warning duri... Open
relates to MDEV-22733 XA PREPARE breaks MDL in pseudo_slave... Stalled
relates to MDEV-24532 Table corruption ER_NO_SUCH_TABLE_IN_... Closed
relates to MDEV-25051 Race condition between persistent sta... Closed
relates to MDEV-25710 Dead code os_file_opendir() in the se... Closed
relates to MDEV-33112 innodb_undo_log_truncate=ON is blocki... Closed
relates to MDEV-9459 Truncate table causes innodb stalls Closed
relates to MDEV-13563 lock DDL for mariabackup in 10.2+ Closed
relates to MDEV-14481 Execute InnoDB crash recovery in the ... Closed
relates to MDEV-14545 Backup fails due to MLOG_INDEX_LOAD r... Closed
relates to MDEV-15154 WSREP: BF lock wait long after a TRUN... Closed
relates to MDEV-15522 Change galera suite MTR tests to use ... Closed
relates to MDEV-16306 TRUNCATE waits for metadata lock on t... Open
relates to MDEV-16465 Invalid (old?) table or database name... Closed
relates to MDEV-17043 Purge of indexed virtual columns may ... Closed
relates to MDEV-17304 Replace use of XtraBackup with MariaD... Closed
relates to MDEV-18654 Failing assertion: sym_node->table !=... Closed
relates to MDEV-18960 Assertion `!omits_virtual_cols(*form-... Closed

 Description   

MariaDB 10.2.2 imported MySQL 5.7.9, which introduced separate log files, for server startup to determine if any tables or undo tablespace need "truncate fixup".

There is no logic in Mariabackup to deal with this.

A cleaner solution would be to remove the separate log files and to make the InnoDB redo log self-contained with respect to the truncate operations. This would likely require writing a new redo log record type MLOG_FILE_CREATE that would cause the file to be initialized from the scratch, followed by some page-level redo log records that would initialize the page contents.
This would also remove the need for a redo log checkpoint during the truncate operations.

MDEV-13563 proposes a Mariabackup option that could be used to prevent TRUNCATE TABLE from occurring during backups. It would not prevent undo tablespace truncation from happening.



 Comments   
Comment by Marko Mäkelä [ 2018-04-23 ]

monty mentioned that a customer would like to have non-locking TRUNCATE TABLE: Old transactions that are reading from the table would continue to see the table contents. The TRUNCATE action would basically rename the old table to an internal #sql name so that MDEV-14585 can take care of crash recovery and create an empty table. The table would be dropped when the last reader closes the old table handle.

This could be refined further by implementing a multi-versioned data dictionary cache (which is work mostly outside InnoDB). In that case, old transactions would continue to see the table contents as it was before the TRUNCATE, even when the first access to the table is after the TRUNCATE was executed. (Write transactions would always refer to the newest table definition.)

Comment by Marko Mäkelä [ 2018-08-02 ]

I believe that we need a twofold fix:

  1. Implement TRUNCATE TABLE as a combination of renaming the table to #sql name and creating one with the original name, in a single transaction. Then, issue DROP TABLE for the old copy (this can be executed in the background).
  2. Implement undo tablespace truncation as a single mini-transaction that rewrites the first few pages (including FSP_SIZE in the first page), then trims the file size. Make sure that recovery (and backup) will ignore old redo log records for pages that were after the trimmed end of the file. To do this, we can write a MLOG_FILE_CREATE2 record with the new size as the page number (instead of writing 0). The MLOG_FILE_CREATE2 records were previously parsed but ignored during recovery and backup.

In this way, the TruncateLogger and some related code can be removed. But we will have to keep TruncateLogParser in order to be able to crash-upgrade from MariaDB Server 10.2 or 10.3 prior to this fix.

In MariaDB Server 10.2, TRUNCATE will no longer be crash-safe

MariaDB Server 10.2 is affected by MDEV-14717 RENAME TABLE in InnoDB is not crash-safe.

If the server is killed in the middle of the BEGIN; RENAME; CREATE; COMMIT; transaction, after recovery we could end up with the table not being truncated, and with the data file having been renamed to #sql-ib….ibd. Some manual recovery would then be needed (such as, renaming the .frm file to match the .ibd file name, then RENAME TABLE `#mysql50##sql-ib…` TO original_table_name;

If the server is killed before the original table (#sql-ib….ibd) is dropped, then the table would remain orphaned after recovery. It could be dropped by copying the .frm file and then issuing DROP TABLE `#mysql50##sql-ib…`;.

MariaDB Server 10.3 is not affected by these issues, because there RENAME operations will be correctly rolled back, and #sql tables will be dropped on startup (MDEV-14585).

Comment by Marko Mäkelä [ 2018-08-02 ]

truncate.patch is a work-in-progress patch. The crash recovery for MLOG_FILE_CREATE2 records has not yet been implemented, and ALTER TABLE…TRUNCATE PARTITION is crashing due to a name mismatch in INNOBASE_SHARE. I think that we must remove INNOBASE_SHARE first, in MDEV-16557.

Comment by Marko Mäkelä [ 2018-08-16 ]

Mariabackup starting with 10.2.18 and 10.3.10 will refuse operation if any MLOG_TRUNCATE record was written (by the incompatible implementation of TRUNCATE TABLE). Unfortunately we cannot easily detect if the incompatible form of undo log tablespace truncation was attempted.

I plan to implement both undo log tablespace truncation and TRUNCATE TABLE in a backup-safe way in the first affected series (MariaDB Server 10.2).

Comment by Marko Mäkelä [ 2018-08-28 ]

There was an issue with mysql.gtid_slave_pos. With the old truncate, it was not a problem to have open table handles lingering around:

diff --git a/sql/rpl_gtid.cc b/sql/rpl_gtid.cc
index 2a0ac9a465f..c933ad4a0ab 100644
--- a/sql/rpl_gtid.cc
+++ b/sql/rpl_gtid.cc
@@ -402,6 +402,8 @@ rpl_slave_state::truncate_state_table(THD *thd)
                        NULL, TL_WRITE);
   if (!(err= open_and_lock_tables(thd, &tlist, FALSE, 0)))
   {
+    tdc_remove_table(thd, TDC_RT_REMOVE_UNUSED, "mysql",
+                     rpl_gtid_slave_state_table_name.str, false);
     err= tlist.table->file->ha_truncate();
 
     if (err)

Also, mroonga must pass the table options, because ha_innobase::truncate() will be calling ha_innobase::create():

diff --git a/storage/mroonga/ha_mroonga.cpp b/storage/mroonga/ha_mroonga.cpp
index b4bfc152053..4c63e95a364 100644
--- a/storage/mroonga/ha_mroonga.cpp
+++ b/storage/mroonga/ha_mroonga.cpp
@@ -12859,13 +12859,22 @@ int ha_mroonga::delete_all_rows()
 int ha_mroonga::wrapper_truncate()
 {
   int error = 0;
+  MRN_SHARE *tmp_share;
   MRN_DBUG_ENTER_METHOD();
+
+  if (!(tmp_share = mrn_get_share(table->s->table_name.str, table, &error)))
+    DBUG_RETURN(error);
+
   MRN_SET_WRAP_SHARE_KEY(share, table->s);
   MRN_SET_WRAP_TABLE_KEY(this, table);
-  error = wrap_handler->ha_truncate();
+  error = parse_engine_table_options(ha_thd(), tmp_share->hton, table->s)
+    ? MRN_GET_ERROR_NUMBER
+    : wrap_handler->ha_truncate();
   MRN_SET_BASE_SHARE_KEY(share, table->s);
   MRN_SET_BASE_TABLE_KEY(this, table);
 
+  mrn_free_share(tmp_share);
+
   if (!error && wrapper_have_target_index()) {
     error = wrapper_truncate_index();
   }

Comment by Marko Mäkelä [ 2018-08-28 ]

I have pushed this to bb-10.2-marko for testing. There are 2 open issues:

  1. Undo tablespace truncation (which is disabled by default) is generating a huge mini-transaction (more than 1 megabyte of log), which will cause crash recovery to hang when using small buffer pool (8 megabytes).
  2. TRUNCATE TABLE is not crash-safe before MDEV-14717 (crash-safe RENAME TABLE inside InnoDB). If the server is killed, we might end up with the tablename.ibd being called #sql-ib….ibd, and only DROP TABLE would be allowed.
    We might also end up having an orphan table #sql-ib….

These issues could be resolved by implementing changes that will break crash-downgrade to earlier versions in the 10.2 or 10.3 series:

  1. Port MDEV-14717 to 10.2. (This change is already present in bb-10.2-ext and 10.3.) This will change the insert_undo log format, breaking crash-downgrade to earlier 10.2. Normal downgrade should not be affected, because DDL operations are always committed before shutdown, and the insert_undo log will be discarded at transaction commit.
  2. Introduce more compact redo log format for undo tablespace truncation. This would break crash-downgrade to earlier 10.2 or 10.3 versions. Crash-downgrade with the current code should be possible; the only caveat is that the undo tablespace file size would not be shrunk in the recovery by earlier versions.
Comment by Marko Mäkelä [ 2018-08-31 ]

For the record, this would also fix the following hang, which I observed in innodb_zip.wl6501_scale_1:

10.2 206528f722799b04708c60a71b59d75bd32bdeb3

#7  0x00005654d44a76d0 in rw_lock_x_lock_wait_func (lock=0x5654d7b36c40, 
    pass=0, threshold=0, 
    file_name=0x5654d4ad0268 "/mariadb/10.2m/storage/innobase/btr/btr0sea.cc", 
    line=1259) at /mariadb/10.2m/storage/innobase/sync/sync0rw.cc:477
#8  0x00005654d44a7822 in rw_lock_x_lock_low (lock=0x5654d7b36c40, pass=0, 
    file_name=0x5654d4ad0268 "/mariadb/10.2m/storage/innobase/btr/btr0sea.cc", 
    line=1259) at /mariadb/10.2m/storage/innobase/sync/sync0rw.cc:541
#9  0x00005654d44a7be4 in rw_lock_x_lock_func (lock=0x5654d7b36c40, pass=0, 
    file_name=0x5654d4ad0268 "/mariadb/10.2m/storage/innobase/btr/btr0sea.cc", 
    line=1259) at /mariadb/10.2m/storage/innobase/sync/sync0rw.cc:692
#10 0x00005654d4540631 in btr_search_drop_page_hash_index (
    block=0x7f26b36d3278)
    at /mariadb/10.2m/storage/innobase/btr/btr0sea.cc:1259
#11 0x00005654d457cb46 in buf_LRU_free_page (bpage=0x7f26b36d3278, zip=true)
    at /mariadb/10.2m/storage/innobase/buf/buf0lru.cc:1767
#12 0x00005654d455dc47 in buf_page_io_complete (bpage=0x7f26b36d3278, 
    dblwr=false, evict=true)
    at /mariadb/10.2m/storage/innobase/buf/buf0buf.cc:6297
#13 0x00005654d45e4442 in fil_aio_wait (segment=5)
    at /mariadb/10.2m/storage/innobase/fil/fil0fil.cc:5290
#14 0x00005654d449ab11 in io_handler_thread (arg=0x5654d5203368 <n+40>)
    at /mariadb/10.2m/storage/innobase/srv/srv0start.cc:337

The above I/O thread is waiting for an adaptive hash index latch, which is being held by TRUNCATE (which appears to have initiated the flush and eviction):

#2  0x00005654d4579d31 in buf_flush_dirty_pages (buf_pool=0x5654d7ac6b30, 
    id=55, observer=0x0) at /mariadb/10.2m/storage/innobase/buf/buf0lru.cc:694
#3  0x00005654d4579e32 in buf_LRU_flush_or_remove_pages (id=55, observer=0x0)
    at /mariadb/10.2m/storage/innobase/buf/buf0lru.cc:712
#4  0x00005654d45dd5b4 in fil_reinit_space_header_for_table (
    table=0x7f265005c298, size=7, trx=0x7f26b8172138)
    at /mariadb/10.2m/storage/innobase/fil/fil0fil.cc:3185
#5  0x00005654d446df5e in row_truncate_table_for_mysql (table=0x7f265005c298, 
    trx=0x7f26b8172138)
    at /mariadb/10.2m/storage/innobase/row/row0trunc.cc:2036
#6  0x00005654d4302650 in ha_innobase::truncate (this=0x7f2650035f40)
    at /mariadb/10.2m/storage/innobase/handler/ha_innodb.cc:13073

Comment by Marko Mäkelä [ 2018-09-05 ]

bb-10.4-marko removes code for supporting crash-upgrade of TRUNCATE TABLE or undo tablespace truncation from pre-MDEV-13564 10.2 or 10.3 to 10.4.
To play it safe, I think that 10.4 should refuse crash-upgrade from 10.2 or 10.3 where a MDEV-13564 fix is not present. While we can detect the occurrence of unsupported TRUNCATE TABLE by the presence of a MLOG_TRUNCATE record, undo tablespace truncation appears to be signalled by the presence of a separate log file only. We can implement this by introducing a redo log format subtype that indicates whether the MDEV-13564 fix is present. Older versions would ignore this subtype byte and keep working.

Comment by Marko Mäkelä [ 2018-09-06 ]

To make the new TRUNCATE crash-safe in 10.2, I backported MDEV-14717, MDEV-14378, a follow-up to MDEV-13407, and MDEV-14585 to bb-10.2-marko.

With this, a crash-downgrade of a RENAME (or TRUNCATE or table-rebuilding ALTER TABLE or OPTIMIZE TABLE) operation to an earlier 10.2 version would trigger a debug assertion failure during rollback, in trx_roll_pop_top_rec_of_trx():

		ut_ad(undo == update || undo == temp);

In a non-debug build, cause the undo log record to be misinterpreted as an update. The table name would be misinterpreted as DB_TRX_ID,DB_ROLL_PTR and the PRIMARY KEY of the table. In the highly unlikely event that a record is found, the execution would be aborted in row_undo_mod(), on the switch (node->rec_type). Normally, the non-debug build would crash inside ha_innobase::open():

#2  0x00005555559702b7 in ut_dbg_assertion_failed (
    expr=expr@entry=0x555556101bac "table2 == NULL", 
    file=file@entry=0x5555561005a8 "/mariadb/10.2m/storage/innobase/dict/dict0dict.cc", line=line@entry=1319)
    at /mariadb/10.2m/storage/innobase/ut/ut0dbg.cc:61
#3  0x0000555555e5a21e in dict_table_add_to_cache (
    table=table@entry=0x7fff98017790, 
    can_be_evicted=can_be_evicted@entry=true, heap=heap@entry=0x7fff9801c8d0)
    at /mariadb/10.2m/storage/innobase/dict/dict0dict.cc:1319
#4  0x0000555555e6bec5 in dict_load_table_one(table_name_t&, bool, dict_err_ignore_t, std::deque<char const*, ut_allocator<char const*, true> >&) ()
    at /mariadb/10.2m/storage/innobase/dict/dict0load.cc:3013
#5  0x0000555555e6c4c2 in dict_load_table(char const*, bool, dict_err_ignore_t) () at /mariadb/10.2m/storage/innobase/dict/dict0load.cc:2810
#6  0x0000555555e5fd38 in dict_table_open_on_name(char const*, unsigned long, unsigned long, dict_err_ignore_t) ()
    at /mariadb/10.2m/storage/innobase/dict/dict0dict.cc:1170
#7  0x0000555555ceca9b in ha_innobase::open_dict_table (
    table_name=<optimized out>, norm_name=0x7ffff4fcad20 "test/t1", 
    is_partition=<optimized out>, ignore_err=DICT_ERR_IGNORE_NONE)
    at /mariadb/10.2m/storage/innobase/handler/ha_innodb.cc:6552
#8  0x0000555555cfb00e in ha_innobase::open(char const*, int, unsigned int) ()
    at /mariadb/10.2m/storage/innobase/handler/ha_innodb.cc:6216

I am not sure if this really counts as a regression. 10.2 without MDEV-14717 would not have crash-safe RENAME to begin with.

Comment by Marko Mäkelä [ 2018-09-06 ]

We can prevent a crash-downgrade to earlier MariaDB 10.2 versions by changing the InnoDB redo log format identifier to the 10.3 identifier, and by introducing a subformat identifier so that 10.2 can continue to refuse crash-downgrade from 10.3 or later. After a clean shutdown, a downgrade to MariaDB 10.2.13 or later would still be possible. Older MariaDB 10.2 are missing MDEV-14909, and a downgrade would only be possible after removing the log files (not recommended).

Comment by Marko Mäkelä [ 2018-09-07 ]

Even though I prepared a fix for 10.2, I decided not to push it yet, because I fear that MDEV-17158 could occasionally cause loss of data when InnoDB is killed during a table-rebuilding operation, such as TRUNCATE or ALTER or OPTIMIZE.

Comment by Marko Mäkelä [ 2018-09-13 ]

We decided not to push this to 10.2.18, because it backports a large amount of code to 10.2, which could be risky right before a 10.2 release. So, the backport could be merged to 10.2.19 at the earliest.

Note that the crash-downgrade prevention for 10.2 will prevent Percona Xtrabackup from working with MariaDB Server 10.2 with the backport included. (Xtrabackup already does not work with MariaDB Server 10.3 or later.)

Comment by Marko Mäkelä [ 2018-10-10 ]

bb-10.2-marko
MariaDB 10.2.19 will support the backup-unsafe TRUNCATE TABLE by default, to retain perceived compatibility with xtrabackup.
Undo tablespace truncation will use the redo log, but older versions of the server or older backup tools will fail to shrink the undo tablespace files on recovery.
The backup-safe TRUNCATE can be enabled in MariaDB 10.2 by setting the start-up parameter loose_innodb_unsafe_truncate=OFF. This parameter will not be available in 10.3 or later releases.

Edit: the option was renamed to innodb_safe_truncate.

Comment by Marko Mäkelä [ 2018-10-11 ]

Buildbot seems to be OK with the change. I conducted a manual test of crash-downgrading to mariadb-10.2.18:

./mtr --manual-gdb innodb.truncate_crash
./mtr --manual-gdb innodb_zip.wl6501_crash_3

Once the server is killed by the test, switch to the 10.2.18 executable (using the file command in GDB) and restart (run in GDB).

With the first test (which uses the backup-safe mechanism), the InnoDB in MariaDB Server 10.2.18 would refuse to start up, after emitting the following message to the error log:

2018-10-11 7:58:30 140737330922240 [ERROR] InnoDB: Downgrade after a crash is not supported. The redo log was created with MariaDB 10.2.19.

With the second test, which uses MySQL 5.7’s backup-unsafe but crash-safe TRUNCATE TABLE, the MariaDB 10.2.18 server would parse and apply the old-format redo log and the ib_*_*_trunc.log just fine. That test is restarting the server several times. I ran the test twice; first, switching to 10.2.18 on the first restart, and second, switching to 10.2.18 on the second restart and on subsequent restarts, switching between 10.2.18 and this 10.2.19 revision.

Comment by Marko Mäkelä [ 2018-10-17 ]

The default value for the 10.2-specific parameter will be innodb_safe_truncate=ON.

Generated at Thu Feb 08 08:06:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.