Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25886

CHECK TABLE crashes due to DB_MISSING_HISTORY in innodb_read_only mode

    XMLWordPrintable

    Details

      Description

      Occasionally, the test innodb.alter_copy would fail, reporting DB_MISSING_HISTORY in CHECK TABLE. It first caught my attention during the development of MDEV-25180. That change introduced delays to the purge tasks, in the form of purge_sys.stop_SYS(). If we delay purge more during DDL operations, then the test would almost always fail. Here is the first occurrence of this kind in a main branch:

      10.6 66165ae2210ac3230850fa60086564db

      CURRENT_TEST: innodb.alter_copy
      mysqltest: At line 84: query 'CHECK TABLE t' failed: <Unknown> (2013): Lost connection to server during query
      2021-05-21 22:21:54 3 [ERROR] [FATAL] InnoDB: Unknown error Required history data has been deleted
      

      I was able to repeat this, and this actually affects older versions as well. In 10.2, the transaction isolation level is supposed to be effectively hard-wired as READ UNCOMMITTED when innodb_read_only is set or the setting innodb_force_recovery has the value 4, 5, or 6. But, we actually would use some other isolation level in trx_undo_prev_version_build():

      	if (trx_undo_get_undo_rec(
      		    roll_ptr, heap, rec_trx_id, index->table->name,
      		    &undo_rec)) {
      		if (v_status & TRX_UNDO_PREV_IN_PURGE) {
      			/* We are fetching the record being purged */
      			undo_rec = trx_undo_get_undo_rec_low(roll_ptr, heap);
      		} else {
      			/* The undo record may already have been purged,
      			during purge or semi-consistent read. */
      			return(false);
      		}
      	}
      

      The fix is to avoid creating a purge view if purge is not going to be executed, so that the READ UNCOMMITTED mode will be used. There are two possible fixes: Either make CHECK TABLE explicitly use the READ UNCOMMITTED isolation level, or do not create the purge sys view (the below is for MariaDB 10.2):

      diff --git a/storage/innobase/trx/trx0sys.cc b/storage/innobase/trx/trx0sys.cc
      index 9138e9475bf..19dfa79d0ff 100644
      --- a/storage/innobase/trx/trx0sys.cc
      +++ b/storage/innobase/trx/trx0sys.cc
      @@ -535,7 +535,9 @@ trx_sys_init_at_db_start()
       
       	trx_sys_mutex_exit();
       
      -	trx_sys->mvcc->clone_oldest_view(&purge_sys->view);
      +	if (!high_level_read_only) {
      +		trx_sys->mvcc->clone_oldest_view(&purge_sys->view);
      +	}
       }
       
       /*****************************************************************//**
      

      We could also explicitly use READ UNCOMMITTED isolation level in CHECK TABLE also in this case (see MDEV-15418 and MDEV-18952).

      MDEV-15418 claims the following:

      Since MariaDB 10.2 (along with upstream MySQL 5.7), InnoDB already does this [READ UNCOMMITTED] for innodb_read_only=1

      That hard-wiring occurs at a rather low level, for example, lock_clust_rec_cons_read_sees() is checking srv_read_only_mode.

      The fix with the least impact and risk would seem to be to only touch CHECK TABLE, starting with 10.3, where MDEV-15418 first touched this code:

      diff --git a/storage/innobase/dict/dict0dict.cc b/storage/innobase/dict/dict0dict.cc
      index 10b878c0e49..1d0e2af3cd6 100644
      --- a/storage/innobase/dict/dict0dict.cc
      +++ b/storage/innobase/dict/dict0dict.cc
      @@ -5425,7 +5425,7 @@ dict_set_corrupted(
       
       	/* If this is read only mode, do not update SYS_INDEXES, just
       	mark it as corrupted in memory */
      -	if (srv_read_only_mode) {
      +	if (high_level_read_only) {
       		index->type |= DICT_CORRUPT;
       		goto func_exit;
       	}
      diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
      index 4e63edc65c2..65039919793 100644
      --- a/storage/innobase/handler/ha_innodb.cc
      +++ b/storage/innobase/handler/ha_innodb.cc
      @@ -14796,10 +14796,9 @@ ha_innobase::check(
       
       	/* We must run the index record counts at an isolation level
       	>= READ COMMITTED, because a dirty read can see a wrong number
      -	of records in some index; to play safe, we use always
      -	REPEATABLE READ here (except when undo logs are unavailable) */
      -	m_prebuilt->trx->isolation_level = srv_force_recovery
      -		>= SRV_FORCE_NO_UNDO_LOG_SCAN
      +	of records in some index; to play safe, we normally use
      +	REPEATABLE READ here */
      +	m_prebuilt->trx->isolation_level = high_level_read_only
       		? TRX_ISO_READ_UNCOMMITTED
       		: TRX_ISO_REPEATABLE_READ;
       
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marko Marko Mäkelä
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration